Thursday, August 21, 2014

Compiling meteoIO Examples with Eclipse (under Mac OS 10.4.9)

This is an ongoing task in the direction to compile GEOtop 2.0 under Eclipse and subsequently to embed part of it in OMS v3

Since to work with GEOtop 2.0 one needs to be aware of what meteoIO does, I started from it. 

To keep it simple, I first compiled the meteoIO libraries as described in the related post, by command line (compiling meteoIo under Eclipse will be a further task to explore eventually). Then I open Eclipse. For the occasion I did a new installation of the recent Luna IDE, installing the default CDT for managing C/C++ projects

I created a new project for one of the examples (by command line and following the instructions given in meteoIO website you can compile all the examples at once. However, if you get in problems to execute them, you do not know what to do, and you need to go to explore the examples content. Then you need a IDE or, at least a text editor, to browse the files ….). 

Then you have to add the example file you need. Assume, for instance, the file called time.cpp.
Import it in Eclipse by using  

  • ->File
  • ->Import
  • ->File System
  • -> Next Button 
  • -> Browsing directories 
  • -> Selecting the desired file by checking the box 
  • -> Finish


Before compiling successfully you have to tell the compiler two things: where the include files of the meteoIO files are (which contain the definition of all the meteoIO methods), and where the meteoIO libraries are. The first is required for compile time, the second for linking time. 

In particular the libraries of meteoIo are meteoio, meteoio.2, and meteoio.2.4.3 (please note that each of them, under Mac OS X is written as lib*.dylib, where * stands for any of the three names above). I actually placed them in one of my working directory but a standard choice would be probably better.

The includes files were instead placed under /usr/local/include (they were actually under  /usr/local/include/meteoio but the last directory is specified inside the code)

So here they are the instructions for making the libraries visible to the linker:
  • -> right click on the project
  • -> find “Properties” on the menu and click it
  • -> Expand “Mac osx C/C++ linker”
  • -> select Libraries
  • -> specify the name of the libraries (without lib and .dylib)
  • -> specify the path were the linker can locate the libraries
  • -> Finish


Here you find instead the instructions to for making visible the include files:
  • -> Right click on the project
  • -> Select Properties
  • -> Expand the options
  • -> choose “C/C++ General”
  • -> choose “Path and symbols”
  • -> Browse the filesystem to find the /usr/local/include directory
  • -> Finish


Then you are ready to build your project and execute it. ^1^2^3

______________________________________________________________________________

^1 - Please be aware that when you change building option, e.g. from "Debug" to "Release" option of Building (the hammer Icon in Eclipse) you have to said again where the library are to the linker

^2 The executables can be found under the Workspace Folder/Project Folder/Release, where "WorkSpace Folder" is the directory declared at the starting of Eclipse, "Project Folder" is the name of the current project.

^3 Many MeteoIO example require something in input and do not fail gracefully when this input is not provided (they give the error code "segmentation fault 11).

Data cleaning is part of any science process


 I take it verbatim this post from the R blog Revolution.

"A New York Times article yesterday discovers the 80-20 rule: that 80% of a typical data science project is sourcing cleaning and preparing the data, while the remaining 20% is actual data analysis. The article gives short shrift to this important task by calling it "janitorial work", but whether you call it data munging, data wrangling or anything else, it's a critical part of the data science. I'm in agreement with Jeffrey Heer, professor of computer science at the University of Washington and a co-founder of Trifacta, who is quoted in the article saying,

“It’s an absolute myth that you can send an algorithm over raw data and have insights pop up.”

As an illustration of this point, check out the essay by Julia Evans, Machine learning isn't Kaggle competitions (hat tip: Drew Conway). A Kaggle competion typically presents a nice, clean, regularized data set to the competitors, but this isn't representative of the real-world process of making predictions from data. As Julia points out:

Cleaning up data to the point where you can work with it is a huge amount of work. If you’re trying to reconcile a lot of sources of data that you don’t control like in this flight search example, it can take 80% of your time.

While there are projects underway to help automate the data cleaning process and reduce the time it takes, the task of automation is made difficult by the fact that the process is as much art as science, and no two data preparation tasks are the same. That's why flexible, high-level langauages like R are a key part of the process. As Mitchell Sanders notes in a Tech Republic article,

Data science requires a difficult blend of domain knowledge, math and statistics expertise, and code hacking skills. In particular, he suggests that expert knowledge of tools like R and SAS are critical. "If you can't use the tools, you can't analyze the data."

This is a critical step to gaining any kind of insight from data, which is why data scientists still command premium salaries today, according to data from Indeed.com."

Wednesday, August 20, 2014

Rills in Utah

Going from Zion National Park to Grand Arches National Park, I had the occasion to see several astonishing geological landscapes. Particularly exciting to me were the rillings that I could observe along the road, eroded in different types of lithology. Here below some images at low resolution (by clicking on the image you can get the higher resolution). Unfortunately I took them with a cellular phone and therefore they are not so clear as they could.
In the first image, where red sedimentary rocks are present, above rills are in the central part of the image. On top other processes than erosion dominate, i.e. rockfall, but in the bottom deposits rills and fluvial type of forms appear everywhere. 
Same as above, but in a different material. On the bottom, rock formations that repeat sequentially are also present. The geometry there is convex-divergent but still filled with rills.
Same material as above. Different geometries, more pronounced aggregative formations.
Same are as above a complete network eroded in the center of the image. Remarkably a sedimentary (?) layer across the formation but without very much effect on the rilling structure.


I do not know if a laser altimeter survey of the area is available, but in the case, it would be really interesting for  geomorphologists to analyse the literally thousands of rills and river networks that formed in this arid environments.

"Two Rivulets side by side,
Two blended, parallel, strolling tides,
Companions, travelers, gossiping as they journey."

W. Withman

Friday, August 1, 2014

What is life ? (by Erwin Schroedinger) and Hydrology

The excuse for this blog post was the reading of an old (1944) little book entitled “What is life ?” by Erwin Schroedinger. It presents the point of view of a physicist on life, before the discover of DNA, and actually influenced the subsequent research by Watson and Crick
My reading, besides being influenced by a general curiosity, had a scope. Hydrology, especially in its very modern declination called ecohydrology (see also here) has a lot to do with the complexity of physical, chemical and biological interactions.  However even the more physical aspects of hydrology deployed in space, present patterns, heterogeneities, feedbacks that are by themselves of an overwhelming degree of complexity. Therefore getting the method there, for life understanding,  could help for a method here, in hydrology.  The whole book is all enjoyable, however, my commentary here covers mostly three chapters, the first and the sixth, and very little the seventh.  Excerpts from the book are in italics, my own notes in normal characters. 

CHAPTER 1 - The Classical Physicist’s Approach to the Subject

INTRODUCTION

“.. though warned at the outset that the subject-matter was a difficult one a …, even though the physicist’s most dreaded weapon, mathematical deduction, would hardly be utilized. The reason for this was not that the subject was simple enough to be explained without mathematics, but rather that it was much too involved to be fully accessible to mathematics.”

Here  I see a parallel with many hydrological processes, say for instance, the hillslope processes. Many outstanding colleagues support the idea that the physics of the argument is too much complex to be treated mathematically. 

The large and important and very much discussed question is: How can the
events in space and time which take place within the spatial boundary of a living organism
be accounted for by physics and chemistry? The preliminary answer which this little book will
endeavor to expound and establish can be summarized as follows: The obvious inability of present-day physics and chemistry to account for such events is no reason at all for doubting that
they can be accounted for by those sciences.

Now, just substitute to “living organism”  “river basin” and you have an answer to the first question for hydydrology. It is indubitably that actually, in these seventy years, passed by the publication of the book, also biology itself, and molecular biology in particular did a lot of steps in the direction traced by E.S., as is, at the same level, clear that hydrology processes knowledge, and the establishment of Hydrology as a physical Science, since the work by P. Eagleson, made extraordinary jumps forward.

STATISTICAL PHYSICS. THE FUNDAMENTAL DIFFERENCE IS  STRUCTURE


Yet the difference which I have just termed fundamental is of such a kind that it might easily appear slight to anyone except a physicist who is thoroughly imbued with the knowledge that the laws of
physics and chemistry are statistical throughout.

This statement applies verbatim to Hydrology.

THE NAIVE PHYSICIST APPROACH TO THE SUBJECT

I propose to develop first what you might call 'a naive physicist's ideas about organisms', that is,
the ideas which might arise in the mind of a physicist who, after having learnt his physics and, more especially, the statistical foundation of his science, begins to think about organisms and
about the way they behave and function and who comes to ask himself conscientiously whether
he, from what he has learnt, from the point of view of his comparatively simple and clear and
humble science, can make any relevant

Substitute “organisms” with hydrology, hydrological processes, watersheds, at your convenience.

WHY ATOMS ARE SO SMALL ?

Why are atoms so small? … Suppose that you could mark the molecules in a
glass of water; then pour the contents of the glass into the ocean and stir the latter thoroughly so as to distribute the marked molecules uniformly throughout the seven seas; if then you took a
glass of water anywhere out of the ocean, you would find in it about a hundred of your marked
molecules.

Besides being a truly hydrological example, attributed to Lord Kelvin, it also envision the scales of hydrology from molecule (in the quantum domain) to oceans (the so call, global hydrology). 

CHAPTER 6 - Order, Disorder and Entropy

A REMARKABLE GENERAL CONCLUSION FROM THE MODEL

From Delbruck's general picture of the … substance it emerges that living matter, while not eluding the 'laws of physics' as established up to date, is likely to involve 'other laws of physics' hitherto unknown, which, however, once they have been revealed, will form just as integral a part of this science as the former.

Substitute Delbrucks’s with “modern Hydrology’; “living matter” with “hydrological processes”. Where these other laws are, is the new frontier of hydrology. A frontier, already envisioned by some time indeed, because, I cannot deny that I can see in it the “Gold medal search” of Ignacio Rodriguez-Iturbe own work.

….

LIVING MATTER EVADES THE DECAY TO EQUILIBRIUM 

When a system that is not alive is isolated or placed in a uniform environment, all motion usually comes to a standstill very soon as a result of various kinds of friction; differences of electric or
chemical potential are equalized, substances which tend to form a chemical compound do so,
temperature becomes uniform by heat conduction. After that the whole system fades
away into a dead, inert lump of matter. A permanent state is reached, in which no
observable events occur. The physicist calls this the state of thermodynamical equilibrium, or of
‘maximum entropy'

There is poetry in this sentence: but it could be subtly imperfect: natural systems usually work under disequilibrium conditions. In fact E.S. remarks it later in the chapter. However, not only living organisms but also eco-hydro-systems work the same way, even if at a more aggregate and “higher” level of organisation. Organisation of spatial physical systems, like river networks, and hydrological interactions work the same way, and often they show the same type of complex organisation. For their organisation, obviously, we would less inclined to talk about evading equilibrium conditions, and there we would be probably correct, but at the same time a little wrong …

IT FEEDS ON ‘NEGATIVE ENTROPY’

By eating, drinking , breathing and (in case of plants) assimilating. The technical term is metabolism. The Greek word means change or exchange. Exchange of what? Originally the underlying idea is, no doubt, exchange of material …That the exchange of material should be the essential thing is absurd …  For a while in the past our curiosity was silenced by being told that we feed upon energy …Needless to say, taken literally, this is just as absurd. … Every process, event, happening -call it what you will; in a word, everything that is going on in Nature means an increase of the entropy of the part of the world where it is going on.

I do not completely agree with the phrases excerpts. E.S. himself, in commenting further, does move out of this strict vision. Entropy represents uncertainty of kinetic energy microscopic configurational space. However, it is driven by energy which is, as well as mass (because space-time is locally hyperbolic and we work in non relativistic conditions), conserved. Is just the feeding up with heat that move water from a less entropic state (ice) to a more entropic state (vapor). Once in an energetic state, water molecules configuration is the most probable (more or less), but as experience teaches, the way the passage between energetic states is obtained, can strongly affects the final “metastable” configuration (and, for instance, snow flakes, are an example). So for living systems, as well as for the hydrological fluxes and states, metastable, out of equilibrium states are the key. Once the systems are not anymore fed up with mass and energy, the system decay to a stable state, which is, at the same time a state of feasible minimal potential energy and  feasible maximum   entropy. Metastability is intrinsic to everything. The universe itself, as we conceive it, is a metastable state  that moves out of the Big Bang. It would be an oddity if the same would not be true for hydrological fluxes.

CHAPTER 7 - Is Life based on the Laws of Physics ?

The tile itself is compelling. E.S. certainly opens many question as: NEW LAWS HAS TO BE EXPECTED IN THE ORGANISM. He concludes that new laws are to be expected emerging (but the word meaning was not there seventy years ago) from disorder, or organising the new order appearing at macroscopic scales:

The orderliness encountered in the unfolding of life springs from a different source. It appears
that there are two different 'mechanisms' by which orderly events can be produced: the
'statistical mechanism' which produces order from disorder and the new one, producing order from order” 

The same type of problematics can arise even in watershed hydrology (read the title: Is Hydrology based on the Laws of Physics ?). The current practice declares that the collective work of many water molecules, and their interactions can be describe under certain circumstances, by macroscopic laws, in which the collective behaviour, the spatial structure of the problem, or other situations, are more important than the simple molecular dynamics (think to the residence time interpretation of the Instantaneous Unit Hydrograph, for the Italians, here, or, remaining on the same topic, the fact that the hydrologic response is mainly determined by the geomorphic organisation, than Navier-Stokes equation)

In the “THE NEW PRINCIPLES ARE NOT ALIEN TO PHYSICS”, E.S. in fact claims that the new physics is still physics, even if, in some sense, super-physical.  He seems to me  in a search, that is not certainly concluded, of  a unifying principle for understanding the stratification of reality, even the physical one,  in layers, each one governed by its own rules. This was enunciated more recently (translation into English is mine) as follows: 

“ We cannot deny that our universe is not a chaos; we recognise being, objects thet we recall with names. These object or things are forms, structures provided of a certain   stability; fill a certain portion of space and perdure for a certain time …” 


The search for scaling, scale invariance and scale breaking in hydrology, that made history in the last two decades,  was the analogous search of understanding these higher levels of organisation of the hydrological processes that still are quite elusive indeed.

_______________________________________________________________________________

On the same topics of What is life ? I found also the Ph.D thesis by Nathaniel Virgo , entitled “Thermodynamics and the structure of living systems”. He is also author of interesting papers referred on his website.
The thesis, besides, E.S. works cites also the previous work by Morowitz and an interesting paper by Schneider

References

- N.Virgo, Thermodynamics and the structure of living systems, University of Sussex, 2011
- Morowitz, H. (1968). Energy flow in biology. New York and London: Academic Press.
- Morowitz, H. (1978). Foundations of bioenergetics. Academic Press.

- Schneider, E. D., & Kay, J. J. (1994). Life as a manifestation of the second law of thermodynamics. Mathematical and Computer Modelling, 19(6–8), 25–48.

Wednesday, July 30, 2014

Uncertainty and Information Theory

We all are persuaded that uncertainty is a big topic, in life but also, in hydrology. So important that many hydrologists dedicate their life to its estimation, in connection to hydrological processes. Uncertainty since it is uncertain also generate confusion, and some of tis literature is  confuse and confusing (I don't want to cite negatively anyone, but I could).
Whatever the case, one of the best talk I attended to at last Fall American Geophysical Union Meeting, was the invited lecture by Hoshin Gupta. Hoshin has an outstanding (really outstanding, I mean) carrier in finding calibration methods, indentifiability of parameters and understanding uncertainty in models. Recently (see for instance Gong et al., 2013) he started to apply concepts derived from information theory to hydrology.  BTW, you can find the pdfs of his AGU’s presentations here: on the necessity to apply information theory concept to evaluate models structural hypotheses, and another one about Information theory and Bayesian inference in hydrology (both with a lot of citations).

I never really understood why hydrologists do not use information theory  concepts. I-Theory is a well developed mathematica theory with a lot of tools, and could help to get out from the fuzziness around  the determination of uncertainty in models. Besides, using the concept of I-Theory information/uncertainty one can gain knowledge about the complexity of processes outputs and, possibly, infer something about the "complexity" of models required to mathematically account for it in a proper way (remind: "Everything should be made as simple as possible but not simpler").

Hoshin is not the only one that was attracted by information theory. In my occasional browsing of the topic, I also found some other interesting papers: the first one, by  Majda and Gershgorin, is concerned by climate models. This is encouraging, because climate models are certainly at least as involved as hydrological models are, and, if not, even more. A second is Weijs et al. (2013): this is concerned with time series: we compare time series, therefore knowing how much information is hidden in a time serie (at least with reference according to some encoding key) is certainly useful. For Wejis and van de Giesen, this paper is just a coming back to the topic (see also Weijs et al., 2010, and Weijs CV)

Another paper came from  Rudell on EOS remarkably highlighting that the I-Theory applications to hydrology attracted last year  many more people than use to be.
For making me feeling among the smarter, I  bought a book, by Mezard (see also, and GS) and Montanari (Andrea, not our colleague Alberto who also has quite a production on uncertainty: please see his website) which can be a further source of ideas and thoughts.

So far, I never actually read carefully any one of the papers (or the book), but excited at the idea to have time to do it in deep.

References

Gong, W., H. V. Gupta, D. Yang, K. Sricharan, and A. O. Hero III (2013), Estimating epistemic and aleatory uncertainties during hydrologic modeling: An information theoretic approach, Water Resour. Res., 49, 2253–2273, doi:10.1002/wrcr.20161.

Mézard, M. and Montanari, A. , Information, Physics, and Computation, Oxford University press, 2009

Majda, A. J.,  and Gershgorin, B., Quantifying uncertainty in climate change science through empirical information theory, PNAS, August 24, 2010, vol. 107,no. 34, 14958–14963

Ruddel, B.L, N. A. Brunsell and P. C. Stoy, Applying Information Theory in the Geosciences to Quantify Process Uncertainty, Feedback, Scale, Eos, Vol. 94, No. 5, 29 January 2013

 Weijs, S. V.;  Schoups, G.  and van de Giesen, N., Why hydrological predictions should be evaluated using information theory, Hydrol. Earth Syst. Sci., 14, 2545-2558, 2010, www.hydrol-earth-syst-sci.net/14/2545/2010/, doi:10.5194/hess-14-2545-2010

Weijs, S. V., van de Giesen, N. and Parlange, M. B., Data compression to define information content of hydrological time series, Hydrol. Earth Syst. Sci., 17, 3171–3187, 2013 www.hydrol-earth-syst-sci.net/17/3171/2013/ doi:10.5194/hess-17-3171-2013

Tuesday, July 22, 2014

Patterns for the application of modern informatics to the integration of PDEs: the case of the Boussinesq Equation

Today Francesco Serafin graduated finishing his master in civil and environmental  engineering. In brief, the scope of his thesis was to implement a series of classes, eventually ported to OMS, to solve the groundwater Boussinesq equation, but with a more large scope to envision an object oriented structure which could work for any PDE. Let Francesco's introduction talk:

"Mathematical models play a fundamental role in many scientific and engineering fields in today’s world. They are used for example in geotechnics to evalute the hillslope stability, in weather science to predict weather trends and produce weather reports, in structural design to study the resistance to stress, and in fluid dynamics to compute fluid flows and air flows.

Consequently mathematical models are evolving all the time: more and more new numerical methods are being invented to solve the Partial Differential Equations (PDE)s that describe physical problems with increasing precision, and more and more complex and efficient processor units are being created to reduce the computational time.
Therefore, the code into which the mathematical models are translated has to be “dynamic” in order to be easily updated on the basis of the continuous developments (Formetta et al. (2014) [16]).
On the other hand, completely different physical problems are often de- scribed using similar PDEs. For this reason, the numerical methods which provide solutions to different problems can be the same. This suggest the implementation of an IT infrastructure that hosts a standard structure for solving PDEs and that can serve various disciplines with the minimum of hassles.

This work is focused on the application of what is envisioned above, with the main purpose of the creation of an abstract code for implementing every type of mathematical model described by PDEs.

We work on hydrological topics but we hope to design a structure of general interest. Obviously the final goal of any work of this type is to find a proper numerical solver, and therefore, part of the thesis is devoted to the analysis of the problem under scrutiny, and the description of the solution found."

Tuesday, July 8, 2014

Quickness and exactitude

I put here an internal review of one of our manuscript, because, I hope, it can be useful in general. The topic is evaluating the rainfall runoff of a small catchment (but I hoped it was en estimation of the global hydrological cycle, even if without evapotranspiration measurements).

"The paper is written in a good English (finally). However good English does not mean is a good paper. It lacks of focus and is not concise (lack of exactitude and quickness, see at the end of the post). Objectives are not clear, and the novelties of the paper not evident. However, I am not desperate to obtain at the end something reasonable: but this just because I know the amount of work behind it, and, in part, the row-material.

Making a rainfall-runoff model cannot be usually considered an exercise at the frontier of our science (citing conversations with Ignacio Rodriguez-Iturbe. However, it could be, as testified by Gunther Bloschl's ERC). It, making rainfall-runoff, I mean, certainly can bring information about a certain basin. However, in our case, the works of N* and M* already filled this space. So what it is the goal of this paper ?
The initial idea was to assess the uncertainty in prediction of discharges by using appropriate statistical techniques. In particular, the idea was to assess the uncertainty inherent to rainfall extrapolation from point measurements to spatial measurements. 
This task has been only partially fulfilled. For the following reasons: errors due to instruments precision were not included (just the hypothesis of perfect functioning measures was applied);  the way rainfall has been included in the model (is not yet clear if average rainfall, one point for each hillslope was used, average rainfall volume for any information or other approximations were utlised: and no sensitivity analysis with respect to the way distribute rainfall was squeezed into the model was performed); the interplay between rainfall and discharge forecasting is not well developed, at least as it could be, i.e. explaining how it works inside the whole procedure is not explained well.  
Therefore the overall rainfall prediction analysis is incomplete, and I expect it would be completed for the thesis. 
The technical novelty we apply in this work is that we use a calibration tool (LUCA) to assess variograms, and we do it at hourly time step, while others do usually at daily time step. A few questions here: how much this approach improves rainfall estimates ? i.e., taking uncalibrated variograms and/or constant variograms (not varying in time) how much difference do we get ? How much this affects the forecasting of the volumes of water? Which comprehensive effect has this on the forecasting of the discharges ?

It could be that all of these approximation have negligible effects on the forecasting of discharges. But this would be indeed good to know and an achievement, which was not obtained so far. 

A second topic of interest was the simulation of the whole hydrological cycle, and a tentative to close the hydrological budget with the Priestley-Taylor simulation of evapotranspiration. This simulations were done but not shown at all in the manuscript. Why not ? Do the simulated discharges and the  simulated ET sum to the total volume of rainfall ? If not, which interpretation do we have about the missing mass ?  Are we able to assess the uncertainty in predictions of each single component of the hydrological cycle obtained with this method? Are we able to observe interannual variability (both in discharges and evapotranspiration, and, if the case, in storage) ? Is this variability estimate reliable, at least as a gross budget ?

Having missed to answer to each one of the questions above the paper results a wandering around that breaks our karma (citation from Vijay K. Gupta).  Please save us with more rigor. 

Regarding quickness and exactitude, I suggest the reading of Italo Calvino's Six Memos for the next Millennium.^1^2

^1 - Here a video seminar on the Six Memos by Paolo Granata
^2 - Hainging around, in a digression maybe, and unfortunately in Italian, the Discorso sulla Matematica (Talk on Mathematics) inspired and guided by Calvino's lectures, written by Gabriele Lolli