Thursday, November 23, 2017

Monday's discussion on evapotranspiration - Part I - The vapor budget

Last Monday at lunch, I and my students discussed about evapotranspiration. I already talk about it in various comment here. However, the starting point was the impression, coming from one of my student that the topic of transpiration is still in its infancy. I agree with him and I offered my synthesis.

  1. In hydrology we use Dalton’s law (here, slide 21) or the derived equations named Penman-Monteith and Priestley Taylor (forgetting all the empirical formulas).
  2. Dalton’s law  puts together, diffusive vapor flux, vapor storage and turbulent transport.
  3. We should have a water vapor budget equation instead, written for some control volume where all the stuff is at its right place.

There is no difficult to recognize that control volume is limited in our case by the open atmosphere and the surfaces that are emitting vapor as consequence of  their thermodynamics. There is also no difficult to recognize that such limiting surfaces can be very complicate, as those of a canopy, for instance, which is also varying in shape and form with time. Out of this surfaces comes a vapor flux which is dependent on the thermodynamics and physiology existing below them, the holes in number and distribution through which the vapor is emitted, the water availability (and dynamics) in the storage below the surface and, last but not least, the vapor content in the control volume which, as in Dalton’s law commands the driving force.

It is not easy actually to account well for all of these factors. We can, maybe, for a single leaf. It is more complicate for the canopy of a single tree. It is even more difficult for a forest. Unless some goddess  acts to simplify the vapor budget, over the  billions of details, and reduces all to some tretable statistics (we can call this statistics the Holy Grail of evapotranspiration - or the whole Hydrology itself) 

Assume we can deal with it, and  we have the fluxes right. Then the vapor budget seems cristalline simple to obtain, the variation of vapor in the control volume is given by the incoming vapor flux, minus the outcoming vapor flux, minus, in case, the vapor condensation. The good old mass conservation. Unfortunately, also the output flux is not that easy to estimate, because the transport agent is atmospheric turbulence, which is affected by the non-linearities of Navier-Stokes (NSeq) equations, and its interactions  with the complex boundary represented by the terrain/vegetation surfaces. All of this involves a myriads of spatio-temporal scales and degrees of freedom which are not easy to simplify. 

Therefore, literature treats the evaporation as a flux, forgets the real mechanics of fluxes and simplifies turbulence according to similarity theory, essentially due to Prandtl work at the beginning of the last century with the additions of Monin-Obukhov theory.  However, in real cases, the hypotheses of similarity theory are easily broken and the velocities distributions are rarely those expected. All of this makes largely unreliable the transport theory applied in a pedestrian way (as we do). See References below and here a quite informed lecture to get a deeper view.

Summarizing,  the transport is complicate because turbulence interacting with complex surfaces is complicate (probably would be better to say "complex"). Numerically is a problem whose solution is still open (a full branch of science, indeed), and we do not know how to model the rustling of leaves (“and the icy cool of the far, far north, with rustling cedars and pines). 
In fact, the models we use for what we usually call potential evapotranspiration, are an extreme simplification of the wishlist.

Finally, the above picture forgets the role of air and soil temperature (thermodynamics). We were thinking, in fact, only to the mass budget and the momentum budget (the latter is what NSeq is),  but there is no doubt that evaporation and transpiration are commanded also, and in many ways, by the energy budget. Turbulence itself is modified by temperature gradients, but also water vapor tension which concurs to establish the quantity of water vapor ready to be transported at vapor emitting surfaces. A good news is that energy is conserved as well, but this conservation includes the phase of the matter transported (the so-called latent heat). So necessarily, in relevant hydrological cases, we have to solve besides the mass and momentun conservation, the energy conservation itself.


Looking for simplified versions of mass, momentum and energy budget, would require a major rethinking of all the derivations and new impulse to proper measurements that, however, some authors already started (e.g., for instance Schymansky, Or and coworkers,  here).

References

Tuesday, November 14, 2017

Open Science

Nothing really original in this post. I just recollect what already said in the FosterOpenScience web pages. Their definition is:

"Open Science represents a new approach to the scientific process based on cooperative work and new ways of diffusing knowledge by using digital technologies and new collaborative tools (European Commission, 2016b:33). The OECD defines Open Science as: “to make the primary outputs of publicly funded research results – publications and the research data – publicly accessible in digital format with no or minimal restriction” (OECD, 2015:7), but it is more than that. Open Science is about extending the principles of openness to the whole research cycle (see figure 1), fostering sharing and collaboration as early as possible thus entailing a systemic change to the way science and research is done." The wikipedia page is also useful.

This approach get along with the one of doing reproducible research which I already talked about several times. I do not have very much to add to what they wrote, but I also want to make you note that "there are in fact multiple approaches to the term and definition of Open Science, that Fecher and Friesike (2014) have synthesized and structured by proposing five Open Science schools of thought" .

In our work a basic assumption is that openness require also the appropriate tools, and we are working hard to produce them and use those other that make a scientific workflow open.

Friday, November 10, 2017

About Benettin et al. 2017, equation (1)

Gianluca (Botter) in his review of Marialalaura (Bancheri) Ph.D. Thesis brought to my attention the paper Benettin et al. 2017. A great paper indeed, where a couple of ideas are clearly explained:

  • SAS functions can be derived from the knowledge of travel and residence times probability
  • a virtual-experiment where they show that traditional pdfs (travel times pdf) can be seen an the ensamble of the actual time-varying travel times distributions.

The paper is obviously relevant also for the hydrological contents it explains, but it is not the latter point the one which I want  to argue a little. I want here just to argue about the way they present their first equation.

SAS stands for StorAge Selection functions and they are defined, for instance in Botter et al. 2011 (with a little difference in notation) as:
$$
\omega_x(t,t_{in}) = \frac{p_x(t-t_{in}|t)}{p_S(t-t_{in}|t)} \ \ \ (1)
$$
as the ratio between the travel time probability related to output $x$ (for instance discharge or evapotranspiration) and the residence time probability.
In the above equation (1)
  •  $\omega_x$ is the symbol that identifies the SAS
  • $t$ is the clock time
  • $t_{in}$ is the injection time, i.e. the time when water has entered the control volume
  • $p_x(t-t_{in}|t)$ with $x \in \{Q, ET, S\}$  is the probability that a molecule of water entered in the system at time $t_{in}$ is inside the control volume, $S$, revealed as discharges, $Q$, or evapotranspiration, $ET$

Equation (1) in Benettin et al. is therefore written as
$$
\frac{\partial S_T(T,t)}{\partial t} + \frac{\partial S_T(T,t)}{\partial T} = J(t) - Q(t) \Omega_Q(S_T(T,t),t)-ET(t) \Omega_{ET}(S_T(T,t),t) \ \ \ \ (2)
$$
Where:

  • $T$ is residence time (they call  water age but this could be a little misleading because the water age of water in storage could be, by their own theory different in storage, discharge, evapotranspiration)
  • $S_T$ is the age-ranked storage, i.e. “the cumulative volumes of water in storage as ranked by their age” (I presume the word “cumulative”  implies some integration. After thinking a while and looking around, also to paper van der Velde et Al. 2012, I presume the integration is over all the travel times up to $T$ which, because the variable of integration in my notation is $t_{in}$ means that $t_{in} \in [t,t-T]$  )
  • $J(t)$ is  the precipitation rate at time $t$
  • $Q(t)$ is the discharge rate at time $t$
  • $\Omega_x$ are the integral of the integrated SAS function which are more extensively derived below.

In fact, this (2) should be just an integrated version (integrated over $t_i$) of equation (9) of Rigon et al., 2016:
$$
\frac{ds(t,t_{in})}{dt} = j(t,t_{in}) - q(t,t_{in}) -et(t,t_{in})
\ \ \ \ (3)
$$
where:
  • $s(t,t_{in})$ is the water stored in the control volume at time $t$ that was injected at time $t_{in}$
  • $j(t,t_{in})$ is the water input which can have age $T=t-t_i$
  • $q(t,t_{in})$ is the discharge that exits the control volume at time $t$ and entered the control volume at time $t_{in}$
  •  $et(t,t_{in})$ is the evapotranspiration that exits the control volume at time $t$ and entered the control volume at time $t_{in}$
In terms of the SAS and the formulation of the problem given in Rigon et al. (2016), the $\Omega$s can be defined as follows:
\begin{equation}
\Omega_x(T,t) \equiv \Omega_x(S_T(T,t),t) := \int_{t-T}^t \omega_x(t,t_i) p_S(t-t_i|t) dt_i = \int_0^{p_S(T|t)} \omega_x(P_S,t) dP_S
\end{equation}
Where the equality ":=" on the l.h.s is a definition, so the $\Omega$s ($\Omega_Q$ and $\Omega_{ET}$) are this type of object. The identity $\equiv$ stresses that the dependence on $t_in$ is mediated by a dependence on the cumulative storage $S_T$ and $T$ is the travel time. As soon as $T \to \infty$, $\Omega \to 1$ (which is what written in equation (2) of Benettin's paper). This is easily understood because by definition ${\omega_x(t,t_i) p_S(t-t_i|t)} \equiv {p_x(t-t_i|t)}$ are probabilities (as deduced from (1)).
An intermediate passage to derive (2) from (3) requires to make explicit the dependence of the age-ranked functions from the probabilities. From definitions, given in Rigon et al., 2016. It is
$$
\frac{d S(t) p_S(t-t_{in}|t)}{dt} = J(t) \delta (t-t_in) - Q(t) p_Q(t-t_{in}|t) - ET(t) p_{ET}(t-t_{in}|t)
$$
which  is Rigon et al. equation (14).
Now integration over $t_i \in [t-T, t]$ can be performed to obtain:
$$
S_T(t_{in},t):= \int_{t-T}^T s(t_{in},t) dt_{in}
$$
and, trivially,
$$
J(t) = J(t) \int_{t-T}^T \delta(t-t_{in}) dt_{in}
$$
while for the $\Omega$s I already said.
The final step is finally to make a change of variables that eliminate $t_{in}$ in favor of $T := t-t_{in}$. This actually implies the last transformation. In fact:
$$
\frac{dS(t,T(t_{in},t))}{dt} =\frac{\partial S(t,T(t_{in},t))}{\partial t} + \frac{\partial S(t,T(t_{in},t))}{ \partial T}\frac{\partial T}{ \partial t} = \frac{\partial S(t,T(t_{in},t))}{\partial t} + \frac{\partial S(t,T(t_{in},t))}{ \partial T}
$$
since $\partial T/\partial t$ =1. Assembling all the results, equation (2) is obtained.

Note:
Benettin et al., 2017 redefines the probability $p_S$ as “normalized rank storage … which is confined in [0,1]” which seems weird with respect to the Authors own literature. In previous papers this $p_S$ was called backward probability and  written as $\overleftarrow{p}_S(T,t)$. Now probably they have doubt we are talking about probability.  In any case, please read it again: "normalized rank storage … which is confined in [0,1]”. Does not sound unnatural is not a probability ? Especially when you repetitively estimate averages with it and comes out with “mean travel times”?Operationally, it IS a probability. Ontologically the discussion about if there is really random sampling or not because there is some kind of convoluted determinism in the travel times formation can be interesting but it brings to a dead end. On the same premised we should ban the word probability from the stochastic theory of water flow, that, since Dagan has been enormously fruitful.

This long circumlocution,  looks to me like the symbol below

or TAFKAP, which was used by The Artist Formerly Known As Prince when he had problems with his record company.
In any case, Authors should pay attention in this neverending tendency to redefine the problem rather beacause it can look what Fisher (attribution by Box, 1976) called mathemastry. This is fortunately not the case of the paper we are talking about. But then why not sticking with the assessed notation ?

The Authorea version of this blog post can be found here.

References 

Tuesday, October 31, 2017

Meledrio, or a simple reflection on Hydrological modelling - Part VI - A little about calibration

The normal calibration strategy is to split the data we want to reproduce into two setz:

  • one for the calibration phase
  • one for the "validation" phase
Let's assume that we have an automatic calibrator. It usually:
  • generates a set of model's parameters, 
  • estimates with the rainfall-runoff hydrological model and any given set of parameters the discharges, 
  • compares what computed with what is measured by using a goodness of fit indicator
  • keeps the set of parameter that gives the best performances
  • repeats the operation a huge number of times (and use some heuristics for searching the best set overall)

This  set of parameters is the one used for "forecasting" and

  • is now used against the validation set to check its performances.
However, my  experience (with my students who usually perform it) is that the best parameter set in the calibration procedure, is not usually the best in validation procedure. So I suggest, at least as a trial and for further investigations to:

  • separate the initial data set into 3 parts (one for first calibration, one for selection, and one for validation).
  • Among the 1% (or x% where x is let at your decision) of best performing in the calibration phase  is selected (called the behavioural set). Then 1% (one over 10^4) best performing in the selection phase is further sieved. 
  • This 1 per ten thousand is chosen to be used in the validation phase
The hypothesis to test is that this three steps way to calibrate returns usually better performances in validation than the original two step steps one.

Sunday, October 29, 2017

Open Science Framework - OSF

And recently I discovered OSF, the Open Science Framework. My students told me that there exists many of them, of this type of on-line tools that make leverage of the cloud to store and helps groups to manage their workflow. However, OSF seems particularly well suited to work for scientists’ group, since it contains links various science-oriented features, like connections to Mendeley, Figshare, Github and others. An OSF “project” can contain writings, figures, codes, data. All of this can be uploaded for free in their servers or being maintained in one of your cloud storage like Dropbox or GoogleDrive. 

For starting, you can take one our of your time to follow one of their YouTube video, like the one below.

Their web page contains also some useful guides that make the rest (do not hesitate to click on icons: they contain useful material!). The first you can start with is the one about the wiki, a customizable initial page that appear in any project or sub-project. There are some characteristics that I want to emphasize here. Startin a new project is easy, and when you have learn how to do it, you almost have learn all of it. Any project can have subprojects, called “components”. Each component behaves like a project by itself, so when dealing with it, you do not have to learn something really new. Any (sub)project can be private (the default) or public, separately, and therefore your global workflow can contain private and public stuff. 

Many people are working on OSF. For instance Titus Brown’s Living in a Ivory Basement blog also has some detailed review of it. They also coded a command line client for downloading files from OSF which can be further useful.  

Wednesday, October 25, 2017

Return Period


Some people, I realised, have problems with the concept of return period. This is the definition in wikipedia (accessed October 25th, 2017):
A return period, also known as a recurrence interval (sometimes repeat interval) is an estimate of the likelihood of an event, such as an earthquake, flood[1], landslide[2], or a river discharge flow to occur.
It is a statistical measurement typically based on historic data denoting the average recurrence interval over an extended period of time, and is usually used for risk analysis (e.g. to decide whether a project should be allowed to go forward in a zone of a certain risk, or to design structures to withstand an event with a certain return period). The following analysis assumes that the probability of the event occurring does not vary over time and is independent of past events.
Something that wikipedia does not include is rainfall intensity. The first paragraph, should be then something like:
"A return period of x time units, also known as a recurrence interval (sometimes repeat interval) is an estimate of the likelihood of an event, such as an earthquake, flood[1], landslide[2], rainfall intensity, a river discharge flow or any observable, to occur (or be overcome) on average every x time units."

Return period clearly involves a statistical concept, which is traced back to a probability, and a time concept, that is the sampling time.
Let us assume we have a sequence of data, for which, at moment, the sampling time is unknown, composed by a discrete number, $n$, of data.
The empirical cumulative distribution function (ECDF) of the data is a representation of the empirical statistics for those data. Let $ECDFc$ be the complementary empirical cumulative distribution function, meaning $ECDFc(h) \equiv 1 - ECDF(h)$.
Let h* be one of the possible values of these data (not necessarily present in the sequence but included in the range of experimental values). We are interested in the probability of $h^*$ being overcome. If $m$ is the number of time $h^*$ is matched or overcome, then
$$ ECDFc(h^*)= m/n $$
$$ECDF(h^*) = 1 - m/n$$
We can, at this point assume that ECDF resembles some probability function, but this is a further topic we do not want to talk about here. What we want to stress i s that ECDFs (probabilities) are not automatically associated to a time. All the data in the sequence refers to different picks of a random variable, and these picks are not necessarily time-ordered or can be happened all at the same time. So the “frequencies" that can be associated to the above events are not time frequencies.
Now introduce time by saying that, for instance, each datum was sampled at regular time step $\Delta t$, what before I called “time units”, and, for practical reasons we are not interested to the ECDF of the data but to know how frequently (in clock time sense) it is repeated. So, we can say that the total time of our record is
$$T = n\, \Delta t$$
and in this time span, the number of time, h* is overcome is (by construction)
$$m=ECDFc(h^*)*n$$
On average, along the record obtained, the time frequency on which values greater than $h^*$ are obtained is the empirical return period:
$$T_r:=\frac{T}{m} =\frac{n *\Delta t}{ECDFc(h^*)*n} = \frac{\Delta t}{ECDF(h^*)}$$
So, the empirical return period of $h^*$ is inversely proportional to the complementary ECDF($h^*$) but, properly there is a “$\Delta t$” to remind that it is given in time units. One basic assumption in our definition is that the underneath probability is well defined, which is not if climate change is in action. This is a delicate and well discussed topic*, but again, not the core of this page.

There is a crucial initial step, the sampling of data which affects the final result. If the data in the sequence are, for instance annual maxima of precipitation, then the return period is given in years. If the data were daily precipitation totals, then the return period is given in days. And so on. Because usually the time unit has value “1” (but dimension of a time), the numeric value of the return period is just the inverse of the ECDFc. We should not forgot, however, that the equation contains a mute dimension. We are talking about times, not dimensionless numbers (probabilities).

Being Bayesian, probably you can introduce this in a different way. I let you as an exercise to do it.

** On the topic of stationarity, please give a look to:

Milly, P. C. D., Betancourt, J., Falkenmark, M., Hirsch, R. M., Kundzewicz, Lettenmaier, D. P., & Stouffer, R. J. (2008). Stationarity Is Dead: Whither Water Management? Science, 319, 1–2.

Montanari, A, and Koutsoyiannis, D,  Modeling and mitigating natural hazards: Stationarity is immortal!, Water Resources Research, 50 (12), 9748–9756, doi:10.1002/2014WR016092, 2014.

Serinaldi, F., & Kilsby, C. G. (2015). Stationarity is undead: Uncertainty dominates the distribution of extremes. Advances in Water Resources, 77(C), 17–36. http://doi.org/10.1016/j.advwatres.2014.12.013

Sunday, October 22, 2017

Simple models for hydrological hazard mapping

This contains the second talk I gave to high-school teacher at MUSE for the Life Project FRANCA. My intention was to show (under a lot of simplification assumptions) how hydrological models work, and give a few hints on which type of hydraulics models of sediment transport can be useful.
 Clicking on the figure above you can access the slides (in Italian but with a little time, I will provide a translation). In their simplicity, the slides are a storyboard for action that could be taken in the SteepStream project to provide an estimation of hazards of Meledrio river basin (and the other two selected).