Category Archives: programming

Predicting equilibrium channel geometry with a neural network

In an attempt to learn more about ML I decided to just jump in and try a project. Predicting channel geometry with a simple neural network.

[source code]

All in all, I’m fairly sure I didn’t learn anything about equilibrium channel geometries, but I had some fun and learned an awful lot about machine learning and neural networks. A lot of the concepts I have been reading about for weeks finally clicked when I actually started to implement the simple network.

I decided to use the data from Li et al., 2015 [paper link here], which contains data for 231 river geometries.

The dataset has variable bankfull discharge, width, depth, channel slope and bed material D50 grain size.

We want to be able to predict the width, depth, and slope from the discharge and grain size alone. This is typically a problem, because we are trying to map two input features into three output features. In this case though, the model works because the output H and B are highly correlated.

The network is a simple ANN, with one hidden layer with 3 nodes. Trained with stochastic gradient descent. Training curve below.

Matlab speed comparison of switch-case and if-then statements and hard-code

I frequently use the extremely methodical approach in scientific programming of “just trying things”. This means that I create a lot of different ways to try to do something to find the best way to do that thing. Sometimes this means getting a realistic result, a stable result, or a fast computation.

All functional programming languages offer if-then statements, whereby sections of code are evaluated on True evaluations of a statement. An if-then statement can be extended with an elseif to provide further evaluations to compare with for True. Using this framework, different “cases” or “methods” of doing a single thing can be quickly tested by switching between them in a script. The purpose of all is to make the code more readable and maintainable, by not having to comment/uncomment blocks of code to make a change in the calculation method.

For example, two methods x = 'test1' | 'test2' could be executed by a function, depending on the definition of x:

if strcmp(x, 'test1')
y = 1;
elseif strcmp(x, 'test2')
y = 2;
end

A similar functionality can be obtained with a switch-case statement:

switch x
case 'test1'
y = 1;
case 'test2'
y = 2;
end

But which is faster?? I suppose I always knew switch-case statements were slower than if-then statements, but I wanted to know how much of a difference it could make. I also often have >4 of these case statements as optional methods, and I wanted to know if the number of cases (or really how “deep” down the list they were) made a difference in speed.

I designed a simple experiment in Matlab to test this out. I looped through a simple set of statements 1 million times, and timed each scenario. You can find the source code here.

It turns out that switch-cases are about ~30% slower than if-then statements. Both are more than an order of magnitude slower than.

Most importantly though, the time increases linearly with each “layer” to the if-then or case-switch statement. To me, this stressed the importance of 1) having few cases that aren’t simply hard-coded in, or 2) at least sort the cases in order of likelihood of being used during program execution.

Markov Chain stratigraphic model

I recently returned from the NCED2 Summer Institute for Earth Surface Dynamics at Saint Anthony Falls Laboratory at University of Minnesota, which is a 10-day long, intense workshop for early-career scientists to talk about novel methods and ideas in our field. This was the ninth, and hopefully not last, year of the meeting.

Typically the participants will do some sort of physical experiment (kind-of what the lab is known for being the best in the world for), but this year’s theme was about mathematics in Earth surface processes. We covered a range of subjects, but a lot of discussion came back to random walks, diffusion-like processes/simulations, and probability. Kelly Sanks and I worked on a project together, which combined a lot of these ideas.

Our question was: is deltaic sedimentation, and the resulting stratigraphy random? Specifically, we hypothesized that a pure “random walk” model can not capture the effect of the “stratigraphic filter”. However, a biased random walk model, where Δz depends on the previous time’s Δz, can capture the dynamics of the system sufficiently to look like an actual stratigraphic sequence. The null hypothesis then is that both random walk models are indistinguishable from a stratigraphic succession.

To attack the problem, we decided to use a simple biased random walk model: Markov chains. These models have no physics incorporated, only probability, which made the project simple and easy to complete over beers at night. The Markov chain can be summarized as a sequence of “states” where the next state is chosen at random based on some probability to change to another given state or stay at the same state. Repeating this over and over gives a chain/sequence of states. In our simulation, the “states” were changes in elevation. Said another way, the next timestep’s change in elevation is dependent on the current timestep’s change in elevation. This hypothesis is grounded in the physics of river-deltas, whereby channel location (and thus locus of deposition/erosion elevation change) does not change immediately and at random, but rather is somewhat gradual.

When a system has more than a few states it becomes complicated to keep track of the probabilities, so we naturally use a matrix to define the probabilities. The so called “transition matrix” records the probability that given a current state (rows), the next state will be any given state (columns).

We used the Tulane Delta Basin experiment 12-1 which has a constant Qw/Qs ratio and RSLR rate for our data. This is kind of necessary for our question because we needed to know what the elevation was at some regular interval, and what the resulting stratigraphy after deposition and erosion was. The experimental delta surface was measured to mm accuracy every hour for 900 hours. We calculated Δz values over the entire experiment spatiotemporal domain, to inform our main biased random walk model.

Markov transition matrix of dz values calculated from the experimental elevation profiles. (axes are in mm/hr)

The main data-trained transition matrix shows that states will tend towards slightly aggradational. This makes sense since this system is deterministically net aggradation due to a RSLR forcing. We compare this data-trained model with two synthetic transition matrices: a Gaussian distribution (intentionally similar to the data) and a uniform distribution (i.e., purely random).

 

 

 

 

 

We then simulated the elevation change sequences predicted by each of the transition matrices and used the stratigraphic filter concept to determine the resultant stratigraphy. We did this a bunch of times, and totally chose one of the more “convincing” results (i.e., where the slope difference between simulations was larger).

a SINGLE realization of the modeled Markov sequences.

We found that the data-trained model reproduced the stratigraphic sequences from the experiment (based only on comparing slopes…). The other models were not too far off, suggesting that deltaic sedimentation is in fact not too far off from purely random.

Ultimately, we would simulate the models hundreds of times and make ensembles of the results to interpret. We also would use more meaningful statistics of the stratigraphic sequences to compare model and data, such as # of “channel cuts”, unconformity-bed thicknesses, “drowned” sequences, etc.

The main source code for all this lives on github here (https://github.com/amoodie/markov_delta) if you are interested in checking anything out or exploring some more. You will need to obtain the data from the SEN website listed above, or ask me to send you a copy of the data formatted into a `.mat` file (it’s too big to put on here…).
If you format the data yourself, produce a 3D matrix with each cell where the x-y-z coordinates are time-across-along.

Authors of the code and project are Kelly Sanks (@kmsanks) and Andrew J. Moodie (@amoodie).

Rivers 2 Stratigraphy

Explore the construction of stratigraphy through the interaction of channel geometry, lateral migration rate, subsidence rate, and avulsion frequency — interactively!

Imagine there is a river in a subsiding basin. As the river laterally migrates, it leaves behind sandy bar deposits, which are encapsulated in a matrix of floodplain muds.

The river deposits are lowered away from the basin surface by subsidence, they become part of the stratigraphic record (Figure below). Depending on the amount of lateral migration, frequency of avulsion, and subsidence rate, the degree of overlap of channel bodies in the stratigraphic deposit will vary. The movie above shows a vertical slice across the basin width, and a channel (the colorful boxes) laterally migrating across a river floodplain.

The movie comes from an educational module I am working on called Rivers2stratigraphy (https://github.com/amoodie/rivers2stratigraphy). The module is open source and relies only on Python dependencies.

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No.1450681. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors(s) and do not necessarily reflect the views of the National Science Foundation.

Python 3 silly random name generator

I was recently working on a project to try to annoy my collaborator Eric by a scheduled script that sends him an email if there are open pull requests that I authored in a github repository he owns (pull-request-bot).

Along the way, I created this fun little function for coming up with random sort-of-realistic-but mostly-funny-sounding names in Python. I’m in the process of switching from Matlab to Python, so little toy functions and projects like this are a good way for me to learn the nuances of Python faster. In fact, I would argue this is the best way to learn a new programming language–no need for books or classes, just try things!

The script is super simple and I have posted it as a GitHub Gist so I won’t describe it all again here. Below the markdown description of the function is an actual Python file you can grab, or grab it from here.

Outreach module — Flooding risk in low-lying landscapes

I have put together an outreach module that describes some of the risks of flooding in low-lying landscapes. The module runs in Matlab, either within a licensed environment, or with the Matlab Runtime Environment (which is available to anyone).

Accompanying the GUI is a worksheet that steps through all the aspects of the GUI and attempts to demonstrate the principles of flooding in deltas without detailing the math or physics behind the model. My hope is that it will be helpful to High School educators in their course work.

So far, I have only written a worksheet targeted at 9-12th graders, but plan to write two more (one for younger students, and one for more advanced undergraduate/graduate students) worksheets in the near future.

Below is a demonstration of the GUI. The full information for the module (including the source code) is on my GitHub here. The project website is here.

 

 


This material is based upon work supported by the National Science Foundation (NSF) Graduate Research Fellowship under Grant No.145068 and NSF EAR-1427177. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

LaTeX CV formatting

I’ve recently spent quite a bit of time trying to get my CV formatted exactly the way I want, using LaTeX. Those who know me, know that I love LaTeX, probably the point of being a snob about it…so naturally my CV has to be made with LaTeX. The final product can be found here.

The bulk of the CV was pretty easy to work up, I just did most of it manually with the help of a just a few critical packages (e.g., tabularx, enumitem). What was important to me though was to develop an easily maintainable method for keeping my CV up to date with publications and presentations. I use Zotero to manage references, so exporting the references to a .bib file to be processed in my LaTeX CV seemed like an obvious place to start.

Below is a style file I have created, providing the package cvlist, in order to take the exported .bib file containing all of the papers and conferences I have been a part of and format them into a beautiful LaTeX list. To do so I’m using BibLaTeX to process the .bib file.

There were a few things that I wanted to have my CV list set up for:

  1. multiple sections for different types of “references”, e.g., peer reviewed, other publications, and conferences proceedings
  2. reverse counting of items within each section, such that the most recent items come first and have the highest number
  3. the ability to have publications in various states (e.g., submitted, in prep) and format them properly into the reference
  4. provide links (formatted as a clean single word “[link]”) for items which have a URL included in the .bib file

This is mostly achieved within the style file and package provided by cvlist. The code comes from various places all over the internet (mostly stackexchange though <3), and I have tried to attribute where possible. The package can be found here.

The only things required in the actual CV .tex file then are to call the package, add the bib resource, and define the section filters. I have chosen to define the section filters in the actual .tex file, but they could have been switched into the style file by default too.

\usepackage{cvlist}
\addbibresource{../CV_biblist.bib}
% define the filters that will separate out the sections printed
\defbibfilter{peer}{type=article and not subtype=nopeer}
\defbibfilter{nopeer}{not type=inproceedings and subtype=nopeer}
\defbibfilter{conf}{type=inproceedings}

Finally, the reference sections are printed with calls

\nocite{*}
\section*{Refereed Publications}
\printbibliography[filter=peer,heading=none]

\section*{Other Publications}
\printbibliography[filter=nopeer,heading=none,resetnumbers]

\section*{Scientific Presentations with Abstracts}
\printbibliography[filter=conf,heading=none,resetnumbers]

I don’t expect that my solution will solve the problems of anyone trying to do something similar to what I have done, but I do hope it provides a good starting point. The package can be found here.

Building a simple delta numerical model: Part VI

This will be the final piece of the model that we need to get to have a working code for delta growth: the time routine. We will define a few more terms in order to set up the model to be able to loop through all the time steps and update the evolving delta.

T = 500; % yrs
timestep = 0.1; % timestep, fraction of years
t = T/timestep; % number of timesteps
dtsec = 31557600 * timestep; % seconds in a timestep

T is the total time the model will be run for (in years), timestep is the fraction of year that will be simulated with each timestep, expressed in seconds as dtsec, and t is the number of timesteps to loop through. Now we simply take our block of code that we’ve built up to calculate all the propertyies of the delta (slope, sediment transport, deposition, etc.) and surround it with a for statement:

for i = 1:t
[S] = get_slope(eta, nx, dx); % bed slope at each node
[H] = get_backwater_fixed(eta, S, H0, Cf, qw, nx, dx); % flow depth
U = Qw ./ (H .* B0); % velocity
[qs] = get_transport(U, Cf, d50, Beta);
qsu = qs(1); % fixed equilibrium at upstream
[dqsdx] = get_dqsdx(qs, qsu, nx, dx, au);
[eta] = update_eta(eta, dqsdx, phi, If, dtsec);
end

To explain in words the above block, now, we are going to go through every timestep i and calculate the slope of the bed (eta) everywhere in the model domain, then we use this slope (and other defined parameters) to determine the flow depth and velocity through the entire delta. The velocity is used to calculate sediment transport, and the change in sediment transport over space produces a change in the channel bed elevation over time (i.e., the Exner equation). Finally, we return to the top of the loop to calculate a new slope based on our new bed.

That’s it! Our delta model is now complete and we can outfit it with all sorts of bells and whistles to test hypotheses about delta evolution in the natural (or experimental) world. A powerful tool indeed!

Below is a simple movie output from this model that shows the results of our hard work! The complete code for the delta model can be found here.

Note that there is a small instability that grows at the front of the sediment wedge, this isn’t a huge problem depending on what you want to do with your model, but you can tweak dx and dt to make things run “smoother” (see the CFL number for more information).

This material is based upon work supported by the National Science Foundation (NSF) Graduate Research Fellowship under Grant No.145068 and NSF EAR-1427177. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Building a simple delta numerical model: Part V

Now we need to add a routine to update the channel bed, based upon the calculated change in sediment transport over space from the previous step. We’ll use the calculation from the last Part of the tutorial (get_dqsdx) in order to update the channel bed (η) at the end of each timestep. Define the following parameters

phi = 0.6; % bed porosity
If = 0.2; % intermittency factor

where If is an intermittency factor representing the fraction of the year that the river is experiencing significant morphodynamic activity. We are basically assuming that the only major change to the river occurs when the river is in flood. One year is the temporal resolution for the model, which we’ll define in the next Part.

function [eta] = update_eta(eta, dqsdx, phi, If, dtsec)
eta0 = eta;
eta = eta0 - ((1/(1-phi)) .* dqsdx .* (If * dtsec));
end

This module works simply by multiplying our vector for change in sediment transport capacity over space to the Exner equation (reproduced below) to evaluate the change in bed elevation over time (i.e., at the next timestep).

There isn’t really anything exciting to show at this stage, as we’ve only calculated the change in the bed for a given dqsdx vector, which represents only a single timestep. In the next Part, we’ll add a time routine to the model, completing the setup required for the simple delta model.

This material is based upon work supported by the National Science Foundation (NSF) Graduate Research Fellowship under Grant No.145068 and NSF EAR-1427177. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Building a simple delta numerical model: Part IV

In this part of “Building a simple delta numerical model”, we’ll write the part of our model that will solve the “Exner” equation, which determines changes in bed elevation through time. We’ll start with the module in this post, and then implement the time-updating routine in the next post of this series.

We’re going to make an assumption about the adaptation length of sediment in the flow in order to simplify the Exner equation. The full Exner equation contains two terms for sediment transport, a bed-material load and a suspended load term. The bed material load is probably transported predominantly as suspended load in an actual river, but because it is in frequent contact and exchange with the bed (i.e., suspension transport is unsteady), the flow and bed can be considered to be in equilibrium over short spatial scales, or rather, there is a short adaptation length.

It has been suggested that in fine-grain rivers the adaptation length of the flow to the bed conditions, and vice versa, is so long that sediment stays in suspension over a sufficiently long distance that the fraction of sediment mass held in suspension must be conserved separately from the bed fraction of sediment. We’re going to neglect this idea, because 1) we’re considering medium sand which likely has a short suspension period, and 2) the process may actually not be that important for fine-grain rivers either (An et al., in prep).

So, the Exner equation is then given by (Equation 1):

where λp = 0.6 is the channel-bed porosity, η is the bed elevation, t is time, qs is sediment transport capacity per-unit channel width, and x is the downstream coordinate. The form of Equation 1 shows that a reduction in the sediment transport over space (negative ∂qs/∂x) causes a positive change in the bed elevation over time (∂η/∂t).

In the last post we calculated a vector qs for the sediment transport capacity everywhere, and now we simply need to calculate the change in this sediment transport capacity over space (x). We’ll do this with a winding scheme and vectorized version of the calculation for speed. This guide is not meant to be comprehensive for solving PDEs or even winding calculations, and we already did a similar calculation for slope, so I’ll just present the code below:

au = 1; % winding coefficient
qsu = qs(1); % fixed equilibrium at upstream

function [dqsdx] = get_dqsdx(qs, qsu, nx, dx, au)
dqsdx = NaN(1, nx+1);% preallocate
dqsdx(nx+1) = (qs(nx+1) - qs(nx)) / dx; % gradient at downstream boundary, downwind always
dqsdx(1) = au*(qs(1)-qsu)/dx + (1-au)*(qs(2)-qs(1))/dx; % gradient at upstream boundary (qt at the ghost node is qt_u)
dqsdx(2:nx) = au*(qs(2:nx)-qs(1:nx-1))/dx + (1-au)*(qs(3:nx+1)-qs(2:nx))/dx; % use winding coefficient in central portion
end

We’ve introduced two new parameters in this step, a winding coefficient (au) and an upstream sediment feed rate (qsu). The sediment feed rate constitutes a necessary boundary condition for solving the equation. Here, we set this boundary to be in perfect equilibrium by defining the feed rate as the transport capacity of the first model cell; however, this condition has interesting implications for the model application (more discussion on this in Part VI!).

The calculation then, for a single t (which we implicitly set to be 1 second with out sediment transport calculation), the change in sediment transport capacity over space is maximized in the backwater region.

This vector is converted to the commensurate change in bed elevation over this single t by multiplying by (1/(1-λp)). Of course, we can consider many ts at once, and so the vector would again be multiplied by the timestep (Δt). We’ll consider this in the next post, where we add a time loop to the model, such that on ever timestep, the backwater, slope, sediment transport and change in bed elevation are each recalculated and updated so that we have a delta model which evolves in time!

The code for this part of the model is available here.

 

This material is based upon work supported by the National Science Foundation (NSF) Graduate Research Fellowship under Grant No.145068 and NSF EAR-1427177. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.