Blog

StratGAN

Coupling a generative adversarial network with image quilting to produce basin-scale realizations of fluvial stratigraphy.

This was a project that began out of an interest to educate myself on machine learning topics, in association with a class I organized. The project goal was to explore subsurface uncertainty quantification through the use of GANs, and it is meant to be an exploratory analysis and thought experiment. I have decided to open source the project so that it is available as part of my professional portfolio.

This post contains some brief explanation of the model, but omits a lot of detail.
I hope to detail the model in a manuscript in the future.

The GitHub repository with the complete code can be found here!

Abstract

Subsurface reservoir size estimates involve considerable uncertainty, which impacts the quality of reserve size and valuation models. Conventional rules-based and process-based numerical models are useful to model this uncertainty, but involve simplifying assumptions about depositional environments and reservoir geology that may be poorly constrained. Generative adversarial neural networks (GANs) are a machine learning model that is optimized to produce synthetic data that are indistinguishable from an arbitrary training dataset, and are an attractive tool for modeling subsurface reservoirs. We have developed a generative adversarial network that is trained on laboratory experimental stratigraphy and produces realizations of basin-scale reservoir geology, while honoring ground-truth well log data. In this way, StratGAN reduces subsurface uncertainty through a large sampling of realistic potential rock geometries throughout a reservoir, without any a priori assumptions about the geology. StratGAN couples a deep-convolutional generative adversarial network (DCGAN) with an image quilting algorithm to scale up channel-scale realizations of reservoir geometries to basin-scale realizations.

Model description

The model is a deep-convolutional generative adversarial network (DCGAN) that has been trained on laboratory experiment data. I use custom tensorflow implementations of convolutional layers and dense layers, which include my flexible batch normalization operation. The GAN is trained on patches cropped out of slices from the Tulane Delta Basin 10-1 experiment. Each of 6 slices was treated as separate categorical labels and fed into the GAN as a one-hot label vector.

Figure 1: example of slice from Tulane Delta Basin experiment that was cropped to produce training data. Red box is approximate size of cropped training image. Note that this image shows only a small part of the slice.

Demonstration of workflow

Patches from the GAN are at approximately the channel scale, where black pixels represent channel, and white is non-channel.

Figure 2: patch realization from the trained GAN.

I have implemented Efros-Freeman minimum-cost-boundary image quilting, which stiches together the patches from the GAN (channel scale) to make a single realization at the basin scale. The patches fed to the E-F algorithm are optimized via conditional inpainting methods (e.g., Dupont et al., 2018) to match along the quilting overlap, as well as at any ground-truth data (e.g., cores).

Together, this method optimizes for the best patch possible, then identifies the best path for cut-and-pasting in the image quilting stage.
I have produced routines to include ground-truth data in the basin scale realizations, such as vertical core-logs that record channel and non-channel intervals.

Figure 3: Example of ground-truthed realization. Similar to image at top of page, but has coarser intervals of channel bodies in the core logs. This demonstrates the variability of realizations from the model.

We can produce an ensemble of ~100 realizations from the model, and average the samples to produce a static image, quantifying the expected probability of reservoir presence at any location.

Figure 4: Ensemble average realization for bottom panel in Figure 3, darker color indicates higher probability of reservoir presence.

Going one step further, we can query the size of the reservoir connected to a specific channel interval in a core log (red dashed line in Figure 4).
This analysis gives a low-side-high-side estimate of reservoir size.

Figure 4: Quantification of probability of reservoir size at red dashed box in Figure 4. We can quantify p10, p50, p90 estimates to give a low-side-high-side estimate of reservoir size.

The model is a proof-of-concept for using GANs in subsurface uncertainty quantification.

Density stratification in the Yellow River, China

I recently submitted another dissertation chapter for publication, this one titled “Suspended-sediment induced stratification inferred from concentration and velocity profile measurements in the lower Yellow River, China”.

The study examines the vertical structure of velocity and concentration profiles in the Yellow River. In particular, we examine how the addition of sediment modulates the turbulent kinetic energy budget of the flow. The addition of sediment to the flow (in this case in very high concentrations) causes the flow to become stratified, and inhibits vertical mixing.

When you break down the stratification by sediment grain sizes, we found that the smallest grain size classes <25 um additionally extract momentum from the flow. This was surprising, since conventional notion dictates that this sediment would be uniformly stratified and behave as “washload”.

Stratification adjustment coefficient α measured in the Yellow River, China. Values of α<1 imply a density stratification effect. Panel a) total concentration profile adjustment. Panel b) grain-size specific adjustment to the concentration profiles.

Look out for the study in Water Resources Research!

Machine learning in Geoscience Seminar: syllabus and review

I led the organization of a “Machine Learning in Geosciences” seminar this fall (2018). I did not do it alone, I worked with my advisor Jeff Nittrouer, and Texas AM professor Ryan Ewing. My responsibilities included selection of reading material for the course; Jeff and Ryan handled the invited speakers and student presentations.

The seminar was not for credit, but we did have a healthy 10-12 students and faculty participate throughout the semester. Ultimately I think the seminar was a huge success; I learned an absolute ton about machine learning, and consider myself to be literate on the subject now. In fact, I’ve even begun to incorporate some deep learning into my research and have written a proposal for funding to continue to pursue this endeavor. 

The course design was roughly as follows:

  • four (4) weeks of background reading on machine learning tools, techniques, vocabulary
  • four (4) weeks of primary scientific readings on various geoscience subjects
  • three (3) weeks of invited presenters
  • two (2) weeks of student project presentations

You can view the complete, detailed syllabus we created here: https://docs.google.com/document/d/1Bmq9m96jLT-WlDNlsAu_jYB6J-qg-Uxg0eg0FIg04Gs/edit?usp=sharing
This page contains a ton of resources.

AGU posters 2018: SedEdu and Density Stratification

I’m presenting two posters at AGU this year!

I’m really excited to be sharing a project I’ve been working on which integrates research and teaching. I am giving a poster Friday morning titled: “SedEdu: developing and testing a suite of computer-based interactive educational activities for introductory sedimentology and stratigraphy courses”. The poster will explain my SedEdu project and particularly emphasize the rivers2stratigraphy moduleHere is the abstract. Below is the poster, click the image to view a pdf.


http://andrewjmoodie.com/wp-content/uploads/2018/12/AGU_2018_poster_2.pdf


I’m also looking forward to sharing an update on my ongoing research on the sediment transport in the Yellow River, China. This poster will be on Friday afternoon, and is titled: “Suspended-sediment induced stratification inferred from concentration and velocity profile measurements in the flooding lower Yellow River, China”. Here is the abstract.

If you’re going to be at AGU, let’s chat!

The Rouse-Vannoni-Ippen concentration profile interactive module

I have made another interactive GUI toy model thing for the teaching of the Rouse concentration profile. The activity is really simple right now, but I may add more features in the future. You can see the source code and download the module here.

You can specify the grain size and shear velocity (two of the major controls on the Rouse profile) and see how the profile responds in a simplified model (helpful for teaching) and the full Rouse model.

The simplified model is useful for teaching the mechanics of the concentration profile and can be defined as:

where c is the concentration at height above the bed z, cb is a known reference concentration defined at height b above the bed, and ZR is the Rouse number given as:

where ws is the settling velocity of the grain size in question, α and κ are constants equal to 1 and 0.41, and u* is the shear velocity.

You can see from the form of this simplified model how, for a constant Rouse number, the profile is an exponential decay from the reference concentration. It is also easy to see how changing the Rouse number, through a change in grain size (settling velocity) or shear velocity will change the rate of decay.

The full model has just an additional term to consider, but has the same basic exponential decay form. Here, H is just the total flow depth.

 

The module uses the Garcia-Parker entrainment relation to calculate near-bed concentration, and used the Ferguson-Church relation for settling velocity.

 

 

 

https://github.com/sededu/rouse_model_toy

The Graduate Interdisciplinary Earth Science Symposia: a year in review, and looking to the future

Below is an article I wrote for our department newsletter about the GIESS symposia. I’m publishing it here because it didn’t make the cut for the newsletter in this cycle, and it’s kind of a time-sensitive article because it justifies moving forward with GIESS in the current format.

There is a short description of what GIESS is in the third paragraph.

———————————————————————————————————————————–

The Graduate Interdisciplinary Earth Science Symposia:
a year in review, and looking to the future

Andrew J. Moodie

Scientists are typically effective written communicators, since professional success in academia is so closely linked with funded grant proposals and published manuscripts. However, oral communication skills are frequently considered subordinate and not consciously developed and practiced by early-career scientists.

In the summer of 2017, a group of EEPS graduate students addressed this training gap and the shortcomings of a weekly departmental seminar series by launching the Graduate Interdisciplinary Earth Science Symposia, or GIESS.

GIESS (pronounced “geese”) provides a forum for oral communication skill development while simultaneously encouraging department-wide participation in advancing the presenter’s science. The GIESS committee’s plan was to decrease the burden from weekly to monthly, limiting the seminar to two speakers, and intentionally adding prestige to presenting, thereby increasing the quality of the talks. Furthermore, the plan introduced “pop-up talks”, brief presentations allowing no more than two slides for no more than two minutes, as a lower-stakes opportunity to practice oral communication. Finally, lunch would be provided for participants to bolster attendance, and year-end awards would be given to select speakers and participants. Finally, department members were invited to provide feedback through an online survey at the end of the Spring semester to assess the inaugural year of the GIESS. The survey garnered 22 responses, predominantly from graduate students (19 of 22, 86%).

a) respondents overwhelmingly agreed that the GIESS was an improvement over the old seminar format. b) a paired t-test determines at above the 99% confidence level that the perceived quality of talks was improved by the GIESS format (p = 0.0001), based on respondents’ declared average quality of talks in the old seminar format and in the GIESS format on an integer scale from 1–10. c) respondents agreed that the pop-ups were a good addition to the GIESS. d) unsurprisingly, respondents were very happy with lunch. It should be noted that the survey designer (that’s me) has no training in survey design and the sample size is small, so the results presented herein should not be expected to withstand rigorous methodological or statistical scrutiny.

 

Overwhelmingly, participants agreed that the GIESS was an improvement over the old Friday Seminar format (Figure 1a, 68%), suggesting that the committee’s primary objective to improve the program was achieved. More importantly, the committee met the goal to improve the quality of the talks. A paired t-test determines at above the 99% confidence level that the perceived quality of talks was improved by the GIESS format (Figure 1b, p = 0.0001), based on respondents’ declared average quality of talks in the old seminar format and in the GIESS format on an integer scale from 1–10. The mean score given to talk quality improved from 4.9 to 7.6 between the old format and the GIESS format, respectively. The committee intends to have the presentation format in the GIESS remain largely the same in the 2018–19 year, possibly tweaking the duration of each talk based on respondent feedback.

Pop-up talks were a new addition, so a close look at participant opinions is warranted. Pop-up talks were scheduled between the two main presentations of the meeting, with typically three to five people participating. The addition of a short presentation style was decidedly favored (Figure 1c), with more than 80% of respondents agreeing that pop-ups were a positive addition. However, three respondents (15%) felt strongly that pop-up talks were a bad idea, though none clarified their position in an open-ended response. Therefore, the committee will keep pop-ups as a permanent feature of the GIESS. The committee also surveyed respondents about allotting time for audience questions of pop-up presenters. Seven respondents stated there should be no questions, seven were neutral on the issue, and seven think it would be a good idea; in the 2018–19 year, the committee will explore allowing questions directed at pop-up presenters.

Unsurprisingly, when more than 80% of survey respondents are hand-to-mouth graduate students, offering a free lunch to participants was favored; 20 out of 21 survey respondents agreed that providing lunch was a positive (Figure 1d). Attendance at the GIESS was vastly improved from the old seminar format, however, the number of students and faculty in attendance faltered late into the year. The committee tentatively attributes the increased attendance to lunch, though hopefully participants also came for the programming. In specific lunch-related feedback, one student asked for a “dessert table” next year, and more than one respondent added that they would like to see “platters of Chick-fil-A spicy chicken sandwiches with tangy Polynesian sauce.” Noted.

Finally, the GIESS committee would like to recognize the award winners who gave outstanding presentations, creative pop-ups, and engaged throughout the symposia. Best Talk awards for the year go to Brandee Carlson and David Blank, whose research presentations are respectively titled “Tie channels on deltas: A case study from the Huanghe (Yellow River) delta, China” and “Discrete element method as a tool for simulating megathrust earthquakes: insights into stress transfer”. Chenliang Wu and Cailey Condit received the Best Pop-up awards for presentations that were both fun and informative. The Best Participant awards were given to two first-year students Michael Lara and Patrick Phelps, because of their active engagement as participants in all aspects of the GIESS.

I would like to directly thank the committee members who helped make this inaugural year of the GIESS an outstanding success: Laura B. Carter, James Eguchi, Sahand Hajimirza, Harsh Vora, and Daniel Woodworth. Let’s make it even better next year.

 

Predicting equilibrium channel geometry with a neural network

In an attempt to learn more about ML I decided to just jump in and try a project. Predicting channel geometry with a simple neural network.

[source code]

All in all, I’m fairly sure I didn’t learn anything about equilibrium channel geometries, but I had some fun and learned an awful lot about machine learning and neural networks. A lot of the concepts I have been reading about for weeks finally clicked when I actually started to implement the simple network.

I decided to use the data from Li et al., 2015 [paper link here], which contains data for 231 river geometries.

The dataset has variable bankfull discharge, width, depth, channel slope and bed material D50 grain size.

We want to be able to predict the width, depth, and slope from the discharge and grain size alone. This is typically a problem, because we are trying to map two input features into three output features. In this case though, the model works because the output H and B are highly correlated.

The network is a simple ANN, with one hidden layer with 3 nodes. Trained with stochastic gradient descent. Training curve below.

Matlab speed comparison of switch-case and if-then statements and hard-code

I frequently use the extremely methodical approach in scientific programming of “just trying things”. This means that I create a lot of different ways to try to do something to find the best way to do that thing. Sometimes this means getting a realistic result, a stable result, or a fast computation.

All functional programming languages offer if-then statements, whereby sections of code are evaluated on True evaluations of a statement. An if-then statement can be extended with an elseif to provide further evaluations to compare with for True. Using this framework, different “cases” or “methods” of doing a single thing can be quickly tested by switching between them in a script. The purpose of all is to make the code more readable and maintainable, by not having to comment/uncomment blocks of code to make a change in the calculation method.

For example, two methods x = 'test1' | 'test2' could be executed by a function, depending on the definition of x:

if strcmp(x, 'test1')
y = 1;
elseif strcmp(x, 'test2')
y = 2;
end

A similar functionality can be obtained with a switch-case statement:

switch x
case 'test1'
y = 1;
case 'test2'
y = 2;
end

But which is faster?? I suppose I always knew switch-case statements were slower than if-then statements, but I wanted to know how much of a difference it could make. I also often have >4 of these case statements as optional methods, and I wanted to know if the number of cases (or really how “deep” down the list they were) made a difference in speed.

I designed a simple experiment in Matlab to test this out. I looped through a simple set of statements 1 million times, and timed each scenario. You can find the source code here.

It turns out that switch-cases are about ~30% slower than if-then statements. Both are more than an order of magnitude slower than.

Most importantly though, the time increases linearly with each “layer” to the if-then or case-switch statement. To me, this stressed the importance of 1) having few cases that aren’t simply hard-coded in, or 2) at least sort the cases in order of likelihood of being used during program execution.

Markov Chain stratigraphic model

I recently returned from the NCED2 Summer Institute for Earth Surface Dynamics at Saint Anthony Falls Laboratory at University of Minnesota, which is a 10-day long, intense workshop for early-career scientists to talk about novel methods and ideas in our field. This was the ninth, and hopefully not last, year of the meeting.

Typically the participants will do some sort of physical experiment (kind-of what the lab is known for being the best in the world for), but this year’s theme was about mathematics in Earth surface processes. We covered a range of subjects, but a lot of discussion came back to random walks, diffusion-like processes/simulations, and probability. Kelly Sanks and I worked on a project together, which combined a lot of these ideas.

Our question was: is deltaic sedimentation, and the resulting stratigraphy random? Specifically, we hypothesized that a pure “random walk” model can not capture the effect of the “stratigraphic filter”. However, a biased random walk model, where Δz depends on the previous time’s Δz, can capture the dynamics of the system sufficiently to look like an actual stratigraphic sequence. The null hypothesis then is that both random walk models are indistinguishable from a stratigraphic succession.

To attack the problem, we decided to use a simple biased random walk model: Markov chains. These models have no physics incorporated, only probability, which made the project simple and easy to complete over beers at night. The Markov chain can be summarized as a sequence of “states” where the next state is chosen at random based on some probability to change to another given state or stay at the same state. Repeating this over and over gives a chain/sequence of states. In our simulation, the “states” were changes in elevation. Said another way, the next timestep’s change in elevation is dependent on the current timestep’s change in elevation. This hypothesis is grounded in the physics of river-deltas, whereby channel location (and thus locus of deposition/erosion elevation change) does not change immediately and at random, but rather is somewhat gradual.

When a system has more than a few states it becomes complicated to keep track of the probabilities, so we naturally use a matrix to define the probabilities. The so called “transition matrix” records the probability that given a current state (rows), the next state will be any given state (columns).

We used the Tulane Delta Basin experiment 12-1 which has a constant Qw/Qs ratio and RSLR rate for our data. This is kind of necessary for our question because we needed to know what the elevation was at some regular interval, and what the resulting stratigraphy after deposition and erosion was. The experimental delta surface was measured to mm accuracy every hour for 900 hours. We calculated Δz values over the entire experiment spatiotemporal domain, to inform our main biased random walk model.

Markov transition matrix of dz values calculated from the experimental elevation profiles. (axes are in mm/hr)

The main data-trained transition matrix shows that states will tend towards slightly aggradational. This makes sense since this system is deterministically net aggradation due to a RSLR forcing. We compare this data-trained model with two synthetic transition matrices: a Gaussian distribution (intentionally similar to the data) and a uniform distribution (i.e., purely random).

 

 

 

 

 

We then simulated the elevation change sequences predicted by each of the transition matrices and used the stratigraphic filter concept to determine the resultant stratigraphy. We did this a bunch of times, and totally chose one of the more “convincing” results (i.e., where the slope difference between simulations was larger).

a SINGLE realization of the modeled Markov sequences.

We found that the data-trained model reproduced the stratigraphic sequences from the experiment (based only on comparing slopes…). The other models were not too far off, suggesting that deltaic sedimentation is in fact not too far off from purely random.

Ultimately, we would simulate the models hundreds of times and make ensembles of the results to interpret. We also would use more meaningful statistics of the stratigraphic sequences to compare model and data, such as # of “channel cuts”, unconformity-bed thicknesses, “drowned” sequences, etc.

The main source code for all this lives on github here (https://github.com/amoodie/markov_delta) if you are interested in checking anything out or exploring some more. You will need to obtain the data from the SEN website listed above, or ask me to send you a copy of the data formatted into a `.mat` file (it’s too big to put on here…).
If you format the data yourself, produce a 3D matrix with each cell where the x-y-z coordinates are time-across-along.

Authors of the code and project are Kelly Sanks (@kmsanks) and Andrew J. Moodie (@amoodie).

Rivers 2 Stratigraphy

Explore the construction of stratigraphy through the interaction of channel geometry, lateral migration rate, subsidence rate, and avulsion frequency — interactively!

Imagine there is a river in a subsiding basin. As the river laterally migrates, it leaves behind sandy bar deposits, which are encapsulated in a matrix of floodplain muds.

The river deposits are lowered away from the basin surface by subsidence, they become part of the stratigraphic record (Figure below). Depending on the amount of lateral migration, frequency of avulsion, and subsidence rate, the degree of overlap of channel bodies in the stratigraphic deposit will vary. The movie above shows a vertical slice across the basin width, and a channel (the colorful boxes) laterally migrating across a river floodplain.

The movie comes from an educational module I am working on called Rivers2stratigraphy (https://github.com/amoodie/rivers2stratigraphy). The module is open source and relies only on Python dependencies.

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No.1450681. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors(s) and do not necessarily reflect the views of the National Science Foundation.