Dietrich Settling Velocity Matlab code

Just about two years ago, I published a code for calculating the settling velocity of a particle in water by the Dietrich, 1982 method. I later realized an error in the code that made it produce erroneous estimates. I replaced it with a corrected version of the code written in R, that you can still find on my Github page, but won’t link here. I’ve taken down the old Matlab code and unpublished the original post.

Since the first publication two years ago, I’ve dramatically improved my technical coding skills, and, just as importantly, learned a ton about how to share code with others. A new version of the code can be found linked a the end of this post, but first some philosophy about coding.

As scientists, we want to be able to reliably reproduce results and we have an attraction to openness and access to knowledge, as such, we inevitably share code with one another. I’ve shared a good bit of my code thus far in my career, and I’ve made a few notes about the matter below:

  1. It is better to organize your code in a way that may not be the most succinct (when speed is not an issue) while improving the readability of the code. In essence, making your code readable to those unfamiliar with it, so they can follow along with your logic.
  2. Comment everything. It is so easy to get into the groove of writing something and blow through it until it’s finished and then never return to clean up and document the code. When others (or even you) go back to look at it, simple comments explaining what a line does go miles further than a detailed explanation in the documentation.
  3. Write your code to withstand more that you are currently using it for. What I mean by this is to not write code that works only when you are manipulating this particular dataset or working this particular problem, but code that will work over a broader range of data or can be applied to another problem. Along the same line, use explicit values as sparingly as possible, e.g., if X = [1:12] is a 12 element vector and you want the middle value, do not use X(6), but instead use X(length(X)/2); or slightly better yet, X(ceil(length(X)/2)) to at least not throw an error and give you the first of the two middle rows if X is an odd number of elements in length.

 

Anyway, the code is uploaded to my Github and you can find it there. The documentation output on a help get_DSV call in Matlab is produced below.

get_dsv_documentation

 

References:
Dietrich, W. E. Settling velocity of natural particles. Water Resources Research 18, 1615–1626 (1982).

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No.1450681. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors(s) and do not necessarily reflect the views of the National Science Foundation.

Groundwater flow demonstration slides

I recently came across a set of old slides from the National Water Well Association, created in 1977. The slides can be found here as an imgur album. These images show principles of groundwater flow that were not novel at the time, but the images do a great job of showing (through time series primarily) how different mediums can drastically change groundwater flow. Further, the slides show the effects of groundwater extraction through natural (gaining streams) and anthropogenic (wells). Below is a gif of the first ~20 slides I put together, which shows the experimenters setup over the duration of the experiment. There is a constant “head”, or a presence of water, at the right side that is flowing towards the left side of the experimental setup, with some dye being injected at a few depths to show the flow lines.

groundwaterexperiment

I’ve included my favorite image from the set below, which shows (beautifully) a cone of depression. When initially drilling a well, it (obviously) is critical to put the well into the water table, such that the pumping end of the well is entirely submerged. Once the well is “switched on”, there is a drawdown of the water table above the well pump head.

groundwater14

The cone forms around the well extraction point with its deepest point at the well head, as the well pulls water in from ALL directions (below included!). This disturbs the flow lines, which is shown really well by this experiment. Note that this experiment just shows a 2D slice of the cone, when in reality the “aquifer” is 3D and would extend into and out of your computer screen, making the wedge you see here a cone in three dimensions.

Yellow River shoreline traces

For some ongoing research I needed a set of shoreline traces of the Yellow River delta that I could measure some properties of to compare modeling results to. My model predicts delta lobe growth rates, for example the growth of the Qingshuiguo lobe on the eastern side of the delta from 1976 to 1996.

early growth stage of Qingshuiguo lobe of the Yellow River delta, figure from van Gelder et al., 1994.

early growth stage of Qingshuiguo lobe of the Yellow River delta, figure from van Gelder et al., 1994.

It’s a little early for me to release the data I’ve collected from the shoreline traces (n=296), or the code I used to generate the traces — but below is a figure mapping all the shoreline traces I have from 1855 to present. My model can also predict rates of delta system growth (i.e., growth over many lobe cycles) and I have generated data from these traces that I can compare to as well. I like the below figure, because I think it nicely demonstrates the rapid reworking of delta lobes (yes even on “river dominated” deltas!); evidenced most clearly by the retreat of the Diakou lobe to the north side of the delta.

xom_shoreline_map

 

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No.1450681. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors(s) and do not necessarily reflect the views of the National Science Foundation.

Yellow River (Huanghe) 2016 survey plans

I’ll be departing soon for my second field campaign on the Yellow River, China. We traveled there around the same time last year to generate a dataset we could begin to explore and develop research ideas; now, a year later, we will return with a more focused plan to gather the data we need to address our research questions. We will bring with us a suite of sophisticated data collection equipment. Below is a terrible sketch I made to demonstrate most of the data we will collect at each of our stations located on the river.

YR2016_deployment

Pretend that the grey thing floating on the water might look like a boat, viewed from the back. We aim to characterize the mechanisms for sediment transport within the Yellow River at various stages of a flood discharge curve. We use a point-integrating sediment sampler to collect numerous suspended sediment and water samples from various depths in the water column (yellow stars). In order to compare the compositions of suspended material to its source (i.e., the bed), we use a Shipek grab sampling device to collect sediment from the bed for our analysis. Deployed off the other side of the boat, we will can characterize the velocity structure of the water column through the use of an aDcp (acoustic velocity profiler) and a mechanical propeller driven velocimeter for near-bed measurements where the aDcp may lack resolution.

This setup represents only a portion of our survey plans, of course we’ll be there for six weeks that will generate a diverse and (hopefully!) comprehensive dataset for our future work.

Hiking Garmisch-Partenkirchen

I’m posting this in the off-chance it may help some future traveler who wants to find the same kind of day hike I did, so that he/she can enjoy it as much as I did. I recently spent a few days in München and wanted to take a day trip to the Alps for a hike. I was having an awful time finding any actual hiking routes, so I just kind of figured I would wing it once I got there. So I boarded a train the south for a city called Garmisch-Partenkirchen. On the train I asked a fellow rider (German) if they knew of any hiking trail out of G-P. While not immensely helpful, he did suggest to me my starting point.

2016-05-14 16.25.12

The train station at Garmisch-Partenkirchen

The idea was to set out from the train station in G-P for a gorge called Partnachklamm (the Partnach river is the river that has cut the gorge); from there I would just continue climbing and wandering up trails and the mountains until I got tired and then head back down. The Partnachklamm begins about 1 km up a small road that begins at the Olympic Ski Jump facility. The hike I followed was ~19 km, or approximately 10 km up and 9km down. HERE is a .kmz file so that you can see the approximate path I followed, you may have to click “view raw” in order to initiate the download (please note that I just mapped this now in Google Earth, and it does not necessarily map the exact trails I followed!!).

So for some direction, the hike I enjoyed could be mapped as follows (be sure to enjoy the scenery along the way too!).

  1. Head out to the north and east of the train station, along St. Martin-Straße toward the east. When you near the river, head south on Partnachhauenstraße which runs along the river. Here there is a neat “geology of the Alps” exhibit that actually has some spectacular specimens. The path crosses the river at Schornstraße. Continue to follow the path to the ski jump stadium.
  2. To the west of the stadium (right hand side as seen from your approach) is a paved road. Follow it (and signs) for the Partnachklamm. There is a 4 euro entrance fee to the Partnachklamm, which may seem unfair, but it was definitely worth while to me. Enjoy the hike through the Partnachklamm.
  3. Once out of the gorge, you will follow the path up a series of switchbacks, following signs for “Das Graseck”. This will take ~30 minutes.
  4. Continue beyond Das Graseck and take the paved road up the hill on your right. Eventually the pavement ends in favor of a dirt path at the “peak” of a hill. You then want to follow the small dirt path up, in the steepest direction up a long series of switchbacks through a dense pine forest. This will be the path to the “Berggasthof Eckbauer”. This will take ~45 to 60 minutes.
  5. Finally you will reach the Berggasthof Eckbauer. Have a pint and enjoy the spectacular view.2016-05-14 14.03.29
    2016-05-14 14.09.20
  6. My final piece of advice is when you return down, you will want to follow the same path you followed up, but once you pass Das Greseck, you will follow a steep path down and to the right. This path will lead you down to some waterfalls and you will see a sign warning that the trail is for experienced hikers only. You should heed this warning, but also don’t be frightened by it. This path will return you to the “tollbooth” of the Partnachklamm but allow you to walk above the gorge and get some different perspective.

Lastly, here is my new computer background. Go and hike the spectacular German Alps.

2016-05-14 15.01.11

TeXlive install process, or developing an intelligent download timer: Part 2

In part one of this series, I presented the download predictions for a program installation. The ultimate goal here, is to develop a method for accurately predicting total download times from the beginning of the download process. To elaborate, as the download progresses, it should become increasingly easy to make an accurate prediction of the time remaining, because there is increasingly less and less to download, and you have increasingly more and more information about the download history.

Naturally, the first thing we want to do is see if the data actually follow any sort of trend. In theory, larger packages in the TexLive install should take longer, but the time/volume of each should be roughly the same, and should remain roughly constant (or really vary about some mean constantly) over time. The easiest way to determine this would be to simply plot the download speed over the duration of the download.

secperkB_overtime

The speed varies significantly, but that’s okay. It, qualitatively, appears that the distribution of speeds randomly varies about a mean value. This is good for us, because it means that there is no trend in the speed over time, like we would see for example if the mean speed were changing over time.

This means that we can build a model of download speeds to predict the total download time. If we simply fit a linear model to the data above (i.e., elapsed package time ~ size) we find that the data are reasonably well explained (r^2 = 0.60) by a line of slope = 3.003797e-4 and intercept = 3.661338e-1.model

Then, we can use our linear model to evaluate, in essence, to predict the time it will take to download each package based on the size of each package and then sum them to produce a prediction for the total download time. Evaluation produces a predicted total download time of 29:26 mm:ss (plotted as dashed line below).

timeseries_witheval

29:26 happens to be the exact time that our download took. That means that despite all the variations in download speeds, the mean over time was so constant that a simple linear model (a constant download speed) perfectly predicts the observed data; perhaps this is not surprising when you see the roughly constant-slope red line above.

Now, this model was based on perfect information at the end of the download, but in the next post, we’ll explore a common, simple, and popular prediction algorithm as a test of an a priori and ongoing prediction tool.

TeXlive install process, or developing an intelligent download timer: Part 1

I recently got a new laptop and during the process of setting up to my preferences, I install LaTeX through TeXlive. This means a massive download of many small packages that get included in the LaTeX install. In effect, this is how all software downloads go, many small parts that make up the whole. Installing TeXlive on Linux gave me the chance to actually see the report of the download, and of course to save it and plot it up after completion. Here is what the data output to the console looks like during install:data

After 3 downloads, the installer makes a prediction of the total time, and then reports the elapsed time against predicted time, along with some information about the current download. If we take this information for all 3188 packages and parse it apart for the desired information, we can plot the actual time, versus predicted time, so see how the prediction performs over time.timeseries

There are some pretty large swings in the predicted time at the beginning of the model, but by about 25% of the total download by size, the prediction becomes pretty stable, making only minor corrections. The corrections continue until the very end of the downloads.

Download time prediction is a really interesting problem to work on, since you are attempting to control for download speed which is largely dependent on things outside the realm of the personal computer and is likely to vary over timescales longer than a few minutes. I’ll be making a few posts about this topic over the next months, culminating with what I hope is a simple, fast, and accurate download time prediction algorithm. More to come!

Gastroliths in the Morrison Fm

I’ve recently returned from a trip to New Mexico for the field methods class of which I am a TA. It was an awesome trip and great experience for me in teaching, but that’s another story. The field area we work in is within the Jurrassic Morrison depositional basin (active roughly during the Kimmeridigan ~157-152 Ma). Within the Morrison Formation is the Brushy Basin member (abbreviated Jmb), renowned for the abundance of dinosaur fossils found within the rock unit. Jmb is found within our field area, so I told the students to keep an eye out for any good finds when walking with the unit.

Although Jmb has an abundance of fossils, we didn’t find any. BUT, the Morrison is also famous for its bounty of another paleontological tool, the gastrolith. Gastroliths are interpreted as stones that dinosaurs would ingest in order to aid with breaking down food and aiding in digestion. A gastrolith may be more generally defined as “a hard object of no caloric value (e.g., a stone, natural or pathological concretion) which is, or was, retained in the digestive tract of an animal” [1].

There are gastroliths all over within the Jmb, some small, and some larger. Below are three photos of the “best” gastrolith I found on our trip.

gastrolith3 gastrolith2 gastrolith1

Gastroliths are often recognized by their very smoothed and polished appearance (some other examples here). I suppose that to be certain my rock is a gastrolith, and not simply a rock polished by water or wind, it should be found in association with the remains of the animal it was within. Regardless, I’m really glad to be able to add a rock with such an interesting back-story to my collection.

 

 

Google Opinion Rewards

Anyone who knows me, knows that I’m a Google fanboy. I don’t own any Apple products and will jump at any opportunity to tell anyone around me why Google is superior to Apple. Let me share with you one reason. Google Opinion Rewards is an app for your phone that periodically offers you the option to take a short survey.

Sometimes the survey is about a place you have visited recently (based on location history) and sometimes it is something completely random; but always, it only takes a few moments. You get paid (“rewarded”) for your time with Google Play Store credit. I take the surveys when I get a free second and I actually enjoy them; I think it’s kind of fun to know what companies want to know from you (the surveys are sponsored by other non-Google companies).

I first signed up for Google Opinion Rewards just over two years ago and wanted to see how much money I had made for what is basically zero time and effort investment. The app allows you to see your rewards history, but only on your phone. So I had to get a little creative here…I ended up taking screen grabs of all of the “data” and putting them through an image to text converter (free-ocr.com) to get the data into text form I could use.

2016-01-06 17.48.19

the screen captured form of the data

Below is the distribution of reward amounts and a cumulative total of the reward amounts received from the service.google_rewards_hist_hist
google_rewards_hist
Unsurprisingly, the bulk of reward amounts are on the low end of the spectrum. 10 cents is the mode, which is essentially the “thanks but no thanks” reward for your opinion. 25 cents is a popular number too, but interestingly 50 cents is not.

The cumulative plot shows that there have been some “dry spells” and some “hot streaks” to my survey responses, but I’ve been pretty consistently rewarded, especially in the 2015 year. There is no trend in the amount of a reward over time since I signed up for the service.

I have earned a total of $52.50 in Google Play credit for doing what I would effectively call doing nothing. That’s a pretty great deal, now of course you can only use the credits in Google’s own store, but hey that’s pretty good considering you can get apps, music, movies, and books there. For me, it has meant that I can buy any app whenever I want one without hesitation, for “free”. I have about $20 of credit sitting around right now, maybe I’ll go buy some Snapchat head effects…

History of the Houston Rodeo performances

The Houston Livestock Show and Rodeo is one of Houston’s largest and most famous annual events. Now, I won’t claim to know much about the Houston Rodeo, heck, I’ve only been to the Rodeo once, and have lived in Houston for a little over a year and a half! I went to look for the lineup for 2016 to see what show(s) I may want to see, but they haven’t released the lineup yet (comes out Jan 11 2016). I got curious of what the history of the event was like, and conveniently, they have a past performers page; this is the base source for the data used in this post.

First, I pulled apart the data on the page and built a dataset of each performer and every year they performed. The code I used to do this is an absolute mess so I’m not even going to share it, but I will post the dataset here (.rds file). Basically, I had to convert all the non-formatted year data, to clean uniformly formatted lists of years for each artist.

hr_hist

Above is the histogram of the number of performances across all the performers. As expected, the distribution is skewed right, towards the higher number of performances per performer. Just over 51% of performers have only performed one time, and 75% of performers have performed fewer than 3 times. This actually surprised me, I expected to see even fewer repeat performers. There have been a lot of big names come to the Rodeo over the years. The record for the most performances (25) is held by Wynonna Judd (Wynonna).

I then wanted to see how the number of shows per year changed over time, since the start of the Rodeo.

hr_peryr

The above plot shows every year since the beginning of the Rodeo (1931) to the most recent completed event (2015). The blue line is a Loess smoothing of the data. Now, I think that the number of performances corresponds with the number days of the Rodeo (i.e. one concert a night), but I don’t have any data to confirm this. It looks like the number of concerts in recent years has declined, but I’m not sure if the event has also been shortened (e.g. from 30 to 20 days). Let’s compare that with the attendance figures from the Rodeo.
hr_compsDespite fewer performances per year since the mid 1990s, the attendance has continued to climb. Perhaps the planners realized they could lower the number of performers (i.e. cost) and still have people come to the Rodeo. The Rodeo is a charity that raises money for scholarships and such, so more excess revenue means more scholarships! Even without knowing why the planners decided to reduce the number of performers per year, it looks like the decision was a good one.

If we look back at the 2016 concerts announcement page, you can see that they list the genre of the shows each night, but not yet the performers. I wanted to see how the division of genre of performers has changed over the years of the Rodeo. So, I used my dataset and the Last.fm API to get the top two user submitted “tags” for each artist. I then classed the performers into 8 different genres based on these tags. Most of the tags are genres so about 70% of the data was easy to class, I then manually binned all the remaining artists into the genres, trying to be as unbiased as possible.

hr_breakdown

It’s immediately clear that since the beginning, country music has always dominated the Houston Rodeo lineup. I think it’s interesting to see the increase in variety of music since the late 1990s, beginning to include a lot more Latin music and pop. I should caveat though, that the appearance of pop music may be complicated by the fact that what was once considered “pop” is now considered “oldies”. There have been a few comedians throughout the Rodeo’s run, but none in recent years. 2016 will feature 20 performances again, with a split that looks pretty darn similar to 2015, with a few substitutions:

hr_2016