# Building a simple delta numerical model: Part V

Now we need to add a routine to update the channel bed, based upon the calculated change in sediment transport over space from the previous step. We’ll use the calculation from the last Part of the tutorial (get_dqsdx) in order to update the channel bed (η) at the end of each timestep. Define the following parameters

```phi = 0.6; % bed porosity If = 0.2; % intermittency factor ```

where If is an intermittency factor representing the fraction of the year that the river is experiencing significant morphodynamic activity. We are basically assuming that the only major change to the river occurs when the river is in flood. One year is the temporal resolution for the model, which we’ll define in the next Part.

```function [eta] = update_eta(eta, dqsdx, phi, If, dtsec) eta0 = eta; eta = eta0 - ((1/(1-phi)) .* dqsdx .* (If * dtsec)); end```

This module works simply by multiplying our vector for change in sediment transport capacity over space to the Exner equation (reproduced below) to evaluate the change in bed elevation over time (i.e., at the next timestep). There isn’t really anything exciting to show at this stage, as we’ve only calculated the change in the bed for a given dqsdx vector, which represents only a single timestep. In the next Part, we’ll add a time routine to the model, completing the setup required for the simple delta model.

This material is based upon work supported by the National Science Foundation (NSF) Graduate Research Fellowship under Grant No.145068 and NSF EAR-1427177. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

# Building a simple delta numerical model: Part IV

In this part of “Building a simple delta numerical model”, we’ll write the part of our model that will solve the “Exner” equation, which determines changes in bed elevation through time. We’ll start with the module in this post, and then implement the time-updating routine in the next post of this series.

We’re going to make an assumption about the adaptation length of sediment in the flow in order to simplify the Exner equation. The full Exner equation contains two terms for sediment transport, a bed-material load and a suspended load term. The bed material load is probably transported predominantly as suspended load in an actual river, but because it is in frequent contact and exchange with the bed (i.e., suspension transport is unsteady), the flow and bed can be considered to be in equilibrium over short spatial scales, or rather, there is a short adaptation length.

It has been suggested that in fine-grain rivers the adaptation length of the flow to the bed conditions, and vice versa, is so long that sediment stays in suspension over a sufficiently long distance that the fraction of sediment mass held in suspension must be conserved separately from the bed fraction of sediment. We’re going to neglect this idea, because 1) we’re considering medium sand which likely has a short suspension period, and 2) the process may actually not be that important for fine-grain rivers either (An et al., in prep).

So, the Exner equation is then given by (Equation 1): where λp = 0.6 is the channel-bed porosity, η is the bed elevation, t is time, qs is sediment transport capacity per-unit channel width, and x is the downstream coordinate. The form of Equation 1 shows that a reduction in the sediment transport over space (negative ∂qs/∂x) causes a positive change in the bed elevation over time (∂η/∂t).

In the last post we calculated a vector qs for the sediment transport capacity everywhere, and now we simply need to calculate the change in this sediment transport capacity over space (x). We’ll do this with a winding scheme and vectorized version of the calculation for speed. This guide is not meant to be comprehensive for solving PDEs or even winding calculations, and we already did a similar calculation for slope, so I’ll just present the code below:
``` au = 1; % winding coefficient qsu = qs(1); % fixed equilibrium at upstream```

```function [dqsdx] = get_dqsdx(qs, qsu, nx, dx, au) dqsdx = NaN(1, nx+1);% preallocate dqsdx(nx+1) = (qs(nx+1) - qs(nx)) / dx; % gradient at downstream boundary, downwind always dqsdx(1) = au*(qs(1)-qsu)/dx + (1-au)*(qs(2)-qs(1))/dx; % gradient at upstream boundary (qt at the ghost node is qt_u) dqsdx(2:nx) = au*(qs(2:nx)-qs(1:nx-1))/dx + (1-au)*(qs(3:nx+1)-qs(2:nx))/dx; % use winding coefficient in central portion end```

We’ve introduced two new parameters in this step, a winding coefficient (au) and an upstream sediment feed rate (qsu). The sediment feed rate constitutes a necessary boundary condition for solving the equation. Here, we set this boundary to be in perfect equilibrium by defining the feed rate as the transport capacity of the first model cell; however, this condition has interesting implications for the model application (more discussion on this in Part VI!).

The calculation then, for a single t (which we implicitly set to be 1 second with out sediment transport calculation), the change in sediment transport capacity over space is maximized in the backwater region. This vector is converted to the commensurate change in bed elevation over this single t by multiplying by (1/(1-λp)). Of course, we can consider many ts at once, and so the vector would again be multiplied by the timestep (Δt). We’ll consider this in the next post, where we add a time loop to the model, such that on ever timestep, the backwater, slope, sediment transport and change in bed elevation are each recalculated and updated so that we have a delta model which evolves in time!

The code for this part of the model is available here.

This material is based upon work supported by the National Science Foundation (NSF) Graduate Research Fellowship under Grant No.145068 and NSF EAR-1427177. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

# Building a simple delta numerical model: Part III

In this part of “Building a simple delta numerical model”, we’ll simply develop a module for calculating sediment transport at all locations within our model domain. This is naturally a critical piece of the model to get right, because the magnitude of sediment transport that occurs within the delta system will have a direct correlation with the magnitude of geomorphic change to the delta system. Just as before, we’ll be working our module for the Mississippi River, and so we will use the Engelund and Hansen, 1967 formulation for bed-material load sediment transport.

It is worth noting here that there are countless sediment transport equations published within the literature, with applicability for different scenarios, covering the entire range of alluvial rivers, the shelf environment, to turbidity currents. A nice review of some relations is given by Garcia, 2008, who produces the plot below with six selected relations, where τ* is shear stress (Shields number) and q* is a dimensionless sediment transport. The only points I hope to make with this figure are that stress and transport are nonlinearly related, and that many of the relations are quite similar. In fact, the majority of sediment transport equations are based on an excess-shear stress formulation, where, as shown above, the sediment transport prediction scales with the shear stress. The Engelund and Hansen, 1967 equation is given in its dimensional form by (Equation 1) where qs is the sediment transport per unit width (m2/s), β is an adjustment coefficient discussed below, R = 1.65 is submerged specific gravity of the bed sediment, = 9.81 m/s2 is gravitational acceleration, is the grain size in question, Cf is the coefficient of friction, ρ = 1000 kg/m3 is the fluid density, and τ = ρ Cf U2 is the fluid shear stress, where is depth-averaged fluid velocity.

The solution to this equation is obtained by simple algebra, and so I will display the code calculations below without any explanation.

``` U = Qw ./ (H .* B0); % velocity```

```function [qs] = get_transport(U, Cf, d50, Beta) % return sediment transport at capacity per unit width % 1000 = rho = fluid density % 1.65 = R = submerged specific gravity % 9.81 = g = gravitation acceleration u_a = sqrt(Cf .* (U.^2)); tau = 1000 .* (u_a.^2); qs = Beta .* sqrt(1.65 * 9.81 * d50) * d50 * (0.05 / Cf) .* ...```
```        (tau ./ (1000 * 1.65 * 9.81 * d50)) .^ 2.5; end```

Let’s first simply examine the behavior of Equation 1, in its dimensionalized form, for the Mississippi River. We need to define the grain size and adjustment coefficient to use in our model; we’ll follow from the literature and use 300 μm for the grain diameter, assuming this makes up the entirety of the bed, and β = 0.64 an empirical adjustment (Nittrouer et al., 2012; Nittrouer and Viparelli, 2014).

Evaluating the equation over a range of velocities from 0 to 3 m/s, we find that the nonlinear behavior shown in the above figure is also true for the Engelund and Hansen formulation. Now, let us implement this into our delta model. Following from earlier, the flow depth and velocity are related by the cross sectional area and discharge, where for a rectangular cross section, the velocity = Qw / H B. We therefore can take our calculation of flow depth from the previous lesson, and solve for velocity at each location. We then simply plug this into the relation for τ given above and solve Equation 1. The bottom plot reveals the behavior of the flow velocity calculated through the backwater region (shown in red), and the resulting sediment transport calculated (shown in green). The nonlinearity between stress (i.e., velocity) and transport is again revealed by the change in slope of the lines through the backwater region. Following from left to right along the green curve, it becomes obvious that there is a decrease in transport downstream. If there is more sediment being transported upstream than downstream, then there must be sediment that is deposited to the bed somewhere along the channel bed between ‘upstream’ and ‘downstream’.

This thought experiment represents one half of the famous Exner equation, which we will explore in the next part, and will provide the last module to complete our delta model. My complete code for the sediment transport calculation in our backwater delta model can be found here.

References

1. Engelund, F. & Hansen, E. A monograph on sediment transport in alluvial streams. (Technisk Vorlag, Copenhagen, Denmark, 1967).
2. García, M. H. Sediment Transport and Morphodynamics in Sedimentation engineering processes, measurements, modeling, and practice 21–163 (American Society of Civil Engineers, 2008).
3. Nittrouer et al., Spatial and temporal trends for water-flow velocity and bed-material sediment transport in the lower Mississippi River. 2012. Geological Society of America Bulletin, 5 (Table 1).
4. Nittrouer, J. A. & Viparelli, E. Sand as a stable and sustainable resource for nourishing the Mississippi River delta. Nature Geoscience, 7, 350–354 (2014).

This material is based upon work supported by the National Science Foundation (NSF) Graduate Research Fellowship under Grant No.145068 and NSF EAR-1427177. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

# Building a simple delta numerical model: Part II

Taking the simpler form of the backwater equation, that is, assuming that width does not vary over the channel reach, we find an expression for changing flow depth along the channel streamwise coordinate x: (Equation 1) where S is the channel bed slope, Cf  is the dimensionless coefficient of friction, and Fr is the Froude number: (Equation 2) A couple quick notes about the overall delta model we are developing herein:

1. The code for this entire model will be developed in Matlab syntax, however, the math is all just math, and the code could easily be translated to another programming language.
2. The model will be set up for the conditions of the Mississippi River, to what I feel are the best constraints published in the literature. Surely not everyone will agree with some of my choices.

We set up our model domain by defining some parameters:

```L = 1200e3; % length of domain (m) nx = 400; % number of nodes (+1) dx = L/nx; % length of each cell (m) x = 0:dx:L; % define x-coordinates start = 63; % pin-point to start eta from (m) S0 = 7e-5; % bed slope eta = linspace(start, start - S0*(nx*dx), nx+1); % channel bed values H0 = 0; % fixed base level (m) Cf = 0.0047; % friction coeff ``` Each of these terms are necessary for defining the boundary conditions the backwater calculation requires. We will additionally need to define the water discharge (Qw), and it’s width averaged derivative qw = Qw / B, where B is the channel width. We’ll use a fixed width of 1100 m (Nittrouer et al., 2012) and a discharge of 10,000 m3/s; since width is fixed in our model, qw = 31.8182 m2/s.

Finally, to solve the above backwater equation, we’ll need to find the local slope at all nodes — our bed holds a constant slope so we could just skip this step and send S0 directly into the calculation, but we’ll be wise beyond our present experience, and prepare our calculation now for the future when our bed slope is constantly changing in time and space. A simple slope calculation is completed by a central difference calculation, applying forward difference and backward difference calculations at the domain upstream and downstream boundaries, respectively.

```function [S] = get_slope(eta, nx, dx) % return slope of input bed (eta) S = zeros(1,nx+1); S(1) = (eta(1) - eta(2)) / dx; S(2:nx) = (eta(1:nx-1) - eta(3:nx+1)) ./ (2*dx); S(nx+1) = (eta(nx) - eta(nx+1)) / dx; end```

As described in the previous section, we will use a predictor-corrector scheme to find a solution to Equation 1. This is not meant to be a comprehensive guide on solving ODEs, and so I will only summarize the method here.

1. We first can evaluate the flow depth H at the downstream boundary, where elevation is fixed (H = H0 – η = 21 m).
2. Then we predict the value at the next upstream node by solving Equation 1 for dH/dx at our known downstream-most node, using the inputs described above and H and S at the known node. This yields dH/dxp = 6.5784e-05.
3. Multiplying dH/dx by dx = 3000 m and subtracting this change from the known H at the downstream-most node, we find our predicted value Hp for the next upstream node.
4. We now repeat the evaluation of Equation 1 with the same inputs except using our new Hp value in Equation 2. This yields our corrected change in flow depth over x:
d
H/dxc = 6.5663e-05.
5. We then combine the results of these two predictions in flow depth change to give our best solution to the ODE Equation 1 at the downstream node. We combine the values using the trapezoidal rule for Hi = H(i+1) – ((0.5) * (dH/dxp + dH/dxc) * dx), giving a predicted change in flow depth of 0.1972 m, for a depth at Hi of 20.8028 m (it was 21 m at the downstream node).
6. repeat steps 2-5 solving for each node moving up the model domain until the final node is determined.

This method is fairly stable for most discharges, but may fail when discharge is very low. In thsi case, the backwater region becomes condensed, and the rate of change in H becomes very large over a very small distance. My code for the predictor corrector implementation can be found as the get_backwater_fixed() function within the complete backwater_fixed model. Below is a gif showing the results of a range of input discharges to our model.

and a still multi-surface example to demonstrate the variability. The low and moderate discharge water surface curves in the above figure (solid and dashed lines) demonstrate the classical convex-up (aka M1, e.g., Chow, 1959) condition that characterizes lowland delta systems during most of the time. The third, high discharge, water surface profile exemplifies a convex-down water surface profile. This condition, called the M2 condition has been recently demonstrated to have importance to the cyclic processes of avulsion and delta-lobe growth (e.g., Chatanantavet et al, 2012; Ganti et al., 2016).

Building a simple delta numerical model: Part III

References:
1. Nittrouer et al., Spatial and temporal trends for water-flow velocity and bed-material sediment transport in the lower Mississippi River. 2012. Geological Society of America Bulletin (Table 1).
2. Chow, Ven Te. Open Channel Hydaulics. N.p., 1959. Print. McGraw-Hill Civil Engineering Series.
3. Chatanantavet, Phairot, Michael P. Lamb, and Jeffrey A. Nittrouer. Backwater Controls of Avulsion Location on Deltas. 2012. Geophysical Research Letters 39.1
4. Ganti, V., A. J. Chadwick, H. J. Hassenruck-Gudipati, B. M. Fuller, and M. P. Lamb. Experimental River Delta Size Set by Multiple Floods and Backwater Hydrodynamics. 2016. Science Advances 2 (5).

This material is based upon work supported by the National Science Foundation (NSF) Graduate Research Fellowship under Grant No.145068 and NSF EAR-1427177. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

# Building a simple delta numerical model: Part I

This is the first in a series I’m going to write over an undetermined-number-of-parts series where we will develop all the pieces to a simple delta numerical model. I’ll outline all the pieces here while I work up an interactive version with a GUI in Matlab, and we write pieces of the code together within the posts.

——————————————————————————–

The water depth (H) over a channel bed with a fixed-elevation downstream boundary can be calculated by a mass-and-momentum conservative derivation following from the full Navier-Stokes equations (e.g., Chow 1959). The so-called backwater equation, in one form, is presented as (Equation 1) where H is the flow depth, x is the streamwise coordinate of the flow, S is the channel bed slope, Cf  is the dimensionless coefficient of friction (corresponding to form drag in the channel bed and walls (i.e., roughness and bedforms)), and Fr is the Froude number, which scales with the ratio of inertia of the flow to the gravitational field driving the flow and can be given as (Equation 2) where qw = Qw is the width averaged water discharge, and g is the gravitational constant. Equations 1 and 2 are derived for a width-averaged, constant flow width scenario.

Equation 1 is useful to the research we do as scientists of lowland depositional systems because it predicts changes in flow depth through a delta system. The equation is solved with a boundary condition of a known flow depth (H) at the fixed-elevation downstream “receiving basin” boundary. For variations in river conditions (i.e., river water discharge), the surface elevation of receiving basins of large lowland rivers are effectively fixed, therefore the equation can be used to estimate a water surface over the lower reaches of the rivers we study for any input discharge conditions! With a calculation of H, we can then use a conservation of water equation Qw = U A (where U is flow velocity and A = B H is channel cross sectional area for an approximately rectangular channel), to evaluate the flow velocity at all points along the channel. Calculation of U in turn allows for calculation of sediment transport, and ultimately sediment deposition — but more on these parts in future parts of this series!

Equation 1 is valid for a solution where width (B) is fixed over the downstream length of x. Where width is roughly constant, say within the lower reaches of a channel we can use this simple form of the backwater equation. However, where width is not constant over all x (i.e., dB/dx != 0), say in the river plume beyond the channel mouth, we must retain an addition term from the derivation from the Navier-Stokes equations to account for this change in width. Since we’re using computers to solve the equations, and the dB/dx term is basically only one extra gradient calculation, this will an easy addition to our model in the future. I’ll write a separate post and link it here with a solution for that version of the backwater equation as well.

So how do we solve Equation 1? Let’s see what we know: the elevation of the receiving basin is fixed at z = 0. Therefore, if we know the channel bed profile, we know at the downstream boundary for ALL possible values of Qw. We can evaluate Equation 1 from the known downstream boundary, and knowing all variables on the right hand side at this boundary, calculate the change in flow depth moving upstream over the channel bed (dH/dx)!

The solution to Equation 1 (or in its slightly more complex form for a varying width domain) can be numerically estimated pretty easily by a predictor-corrector scheme. We’ll work up a calculation for a fixed-width form of the backwater equation (Equation 1) in the next post, and then proceed to use this as one module of our simple delta model.

Building a simple delta numerical model: Part II

This material is based upon work supported by the National Science Foundation (NSF) Graduate Research Fellowship under Grant No.145068 and NSF EAR-1427177. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

# Dietrich Settling Velocity Matlab code

Just about two years ago, I published a code for calculating the settling velocity of a particle in water by the Dietrich, 1982 method. I later realized an error in the code that made it produce erroneous estimates. I replaced it with a corrected version of the code written in R, that you can still find on my Github page, but won’t link here. I’ve taken down the old Matlab code and unpublished the original post.

Since the first publication two years ago, I’ve dramatically improved my technical coding skills, and, just as importantly, learned a ton about how to share code with others. A new version of the code can be found linked a the end of this post, but first some philosophy about coding.

As scientists, we want to be able to reliably reproduce results and we have an attraction to openness and access to knowledge, as such, we inevitably share code with one another. I’ve shared a good bit of my code thus far in my career, and I’ve made a few notes about the matter below:

1. It is better to organize your code in a way that may not be the most succinct (when speed is not an issue) while improving the readability of the code. In essence, making your code readable to those unfamiliar with it, so they can follow along with your logic.
2. Comment everything. It is so easy to get into the groove of writing something and blow through it until it’s finished and then never return to clean up and document the code. When others (or even you) go back to look at it, simple comments explaining what a line does go miles further than a detailed explanation in the documentation.
3. Write your code to withstand more that you are currently using it for. What I mean by this is to not write code that works only when you are manipulating this particular dataset or working this particular problem, but code that will work over a broader range of data or can be applied to another problem. Along the same line, use explicit values as sparingly as possible, e.g., if X = [1:12] is a 12 element vector and you want the middle value, do not use X(6), but instead use X(length(X)/2); or slightly better yet, X(ceil(length(X)/2)) to at least not throw an error and give you the first of the two middle rows if X is an odd number of elements in length.

Anyway, the code is uploaded to my Github and you can find it there. The documentation output on a help get_DSV call in Matlab is produced below. References:
Dietrich, W. E. Settling velocity of natural particles. Water Resources Research 18, 1615–1626 (1982).

This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No.1450681. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors(s) and do not necessarily reflect the views of the National Science Foundation.

# Yellow River shoreline traces

For some ongoing research I needed a set of shoreline traces of the Yellow River delta that I could measure some properties of to compare modeling results to. My model predicts delta lobe growth rates, for example the growth of the Qingshuiguo lobe on the eastern side of the delta from 1976 to 1996. early growth stage of Qingshuiguo lobe of the Yellow River delta, figure from van Gelder et al., 1994.

It’s a little early for me to release the data I’ve collected from the shoreline traces (n=296), or the code I used to generate the traces — but below is a figure mapping all the shoreline traces I have from 1855 to present. My model can also predict rates of delta system growth (i.e., growth over many lobe cycles) and I have generated data from these traces that I can compare to as well. I like the below figure, because I think it nicely demonstrates the rapid reworking of delta lobes (yes even on “river dominated” deltas!); evidenced most clearly by the retreat of the Diakou lobe to the north side of the delta. This material is based upon work supported by the National Science Foundation Graduate Research Fellowship under Grant No.1450681. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors(s) and do not necessarily reflect the views of the National Science Foundation.

# TeXlive install process, or developing an intelligent download timer: Part 2

Naturally, the first thing we want to do is see if the data actually follow any sort of trend. In theory, larger packages in the TexLive install should take longer, but the time/volume of each should be roughly the same, and should remain roughly constant (or really vary about some mean constantly) over time. The easiest way to determine this would be to simply plot the download speed over the duration of the download. The speed varies significantly, but that’s okay. It, qualitatively, appears that the distribution of speeds randomly varies about a mean value. This is good for us, because it means that there is no trend in the speed over time, like we would see for example if the mean speed were changing over time.

This means that we can build a model of download speeds to predict the total download time. If we simply fit a linear model to the data above (i.e., elapsed package time ~ size) we find that the data are reasonably well explained (r^2 = 0.60) by a line of slope = 3.003797e-4 and intercept = 3.661338e-1. Then, we can use our linear model to evaluate, in essence, to predict the time it will take to download each package based on the size of each package and then sum them to produce a prediction for the total download time. Evaluation produces a predicted total download time of 29:26 mm:ss (plotted as dashed line below). 29:26 happens to be the exact time that our download took. That means that despite all the variations in download speeds, the mean over time was so constant that a simple linear model (a constant download speed) perfectly predicts the observed data; perhaps this is not surprising when you see the roughly constant-slope red line above.

Now, this model was based on perfect information at the end of the download, but in the next post, we’ll explore a common, simple, and popular prediction algorithm as a test of an a priori and ongoing prediction tool.

# TeXlive install process, or developing an intelligent download timer: Part 1

I recently got a new laptop and during the process of setting up to my preferences, I install LaTeX through TeXlive. This means a massive download of many small packages that get included in the LaTeX install. In effect, this is how all software downloads go, many small parts that make up the whole. Installing TeXlive on Linux gave me the chance to actually see the report of the download, and of course to save it and plot it up after completion. Here is what the data output to the console looks like during install: After 3 downloads, the installer makes a prediction of the total time, and then reports the elapsed time against predicted time, along with some information about the current download. If we take this information for all 3188 packages and parse it apart for the desired information, we can plot the actual time, versus predicted time, so see how the prediction performs over time. There are some pretty large swings in the predicted time at the beginning of the model, but by about 25% of the total download by size, the prediction becomes pretty stable, making only minor corrections. The corrections continue until the very end of the downloads.

Download time prediction is a really interesting problem to work on, since you are attempting to control for download speed which is largely dependent on things outside the realm of the personal computer and is likely to vary over timescales longer than a few minutes. I’ll be making a few posts about this topic over the next months, culminating with what I hope is a simple, fast, and accurate download time prediction algorithm. More to come!

# History of the Houston Rodeo performances

The Houston Livestock Show and Rodeo is one of Houston’s largest and most famous annual events. Now, I won’t claim to know much about the Houston Rodeo, heck, I’ve only been to the Rodeo once, and have lived in Houston for a little over a year and a half! I went to look for the lineup for 2016 to see what show(s) I may want to see, but they haven’t released the lineup yet (comes out Jan 11 2016). I got curious of what the history of the event was like, and conveniently, they have a past performers page; this is the base source for the data used in this post.

First, I pulled apart the data on the page and built a dataset of each performer and every year they performed. The code I used to do this is an absolute mess so I’m not even going to share it, but I will post the dataset here (.rds file). Basically, I had to convert all the non-formatted year data, to clean uniformly formatted lists of years for each artist. Above is the histogram of the number of performances across all the performers. As expected, the distribution is skewed right, towards the higher number of performances per performer. Just over 51% of performers have only performed one time, and 75% of performers have performed fewer than 3 times. This actually surprised me, I expected to see even fewer repeat performers. There have been a lot of big names come to the Rodeo over the years. The record for the most performances (25) is held by Wynonna Judd (Wynonna).

I then wanted to see how the number of shows per year changed over time, since the start of the Rodeo. The above plot shows every year since the beginning of the Rodeo (1931) to the most recent completed event (2015). The blue line is a Loess smoothing of the data. Now, I think that the number of performances corresponds with the number days of the Rodeo (i.e. one concert a night), but I don’t have any data to confirm this. It looks like the number of concerts in recent years has declined, but I’m not sure if the event has also been shortened (e.g. from 30 to 20 days). Let’s compare that with the attendance figures from the Rodeo. Despite fewer performances per year since the mid 1990s, the attendance has continued to climb. Perhaps the planners realized they could lower the number of performers (i.e. cost) and still have people come to the Rodeo. The Rodeo is a charity that raises money for scholarships and such, so more excess revenue means more scholarships! Even without knowing why the planners decided to reduce the number of performers per year, it looks like the decision was a good one.

If we look back at the 2016 concerts announcement page, you can see that they list the genre of the shows each night, but not yet the performers. I wanted to see how the division of genre of performers has changed over the years of the Rodeo. So, I used my dataset and the Last.fm API to get the top two user submitted “tags” for each artist. I then classed the performers into 8 different genres based on these tags. Most of the tags are genres so about 70% of the data was easy to class, I then manually binned all the remaining artists into the genres, trying to be as unbiased as possible. It’s immediately clear that since the beginning, country music has always dominated the Houston Rodeo lineup. I think it’s interesting to see the increase in variety of music since the late 1990s, beginning to include a lot more Latin music and pop. I should caveat though, that the appearance of pop music may be complicated by the fact that what was once considered “pop” is now considered “oldies”. There have been a few comedians throughout the Rodeo’s run, but none in recent years. 2016 will feature 20 performances again, with a split that looks pretty darn similar to 2015, with a few substitutions: 