Monthly Archives: December 2015

History of the Houston Rodeo performances

The Houston Livestock Show and Rodeo is one of Houston’s largest and most famous annual events. Now, I won’t claim to know much about the Houston Rodeo, heck, I’ve only been to the Rodeo once, and have lived in Houston for a little over a year and a half! I went to look for the lineup for 2016 to see what show(s) I may want to see, but they haven’t released the lineup yet (comes out Jan 11 2016). I got curious of what the history of the event was like, and conveniently, they have a past performers page; this is the base source for the data used in this post.

First, I pulled apart the data on the page and built a dataset of each performer and every year they performed. The code I used to do this is an absolute mess so I’m not even going to share it, but I will post the dataset here (.rds file). Basically, I had to convert all the non-formatted year data, to clean uniformly formatted lists of years for each artist.


Above is the histogram of the number of performances across all the performers. As expected, the distribution is skewed right, towards the higher number of performances per performer. Just over 51% of performers have only performed one time, and 75% of performers have performed fewer than 3 times. This actually surprised me, I expected to see even fewer repeat performers. There have been a lot of big names come to the Rodeo over the years. The record for the most performances (25) is held by Wynonna Judd (Wynonna).

I then wanted to see how the number of shows per year changed over time, since the start of the Rodeo.


The above plot shows every year since the beginning of the Rodeo (1931) to the most recent completed event (2015). The blue line is a Loess smoothing of the data. Now, I think that the number of performances corresponds with the number days of the Rodeo (i.e. one concert a night), but I don’t have any data to confirm this. It looks like the number of concerts in recent years has declined, but I’m not sure if the event has also been shortened (e.g. from 30 to 20 days). Let’s compare that with the attendance figures from the Rodeo.
hr_compsDespite fewer performances per year since the mid 1990s, the attendance has continued to climb. Perhaps the planners realized they could lower the number of performers (i.e. cost) and still have people come to the Rodeo. The Rodeo is a charity that raises money for scholarships and such, so more excess revenue means more scholarships! Even without knowing why the planners decided to reduce the number of performers per year, it looks like the decision was a good one.

If we look back at the 2016 concerts announcement page, you can see that they list the genre of the shows each night, but not yet the performers. I wanted to see how the division of genre of performers has changed over the years of the Rodeo. So, I used my dataset and the API to get the top two user submitted “tags” for each artist. I then classed the performers into 8 different genres based on these tags. Most of the tags are genres so about 70% of the data was easy to class, I then manually binned all the remaining artists into the genres, trying to be as unbiased as possible.


It’s immediately clear that since the beginning, country music has always dominated the Houston Rodeo lineup. I think it’s interesting to see the increase in variety of music since the late 1990s, beginning to include a lot more Latin music and pop. I should caveat though, that the appearance of pop music may be complicated by the fact that what was once considered “pop” is now considered “oldies”. There have been a few comedians throughout the Rodeo’s run, but none in recent years. 2016 will feature 20 performances again, with a split that looks pretty darn similar to 2015, with a few substitutions:


Adding real data into GMT map, making a hillshade, and positioning

Alright, let’s get down to business plotting some real data on a map. Grab the zip file here, and uncompress it in the folder you want to build a map in. The zip contains the data to map, as well as  on of my favorite color palettes from cptcity by M. Burak Yilkilmaz (file named mby.cpt).

Here’s the end product of this tutorial, we’re going to make two maps, the first will be a simple data map, and the second a hillshaded version of the same map. We’ll also learn how to position them side by side like below.


We’ll start by plotting a simple map, with just the data represented as different colors (left side of the map). The data we’re plotting are from a digital elevation model (DEM), and of Eastern North America. Making this image is simple, we simply need to call
gmt grdimage -R278/286/36/42 -JM4 -B2WESN -Xc -Yc -Cmby.cpt -V >
where you already know most of the switches above. grdimage is a gmt tool that you can use to plot any data stored in the netCDF format. To color the image based on the values in the .nc file, we need to use a color palette. The color palette is called with the -C switch, followed by the name of the color palette. More on color palettes in a future tutorial though.

Now, let’s talk about hillshades. A hillshade (also known as shaded relief) is an element of a map that adds depth to topography. It is essentially simulating a light source and the shadows cast on the landscape from the hills onto the valleys. Making one in GMT is straightforward but requires a few steps. The main step of the process is done first with grdgradient. We use the following command
gmt grdgradient -A345 -Ne0.6 -V
In this command, -G has a different meaning than in this tutorial; another common use of the -G switch is to give the name of the output file for the command. -A is the azimuth from which the light source should be directed, and -Ne is a normalization parameter for the output from the calculation, just use 0.6.

Then, we will ensure that the shaded relief map is reasonably balanced by normalizing it again along a Gaussian distribution
gmt grdhisteq -N -V

Finally, in order to ensure the values are between -1 and 1, a final step required for creating the intensity file needed for a hillshade to the image. We do this by dividing by some value just larger than the range of values in the histogram equalized grid file. Let’s find this range by looking at the info for this file.

$ gmt grdinfo Title: Produced by grdhisteq Command: grdhisteq -N -V Remark: Normalized directional derivative(s) Gridline node registration used [Geographic grid] Grid file format: nf = GMT netCDF format (32-bit float), COARDS, CF-1.5 x_min: 278.004166667 x_max: 286.004166667 x_inc: 0.00833333333333 name: longitude [degrees_east] nx: 961 y_min: 35.9958333333 y_max: 41.9958333333 y_inc: 0.00833333333333 name: latitude [degrees_north] ny: 721 z_min: -4.67873573303 z_max: 4.67873573303 name: Elevation relative to sea level [m] scale_factor: 1 add_offset: 0 format: netCDF-4 chunk_size: 138,145 shuffle: on deflation_level: 3

So we will use 5 as our divisor, in the command
gmt grdmath 5 DIV =
which takes our -hist file and divides every grid cell by 5, putting it within the range of -1 to 1.

Finally, we use
gmt grdimage -R278/286/36/42 -JM4 -B2WESN -Xc -Yc -Cmby.cpt -V >
the grdimage tool again, in almost exactly the same way as before, except adding the -I switch with our normalized intensity file as the input.

Now, to put the two images side by side, we need to set different -X and -Y targets for each image. These switches set the offset of the lower-left corner of the PostScript layer being produced, from the last reference point used on the PostScript page. The a argument can be added to the switches to reset the reference point to the lower-left corner of the page. Try using -Xa1 -Yc for the flat image and then overlay the hillshade image at -Xa6.5 -Yc. Remember, you’ll need to add in -K, -O, and >> in order to produce a multi-layer map.

See if you can make it work by yourself, but if you need a hint, you can get the script I used from here.

Lal, 1991 in situ 10-Be production rates

10Be is a cosmogenic radioactive nuclide that is produced when high energy cosmic rays collide with nuclides and cause spallation. 10Be is produced in the atmosphere (and then transported down to the surface) as “meteoric”, and produced within mineral lattices in soil and rocks as “in situ“. In 1991, Devendra Lal wrote a highly cited paper about the physics of in situ produced Beryllium-10 (10Be). In the paper he lays out an equation for the production of in situ 10Be (q) based on latitude and altitude. I’m currently working on an idea I have for using cosmogenic nuclides as tracers for basin scale changes in uplift rate, so I wanted to see what his equation looked like applied. The equation is a third degree polynomial, with coefficients that depend on latitude (L), and direct dependency on altitude (y).

I grabbed an old raster (GEBCO 2014 30 arc second) I had laying around for Eastern North America and plotted it up. First, the elevation map (obviously latitude is on the y-axis…)


Elevation map for ENAM.

And then apply the Lal, 1991 equation and find


Plotting Lal’s 1991 in situ production rate equation for ENAM. Green–>red increasing production rate. Production rate = NA in water.

I think the interesting observation is for how little of the mapped area there is any significant change in the production rate. Maybe this should be obvious since the polynomial has direct dependence on altitude and altitude doesn’t change that much in most of the map. Further the dependence of latitude is not all all observable with this map; perhaps because the latitude range is not very large, or the coefficients never change by more than an order of magnitude anyway. Next time, maybe a world elevation map! Not sure my computer has enough memory…

You can grab the code I used from here and Lal’s paper from here.