Category Archives: Python

Predicting equilibrium channel geometry with a neural network

In an attempt to learn more about ML I decided to just jump in and try a project. Predicting channel geometry with a simple neural network.

[source code]

All in all, I’m fairly sure I didn’t learn anything about equilibrium channel geometries, but I had some fun and learned an awful lot about machine learning and neural networks. A lot of the concepts I have been reading about for weeks finally clicked when I actually started to implement the simple network.

I decided to use the data from Li et al., 2015 [paper link here], which contains data for 231 river geometries.

The dataset has variable bankfull discharge, width, depth, channel slope and bed material D50 grain size.

We want to be able to predict the width, depth, and slope from the discharge and grain size alone. This is typically a problem, because we are trying to map two input features into three output features. In this case though, the model works because the output H and B are highly correlated.

The network is a simple ANN, with one hidden layer with 3 nodes. Trained with stochastic gradient descent. Training curve below.

Python 3 silly random name generator

I was recently working on a project to try to annoy my collaborator Eric by a scheduled script that sends him an email if there are open pull requests that I authored in a github repository he owns (pull-request-bot).

Along the way, I created this fun little function for coming up with random sort-of-realistic-but mostly-funny-sounding names in Python. I’m in the process of switching from Matlab to Python, so little toy functions and projects like this are a good way for me to learn the nuances of Python faster. In fact, I would argue this is the best way to learn a new programming language–no need for books or classes, just try things!

The script is super simple and I have posted it as a GitHub Gist so I won’t describe it all again here. Below the markdown description of the function is an actual Python file you can grab, or grab it from here.

Reddit data — When is it really too soon to retail Christmas?

About this time every year, people begin to complain about retail stores having Christmas themed displays and merchandise out. Well, speaking objectively, I think it is totally fair game for retail stores to shift to Christmas-mode, once people begin to think and talk about Christmas. Can Reddit post topics act as a proxy to determine when people begin to talk about Christmas? In each of the following plots, the black open circles represent a single day’s value, and the red line is a 7-point moving average designed to eliminate the weekly periodicity of Reddit posting.

posts with 'christmas' in titleWell it looks like the beginning of an increase in Christmas related posts occurs in the middle of October, with a substantial increase at the very end of November (just after Thanksgiving). Let’s dig a little deeper though. In the plot below, I’ve taken the same data and plotted them on a logged y-axis to highlight the variability.

log plot of xmas posts

From the above plot, it seems that the steady increase begins as early as the middle of September! Is a steady increase really enough to conclude that the conversation has begun though? Well I decided to take a look at the variation in the data to try and answer that.

xmas_nmlzdIn the above plot, the data has been normalized per day to a percent of total posts that have Christmas in the title. On December 25th, 16% of posts to Reddit included the word Christmas in the title (over 16,000 posts)! Now, I took the period from April 1 to Aug 15, and determined the mean and standard deviation. The horizontal black line represents the mean for this period, and the gray box is 2 standard deviation from the mean.

Taking 2 complete standard deviations from the mean to be a good indicator of significant change, the conversation about Christmas breaches this threshold right in the middle of September.

Now, you are probably thinking “Well these posts in September and October are probably just all people complaining about the early retailing of Christmas!” Well yes, you may be right. But hey, even if that’s the case, the retail companies are succeeding at getting you to talk about Christmas which means it is worth their time to put up the merchandise early, since you then buy more!

 

Just for fun, here are the conversation plots for some other holidays!

all_holiday

Reddit data — Graduate School talk

This is the first post in a series I’ll be doing about posting on Reddit for 2013. The posts in this series search through every single post made to Reddit in 2013 — that’s over 50 GB worth of data, and over 39,000,000 posts!

For this post, I examined every post made to any subreddit for any word that related to graduate school (including law and medical school) for each day of 2013, in either the ‘title’ or the ‘self-text’. The key used for positive matches is at the end of this post.

grad_talk

posts made to Reddit with words relating to graduate school in 2013. 1 data point for each day, red line is 7 point moving-average.

Maybe not as telling as I had expected, there’s a ton of variance day to day and week to week, but the most obvious observation would be the spike in graduate school related comments in the month of April, following a consistent increase in posts in March. I would suggest this is probably due to the fact that this is the time of the year when a lot of acceptance decisions come out.

Normalizing the data against total posts for the day is not any more telling, the profile is stretched a bit in the y-direction. The 7 point moving-average is an attempt to remove the weekly periodicity of Reddit posting.

The key used was [‘grad school’, ‘graduate school’, “master’s”, ‘masters’, ‘ phd ‘, ‘ gre ‘, ‘letter of recommendation’, ‘letters of recommendation’, ‘doctorate’, ‘law school’, ‘med school’, ‘medical school’, ‘transcript’, ‘undergraduate gpa’, ‘undergrad gpa’]. There are of course more keywords that could have been used, but many have multiple implications, and this list was used as an attempt to minimize false positives.

Weathering of a theoretical cube

Another old assignment here, this one to model the erosion of a theoretical 1cm³ granitoid cube of a given initial composition and neatly present the results. I did the modelling in Python, as one of my earliest Python projects. The plots were made manually in LibreOffice Calc. Below are the plots, for more information, including assumptions and citations, see the short report here. The code used for the model can be found here.

Time dependent log-log plot of dissolution of minerals in a theoretical 1cm 3 granitoid cube in a soil at pH 5, through time. Dissolution rates taken from Bandstra et al. (2007) and Palandri and Kharaka (2004). Initial weathering rates of all minerals at the start of the experiment (quartz, plagioclase feldspar, annite, tremolite, and potassium feldspar) are faster than all subsequent weathering rates rates due to continually decreasing surface area of the mineral through time. Aggradation of kaolinite is dependent on the dissolution rates of plagioclase feldspar, potassium feldspar and annite. Iron oxide aggradation is dependent on annite weathering rate. Minerals reach “zero value point” at 0.00001 moles of mineral. These times are roughly 50 years for annite, 100 years for tremolite, 300 years for plagioclase feldspar, 1,500 years for potassium feldspar, and 30,000 years for quartz.

Time dependent log-log plot of dissolution of minerals in a theoretical 1cm 3 granitoid cube in a soil at pH 5, through time. Dissolution rates taken from Bandstra et al. (2007) and Palandri and Kharaka (2004). Initial weathering rates of all minerals at the start of the experiment (quartz, plagioclase feldspar, annite, tremolite, and potassium feldspar) are faster than all subsequent weathering rates rates due to continually decreasing surface area of the mineral through time. Aggradation of kaolinite is dependent on the dissolution rates of plagioclase feldspar, potassium feldspar and annite. Iron oxide aggradation is dependent on annite weathering rate. Minerals reach “zero value point” at 0.00001 moles of mineral. These times are roughly 50 years for annite, 100 years for tremolite, 300 years for plagioclase feldspar, 1,500 years for potassium feldspar, and 30,000 years for quartz.

 

figure_2_in_report

The aggradation of solutes from the same granitoid cube described in Figure 1, in a soil. The aggradation is entirely dependent on the weathering rates of the primary minerals in the granitoid cube. Plagioclase feldspar yields calcium and sodium ions into solution, quartz yields silicic acid, annite yields potassium ions, tremolite yields calcium and sodium ions and silicic acid, and potassium feldspar yields potassium ions and silicic acid.