Category Archives: Python

Reddit data — When is it really too soon to retail Christmas?

About this time every year, people begin to complain about retail stores having Christmas themed displays and merchandise out. Well, speaking objectively, I think it is totally fair game for retail stores to shift to Christmas-mode, once people begin to think and talk about Christmas. Can Reddit post topics act as a proxy to determine when people begin to talk about Christmas? In each of the following plots, the black open circles represent a single day’s value, and the red line is a 7-point moving average designed to eliminate the weekly periodicity of Reddit posting.

posts with 'christmas' in titleWell it looks like the beginning of an increase in Christmas related posts occurs in the middle of October, with a substantial increase at the very end of November (just after Thanksgiving). Let’s dig a little deeper though. In the plot below, I’ve taken the same data and plotted them on a logged y-axis to highlight the variability.

log plot of xmas posts

From the above plot, it seems that the steady increase begins as early as the middle of September! Is a steady increase really enough to conclude that the conversation has begun though? Well I decided to take a look at the variation in the data to try and answer that.

xmas_nmlzdIn the above plot, the data has been normalized per day to a percent of total posts that have Christmas in the title. On December 25th, 16% of posts to Reddit included the word Christmas in the title (over 16,000 posts)! Now, I took the period from April 1 to Aug 15, and determined the mean and standard deviation. The horizontal black line represents the mean for this period, and the gray box is 2 standard deviation from the mean.

Taking 2 complete standard deviations from the mean to be a good indicator of significant change, the conversation about Christmas breaches this threshold right in the middle of September.

Now, you are probably thinking “Well these posts in September and October are probably just all people complaining about the early retailing of Christmas!” Well yes, you may be right. But hey, even if that’s the case, the retail companies are succeeding at getting you to talk about Christmas which means it is worth their time to put up the merchandise early, since you then buy more!

 

Just for fun, here are the conversation plots for some other holidays!

all_holiday

Reddit data — Graduate School talk

This is the first post in a series I’ll be doing about posting on Reddit for 2013. The posts in this series search through every single post made to Reddit in 2013 — that’s over 50 GB worth of data, and over 39,000,000 posts!

For this post, I examined every post made to any subreddit for any word that related to graduate school (including law and medical school) for each day of 2013, in either the ‘title’ or the ‘self-text’. The key used for positive matches is at the end of this post.

grad_talk

posts made to Reddit with words relating to graduate school in 2013. 1 data point for each day, red line is 7 point moving-average.

Maybe not as telling as I had expected, there’s a ton of variance day to day and week to week, but the most obvious observation would be the spike in graduate school related comments in the month of April, following a consistent increase in posts in March. I would suggest this is probably due to the fact that this is the time of the year when a lot of acceptance decisions come out.

Normalizing the data against total posts for the day is not any more telling, the profile is stretched a bit in the y-direction. The 7 point moving-average is an attempt to remove the weekly periodicity of Reddit posting.

The key used was [‘grad school’, ‘graduate school’, “master’s”, ‘masters’, ‘ phd ‘, ‘ gre ‘, ‘letter of recommendation’, ‘letters of recommendation’, ‘doctorate’, ‘law school’, ‘med school’, ‘medical school’, ‘transcript’, ‘undergraduate gpa’, ‘undergrad gpa’]. There are of course more keywords that could have been used, but many have multiple implications, and this list was used as an attempt to minimize false positives.

Weathering of a theoretical cube

Another old assignment here, this one to model the erosion of a theoretical 1cm³ granitoid cube of a given initial composition and neatly present the results. I did the modelling in Python, as one of my earliest Python projects. The plots were made manually in LibreOffice Calc. Below are the plots, for more information, including assumptions and citations, see the short report here. The code used for the model can be found here.

Time dependent log-log plot of dissolution of minerals in a theoretical 1cm 3 granitoid cube in a soil at pH 5, through time. Dissolution rates taken from Bandstra et al. (2007) and Palandri and Kharaka (2004). Initial weathering rates of all minerals at the start of the experiment (quartz, plagioclase feldspar, annite, tremolite, and potassium feldspar) are faster than all subsequent weathering rates rates due to continually decreasing surface area of the mineral through time. Aggradation of kaolinite is dependent on the dissolution rates of plagioclase feldspar, potassium feldspar and annite. Iron oxide aggradation is dependent on annite weathering rate. Minerals reach “zero value point” at 0.00001 moles of mineral. These times are roughly 50 years for annite, 100 years for tremolite, 300 years for plagioclase feldspar, 1,500 years for potassium feldspar, and 30,000 years for quartz.

Time dependent log-log plot of dissolution of minerals in a theoretical 1cm 3 granitoid cube in a soil at pH 5, through time. Dissolution rates taken from Bandstra et al. (2007) and Palandri and Kharaka (2004). Initial weathering rates of all minerals at the start of the experiment (quartz, plagioclase feldspar, annite, tremolite, and potassium feldspar) are faster than all subsequent weathering rates rates due to continually decreasing surface area of the mineral through time. Aggradation of kaolinite is dependent on the dissolution rates of plagioclase feldspar, potassium feldspar and annite. Iron oxide aggradation is dependent on annite weathering rate. Minerals reach “zero value point” at 0.00001 moles of mineral. These times are roughly 50 years for annite, 100 years for tremolite, 300 years for plagioclase feldspar, 1,500 years for potassium feldspar, and 30,000 years for quartz.

 

figure_2_in_report

The aggradation of solutes from the same granitoid cube described in Figure 1, in a soil. The aggradation is entirely dependent on the weathering rates of the primary minerals in the granitoid cube. Plagioclase feldspar yields calcium and sodium ions into solution, quartz yields silicic acid, annite yields potassium ions, tremolite yields calcium and sodium ions and silicic acid, and potassium feldspar yields potassium ions and silicic acid.