Category Archives: Reddit

Reddit data — When is it really too soon to retail Christmas?

About this time every year, people begin to complain about retail stores having Christmas themed displays and merchandise out. Well, speaking objectively, I think it is totally fair game for retail stores to shift to Christmas-mode, once people begin to think and talk about Christmas. Can Reddit post topics act as a proxy to determine when people begin to talk about Christmas? In each of the following plots, the black open circles represent a single day’s value, and the red line is a 7-point moving average designed to eliminate the weekly periodicity of Reddit posting.

posts with 'christmas' in titleWell it looks like the beginning of an increase in Christmas related posts occurs in the middle of October, with a substantial increase at the very end of November (just after Thanksgiving). Let’s dig a little deeper though. In the plot below, I’ve taken the same data and plotted them on a logged y-axis to highlight the variability.

log plot of xmas posts

From the above plot, it seems that the steady increase begins as early as the middle of September! Is a steady increase really enough to conclude that the conversation has begun though? Well I decided to take a look at the variation in the data to try and answer that.

xmas_nmlzdIn the above plot, the data has been normalized per day to a percent of total posts that have Christmas in the title. On December 25th, 16% of posts to Reddit included the word Christmas in the title (over 16,000 posts)! Now, I took the period from April 1 to Aug 15, and determined the mean and standard deviation. The horizontal black line represents the mean for this period, and the gray box is 2 standard deviation from the mean.

Taking 2 complete standard deviations from the mean to be a good indicator of significant change, the conversation about Christmas breaches this threshold right in the middle of September.

Now, you are probably thinking “Well these posts in September and October are probably just all people complaining about the early retailing of Christmas!” Well yes, you may be right. But hey, even if that’s the case, the retail companies are succeeding at getting you to talk about Christmas which means it is worth their time to put up the merchandise early, since you then buy more!

 

Just for fun, here are the conversation plots for some other holidays!

all_holiday

Reddit data — Graduate School talk

This is the first post in a series I’ll be doing about posting on Reddit for 2013. The posts in this series search through every single post made to Reddit in 2013 — that’s over 50 GB worth of data, and over 39,000,000 posts!

For this post, I examined every post made to any subreddit for any word that related to graduate school (including law and medical school) for each day of 2013, in either the ‘title’ or the ‘self-text’. The key used for positive matches is at the end of this post.

grad_talk

posts made to Reddit with words relating to graduate school in 2013. 1 data point for each day, red line is 7 point moving-average.

Maybe not as telling as I had expected, there’s a ton of variance day to day and week to week, but the most obvious observation would be the spike in graduate school related comments in the month of April, following a consistent increase in posts in March. I would suggest this is probably due to the fact that this is the time of the year when a lot of acceptance decisions come out.

Normalizing the data against total posts for the day is not any more telling, the profile is stretched a bit in the y-direction. The 7 point moving-average is an attempt to remove the weekly periodicity of Reddit posting.

The key used was [‘grad school’, ‘graduate school’, “master’s”, ‘masters’, ‘ phd ‘, ‘ gre ‘, ‘letter of recommendation’, ‘letters of recommendation’, ‘doctorate’, ‘law school’, ‘med school’, ‘medical school’, ‘transcript’, ‘undergraduate gpa’, ‘undergrad gpa’]. There are of course more keywords that could have been used, but many have multiple implications, and this list was used as an attempt to minimize false positives.