Tuesday, March 19, 2013

Behavioral Economics and Beer... highly correlated

Short:
I plot the frequency of wikipedia searches of “Behavioral Economics”, and “Beer” – who knew the correlation would be 0.7!

Data reference:
Data on any wikipedia searches (back to 2007) are available at http://glimmer.rstudio.com/pssguy/wikiSearchRates/. The website allows you to download frequency hits per day as a csv, which is what I've done here.

# Behavioral Economics and Beer:

# Author: Mark T Patterson Date: March 18, 2013

# Clear Workbench:
rm(list = ls())

# libraries:
library(lubridate)
library(ggplot2)
## Find out what's changed in ggplot2 with
## news(Version == "0.9.1", package = "ggplot2")

# data:
curr.wd = getwd()
setwd("C:/Users/Mark/Desktop/Blog/Data")
ts = read.csv("BehavEconBeer.csv", header = TRUE)
setwd(curr.wd)

# cleaning the dataset: str(ts)
ts$date = as.character(ts$date)
ts$date = mdy(ts$date)
## Using date format %m/%d/%Y.
ts = ts[, -1]

Note: the mdy function is in the lubridate package, which cleanly handles time/date data. I've eliminated the first column of data, which just gives row names inherited from excel.

p = ggplot(ts, aes(x = date, y = count)) + geom_line(aes(color = factor(name)), 
    size = 2)
p

plot of chunk unnamed-chunk-2

It turns out the pattern we observe isn't at all unique – many variables follow (predictable) patterns of variation through the week. This doesn't necessarily mean, though, that the correlation between beer and behavioral economics is entirely spurious!

No comments:

Post a Comment