1. Talk on Ranking NFL Teams

    I gave a talk last month at the New York Open Statistical Programming Meetup on ranking systems, specifically applied to the NFL. You can find slides, code, and an IPython notebook which contains most of the information. I encourage you to look at the slides, which I spent a lot of time on.  They contain two embedded interactive visualizations.  I did get my Super Bowl prediction wrong, though.

    There were about 200 attendees, but unfortunately there is no video of the talk. Thanks to everyone who came; it was incredibly fun for me.

    The talk was mostly a review/comparison of different methods:

    • Pythagorean wins
    • Eigenvector methods
    • the Bradley-Terry-Luce model
    • optimal rankings

    The last one warrants more explanation.  I had previously reviewed the optimal descriptive ranking problem and my solution.  It’s a fascinating application of graph theory to a problem that most people wouldn’t consider to be graph-theoretic.  Once the ranking problem is posed as a topological sort of a graph which contains cycles, it’s easy to describe an exact (if non-unique) solution as well as find an algorithm to approximate it.  The results are quite stunning: a 10+% increase in the number of correctly described games from the other models.

     
  2. I made a promise to myself when I interned at Facebook that I would write at least one blog post on the NFL.  Today I got to publish it!  Technically this was all quite trivial, mostly just aggregating users by teams and geography, then producing the maps using D3 (I had to rasterize them to post them on Facebook).

    The NFL friendships thing involved a join on the social graph with the Like graph.  If I had had more time, I would have looked for rivalries.  My plan was to look for friendships that were unlikely holding relative geography fixed but changing the two teams.  The idea was to look for some pairs of teams that despite having pairs of fans in close proximity are unlikely to have fan friendships.

    The finding that winning is correlated with Likes is also interesting.  To an economist, this would seem to be an excellent instrumental variable.  (This is not a new idea).  Conditional on the spread of the game, wins and losses are basically coin flips, and yet they seem to be correlated with big increases in team fan bases. This strategy could potentially be used to identify peer effects if one were willing to make a bundle of assumptions about dynamics.

     
  3. 16:13 25th Jan 2013

    Notes: 10

    Real scientists make their own data

    Around budding social- and data scientists, a question you often hear is “where can I get data?”  It happens so often that people like Hilary Mason, who I’m sure gets this question all the time, have posted pages with resources. Getting new data can be just what you need to practice a technique you are learning or complete a project that you can publish or add to your portfolio.

    Here I argue that if you want to make a bigger impact as a scientist, you should make your own data instead of downloading it. Here are my points:

    Read More