Movie Time

No comments:
I was sorting my Netflix queue today, musing about the next DVD we'd receive and lamenting the fact that we have already watched all of Star Trek DS9 since its release on "Watch Instantly" in October, when I noticed something wonderful. Nay, something blog worthy:

Netflix saves a record of your entire rental history!

Three years of data, just begging to be looked at... This is right up my alley. I got busy with scraping it off the website and putting it in to some code. I immediately wondered: what can these data tell me about my Netflix habits? Am I getting my money's worth?

The answer to this is not at all straightforward I realized, how do you define getting your "money's worth" anyhow? I've really enjoyed the Netflix service, and find the monthly fee is insignificant to the point of forgetting about it entirely. By that metric it has been a very good value. But I wanted to make some nice looking figures, which is a total addiction of mine I'm finding... though it is also excellent practice.

I suspect most people would agree with this statement:
"Netflix would be such a great deal if I would just return those damned DVDs sooner!" 
With postage on the rise constantly, Netflix no doubt needs you to hold on to those DVDs to break even (just exactly how long that break-even time is has been the point of speculation since Netflix first began). Based on the first figure, it seems I am doing my part to keep Netflix in the "black"...

Rental length distribution for 3 years of Netflix viewing.


Just 103 disc (DVD or Blu-ray) rentals total since I became a member in 2009... when you put it that way it sounds quite sad. Moreover, I'm holding these discs for at least 10 days on average! (The mean rental length was ~16 days). And check out those outliers (they aren't errors); That last point is like a 6-sigma outlier! I held on to that damn DVD for 3 months, and worst of all: I don't think I even watched it when I finally sent the bloody thing back.

I then found that Netflix had my "Watch Instantly" (streaming) history available as well. I have streamed 935 items so far. Here is the comparison of DVD (and Blu-Ray) rentals to streaming for the last 3 years. The argument that physical media is "dead/dying" may be well made using this figure alone.


Also intriguing is the monthly numbers of physical rentals vs streaming views (totaled over all years). Streaming seems to be roughly bi-modal, with broad peaks near winter and summer likely due to the academic calendar. This is essentially the behavior I expected. 

Rentals, however, seem to not follow this trend very well at all. Granted, these are dominated by small number statistics (only order 10's of rentals per month bin). To provide real meaning to this trend, we might consider time of the year (I'll visit this again at the end), or perhaps the DVD's themselves. If we're watching a series heavily we may rip through 6-8 DVDs in a short span of time, and with so few DVD rentals in total this can heavily skew the data.


A neat, but somewhat obtuse, diagram is the normalized cumulative number of rentals (or streams) as a function of time. The curve starts at 0 on the y-axis, and with each successive rental (or stream) it increases by 1 until it reaches the total number. I then divided by the total number of rentals (or streams) to make the curves both span from 0 to 1. 

The steeper the curve, the more rapid the consumption of media. A few notable things stuck out to me in this diagram:
  1. My Watch Instantly rate is (overall) accelerating. This is also shown coarsely in Figure 2.
  2. My DVD rate is fairly steady over the last year
  3. When we "binge" on either the DVD or streaming, and that curve turns up sharply for a brief bit, the other curve tends to flatten out. This is quite noticeable at Jan 2010 (big DVD binge) or about March 2011 (steady streaming binge)

Item #3 brings about perhaps the most important point thus far: we evidently only have the attention span for one platform or the other at a time.


Then I went a bit off the deep end, and so I'll first start with a classic word of caution: correlation does not indicate causation. Often times, however, it does indicate that a more important 3rd parameter is at work.

With that in mind, I downloaded the average temperature data for Seattle from weather.com. Once again I was pleased with the result. A nice trend was seen (plotted as grey line): the warmer it is outside, the less we watch DVD's (only physical rentals shown). The outliers (Jan, Sep, Aug) are the Winter and Summer breaks that Sarah and I have together, so while the overall rates vs month did not track the school year cycle very well in rentals, the outliers certainly do. If we removed these high points (which arguably don't belong to the same parent distribution) an even cleaner trend is left - and this would seem to bode well for our social lives.

I neglected to show the streaming data vs temperature, but it was less pretty anyhow. The shape is roughly parabolic, with a minimum around 55deg and maxima at the hot and cold extremes. This is essentially tracking the school calendar again.


 Alright, that's about all I can squeeze out of this little data set for now. What a fun exercise, I think we all really learned something today. I learned that my addiction to making pretty plots to satisfy my idiotic curiosity is as strong as ever. What did you learn?