 Today I'm going to discuss my thoughts on a silly topic that has gone by many names over the years. These days I see it dubbed the "Reddit Effect", usually meaning the impact on your website from being posted on Reddit. This has also been called the "digg effect", the "twitter effect", the "stumbleupon effect", the "slashdot effect"...  Basically an explosive increase in web traffic to your blog/site that can cripple servers, and has the same (though accidental) result as a DDoS attack.
Today I'm going to discuss my thoughts on a silly topic that has gone by many names over the years. These days I see it dubbed the "Reddit Effect", usually meaning the impact on your website from being posted on Reddit. This has also been called the "digg effect", the "twitter effect", the "stumbleupon effect", the "slashdot effect"...  Basically an explosive increase in web traffic to your blog/site that can cripple servers, and has the same (though accidental) result as a DDoS attack.Of course, bloggers/writers crave this kind of attention. Your home-grown webpage can suddenly get exposed to 100K viewers overnight, if you can navigate the social media/news network waters. Nobody can sign up for your email list, however, if your discount webserver gets fried under peak traffic load. Thus, it's a sensible and practical phenomenon to study.
Some Background
My thinking about this started when I decided to submit my blog posts to r/dataisbeautiful (a wonderful subreddit I'd strongly recommend!) When ever I'd submit a link that fared well, I saw a characteristic "spike" in my blog traffic - as you'd expect!In my astronomy research, I've been interested in stellar flares for many years. Particularly neat are flares from M dwarf stars, faint stars with about 1/2 the Sun's mass, but make up ~70% of the stars in our Galaxy by number. These common little stars are also notoriously "magnetically active", and have been known to produce flares with 100's or 1000's of times more energy than those on our own Sun, and with many times greater frequency!
Neat, but what's the point here?
When you stare at an M dwarf star, waiting for it to flare (say, with the amazing Kepler space telescope!) the resulting "light curve" (flux versus time) that you measure looks an awful lot like web traffic. I'm going to use this as a cute analogy, please don't read too much in to it.
|  | 
| Kepler "light curve" for  the M dwarf GJ 1243, showing low-amplitude spot modulation, and flares. (Figure generated by MAST @ STSci, using public data here) | 
The Effect
There was a cool discussion about the "reddit effect" recently after UReddit was featured on the front page. (see discussion here for more details) The impact was not only a painful load on their servers, but also a huge increase in "followers" or subscribers. This really is the dream, right?! You website generates content that is so exciting people stampede to see it, and then you handle it with grace!People like to talk about this experience, especially when it's the first time you're getting serious attention! Here are some other good examples of the "Effect" that I found while reading various sites. First an example from Wikipedia, considering the server side of a website that worries about how much data is being uploaded to the internet per second:
| The "slashdot effect" | 
A great "flare-like" example of the digg effect in action. I think the shoulder during the explosive rise is very interesting. The detailed structure invites us to consider 1) what kind of conversation or sharing is going on within digg and the entire online community about the article? and 2) to what extent is the decay of the web traffic due to the digg algorithm for pushing new content to the top? In other words, has the article "saturated" its market of interested viewers on its own, or has it been automatically pushed down by the ranking algorithm? (Interesting discussion of the reddit ranking algorithm here)
| The "Digg effect" | 
I personally love this example, posted by the reddit team themselves: the impact on reddit by the AMA thread posted by President Obama. Naturally this crashed reddit, and I think its just awesome and a bit ironic: most people would be happy to simply hit the front page (myself included). The POTUS spent ~30min on reddit and killed their servers (and they even had warning!)
Some more analysis of the digg effect, and some of the Twitter effect, if you're interested.
Since the traffic per day spans more than 3 orders of magnitude, it can be hard to the see the growth in baseline traffic that is hidden by the "flares". So here is the same history for If We Assume, but with a log-scale on the vertical axis. The traffic flares do indeed seem to result in higher baseline traffic over time. Progress!
Note: I don't necessarily have answers to these questions, but I think they could be understood (and probably are well studied by people much smarter than me!)
When we talk about the observed profiles of stellar flares in light curves, we naturally ask important questions like:
So too should we ask questions of the spikes in web traffic from social networks and media sites:
and perhaps many others. One could imagine a controlled study of many articles, considering perhaps only 1 source directing traffic to 1 website. Of course the decay/tail of the traffic will naturally include information about the ranking algorithm or news cycle rate for a given traffic source. But I wonder if we could control the experiment enough such that we actually began to measure the interest/interaction people had with an article. This information would be very interesting/useful to determine, for example, the optimal news/headline cycle for a given article, or a topic-dependent ranking algorithm.
If we assume a traffic spike is defined by the following components:
1) an explosive and nearly linear rise
2) a peak of a given amplitude
3) an initial rapid decay
4) a long, low-amplitude, exponential tail
what does each component tell us about the traffic source? Supplemental sources/sharing? The traffic-receiving website? Tell me what you think in the comments!
Happy Thanksgiving!
|  | 
| The POTUS effect on reddit | 
Some more analysis of the digg effect, and some of the Twitter effect, if you're interested.
My Experience
I've kept a close eye on many metrics for my blog, trying to track views/tweets/shares/upvotes, to understand better what makes good content. Here is simple chart of mentions (tweets) for the past couple months using Topsy:
Most of this action is due to my Starbucks post, which to date has received 21,557 views on my blog, and 297 tweets. These numbers are enough to blow my mind, but what makes it truly exciting (but infinitely more difficult to keep track of) is the secondary traffic: re-posts, pick-ups, other blogs/sites discussing, etc. I posted the main figure on Visually, for example, which in turn received 3200 views and 143 tweets. The post also had at least 30 direct pickups, including perhaps most notably Huffington Post, which then generated another 439 tweets, 1468 likes... and so on and so forth. I have no idea how much traffic the post actually generated in total, but I do know that HuffPo has more traffic than NYT.
But I digress somewhat. The traffic to my blog has always followed these explosive spikes. Individual spikes are usually due to single sources. This example shows traffic to ifweassume.com from a couple months ago, and really draws to mind my analogy of flares.
|  | 
| Data from Google Analytics | 
Here I'm zooming in on two different spikes to show the fine details. 
|  | 
| Data from Google Analytics | 
|  | 
| Data from Google Analytics | 
Lastly, here is the entire pageload per day history for my blog. This definitely looks like a bunch of flares in a light curve...
|  | 
| Data from Statcounter | 
Since the traffic per day spans more than 3 orders of magnitude, it can be hard to the see the growth in baseline traffic that is hidden by the "flares". So here is the same history for If We Assume, but with a log-scale on the vertical axis. The traffic flares do indeed seem to result in higher baseline traffic over time. Progress!
|  | 
Questions
I've been talking about this explosive web traffic first with an analogy to astrophysics, then with some broad examples, and then some specific examples using my own website. Now let's bring it back to the analogy and ask some fundamental questions from the web traffic "flares".Note: I don't necessarily have answers to these questions, but I think they could be understood (and probably are well studied by people much smarter than me!)
When we talk about the observed profiles of stellar flares in light curves, we naturally ask important questions like:
- Why are they so explosive?
- How frequently do they occur? Are they periodic or patterned in some way?
- What physics governs each phase of the flare?
- Do all stars have the same kinds of flares that obey general rules?
So too should we ask questions of the spikes in web traffic from social networks and media sites:
- What determines the slope of the initial explosive rise? How about the shape of the decay?
- Is the initial rise related to the decay? Could you predict the decay given the rise?
- Similarly, what determines the total number of views?
- How similar are all the traffic spikes?
- Why do some articles have very high-amplitude peaks, but stay popular for very short timescales? and vice versa
- How long do the exponential tails extend in time?
and perhaps many others. One could imagine a controlled study of many articles, considering perhaps only 1 source directing traffic to 1 website. Of course the decay/tail of the traffic will naturally include information about the ranking algorithm or news cycle rate for a given traffic source. But I wonder if we could control the experiment enough such that we actually began to measure the interest/interaction people had with an article. This information would be very interesting/useful to determine, for example, the optimal news/headline cycle for a given article, or a topic-dependent ranking algorithm.
If we assume a traffic spike is defined by the following components:
1) an explosive and nearly linear rise
2) a peak of a given amplitude
3) an initial rapid decay
4) a long, low-amplitude, exponential tail
what does each component tell us about the traffic source? Supplemental sources/sharing? The traffic-receiving website? Tell me what you think in the comments!
Happy Thanksgiving!
 

 

Nice article! This just happened to me... and I'm going to try to answer some of your questions vis-a-vis my own big star. I'll parse the web-logs myself in R to see two effects: 1) How do different networks behave? Big peak then nothing? Or slow-and-long burn? 2) How long does it take to trickle through the internet. Anyway... I'll post the analysis on www.amitkohli.com at the end of the month, look for it!
ReplyDelete