2014-07-31

Lying with statistics and StackOverflow

A recent article on StackOverflow performance brought some memories back; how easy it is to lie with statistics.

The interesting thing about the article on StackOverflow performance is that it is focusing on "560M page views per month on 25 servers". That is actually not that impressive... Given the title I think most people assume load would be spread across all 25 servers so 560M/25/30/24/60/60=8.6 page views per machine per second. Using Fiddler to hit StackOverflow you'll see that each page view consists of 3 calls to the StackOverflow servers and the rest to various other sites such as CDNs for static content. So each of the 25 machines are getting 25 requests per second.

The first counter argument was that average is far from peak load. I beg to differ since StackOverflow is used all over the world the peak load is probably never more than twice the average load. 50 requests per second for a web server should not be a problem. Not even IIS...

However reading all the comments to that article the most interesting one is from a StackOverflow sysadmin. He clarifies that a lot of traffic is actually not page views but API calls, averaging 1750 requests per second across all servers (peaking at 3000 so my assumption of peak load relative to average turned out to be spot on). That comment also lets us know that all the requests are handled by nine (9) servers and the rest of the 25 are databases etc. So at peak each server is getting 330 requests per second. Now it starts to sound good to me.

Also apparently these servers were running at 15% of their capacity. For the sake of simplicity let's assume these servers scale horizontally; then each server should be able to handle 2000 requests per second (at 90% utilization) which definitely is impressive for IIS.

I have no idea why the initial post was using the 560M page view metric since it is not impressive when the real data (requests per second) actually is impressive. Probably because 560M sounds like a lot. But then why not go for "6.72B page views per year" because that sounds like a lot more!

I think this is an excellent example on how you can make one statistic support your point by presenting it differently. Next week I'll go over some generic examples to further illustrate this. In the mean time remember that everybody over the age of one that dies was breathing air...

No comments:

Post a Comment