Out of a sample of 6511 random cat images taken from Flickr, 640×640 represents 31% of all image resolutions.
Below is my chart covering the cumulative distribution of the 731 distinct image resolutions:
As we can clearly see, 10% of resolutions make up 75% of images on Flickr.
Also, the average size of a Flickr image is 1.7MB. They recently announced that they have 6 billion images, for a total of 10PB of images. 10PB of images would use 1,700 WD60EFRX 6TB drives, which retail for $300 each (and yes, you can buy 6TB drives off the shelf). The total cost would be $510,000 for a single set of drives, not counting the computers to run them, and different image resolutions, and replica sets, and so forth. Google did a study in 2007 which found that hard drives which are constantly spinning have an expected lifespan of around 5 or 6 years (you have to do some extrapolating to get that, and it’s an estimate). That comes out to 24 expected failures per month, or $7,200 per month in replacements.
We can assume that Flickr has around 10 datacenters, as a sort of fermi approximation (and CDN caching, etc). If the computers to run the drives cost about as much as the drives itself (which seems reasonable), then we get a total bill of $10 million to setup all of the datacenters, plus $72,000 per month in replacement drives, plus $1 million/yr in staff and $1 million/yr in housing.
Amazon S3 sells data for $0.0275/GB/month over 5PB/month. That comes out to $280,500 per month, or $3.4 million per year. Slightly cheaper than the in-house datacenter.
What is the point of all this? Compression. Every 1% of compression that they can eke out of their images saves them at least $27,000 per year, and that’s a number that’s only growing (unless Flickr folds). So if you can manage to make a JPEG 3% smaller, you could make a hundred grand a year for the rest of your life off the savings.
Flickr is but a small fish in a big pond. According to Quora, Facebook currently has 90 billion pictures, 15x the size of Flickr, and adding 6 billion pictures per month. Someone is uploading all of Flickr to Facebook every month. Using our numbers from above, Facebook spends $5 million per month on new hard drives to put pictures on, not counting replacements. A 1% improvement saves them $50,000 per month.
Something to think about, next time you run a large datacenter.