To start this post, we’re going to play a game called “Spot the Error.” Can you spot the error?
Did you find it? How long did it take you? And this is only 25 log events! (If you couldn’t find it, click here to see the solution.)
In 2012, according to Marcia Conner based on research done for her Fast Company article, “Time to Build Your Big Data Muscles,” over 2 exabytes of data were created every day. That’s over 2 billion gigabytes every 24 hours. That’s over 23,000 gigabytes per second. Additionally, according to IDC, the volume of digital data is expected to reach 7.9 trillion gigabytes in 2015, with 90% of digital data generated by machines. By any measure, that’s a ton of data and it’s only getting bigger.
What does that mean from a logging perspective? Well, it means get a good logging tool, because you’re going to have a lot of data to get a handle on! For instance, here at Logentries we process over 10 billion log events per day…that’s billion with a “b.” Dr. Anton Chuvakin put out a nice blog post a while back that helps us see exactly how much data that is:
100,000 log messages / second x 300 bytes / log message ~ 28.6 MB*
x 3600 seconds ~ 100.6 GB / hour
x 24 hours ~ 2.35 TB / day
x 365 days ~ 860.5 TB / year
x 3 years ~ 2.52 PB*
So, with over 10 billion log events, we’re looking at about 3 trillion bytes of data (better known as about 2,800 gigabytes per day or 2.8 terabytes per day). And that’s not including the data that we append from our pre-processing. Nor does it have anything to do with anything done to that log data after it’s indexed…that’s just the volume of data we index every day, before further work is done to bring out the insights.
But what does this mean from the perspective of someone like you? Based on Dr. Chuvakin’s work and what we typically see here at Logentries, we’ve put together a chart based on size/type of company to show you how much log data typical companies can put out:
And the numbers can easily go quite a bit higher than this chart shows. With hundreds of thousands of log events produced daily at a minimum, you’ll need a better way to find the important tidbits than our little game of I Spy above.
To put this in perspective, if you had a dollar bill for every log event you had in a day and you laid them end to end, a large cloud provider’s log data would stretch halfway from Earth to Mars at their closest approach (that’s 26 million km of log data); an online marketing organization’s log data would stretch 1.25 times around the world; even an early stage startup’s logs would stretch almost all the way across the English Channel at the Strait of Dover (that’s 26 kilometers of log data)!
The upside is that all of this data presents a great opportunity to get a better understanding of your application(s), your users and more if you’re paying attention to it. The downside is that it can just be overwhelming and lead to prolonged downtime and costly outages if you can’t use it well.