Over the past few years we have brought on board many a company that has started out on the path of rolling their own logging solution. Very often companies start down this path…largely because they can (there’s a bunch of open source technologies)…and it’s free, so you can get started with zero down.
But as we all know there’s no such thing as a free lunch, and rolling your own solution contains a number of hidden costs, such as some beefy servers for when your log volumes grow, your VALUABLE time wiring together the initial solution and, most costly of all, maintaining and managing your solution as it begins to span a set of clustered servers. In some cases this works well for organizations, and they manage all their logs in-house with their custom built solution, often combining a number of open source components.
However, one trend we continue to see is that as companies grow, their logs grow, and so do their in-house custom solutions along with all their complexities. As these systems become more complex, more time is required by the engineers who built them in order to maintain and manage these. The more hassle these systems become, the more likely organizations are to jump off the “roll your own” complexity elevator.
We’ve also noticed a trend in the types of systems that get developed in house over time. They generally start off fairly simplistic and can grow into much more sophisticated solutions. Below we outline the different “role your own logging” stages we have come to see over the past few years:
Stage 1 – The insurance policy
The most basic logging solution is generally used as an insurance policy, “just in case” you need access to your logs at some point in the future. This usually involves a combination of Syslog and Logrotate to manage logs on each individual server and a mechanism to archive the logs periodically (e.g. daily). The most common approach we see is where companies simply use S3 for such archiving and which does the job nicely.
This solution is pretty basic as it doesn’t really give you a good way to interrogate your logs if you need to and simply acts as a mechanism for storing logs in case of an emergency. That being said, this is often the first step people take as they enter the roll your own logging complexity elevator.
Stage 2 – Searching for the needle
Once logs are being kept around the question often arises – “hey, can we search across this data?” – usually in the face of some operational outage or customer support query. The log archiving solution above doesn’t really help here as it doesn’t provide any simple way to search your logs.
Enter Logstash. Logstash is an open source tool that indexes log data and allows you to search across it, so it’s often the first place people turn to start digging into their logs given there’s no requirement for money down up front. The main issue we hear from users of open source solutions is that, as log volumes increase, so does the amount of time spent having to manage it, in particular if your back-end requires clustering. Note you’ll also have to host Logstash (e.g. on some AWS instances) and this cost will grow over time also.
Stage 3 – Show me the Metrics
Logs are addictive! Once you start digging into them you’ll find more and more valuable info that you can use for a range of different use cases. IMHO the real power of logs is when you can begin to use your logs as data! At Logentries, we regularly see relatively small systems that produce in the order of 10’s or 100’s of log events per day. These events can contain vital pieces of data for understanding your systems (e.g. response time, memory usage, cpu usage). Where logs become really powerful is when you can identify important field values in your data, and then role up these values into a metrics dashboards to visualize and understand key trends. You can thus use your logs to dynamically build reports that give you different views into your system for a range of different use cases (e.g. performance monitoring, product usage, web analytics, etc…).
Using ‘logs as data’ is becoming more and more common and when rolling your own solution this can again be achieved with something like Logstash by combining it with Statsd and Graphite. Again this does not require any hard cash investment, but you will spend time managing and configuring this, which can become more challenging in particular as your data volumes grow.
Stage 4 – The deep dive
The final type of the roll your own logging solution we see is where companies also write their log data to something like HDFS – whereby they are running more complex queries against their data (e.g. to identify correlations or associations between error events that lead to serious issues, or to build reports such as funnel or cohort reports for web analytics).
This type of analysis can be super powerful and is really only limited by the quality of your data (i.e. what data you decide to include in your log events) and the intelligence of your data scientist 🙂 However at this stage you really require deep expert skills and someone who can play the data scientist role at your organization. So again, while your solution might not require a cash investment upfront, you are going to require the some serious tech skills and someone with the time to invest in building out your Hadoop cluster and queries.