Last updated at Mon, 06 Nov 2017 20:27:05 GMT
Lack of an enterprise logging policy is a common shortcoming when it comes to the organizational discipline of logging from within large, distributed applications. Just because you can get log data into a system, it does not necessarily follow that the data you are entering is useful. The old adage, garbage in, garbage out holds true. If an enterprise allows anybody to enter log data in any way possible, anybody will. In the long run, without a proper policy for logging, and procedures to support that policy, an enterprise will spend an unnecessary amount of time and money getting meaningful information from its logs.
It’s a pay me now, pay me later, sort of thing. If your company allows developers to log data on a whim, the amount of badly or randomly structured log data in your system will grow at increasing rates. And, as time goes on, you are going to spend more time and money on the backend trying to make sense of logs that should have been easy to process in the first place from the front end.
You have a choice. You can have a simple policy the ensures that structured, easy to process log data is always entered into the system. Or, you can spend time and money cleaning up your log data on the backend. When it comes to log entry and subsequent log processing, you can pay me now or you pay me later, but you will pay me.
Implementing a logging policy that is easy to follow makes sense. Yet many companies shy away from implementing such a policy under a misconception that it’s burdensome and costly. It’s not. In fact, I am going to show you how to do it. It’s all about following one principal: Use Self Describing Data Formats.
The Case for Self Describing Data Formats
Consider the following log entry shown below in Listing 1:
25 Mar 2016 16:39:03.305 info wXjpihBWuCwe1jkiNiV8YB
Listing 1: A cryptic log entry
What does it tell you? Well, we can infer that the data was entered on March 25, 2016 at a certain time. Also, we can infer that the entry had something to do with, info
. But what about, wXjpihBWuCwe1jkiNiV8YB
? You got me.
Now consider this log entry shown below in Listing 2:
25 Mar 2016 17:12:07.061 info {"applicationName":"GoodDogBadDog","token":"wXjpihBWuCwe1jkiNiV8YB"}
Listing 2: A log entry that structures data using JSON
The start of the entry in Listing 2 is the same as the previous entry in Listing 1; the entry was done on March 25, 2016. And, the entry has something to do with info
. But, what follows tells you exactly what the entry information is about. An application named, GoodDogBadDog
submitted a token
and that token has a value of, wXjpihBWuCwe1jkiNiV8YB
. This may seem trivially obvious, but it’s not. The first entry tells you nothing. The second entry tells you everything. The second entry is self describing. The entry in Listing 2 describes both the structure of the information as well as the information itself. The format of the entry, all the data between the first, left curly bracket and the last, right curly bracket is JSON. JSON is a self descriptive format. There are others, XML and key-value pairs, for example.
Listing 3 below shows the JSON entry above converted into a set of key-value pairs:
25 Mar 2016 17:12:07.061 info "applicationName":"GoodDogBadDog","token":"wXjpihBWuCwe1jkiNiV8YB"
Listing 3: A log entry that uses key-value pairs to structure information
Here is the JSON entry above converted into XML (please see Listing 4 below):
25 Mar 2016 17:12:07.061 info <entry><applicationName>GoodDogBadDog</applicationName><token>wXjpihBWuCwe1jkiNiV8YB</token></entry>
Listing 4: You can use XML to structure the information of a log entry
No matter which self describing format you use, the important thing to know is this, logging data within the structure of a self-describing data format will save you time and money almost from the gitgo. Structured data is easier to parse and easier to index. Indexed data is easier to query. In fact, a technology such as Logentries has the intelligence built-in to parse and index JSON automatically. It’s a win-win situation all the way around.
Making It Happen
So, if we agree that using a structured data format such as JSON or key-value pairs is a good thing to do, how do we make it happen within the enterprise?
First we need a policy. At the enterprise level, the best policies are the ones that are the simplest. Take this policy for example:
All employees will display their ID badges in plain sight at all times when on the company’s premises.
It a simple policy that is easy to verify. When an employee is in the building, you can see his or her badge or you can’t.
Because the policy is so simple, it is easy to create one or many procedures to support the policy. Thus, one can well imagine this set of procedures to facilitate the badge display policy:
- Upon issuing an employee an ID badge, provide the employee both a shirt clip and an ID badge necklace.
- Instruct the employee that he or she needs to wear his or her ID badge in plain sight using either the shirt clip or badge necklace when on the company’s premises.
The policy is simple and, as a result, the procedures put in place to support the policy are simple and easy to enforce. The more complex a policy gets, the more complex are the procedures that follow. And, enforcement becomes complex too.
So, when it comes to ensuring that developers use structured data formats when logging, the key is to Keep it Simple.
Consider this policy:
All developers will log data using JSON to name the fields(s) of log data being entered as well as the value of each field.
Simple.
Next comes the procedures. You have a few choices. One way is to enhance your logging component at the code level to make it so only structured data is sent out from an application to the log collector. If you are using a compiled language such as C# or Java, this is a viable approach. (Please see Listing 5, below)
//This class ensures that log entries //are submitted is structured data public class Logger { void static Log(LogData logData) { //Convert object to JSON //Send json to log collector } } var logData = new LogData(); . . . Logger.Log(logData);
Listing 5: C# code that enforces using JSON structure log entries
You create a logging method that accepts POCO or POJO and then internally the method converts the object to a structured data format, such as JSON. Thus, the only way you can log is through this method. If your code goes errant, logging will not happen, or if you fail to use a POCO or POJO, non-compliance will be picked up at compile time.
When you are using runtime languages such as JavaScript of PHP things get a little harder. Yes, you can throw an error at runtime, should your code come across log data that is random. But, with this approach, the horse has left the barn. Of course, you can stop the code upon a “RandomLogEntryException” and this might be a good thing to do, if you want to catch the problem in a Development or Q/A environment. However, stopping the code in production is just bad business. If you have a good runtime testing environment, raising exceptions on random log data entry can work.
Log inspection and code reviews are alternative procedures, should ensuring policy compliance not be possible at the code level. The procedures must be very exact. How will log inspection happen? What or who will do it? What is the frequency of inspection? How will the code review be conducted? How often? How will the activity be documented? How is non-compliance addressed? Answering these questions will bring about the clarity and detail required for having procedures that produce reliable results.
Thus, we can have a policy that looks like this:
All developers will log data using JSON to name the piece(s) of log data being entered as well as the value of each piece.
With accompanying procedure to support the policy:
- The company will, if possible, use or enhance logging clients to ensure that only JSON is sent to the enterprise’s log collectors.
- Should a code a solution not be possible, personnel will create automated, service side tools that conduct daily analysis of log entries to ensure that structured JSON data is submitted for logging. Sources that create log data that is not submitted in JSON will be notified by email that correction is required.
The key takeaway is that the policy is simple and the procedures to support the policy are relatively easy to implement. Also the procedures are verifiable.
If it seems that I am partial to using JSON for doing log entries, you are right. I find JSON to be efficient in terms of layout specification, self-describing in terms of format and easily adaptable into runtime objects in environments such as browser side Javascript and server side NodeJS.
However, JSON is my preference. Preferences vary by developer. The key is to use a data format that is easy to adopt in your enterprise. If your developers like name-value pairs, use them.
Putting It All Together
Data formats that are self describing are easy to work with. You don’t have to spend time and money trying to figure out what is going on. The format tells you what you need to know. In addition to being informative, self describing formats such as JSON and name-value pairs index easily once parsed. And, typically parsing these formats on the server side is easy too.
When it comes time to Implement a policy and supporting procedures to ensure the use of self-describing data formats in your enterprise, remember that it’s best to keep it simple. A simple to understand policy accompanied by easy to follow procedures will save you time and money. Remember, when it comes to your enterprise’s logging efforts, you might pay now, or you might pay later. But using self-describing data formats will reduce the price you pay, no matter what.
Logentries makes it easy to capture, visualize, and alert on the log data from your enterprise. Get started with a free account today.