I’ve recently been reading Nate Silver’s book, “The Signal and the Noise.” In the book, Silver looks at a number of areas where predictions have been made and considers how successful they have been, as well as the reasons why they have been accurate (or not).
I couldn’t help but draw the similarities how most companies use log management tools today.
Silver’s particular interests are political forecasting (see www.fivethirtyeight.com) and baseball, particularly predicting player performance. He doesn’t always get his predictions right, but he does explain the rationale behind his predictions and seeks to be unbiased.
In his book, he also considers other areas such as meteorology where the models are based on established scientific models, and prediction accuracy increases as more computing power is applied to the problem.
Silver shows that predictions in areas such as economics have been less successful, e.g. he examines why many economists missed the recession, and why supposedly expert forecasters get election predictions wrong so often. For example, before the recession of 2008, the assumption was made that house prices would continue to rise, whereas history has shown they can decline in certain circumstances. Consequently there was false confidence about the associated risks in the event of a housing bubble or the risk that a fall in house prices could trigger a global crisis.
Other than giving us interesting problems around prediction where computing power can be applied to big data sets, why should his findings be interesting to me or anyone interested in logs and big data?
Firstly, I like the statement he makes that; “before we demand more of our data, we need to demand more of ourselves.”
In terms of the data being generated, “most of it is just noise, and the noise is increasing faster than the signal.”
Silver’s approach can be summarized as an attitude based on Bayes Theorem, and while that is a mathematical formula, he uses it as a basis for incorporating probability, uncertainty and testing into analysis, as well as questioning assumptions and beliefs.
The following are key lessons I took from the book:
- Consider Risk vs Uncertainty
- Appreciate the value of a domain expert (using computer/data support)
- Be aware of your biases
- Ask yourself if you are a fox or a hedgehog
Risk vs Uncertainty:
The relationship between Risk (a gamble with odds that you can put a price on) and Uncertainty (a risk that is hard to measure) is key, as we often tend to ignore or make incorrect assumption about uncertainty. This relationship is considered by Silver to be key to many of the issues seen in predictions in the finance industry.
In that case, there is plenty of computing power available, but predictions often turn out to be incorrect due to incorrect assumptions on the level of uncertainty, e.g. a mortgage backed security where a risk is calculated based on the individual mortgages being independent of each other and using models assuming a manageable downturn in house prices. Of course, these assumptions turned out not to be true in the event of a global housing price fall and so the risks had in fact been greater than calculated.
Appreciate the value of a domain expert
The second bullet is good news to anyone who thinks they have developed expertise based on experience, such as the expert sysadmin or DevOps developer.
Models are best when applied with human judgement to understand risk vs. uncertainty and the weighting to be applied to different factors. For example, in baseball prediction Silver cites the skills a scout may have in judging a prospect under different headings, such as work-ethic, focus, humility.
It is important for this judgement to be used to ensure that the greater computing power available is not just used to make seemingly more accurate predictions based on incorrect assumptions or a greater amount noise rather than the signal.
Be aware of your biases
The third bullet is a warning for us to be aware of our biases, because there is massive value in the experience of a domain expert (provided he is not biased!). Silver says that pursuing the objective truth is a goal for all those making predictions, but the forecaster must realize that they perceive it imperfectly.
He points out we often focus on signals that tell the story we want, and not the story we have. Or we make assumptions that are not true. We may not think we have any biases, but ask yourself a few questions.
- Have you got used to one tool for looking at system performance or failure analysis?
- Are you prone to always blame a particular webserver instance, a certain application, database or a single vendor and begin by looking at those components before (or even to the exclusion of) other ones when a failure happens?
- Would you be inclined to use the available data selectively to focus on one particular component or technology or consider it in an open way to select the component?
I hope in selecting the four bullets above as key points from the book, I hope I have not shown any bias of my own!
Are you a fox or a hedgehog?
The final bullet is important in terms of how much information you rely-on and gather to find errors, analyse for trends and maybe even make predictions.
Silver says hedgehogs believe “in governing principles about the world that behave as though they were physical laws.” and foxes “are scrappy creatures who believe in a plethora of little ideas and in taking a multitude of approaches toward a problem.”
Or put another way, “The fox knows many things, but the hedgehog knows one big thing.”
This is inspired by an Isiah Berlin essay “The Hedgehog and the Fox”, using a title borrowed from the Greek poet Archilochus, http://fivethirtyeight.com/features/what-the-fox-knows/.
A key point is that a hedgehog may only gather the information that confirms their existing views and/or ignore new information that conflicts with them. For example, Silver cites work on political pundits which shows that those “experts” who do the most interviews tend to be the most confident and most strident in their views, but make the worst predictions.
The key attributes of a fox are that he/she is multi-disciplinary, adaptable, self-critical, tolerant of complexity, cautious and empirical. On the other hand, a hedgehog tends to be specialized, a stalwart, stubborn, seek order, confident and ideological.
How do log management tools fit into this world of foxes and hedgehogs?
By allowing logs of all sorts from your data servers to be uploaded, searched and analyzed using a powerful UI to highlight areas of interest, we enable a pluralistic or foxlike approach that can contribute to your data analysis for a range of purposes.
You could of course limit yourself to being a hedgehog by only loading logs from those systems you have a bias against, but note the third bullet above and you don’t really have an excuse any more.
The point above on the value of applying the reasoning and experience of experts together with computing power to analyze the right sets of data should inspire those of us who believe in the experience of computer professionals and see the potential in all the data we are gathering and analyzing from logs and elsewhere.
So, just check your biases and try to be a fox.