Over the past year I reckon I have spoken to more than a thousand Developers/IT Os/DevOps folk through customer calls, demos of Logentries, at conferences such as Velocity, DevOpsDays, AWS re:Invent as well as a bunch of other more low key meetups across US and Europe.
Naturally, one of the first questions I tend to ask is: “hey what do you use for logging?”
Quickly followed by: “What other tools do you use?”
Below is a list of tools I frequently come across (note: this is not exhaustive) that I see making up today’s modern IT and Dev Ops toolkit.
The Modern IT & Dev Ops Toolkit?
This is how I think about the modern day IT and Dev management tools that are critical to supporting the distributed and complex environments, applications, and diverse end users today…
Below is an outline of each key category of tools that need to be in your toolkit, and the leading technologies to consider as you build the toolkit that best supports your team and your organization.
You’ll notice a lot of SasS services; over the past few years people have really started to move away from the on-prem, “roll your own” solutions, and some of the old dinosaurs that tried to provide everything in one box (think Tivoli, Splunk…). Instead they are taking advantage of more specific cloud-based services that are more flexible, require less investment up front, and practically zero management.
Here they are. The top eight must-have technologies for YOUR modern IT and DevOps toolkit:
(Click infographic to enlarge)
Configuration/Automation: Because it is so easy to spin up and down server instances these days, organizations will regularly have 100’s or 1000’s of instances associated with a specific app or set of services. Furthermore, one thing I have noticed over the past year is the number of organizations with autoscaling in place.
Through 2012/2013 a lot of people were talking about autoscaling, but this year I’ve noticed a big increase in the percentage of organizations are actually utilizing it. Large and dynamic environments call for orchestration and automation tools such as Chef, Puppet and Ansible. Vagrant is a complementary tool which also allows you to easily manage your development environments which is also a common fixture.
Server Monitoring: Keeping an eye on server resource usage and performance metrics has long been a common practice. However, the tool set has moved away from solutions that were traditionally installed on premise (e.g. nagios, Solarwinds’ server monitoring) to more lightweight SAAS services that require very little effort to configure and maintain.
Cloudwatch is Amazon’s monitoring service which will give you insight into metrics on your service instances and other AWS services and is a very popular choice across the AWS community. Datadog is another popular (SAAS) service that allows you to easily collect all your server and application metrics in once place and can plug into your application components to retrieve metrics as well as other SaaS services and any existing monitoring tools you have in place. Other common SAAS tools for server monitoring include ServerDensity and ScoutApp.
Log Management & Analytics: Logs are important for a range of activities including developer troubleshooting, monitoring production systems, real-time alerting, customer support, application usage analytics… the list goes on. In fact being particularly well positioned to talk about logging use cases 🙂 we see logging use cases limited only by the type of data you choose to log.
Logs have begun to provide very simple way to perform “risk-free analytics”; you do not have to invest heavily in application instrumentation or an expensive BI tool to start to get immediate insights and visibility into your application behavior.
That being said, often the primary reason for organizations requiring a log management solution to to centralize their logs so Development and Ops teams can easily access log data from hundreds or thousands of instances in a single location, without having to manually log into individual boxes.
The open source logging tool of choice tends to be Elasticsearch Logstash Kibana (commonly known as ELK). While this is a great open source tool of choice, as soon as log volumes grow maintenance of ELK can become painful and expensive, and organizations tend to look for a commercial SaaS solution. Organizations with big budgets, dedicated data scientists at hand and time and energy to invest in educating their users have traditionally looked at Splunk for their log management solution. However organizations are frequently looking for a more lightweight, cost effective, and easier to use technologies and without the need to break the bank.
Logentries is a real time log management technology designed for the cloud with more than 35,000 global users. It also provides a unique unlimited logging technology which allows you to send as much data as you like and decide dynamically what data you want to analyze immediately, and what data you route to cold storage for on demand analysis. This can reduce logging costs to a fraction of traditional solutions.
Incident Alert Management: As is evident from this post, the Modern DevOps Toolkit will regularly consist of a number of different lightweight tools that are used side by side rather than one large monolithic solutions of days past. As such, alerting can be a bit of a nightmare with alerts firing from different end points, which can potentially result in a lot of noise.
Furthermore, managing on-call schedules and what team member should get different alerts can also be challenging, especially as teams grow in size. Tools like PagerDuty and VictorOps have been designed to take the pain out of incident alerts and provide a range of capabilities to allow you to filter important alerts from the noise and to route them to the correct team members at the right time.
Data Visualization: Devops teams using lots of different tools to manage their environment often require a centralized dashboard to view and correlate data from different sources. For example almost every tech company office you walk into these days has a number of flat screens containing key performance indicators for the entire team to keep an eye on. Technologies like Geckoboard, Librato, and Graphite (or Hosted Graphite for those not wanting to maintain their own Graphite deployment) are some of the more popular operations dashboards used by DevOps teams today.
Real-Time Messaging: Chat clients are used across almost all organizations for real time comms and have been around for quite some time. Hipchat has likely been the most popular such tool among the Dev and Ops community, with it’s nice integrations with Jira and Github as well as its ability to ingest alerts from your different monitoring tools. Slack seems to be the new kid on the block, and I’d personally rank as one of the fastest adopted tools I’ve come across. Expect slack to be everywhere in 2015 … if not there already.
APM/End User Experience: Application performance monitoring (APM) is a key technology for developers wanting to optimize and manage the performance of their apps. For Ops teams concerned with overall user experience, APM can give insight into what is happening in your production environments and can be dynamically tuned to provide more information on demand, so that they do not have a constant performance impact on your running systems.
I spent time building APM solutions 10 years ago where they were all on premise solutions that you downloaded, deployed and managed in house. Today, they have largely moved to the cloud with the likes of Smartbear AlertSite, New Relic, AppNeta and AppDynamics leading the charge.
Health Checks: While most server monitoring, logging and APM choices have all moved to the cloud, I still regularly hear that recursive acronym … NAGIOS ain’t gonna insist on sainthood. DevOps teams still have a love/hate relationship with this old reliable, where the common feeling is that while not pretty, but it does the job. When I come across NAGIOS these days it tends to be in the context of health checks for vital services to make sure they are ‘Up’, and is usually complemented with tools that provide a deeper dive if investigation is required (e.g. New Relic, Logentries, Cloudwatch…).
Alternatively, health checks are also commonly performed using Pingdom, which gives you a pretty coarse grained view of service uptime and downtime. Another alternative is to perform health checks at the log level, e.g. by using inactivity alerting to get notified when expected behaviors do not behave as expected….
Tell us what you think?
Above is an overview and categorization of the most popular Dev and Ops tools we have regularly come across 2014 as we have engaged with the Logentries Community across customer calls, meetups and conferences. This list is by no mean exhaustive and is our view into what we see as the modern IT and Dev Ops toolkit. Let us know how it lines up with what you see or if you think we are missing any thing?