There is a whole lot of talk about this DevOps thing. Pushing teams to move faster, increasing focus on results, and doing so with better and better quality. But the elephant in the room is how we go from immutable infrastructure to scalable environments that move with the code, not against it. And making infrastructure move at the speed of code, takes more than orchestration tools. Operations needs to be confident that they can let the meta-application scale infrastructure on-demand without resulting in a huge mess of tangled servers.
Virtual Machines (VM) help make infrastructure more flexible. Machines can now be treated much like any file, although very large. They can be moved around and copied on-demand. And if you think about it, in the highly flexible model, VMs should have very short life spans. They should live and die based on the current application load. Or for complete end-to-end testing, a full set of infrastructure should be provisioned for each test run. And then deleted when the test completes. When a developer asks for a machine the response should be “you will have it in minutes,” not days. Even better, “you can do it yourself,” without burdening IT operations with every request, but maintaining oversight of infrastructure. This vision is not how it usually goes.
The reality is most of your VMs have been up and running for months. And they are not as agile as we want them to be.
Are you a server hugger?
As we all know, we are creatures of habit. We are accustomed to thinking about servers as the physical things they used to be. And thus, something that you set and forget. And, if you are not dealing with your own datacenter and/or do not have control over the virtualization layer, you might have limited control over performance in copying, moving, and spinning up VMs, thus you do not do it often. This can be solved with a well managed private cloud, or a high power public cloud designed for such a thing. Like spot instances on AWS.
But the other huge, and perhaps the most pressing, reason why we don’t free our VMs is because we are afraid of server sprawl. Server sprawl is a very large problem. It can impact costs; and if you are using a cloud provider, it can cause issues in knowing which servers are handling which work loads, and it can just waste a tremendous set of resources.
How many rogue VMs are currently in your environment?
In any case, most of us want to avoid this situation as much as we can. A free-reign, scalable environment by some meta-level orchestration layer is a bit frightening, and rightly so.
The trick to making it all happen is log analysis.
Herd Servers with Log Analysis
Normally you think of log analysis as that thing added to VMs once they are created, in order to monitor logs that the VM creates. But it also gives you the possibility of letting your server farm to run free without the fear and lack of management that could possibly happen.
And the method for doing it is quite simple: create a gold master VM, or orchestration script. To that VM you will pre-install your log agent with appropriate settings. When anything changes to the configuration such as updates, and installs, you will only make that change on the gold master scripts or VMs, not in production. That change may or may not trigger a replacement of machines already provisioned.
All your VMs will be provisioned off this gold master. And when done correctly they will automatically be linked to your log analysis platform. Now here is the gotcha. Naming conventions. Before taking on this approach you need to have a strong, universal and easy to understand naming convention.
This is important for easier management, but also the ability to remote into machines without much guessing. Or, if you identify a machine in a log file, you want to be able to know just from the name what its purpose, location, creation date, and workload type are. I’m not suggesting your names get silly long like some file names. Only that the name tells enough about an isolated machine, that you can take the next step.
As part of your provisioning this will require you to use something like SysPrep on Windows, or an orchestration tool on Linux, to do the necessary dynamic changes to the machines’ admin accounts, network configuration, and machine and host names.
Here is where log analysis comes in to help again. You can actually take the log files from your virtualization layer to associate server provisioning information with individual servers. This way, even if you did not proactively create a good information architecture for your VMs, you can associate machines with their logs and be able to ask questions about the details, respond to an issue, or take an action on it.
For the Advanced
More advanced implementations will likely have custom services on the VMs that are sending more detailed logs to the analysis platform. And many organizations will consider using Docker as the container layer instead of heavy virtualization.
The other scenario for the advanced, or possibly a challenge, is keeping track of machine IPs. It is especially important if your model includes allowing front and back end developers to access machines in the ever changing farm. To do so they will need some way to identify the IP of the machine quickly.
This requires some smarts on the network layer. If you are leveraging virtualization like VMWare it is possible to snapshot multi-tier environments including the network layer and make this portable as well. That way all IPs are maintained, but contained in individual environments and the vLan isolated from all others. Thus all you need to know is the environment name. However this will complicate any configuration changes to the gold master.
Or you can make sure that in orchestration or configuration management scripts you make IP allocations variable, but record in your log platform all the details like we suggest in this post. There are also some new orchestration as a service tools that will do variable IP allocation for you, and an area that is for sure going to improve.
It is not terribly easy to get this engine running, and the most complex part is planning for change management, not the log analysis. For example do you allow outdated VMs to keep running in the data center or do you automatically kill them and replace them with a new one based on an updated gold master.
Do it now, or do it later, unshackling your infrastructure is a must in order to make your move further and further into the DevOps framework. And the way to make sure that highly scalable environments do not get out of everyone’s control, is by building in variable log analysis. Log analysis that automatically attaches itself to every machine provisioned, will help everyone have a picture of the entire environment, without necessarily touching a single VM. It is not easy setting your infrastructure free, but when you do, the benefits of better application performance, better tools for developers, and better control over your “datacenter” make it worth it. And log analysis is the way to ease the tension.