To understand log management, you first need to understand what problem it solves. Once you see that, you’ll know both what it is and why you need it.
Software these days involves a lot of complexity that didn’t exist once upon a time. We’ve moved things into the cloud, created software/platforms/infrastructure as services, and embraced distributed computing.
That’s a sea change from the good ol’ days of the 1990s. Back then, you’d write a bunch of code, build it, put it on CDs or floppy disks, and mail it to people. It’s even a sea change from the 2000s, when the web application took over. Instead of CDs, you’d set up a web server, deploy your software to that, and let users and their browsers have at it.
But today, we have containers and microservices. We have software intelligence distributed around the globe, spinning up and down on demand, collaborating and orchestrating. We’ve traded the simplicity of the historical monolith for the flexibility and complexity of distributed intelligence.
Log Files in a Distributed World
Think about the change I’ve just described. And now imagine what that means for the existence of a log file.
In the 1990s, you’d add code to your application that dumped information to a single log file. If your users had problems, they could zip up that log file, along with an OS log file for good measure, and send those to you for troubleshooting. With 2000s web applications, that same application log file, along with the web server log file and the database log file, did the trick.
But now? Good luck. Your production operations include six RESTful microservices on six different servers, a bunch of on-demand containers, a few miscellaneous web apps, a service bus, and who knows what else? Each of those concerns is contained, isolated, simple, and useful.
But troubleshooting across those concerns, when the issue happens in the gaps, can be a mess. And gathering 20 different log files that you attempt to reassemble into some facsimile of order doesn’t help matters at all.
Log Management to the Rescue
That is where the idea of log management as a first class need enters the picture. If you have a desktop app or a simple web app, you can probably get by with grep, text editors, and elbow grease. But as soon as you grow beyond that, you’re going to need a better approach.
Log management is that better approach. Instead of regarding your applications’ logs as separate, unrelated entities, you conceive of them as parts of a whole. You weave them together and then use them to paint a dynamic, intelligent, and visual picture of the health of all your systems.
If that sounds daunting, don’t worry. You don’t need to implement all of this yourself. In fact, you definitely shouldn’t do it yourself any more than you should write your own source control. A lot of talented toolmakers have invested significant effort in helping you with your log management.
But rather than focus on specific tools, let’s take a look at log management as a function of its components. What does a good log management scheme involve, and what should you expect out of it?
Gathering and Storing Your Log Files
First up, it should do the basics. I’ve just described the value proposition of unifying your distributed logging mess. So, obviously, your log management tool should do that, at a bare minimum.
Specifically, this means the ability to take logs from all of the sources of interest that generate them and pull them together. It should also interleave them so that you have a chronological perspective on your different log files.
On top of that, your log management tool should have the capability to set policy on maintenance of these log files. For how long should logs be kept? How much storage space should you let them consume? A log management tool unifies your logs and answers these questions.
Parsing and Turning Your Log Files into Data
Aggregation is a huge step, and a critical one for you to manage your production footprint. But it doesn’t stop there.
It won’t do you a whole lot of good if you have a haphazard mess of 20 different log file formats. So a good log management solution needs to understand your log files. It should be able to parse them, extract common elements and store them in a common format.
In other words, it should turn your logs into first class data rather than just a bunch of text with time stamps. This will help you look for patterns and consume the information in an intelligent way.
Let You Search, and Do It Quickly
When you turn the log entries into data and you store that data, you get some cool capabilities. In short, you can think of your aggregated log files as a sort of database. This means that you can expect log management to help you with some of the things databases help people with.
Key among these? Search. Probably the number one use case for a log file in general is hunting down some production issue. And, since the first log file, this involves opening the log file and searching through it for some error message or particular time stamp.
So a good log management solution should support search. And, beyond that, it should support fast search. Time is short and tensions high during production issues. A good log management tool should never be your bottleneck.
It Should Get You Out in Front of Issues with Monitoring and Alerts
A log management tool gathers your log files, turns them into data, and lets you search them. But it doesn’t do this in batches or every now and then.
It should do this in real time.
This means that a good solution is constantly creating intelligent, searchable, live data for you. Think of what you can do with that data. You can get out in front of issues by monitoring live trends and setting up alerts.
For instance, consider the case of security. You’re pulling data from your web server log as well as your application log. And let’s also say that there’s a simultaneous spike in 403 Forbidden responses and a rise in invalid login credentials.
Historically, you’d have found this out in retrospect during a post-mortem, after someone broke in and did some damage. “Oh, yeah, wow, someone was really trying to break in.” But you can leverage a log management tool to give you the warning before anything bad happens. Set it up to alert you when a certain number of 403s occur or when a certain number of logins fail. This lets you investigate ahead of time.
Visualize Your Production Environments Using a Dashboard
The final point that I’ll mention is the one that will probably draw rave reviews throughout the business. I’m talking about visualization.
Don’t get me wrong. Reducing troubleshooting times and preventing and monitoring issues are things everyone can certainly appreciate. And they’re valuable to your organization. But dashboards really bring all of that to life for folks outside of IT. And a good log management tool will offer graphs, trend visualizations, and dashboards.
I’ve talked before about the importance of visualization. It’s one thing for you to monitor your logs and see that you record many more purchase transactions on Monday mornings than any time during the rest of the week. But it’s a whole different thing for you to put up a graph with a huge spike. With the former, decision makers file it away to think about later. With the latter, they tend to act.
If you have various logs that you tend to mine for information, but you’ve never heard of log management before, I encourage you to give it a try. They’ll make your life easier, but they’ll also help everyone in the organization understand what you do and what’s actually going on with the production software.