One day, one of our main web APIs was down, and the first person that knew it was my boss. We were so worried about bringing the API up that we never paid attention to how he was able to be one step ahead of us. There were times when we even thought he had nothing else to do than constantly refresh the web page. But the truth is that he wasn’t doing that at all. He was using an HTTP monitor that emailed him every time the API was down, slow, or unresponsive.
It was actually lucky for us that he had that monitor: it helped everyone fix things before our clients could notice. But what is an HTTP monitor, anyway? And why else would you need it?
These are the questions we’ll answer in this post. We’ll start with the “what” question, by explaining what an HTTP monitor is and the ways in which it can help you. After that, we’ll walk you through some of the main metrics you should care about when considering adopting HTTP monitoring. As it turns out, there are quite a few of them, but not all generate the same value for you. It’s important to distinguish the wheat from the chaff and we’ll help you with that. We’re just that nice.
After walking you through the important metrics you should know, we cover the importance of alerts and how to configure and use them wisely, under penalty of getting underwhelmed by the sheer volume of them. We then part ways while sharing some final thoughts.
HTTP Monitor 101
As promised, let’s start with the basics. You’ll learn what an HTTP monitor is and what it can do for you.
What’s an HTTP Monitor? It’s a Friend Who’s Watching Out for You All the Time
Think of an HTTP monitor like it’s a friend you’re paying to constantly check if your HTTP endpoints are working. That friend will hit the endpoints you tell them to, and they’ll follow the exact orders you gave them. Those instructions will be something like, “Hey, I need you to hit this endpoint every five minutes. And only call me if they’re responding with anything other than OK or if you haven’t heard anything back from them in less then a second.”
In the larger universe of monitoring, HTTP monitor belongs to the category known as synthetic monitoring. That means that, unlike real-user monitoring, synthetic monitoring solutions simulate interactions with the website or app in an automatic way.
A basic check, on the other hand, verifies the configured metrics—which can even include something related to the content of the page—but it never renders the downloaded content into a browser window. HTTP monitors are usually classified as basic checks.
Why Are HTTP Monitors Used?
You can use an HTTP monitor not just for web APIs but also for any of your websites. All that matters is that the application you want to monitor can be accessed using HTTP.
It’s also important that they’re publicly accessed. If they’re not, you can make use of a bastion host for critical-no-public endpoints. This is needed because an HTTP monitor is something that’s outside your network. Imagine that you or your provider have massive downtime. Your application will be down, but you also have that good friend who’s supposed to tell that you’re down. People also use HTTP monitors as part of the integration testing suite to validate a deployment.
HTTP monitors are really simple, but fault tolerance and high availability in these type of systems make them complicated. And I bet you have enough of a time trying to keep your application up and running to want to worry about something else. Scalyr helps you to alleviate the load by offering HTTP monitors to probe your servers and measure availability, status, and performance.
Some Metrics You Need to Know Before You Start
It’s not just about whether it works or it doesn’t work. In order to make better use of HTTP monitors, we need to understand a handful of metrics and codes you can use to know how bad, exactly, your app is behaving.
Let’s explore these metrics and codes:
HTTP Response Codes
There are several codes that will give you more information about what happened in a request. But when monitoring HTTP endpoints, the only code that you’ll probably use is the “200 OK.” If you don’t receive this code in the response, something is wrong.
They are grouped like this:
- 1XX is only informational.
- 2XX means success.
- 3XX means a redirection happened.
- 4XX means something in the user’s request is not correct.
- 5XX means that something is wrong in your servers.
So, it doesn’t make sense to tell your monitor to watch for any code other than the “200 OK.” If you get something different than this code, it will be useful to troubleshoot, not to monitor.
This is simply the payload you receive back when hitting the endpoint. It could be in various formats such as JSON, HTML, or even plain text. This is useful because you can configure your monitors to extract, parse, and analyze the text.
So for example, if the word “error” is found, you can trigger an alert even though you’ve received a 200-status code.
Or what about when, for example, an endpoint should return data for Ferraris and you’re getting back data for Lamborghinis? That could be caused by a data migration in the database. But it doesn’t really matter what caused it; it matters that you can be notified almost in real time automatically.
The response time is the time the endpoint takes to respond with a status code and a response body. It’s also called latency. Response time will depend on the endpoint you’re monitoring and the SLAs you’ve defined for it. If it’s an API, it usually needs to be in a range of 100 ms to 500 ms, but if it’s a website, the rage varies from 1 s to 2 s. But it depends on your use case.
There are different metrics that serve different purposes. You can combine them so the monitor will alert you only when something really bad is happening, and not just because there was a temporary problem.
And it’s important that your applications are returning the proper code too. I’ve seen cases where the endpoint is returning a 200 code, but when I looked at the response body, it was clear that the application was throwing an error.
You Can React When Something’s Wrong
After you know what data is important when probing HTTP endpoints, the next step is to configure the monitor so that it can notify the proper people. This part is tricky, but it’s the most important step because you don’t want your monitor to become the boy who cried wolf.
For this reason, you might want to only be alerted when something can’t be recovered or healed by itself. Otherwise, the next time you receive an alert, you’ll rapidly think, “Nah, I always get these alerts. The recovery notifications will come after—they always do.” But no, after five minutes you start receiving calls from everyone, including your boss, telling you that the site is down.
So let me give you some examples of HTTP monitor configurations that have worked for me. An alert will be triggered when at least one of the following happens:
- The system received a status code that’s different than 200 OK.
- It takes more than 800 ms to respond back.
- The response body size in KBs is zero.
- The response JSON doesn’t include the “success” value in the “status” property.
- A custom header in the response isn’t found.
- The HTTPS endpoint is timing out after 800 ms and the status code is not 200 OK.
The idea is that when you receive an alert, you previously verified all the contexts of the request. Downtime could be caused by a ton of different reasons. It could be something with a server, and you just need to replace it. Or maybe an external dependency like others’ API endpoints are having problems and you just need to disable them temporarily. What happened doesn’t matter too much at the moment of being alerted. What matters is that you can act soon to reduce your system’s downtime.
You’ll Always Be One Step Ahead
Don’t let your clients be the first in line when reporting a problem in your system. Always be one step ahead of them and find the mechanisms to recover quickly. HTTP monitors will help you to minimize downtime by being aware. They’ll do this boring job for you.
These type of monitors will inform you about the uptime percentage of your system endpoints. They’ll let you know when endpoints aren’t loading or taking too much time. And it’s not just that. You can do interesting things like parse the response body and look for any unexpected results. So it’s important that the dev team knows what status should be returned depending on the result.
Know what you have at your disposal, and take advantage of it.
As we’ve mentioned in the post, Scalyr offers, among many other valuable features, HTTP monitors you can use to check the availability and performance of your apps. We invite you to try Scalyr today.