Ideas about distributed tracing and monitoring across multiple systems have certainly generated quite a buzz. It’s becoming more important than ever before to be able to see what’s going on inside our requests as they span across multiple software services. Aiming to harness this importance, the OpenTracing initiative has sprung up to help developers avoid vendor lock-in. Let’s take a look at what this means for developers.
OpenTracing is a distributed tracing related topic. So, it makes sense to open the post by briefly covering Distributed Tracing. What is it? Who benefits from it? These are the kind of questions we’ll be answering.
With that out of the way, we move to OpenTracing itself. We’ll start by offering you an overview of the concept, following that with a discussion on the motivations behind OpenTracing. With that “what” and the “why” out of the way, we’ll then be ready to tackle the “how” of OpenTracing. You’ll learn about the terminology used by OpenTracing, see code examples and tips on how to get started.
Before wrapping up, we hope to clear some misconceptions about OpenTracing, listing a few things that OpenTracing is not. Let’s get to it.
Distributed Tracing Fundamentals
Before moving on to the more specific topic of OpenTracing, let’s take a quick detour into distributed tracing in general.
What Is Distributed Tracing?
Distributed tracing is a mechanism you can use to profile and monitor applications. Unlike regular tracing, distributed tracing is more suited to applications built using a microservice architecture, hence the name.
To quote another post from our blog:
Distributed tracing tracks a single request through all of its journey, from its source to its destination, unlike traditional forms of tracing which just follow a request through a single application domain.
In other words, we can say that distributed tracing is the stitching of multiple requests across multiple systems. The stitching is often done by one or more correlation IDs, and the tracing is often a set of recorded, structured log events across all the systems, stored in a central place.
The Motivation Behind Distributed Tracing
What are the benefits of distributed tracing? What do organizations get from it? At first, the same benefits you get from any tracing: the ability to monitor and profile applications, detecting and fixing problems, and improving their performance.
Distributed tracing, in specific, is particularly well-suited for the way applications are written in the 21st century. Distributed architectures, such as microservices, benefit the most from this kind of tracing.
But you don’t need to wait for your code to make it all the way to production so you can reap the benefits from distributed tracing. If you’re a developer, you can use this technique to help you debug and improve your code even before deployment.
Open Tracing: An Overview
So, what is OpenTracing? It’s a vendor-agnostic API to help developers easily instrument tracing into their code base. It’s open because no one company owns it. In fact, many tracing tooling companies are getting behind OpenTracing as a standardized way to instrument distributed tracing.
OpenTracing wants to form a common language around what a trace is and how to instrument them in our applications. In OpenTracing, a trace is a directed acyclic graph of Spans with References that may look like this (from their website):
[Span A] ←←←(the root span) | +------+------+ | | [Span B] [Span C] ←←←(Span C is a `ChildOf` Span A) | | [Span D] +---+-------+ | | [Span E] [Span F] >>> [Span G] >>> [Span H] ↑ ↑ ↑ (Span G `FollowsFrom` Span F)
This allows us to model how our application calls out to other applications, internal functions, asynchronous jobs, etc. All of these can be modeled as Spans, as we’ll see below.
For example, if I have a consumer website where a customer places orders, I make a call to my payment system and my inventory system before asynchronously acknowledging the order. I can trace the entire order process through every system with an OpenTracing library and can render it like this:
––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–––––––|–> time [Place Order···················································] [Receive Payment·····] [Fulfill Order··] [Email Order...]
Each one of these bracketed blocks is a Span representing a separate software system communicating over messaging or HTTP. You can find a deeper overview on OpenTracing with great visuals here.
Why Should I Care About OpenTracing?
Why, as software team members, should we even care about OpenTracing? For that matter, why should we even care about tracing in general? A previous post answers the latter question in depth.
In short, tracing is becoming more and more important because software systems are becoming more and more distributed and complex. We need ways to correlate them so that we can understand what is happening inside them. When we know what’s happening inside them, we can quickly hunt down defects and other incidents. Good distributed tracing tooling can save you hours or days of frustration.
Now, why is OpenTracing good for this? Let’s put it this way: there are numerous tracing tools out there, but many software teams don’t use one or don’t know about them. Tracing is reaching a tipping point, and that means these tools will be constantly evolving. Because of this, some tools will be more useful to your team than others, and this may change over time. Do you want to be locked into a specific tool because you’ve coupled numerous pieces of code to their libraries? I’d hope not!
OpenTracing provides a specification that these tools can adopt so that development teams can remain free to use the best tool for their current state.
It’s also good for these tools because it lets developers dive headfirst into trying them out with little risk. Otherwise, a developer may steer clear or take months to decide which tool to use. This is similar to HTTP. Because HTTP isn’t owned by any web server company, developers have the flexibility to change how they host their services, knowing that they can still communicate with the rest of the world with ease. Without this, the World Wide Web may never have been born.
Who Owns It?
OpenTracing is owned and managed by a council of individuals from various companies and organizations. They intentionally maintain a diversified set of interests so that they can ensure the specification meets all appropriate needs. This council frequently meets with an advisory board of tracing experts to review the specification and gather input.
Let’s talk a bit about the components of the OpenTracing API. It’s fairly straightforward, but I’ll also link to places where you can explore each concept in depth. At the time this post was written, the most recent accepted version of the OpenTracing specification was 1.1. All example code will be in C#.
This tracer is the entry point into the tracing API. It gives us the ability to create Spans. It also lets us extract tracing information from external sources and inject information to external destinations. You can find more information here.
//create spans var span = tracer.BuildSpan("noop").Start(); //extract external tracing information var spanContext = tracer.Extract(BuiltinFormats.TextMap, new TextMapExtractAdapter(carrier)); //inject current tracing information tracer.Inject(span.Context, BuiltinFormats.TextMap, new TextMapInjectAdapter(carrier));
This represents a unit of work in the Trace. For example, a web request that initiates a new Trace is called the root Span. If it calls out to another web service, that HTTP request would be wrapped within a new child Span. Spans carry around a set of tags of information pertinent to the request being carried out. You can also log events within the context of a Span. They can support more complex workflows than web requests, such as asynchronous messaging. They have timestamps attached to them so we can easily construct a timeline of events for the Trace. More information is available here.
//example span IScope scope = tracer.BuildSpan("send") .WithTag(Tags.SpanKind.Key, Tags.SpanKindClient) .WithTag(Tags.Component.Key, "example-client") .StartActive(finishSpanOnDispose:true)
The SpanContext is the serializable form of a Span. It lets Span information transfer easily across the wire to other systems.
So far, Spans can connect to each other via two types of relationship: ChildOf and FollowsFrom. ChildOf Spans are spans like in our previous example, where our ordering website sent child requests to both our payment system and inventory system. FollowsFrom Spans are just a chain of sequential Spans. So, a FollowsFrom Span is just saying, “I started after this other Span.”
How Do I Use OpenTracing in My Application?
One wonderful thing about OpenTracing is that once you learn it, you can apply it across a variety of tools and libraries. The OpenTracing site has many guides, depending on your language of choice. It also has an excellent set of best practices that cover a wide variety of communication patterns. It’s almost guaranteed that they list the patterns your application follows.
I’ll also give my two cents on instrumenting tracing here:
- Use dependency injection where possible. This will make things easily testable and configurable.
- Follow the idioms of your language and frameworks as much as possible. This will let your team members easily onboard into OpenTracing and tracing in general.
- Many frameworks provide extensibility points around units of work. For example, Spring Boot has pre- and post-request handlers for web requests. Leverage these as much as possible to save you effort when instrumenting tracing.
- If you don’t have a framework or can’t use its extensibility points, keep most of the tracing instrumentation as isolated from the business logic as possible. You can use patterns like Decorator and Chain of Responsibility for this, or even aspect-oriented programming. This will make your code’s intent clearer and easier to read. Exceptions to this include when you need to add tags or log an event; these are commonly specific to the business logic your code is executing.
Another valuable thing you can do is keeping an eye out for existing projects and contributions out there that already use open tracing. This organization on GitHub features many repositories containing third-party projects that use open tracing, written in a variety of programming languages and platforms. To name just a few:
- nginx-opentracing: NGINX plugin for OpenTracing
- java-spring-web: OpenTracing Spring Web instrumentation
- python-django: OpenTracing instrumentation for the Django framework
- java-metrics: Span-based application metrics for open tracing compliant Java tracers
What OpenTracing Is Not
We covered quite a bit about OpenTracing, and I hope you find it valuable. But I want to be clear what OpenTracing is not. It isn’t:
- A specific program or implementation of tracing to download and run.
- One specific tool to use.
- Specification on how to serialize this information across the wire. For that sort of effort, you can look into things like W3C’s proposed trace context specification.
Be Open to Tracing
Tracing is going to continue to be important for our jobs to keep our systems running and to hunt down nefarious bugs. Serverless infrastructure, microservices, and other such distributed patterns will keep us operationally busy for years to come.
OpenTracing strives for us to agree on a way to tackle our operations head on. It lets us agree on a way to add information to our system that we can share with others. We can ultimately use this to see clearly what’s happening in our entire software ecosystem.
If you’re struggling with finding the cause of incidents in your system, consider using a tool or library that supports OpenTracing. You now have the knowledge you need to easily learn and harness it. What should your next step be?
Well, a logical progression for most readers would be learning more about the encompassing topic of observability. Tracing is considered one of the three pillars of observability, the other two being metrics and logging. As it turns out, the Scalyr blog is a great place to learn more about all of these three topics.
While you’re at it, you might want to take a look at Scalyr’s offering, a comprehensive log management tool that will certainly ease your journey towards observability.