Orchestrating Microservices: A Guide for Architects

Microservices continue to be a popular way of dividing up functionality for our complex systems. They give us the flexibility that allows us to hone and scale on specific capabilities, while also being agile in delivery.

But when you have all these separate modules doing their own things, the question inevitably comes up: How do we stitch them together?

We have to very carefully stitch together microservices because the cost of coupling is high. We encounter:

more failure,
higher thresholds to test,
and higher difficulty to understand systems and other costs.

Let’s take a look at what we can do to orchestrate microservices carefully and effectively.

First, a Quick Primer on Orchestration

Before we dive in, I want to be clear: there are two different meanings to the word “orchestration.”

One is composition, joining together data to show on a screen. The other is what I would call true orchestration: coordinating commands across multiple services.

Composition

Say you have data sitting in 10+ different data sources and you’re thinking, “It would be useful to see all this data together at once.”

So you stitch together the data into giant grids on a screen that would make Amazon’s product pages weep in shame.

These are the “reads” of the system. This is actually composition, not orchestration.

In the vast majority of cases, I caution against orchestrating read data if you are not on the front-end development team that directly delivers value to customers.

Orchestration provides better value when coordinating back-end, state-changing tasks.

This is because when you are composing data from multiple sources, you are seen as the source of that data unless you are very clearly the “front-end.”

That said, if you are in a role outside of your control where you must compose data to other front-ends, some of these tips will still help.

Command Orchestration

The other meaning is what I usually think of when I hear the word “orchestration”: getting a bunch of services to work nicely together in order to get some real work done.

We are talking about state changes. Netflix’s Conductor comes to mind here. ETL systems are great examples of systems that need this type of orchestration. These are the “writes” of the system.

Here are some tips that cover both types of orchestration.

1. Be Clear As Crystal

Whether you are composing or orchestrating services, people will think you are the source of the produced data.

You can insist all you want, “We are not the ones actually failing”—but this often falls on deaf ears. People leave and join; the information gets lost.

New people are constantly exploring your API and have no knowledge of what lies beneath. And when errors occur, they will attempt to attribute them to you. It’s human nature.

Error Messaging

However, you can combat this in how you message these errors. When an error (or any important change in the system) occurs, ensure its source is clear.

Include a URL that points to their API. Show the contact info for their support team. Give the consumer absolutely all the information they need to take up their concern with the owning service, instead of with you.

Keep in mind: you should still be supportive. Don’t reject your consumer’s cry for help just because you weren’t the direct cause of their problem!

If all else fails and they still want help from you, help them out. And while you’re at it, guide them to whom they should contact if they have trouble in the future.

2. Avoid Transformation

Remember: you are an orchestrator, a coordinator of data and functions. You are not a transformer. Stay out of the business of messing with other people’s schema.

This mostly applies to when you are composing data. Any data you expose from another service should be as close to its source schema as much as possible.

If you start transforming the data, you own it. And, as mentioned above, people will attribute errors and issues with it to you.

It’s alright, and often necessary, to ask these services to conform to some standard in order for you to orchestrate them

3. Keep It Small

Ensure the surface area of coupling between you and the services you orchestrate is small. Keep it to a few fields.

Ensure the rules to each schema are relatively simple. If possible, keep the interfaces consistent across all services, both in the requests and in the responses.

The more you are coupled to the surfaces of these services, the more painful and frequent the changes will be to your code.

Additionally, ensure you keep chattiness to a minimum between services. That means a small, infrequent surface area of network calls.

The more network calls that go back and forth between two services or between you and a service, the more pain you’ll have to deal with. You also will have highly latent communication.

This is usually a sign that the team or teams have badly designed the services.

4. Be Resilient

As the fallacies of distributed computing point out, working with distributed systems is unreliable at best. You have to be prepared for their failure and respond accordingly.

This often means retrying intermittent failures and circuit breaking permanent ones. You can cache slowly changing data and skip bad services if coordinating a chain of them.

If you think not only about the happy paths in your orchestration, but also about the paths of failure you will have a system that costs much less to maintain.

I also recommend putting queues into place so that failures and outages can be rerun quickly. Your systems will also be able to handle bursts of requests better when you can queue them into a buffer.

5. Use Known Patterns

There are a few patterns that have been developed over time to deal with orchestrating distributed services.

Using them instead of custom, one-off solutions will save you much pain and angst. Recommended ones include the Saga pattern, routing slips, and stateful workflows.

Each pattern works with a certain level of complexity. Study up and match the right patterns to your orchestration.

6. Make Your Orchestration Observable

Observability deserves its own post; it’s important to apply it to any distributed system. In a nutshell, observability is the ability to infer and troubleshoot what is happening in a distributed system by observing its external outputs.

This includes:

monitoring,
traces,
and logs.

When orchestrating distributed services, be sure to work in observability.

As mentioned above, you will encounter failure even while building in resiliency. Add correlation IDs and request IDs so you can trace requests throughout all the microservices you coordinate.

Log your units of work while you orchestrate, and your life will be much easier.

7. Prefer Choreography

Since there many risks to orchestrating microservices, it is prudent for you to limit your orchestration to places that need it.

There is another pattern, called choreography, where services communicate with each other in a publish-subscribe manner via events. This will keep coupling low and remove the need for an intelligent orchestrator.

Removing this need will make your system much simpler in most cases. It will also free you up to focus your orchestration on those services that are ill-suited for events or have high coupling to each other.

Tread Carefully While Orchestrating Microservices

There are many pitfalls that can arise when orchestrating microservices. You can confuse your consumers of data ownership. Intermittent failures can leak in and ruin the entire workflow.

You can significantly increase the latency of your system. But if you practice the tips listed here, you will avoid many of them.

You can orchestrate successfully, so long as you make sure to be careful. If you do, you will bring success for yourself, your team, and your consumers.