The Pitfalls of Free

Lynn the Kitten

That kitten or puppy is only a cute and cuddly baby for a few months. Free offers in technology tend to be a lot like a free kitten or puppy. It starts out small and sweet, but all too often it grows into a big problem that might be more trouble than it’s really worth. Now, I love cats, dogs, and open-source, but let’s be honest: both are major, long-term commitments. The number of times I have seen an engineer (me included) create a solution, support and maintain a solution, and then ultimately abandon that same solution for a shiny new one are too numerous to count on my fingers and toes. 

Open-source Elasticsearch is a great example of a wonderful product that can be challenging to grow and support long-term. I’ve tried on three separate occasions, and know it’s strengths and weaknesses first hand. While it comes with great power, the underlying system requires a lot of administrator TLC to configure and maintain, especially for the high-throughput needs of event data and observability use-cases.

The Promise

Elasticsearch promises “a distributed, RESTful search and analytics engine capable of addressing a growing number of use cases”. Using Lucene as a core technology, elasticsearch utilizes inverted indexing to ensure that queries run quickly. If you are dealing with a small dataset or a static dataset, Elasticsearch works really well. You will find that unless you are inserting 1,000s of records a second, it will generally do its job and return results predictably. 

I can tell you from first hand experience that Elasticsearch works well until you need 1,000s of writes a second. As scale increases so do the challenges.

Growth

Like our kittens and puppies, Elasticsearch clusters grow, and grow, and grow. Maybe it is because they embedded the word “elastic” in its name, but once created, the growth of a cluster is typically both vertical and horizontal. Weirdly, Elasticsearch isn’t very ‘elastic’. If you want to burst, you have to know and design that high water mark into the system beforehand, and pay for it. The fact that Elasticsearch can scale is a tribute to the technology, but scaling requires knowledge and acquiring knowledge can feel like trial by fire. 

One of the first trials of growth is sharding. A shard is a part of a dataset. In Elasticsearch, an index is divided into shards when it is created and to re-shard requires a scan and re-write of that index. Throughput of writes to an index is tightly coupled to the number of shards. The more shards, the more nodes participating with that dataset, and the faster you can write. If you have a small number of shards, you run the risk of not being able to ingest data fast enough. Too much data, too quickly, and your data nodes might crash with out-of-memory issues. It’s a dance that must be played constantly and it is exhausting, time consuming, and takes resource/expertise to manage.

As you right-size your shards, you will eventually want to add cluster capacity by adding new nodes. To benefit from the added capacity, the existing shards must know about the new capacity. I have continually seen teams wonder why their single index of 5 shards didn’t benefit from having a 20 node cluster. 

Can you figure this out eventually? Absolutely, but the amount of time learning the gotchas of Elasticsearch can be a huge distraction and a major source of anxiety. This is just one example. There are many. Is this how you want to spend your time, and/or your teams’? There may be more productive uses of all that brain and coding power. 

Mistakes were Made

As I mentioned, your cluster will grow and as it grows you will start to see your mistakes piling up. Missing data mappings on indexes can be a huge burden. Most people will not spend the time upfront to optimize their data for Elasticsearch. This means that your data will use the default mapping which tries to be everything to everyone at the expense of doubling your data storage. As data flows in and changes over time, you will pay again. Solving this can be tricky since you often need to preserve backwards compatibility because queries and visualizations may be built on those fields. You are left with a bunch of field aliases, or a bunch of broken dashboards. Worse case, you will see index conflicts that can result in data loss.

Another mistake of convenience is the general purpose node. Elasticsearch offers the ability to select node roles to locally optimize performance. Among those roles, you have the master role, the coordinator role and the all important data role. But wait, it gets even more fun. Within the data role, you can have data tiers as well, and these are important for ensuring local optimizations at the node level to save money. They also enable index life cycle so that indexes don’t stick around forever, since the more indexes you have, the more shards you have and the more data nodes you require. Infrastructure requires money and by now we have invested quite a lot in infrastructure and lots of valuable engineering time managing it.

Infamous Upgrades

A number of people I have spoken with have simply given up on upgrading. If you are diligent and can keep up, then upgrades can be managed, but the truth is that Elastic increments major versions frequently because they break compatibility frequently. This can put a major strain on a centralized observability team. Not only do you have to perform a rolling upgrade that can take a number of hours to do safely, but you also run the risk of breaking dashboards and applications that depend on a contract for query execution. A RESTful API with a weak contract is a rough sell. Most people learn this the hard way when it breaks.

Hidden Costs and Missing Features

Elastic wants you to buy X-Pack, which is its commercial license of features. Single sign-on, SSL and Watcher are some of the major features, but if you browse the subscription page, you will realize that a subscription is going to eventually be required or you will need to add extra tools to supplement. I want to love opensource. I’m an engineer, after all. I do love the people I’ve worked with in the past at Elasticsearch, but I no longer believe this is the best approach for most companies. I’ve tried three times, and each time I was smarter, but in the end the costs of open source outweigh the benefits. Even when you’re smart and experienced. 

This is the excruciating reality that we live in; open-source software is a tease. And unfortunately, it’s the worst kind of tease, because it starts out as a cute little kitten and grows into an overweight cat that needs a full-time vet (Sorry Lynn).

Lynn the Cat

I love opensource. I’m an engineer, after all. I do love all the people I’ve worked with in the past at Elastic.co, but I no longer believe in feature gapped open-source solutions. It entices you with free, fractures the community and often leads you to pay in some way.