Zalando Engineering Team Standardizes on Scalyr for Log Management   

Overview 

Zalando, Europe’s leading online fashion platform, made the transition to the cloud two years ago. As part of the move to AWS, they were looking for a log management tool that was flexible enough to fit their agile engineering culture, powerful enough to scale, and fast enough to allow them to investigate incidents. After evaluating several solutions, they standardized on Scalyr as their log management solution across their entire engineering team.

About Zalando

Zalando is Europe’s leading online fashion platform for women, men and children. They offer their customers a one-stop, convenient shopping experience with an extensive selection of fashion articles including shoes, apparel, and accessories, with free delivery and returns. Their assortment of almost 2,000 international brands ranges from popular global brands, fast fashion, and local brands, and is complemented by their private label products. Their localized offering addresses the distinct preferences of their customers in each of the 15 European markets they serve: Austria, Belgium, Denmark, Finland, France, Germany, Italy, Luxembourg, the Netherlands, Norway, Spain, Sweden, Switzerland, Poland and the United Kingdom.

Customer Challenges

Zalando transitioned to the cloud two years ago. They went from a monolith code base to microservices in the cloud, which changed their log management needs. They evaluated Scalyr along with three other solutions.

During their evaluation process, their evaluation criteria required:

  • An agent that can collect all the logs on every service
  • UI where engineers can search logs
  • Search specific applications
  • Ability to see every single log in the UI
  • Ability to scale
  • Would fit with the engineering culture of Radical Agility

After evaluating the four solutions, they narrowed it down to two to let the teams decide. They liked that with Scalyr it was easy to implement the agent and roll it out onto EC2 instances. They were able to define custom parsers for log lines.

The engineering culture at Zalando is built on Radical Agility. In order to empower their teams with autonomy, they need to automate everything around how they provision machines. This includes giving people the tools they need to do everything in a compliant way in their accounts. They found that the custom parsers were particularly important in giving each team flexibility to do things in their own way, which is a key pillar of the success of the engineering team.

Results of Using Scalyr

Scalyr is now deployed across the entire engineering team at Zalando. The main ways the team uses Scalyr are:

  • Respond to incidents and incident mitigation
  • Analysis of what’s happening on the service
  • Metrics for monitoring
  • Proactive investigations

They were able to get Scalyr up and running very fast. Once set up, their teams were enabled with access to their logs. They didn’t need to configure the agent and were able to instantly see their logs.

Given the number of autonomous services Zalando runs, they needed a coherent solution for how to get to the logs.

When asked how Scalyr has helped them, Tim Kröger, Head of Engineering – Visibility and Andreas Pfeiffer, Cloud and Network Architect, responded with it feels like asking how breathing helped you with your life.”

 

Before Scalyr, when an application crashed, the developer had to go to the log server, grab all the logs and find the host where the app was running. This would take at least 10 minutes. With Scalyr, developers can now deploy an application, get issues on the error, see the logs immediately, log into Scalyr, give the app ID and see all the logs from the deployment. They were able to go from 10 minutes of work to 13 seconds (which includes logging into Scalyr!).

Overall, Scalyr has helped Zalando make the transition to the cloud with its cloud log management and mitigated the risk or increasing errors while moving to AWS.