Three Techniques to Increase Scalyr Agent Upload Throughput

The Scalyr agent is typically able to handle incoming logs just fine with a default configuration,but if the log volume becomes too big, then the default configs may need adjusting to prevent the agent from being overwhelmed. 

In this article, we describe sets of configuration parameters of the Scalyr agent that you might want to use in your project to increase agent upload performance.  The first two options listed above are about increasing the number of sessions and the number of CPU cores processing logs to get more throughput. The third involves the use of Scalyr Teams, each with their own API key and associated workers.

Multiple sessions

In a default configuration, the Scalyr Agent creates a single session with the Scalyr servers to upload all collected logs. This typically performs well enough for most users, but at some point, the log generation rate can exceed a single session’s capacity (typically 2 to 8 MB/s depending on line size and other factors). In this case, modifying the agent configuration to open additional sessions with the Scalyr server (each session is assigned a subset of the log files to be uploaded) increases the number of sessions and increases the overall upload throughput.

The simplest approach to increase the number of sessions is to set/change the default_sessions_per_worker config option.

{
    "default_sessions_per_worker": 3, // use 3 sessions instead of 1.
    "api_key": "<you_key>",
    "logs": [
      {
        "path": "/var/log/app/*.log",
      },
    ]
}

Here all the matched log files will be distributed among three independent upload sessions. Each of them will upload only a subset of the log files, thereby reducing the load on a particular session.

Multiprocess sessions

Even if the multiple sessions are created, by default, they will still run within the same Python process, limiting their resources to a single CPU core (see: Global Interpreter Lock). This may become a problem when the number of log files is too big to be handled by a single CPU core, especially if additional features (such as sampling and redaction rules) are enabled.

This limitation can be addressed using the use_multiprocess_workers option in the agent configuration.

{
    "use_multiprocess_workers": true, // each session runs in its own process in a separate CPU core.
    "default_sessions_per_worker": 3,
    "api_key": "<you_key>",
    "logs": [
      {
        "path": "/var/log/app/*.log",
      },
    ]
}

Now each session will run in its own process and does not share the same CPU with other sessions.

Separation of log sources using Teams

A worker is a session or set of sessions that are configured to send logs by using the same api_key. In the previous configurations, we haven’t mentioned workers, only sessions, but those sessions are actually just a part of the default worker, which uses the API key from the configuration. 

Let’s imagine the situation where there are two teams of developers: “messaging” and “queue,” and each has its own Team in the company’s Scalyr account. The agent will upload some logs using the API key of the “messaging” team and upload some logs using the API key of the “queue” team. Each team will see only its own logs in its own team account.

If we want to send some logs using a different API key, then we have to create another worker with another API key.

{
    "api_key": "<messaging_team_api_key>", // the "messaging" team uses the API key which is used by the default worker.
    "workers": [
        {
            // Create a new worker for the "queue" team, which uses another API key.
            "api_key": "<queue_team_api_key>",
            "id": "queue",
        }
     ],
...

After a new worker is defined, we have to associate needed logs with it.

...
    "logs": [
      {
        "path": "/var/log/app/messaging/*.log", // those logs are uploaded to the "messaging" team's account because the "worker_id" field is omitted and the default worker is used.
      },
      {
        "path": "/var/log/app/queue/*.log" // those logs now are uploaded to the "queue" team's account.
        "worker_id": "queue_team_key" // refers to the <queue_team_api_key> worker.
      }
    ]
}

The full example:

{
    "api_key": "<messaging_team_api_key>",
    "workers": [
        {
            "api_key": "<queue_team_api_key>",
            "id": "queue",
        }
     ],
    "logs": [
      {
        "path": "/var/log/app/messaging/*.log",
      },
      {
        "path": "/var/log/app/queue/*.log"
        "worker_id": "queue_team_key"
      }
    ]
}

In summary, there are three ways to increase the throughput of the Scalyr agent uploading logs.  Two are independent of your use of the Scalyr Team structure to separate logs. These two are increasing the number of sessions and the number of CPU cores processing logs.  The third involves the use of Scalyr Teams, each with their own API key and associated workers.