PowerQueries provide a useful way to analyze data. Scalyr makes building tables and computations of vast amounts of data easy. In this post we are going to focus on a few techniques to produce more useful reports: parsing from fields, calculating time deltas, enriching and classifying your data, and adding summary rows and columns.
If you are unfamiliar with PowerQueries, I would recommend going through the documentation first. We also have a series of quickstart videos. Below are some tips and tricks for people relatively comfortable with basic PowerQueries.
Parse from Fields
Parsing is a key part of the data ingestion pipeline in Scalyr, but what if your data did not parse properly, or a parser was not preset for your logs? PowerQueries allow us to parse on the fly.
Let’s take a simple example of a file path on a Widows machine
$serverHost in ("Server 2008") logfile = * |columns serverHost, logfile |group count() by logfile
Let’s say that we want to extract data from the
We can do so with the
In this dataset, let’s extract the drive letter and the logfile name. To do so we will write a regular expression in the parse command. The regular expression (make sure to double-escape your backslashes), will take the value before the
C:\ and will take the value before the
$serverHost in ("Server 2008") logfile = * |columns serverHost, logfile |group count = count() by logfile |parse "$drive$:(\\\\[^\\\\]+)+\\\\$file$\\.\\w+" from logfile
Now we can perform functions on our new fields
$serverHost in ("Server 2008") logfile = * |columns serverHost, logfile |group count = count() by logfile |parse "$drive$:(\\\\[^\\\\]+)+\\\\$file$\\.\\w+" from logfile |group count = count() by drive, file |columns file, drive
Did you know that you can perform conditional logic in PowerQueries using the ternary operators? This gives us the opportunity to specify more readable labels, or classify our data into named buckets.
Programmers use ternary operators in languages like Python for decision making in place of conditional statements if and else. The ternary operator takes three arguments: The first argument is a comparison argument, the second is the result upon a true comparison, and the third is the result upon a false comparison. If it helps, you can think of the operator as shortened way of writing an if-else statement.
Here’s a simple decision-making example using if and else statements in Python:
import re hosts = ["server-prod", "server-dev", "server-qa"] value = [100000000,110000000,140000000] env =  score =  for i in range(len(hosts)): #evaluate host if re.search("prod", hosts[i]): env.append("prod") elif re.search("dev", hosts[i]): env.append("dev") else: env.append("qa") #evaluate score if value[i] > 130000000: score.append("High") else: score.append("Low") print(hosts) print(value) print(env) print(score)
We can do the same in PowerQueries
tag='logVolume' metric='logBytes' |group value = sum(value) by host |sort -value |columns value, host, score = (value > 130000000) ? "high" : "low", env = (host matches "prod") ? "Prod" : (host matches "sb") ? "sb" : "qa"
In other words, we are bucketing each row in to a
low based on
value, and specifying a value for
env based on the
PowerQueries can be utilized in a multitude of applications. It is a great way to perform SQL-like queries to large unstructured data sets. If you would like to view more about PowerQueries, check out our public documentation.