The extract function in Kusto Query Language (KQL) is used to retrieve specific parts of a string based on a pattern. It allows you to pull out specific information from a text or string column by using regular expressions. This function is very helpful when you want to isolate parts of a string, such as extracting email addresses, IP addresses, or other patterns from a column that contains mixed or formatted data.
Concept of extract
- The
extractfunction lets you pull specific substrings from a larger string by specifying a regular expression pattern. - This is useful for parsing strings in columns that contain complex or structured text data, such as URLs, logs, or JSON-like formats.
- Regular expressions (regex) allow you to define patterns to match specific text, making
extracta powerful tool for text manipulation in KQL.
Syntax
The syntax for the extract function is as follows:
extract("<regex_pattern>", <capture_group>, <column>)
<regex_pattern>: The regular expression pattern that specifies what to extract.<capture_group>: The number of the capture group in the regex pattern. Capture groups are portions of the pattern enclosed in parentheses()that specify what part of the pattern to extract.- The capture group is numbered starting at
1. <column>: The name of the column containing the text you want to parse.
Example Syntax
extract(@"\b(\d{3})\b", 1, SomeColumn)
In this example:
\b(\d{3})\bis a regex pattern that matches any sequence of three digits surrounded by word boundaries.1specifies that we want the first capture group, which is the sequence of digits.SomeColumnis the column in which we are searching.
Usage and Examples
Let’s go over some examples to see how the extract function works in practical scenarios.
Example 1: Extracting Area Codes from Phone Numbers
Suppose you have a table Contacts with a column PhoneNumber that stores phone numbers in the format (123) 456-7890. If you want to extract the area code (the first three digits in parentheses), you can use extract as follows:
Contacts
| extend AreaCode = extract(@"\((\d{3})\)", 1, PhoneNumber)
- Explanation: The pattern
\((\d{3})\)matches the area code. Here: \(and\)match the literal parentheses around the area code.(\d{3})matches a sequence of three digits inside the parentheses.- Result: The
AreaCodecolumn will contain the extracted area codes, such as123.
Example 2: Extracting Email Domains
If you have a UserLogs table with an Email column and want to extract the domain from each email (e.g., example.com from user@example.com), you can use the following query:
UserLogs
| extend Domain = extract(@"@([a-zA-Z0-9.-]+)", 1, Email)
- Explanation: The pattern
@([a-zA-Z0-9.-]+)captures everything after the@symbol. @matches the@symbol.([a-zA-Z0-9.-]+)captures the domain, including letters, numbers, dots, and hyphens.- Result: The
Domaincolumn will contain the domain part of each email, such asexample.com.
Example 3: Extracting IP Address Components
Suppose you have a table NetworkLogs with an IPAddress column, and each IP address is in the format 192.168.1.1. To extract just the first two parts of the IP (e.g., 192.168), you could do:
NetworkLogs
| extend IPPrefix = extract(@"^(\d+\.\d+)", 1, IPAddress)
- Explanation: The pattern
^(\d+\.\d+)captures the first two parts of the IP. ^matches the start of the string.(\d+\.\d+)captures the first two sets of digits separated by a period.- Result: The
IPPrefixcolumn will contain just the first two parts of the IP, such as192.168.
Example 4: Extracting Timestamps from Log Messages
Imagine you have a Logs table with a Message column that includes timestamps embedded in text, like Started at 2023-10-05 14:45:30. You can extract just the date and time using:
Logs
| extend Timestamp = extract(@"(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})", 1, Message)
- Explanation: The pattern
(\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2})matches the date and time formatYYYY-MM-DD HH:MM:SS. \d{4}matches the year (four digits).\d{2}-\d{2}matches the month and day, each with two digits.\d{2}:\d{2}:\d{2}matches the hour, minute, and second.- Result: The
Timestampcolumn will contain just the extracted timestamp, such as2023-10-05 14:45:30.
Example 5: Extracting Status Codes from URLs
If you have a table called WebLogs with a column URL containing URLs like /status/404/error, and you want to extract just the status code (404), you can use the following:
WebLogs
| extend StatusCode = extract(@"/status/(\d+)", 1, URL)
- Explanation: The pattern
/status/(\d+)looks for/status/followed by one or more digits. /status/matches the literal string/status/.(\d+)captures the status code.- Result: The
StatusCodecolumn will contain the status code, such as404.
- The
extractfunction in KQL helps you extract specific parts of a string based on a pattern. - It’s especially useful for handling data with structured text where you only need specific components, such as dates, IP addresses, email domains, or other patterns.
- Regular expressions allow you to define flexible patterns to match the exact part of the string you want to extract.
The extract function is a powerful tool for string manipulation, enabling you to parse and retrieve only the information you need from complex text data.
