The topic for this month’s T-SQL Tuesday #171 hosted by the great and powerful Brent Ozar (Blog) is about our most recent closed issue.
Learning KQL and ADX
Over the past few months I have been learning and working more with ADX and KQL. These were new technologies to me. I’ve been interested in the world of NoSQL and what a database would look like with being an RDBMS held to the principles of Edgar Codd. About 10 years ago I did some self-learning while consulting about the newer tech stack at that time – technologies like Hadoop, Hive, Hbase, pig, etc. Those technologies were interesting but too much in the beta-stage. In order to achieve mass adoption, they needed to mature.
Recently, I had the opportunity to work with 2 new technologies on the Microsoft data platform.
Azure Data Explorer (ADX) is a big data analytics platform for storing and querying massive amounts of information. You can use structured / semi-structured data e.g. XML, JSON or unstructured data like heaps of text files e.g. log files. It’s an impressive specimen that can ingest and process huge swaths of data very fast.
Kusto Query Language (KQL) is the main query language for ADX. KQL is powerful and simple. The principle of piping from shell-scripting (ex. PowerShell) is the fundamental way data flows inside a query. You build a pipeline of data that rolls forward and you process. The output from one statement is the input for another until you get the data you want they way you want to see it.
The Problem
My last task was to query a massive table to extract errors and warnings from a component of the software product and parse the logs to make sense of what happened. Sounds easy but when the data is totally unstructured you end up having to parse tons of text. We don’t often appreciate the structure in RDBMS like SQL Server until it is taken away.
The Solution
The first problem was that although the Microsoft documentation on KQL is decent, there isn’t a lot of blogs or things to reference to do anything other than the most basic things. I had read the few bloggers who write about ADX and KQL along with the few YouTube videos on it. While they are good at explaining very basic tasks, I wasn’t able to find much for more advanced challenges.
I had to re-learn and write a bunch of regular expressions in KQL to parse Windows Server Event Log Viewer files. I needed to extract parts of the error message for alerting and presentation but not a lot of the noise. For the micro-service I was investigating there were many different types of errors all formatted their own special way. I am reminded that there are groups of developers who feel no sense of duty to standardize their logging for the poor soul who must make sense of the crime scene that their service bailed on.
Conclusion
ADX is an awesome technology and KQL is a simple and easy to use query language for querying monsterously big databases in lightning speed. Regular expressions are just how I remembered them from earlier in my career and school. The whole thing reminds me of coding in Perl – write once, read never.
Thanks for reading!
[…] Extracting Errors from Unstructured Text Logs – Jeff Mlakar had to figure out the aftermath from a “crime scene” (I love that expression) and used regular expressions in KQL to do it. […]
[…] T-SQL Tuesday #171 – Describe the Most Recent Issue You Closed InfoSec Design Principles – 8 Security Principles To Implement […]
[…] Describe the last project or thing the team worked on – gives a good feel for what work actually might be done that might deviate from the job advertisement. Here’s an example. […]