Lesson 14 of15
In Progress

Intro to Splunk

cover6 January 15, 2020

Splunk is a Security Information and Event Management (SIEM) tool that ingests various types of machine data to allow analysts to quickly search, monitor, and visualize data.

It can index structured or unstructured data, perform real-time or historical processing and analysis, generate statistical reports, and create visualizations such as tables and charts. Splunk basically takes your machine or software data, typically found in log files, and makes everything searchable, indexable, and easily understandable. This is the essence of Splunk, but it also has many other features that have been added over time.

Machine data can make up as much as 90% of all data accumulated by an organization. But how can it be collected and organized in some useful way? This is where a SIEM like Splunk comes in.

There are typically three main components of a Splunk installation:

  • Forwarders – these are instances that reside on various source devices where they collect machine data that is sent to Splunk for indexing
  • Indexers – these accept incoming raw machine data from the forwarders that is then processed and organized into indexes of events
  • Search Head – this is normally where users interact with Splunk as they run search queries that request event data from the indexers

It’s really the indexers that are the heart of Splunk, they examine raw machine data coming in from the forwarders, attach sourcetype labels, and normalize timestamps; once this is done the resulting events are stored in an index to await search queries from users.

It’s easy to get started with Splunk using the sort of search methods you might use for a Google search. However, Splunk offers a powerful tool called Search Processing Language (SPL) that can sift through your log data and perform analytical operations to uncover relevant information for you to use. You may be familiar with the use of Structured Query Language (SQL) in relational database management, but with Splunk, there is no database and no schema. The power of Splunk and SPL comes from their ability to work with simple log files. In fact, Splunk can handle almost any text-based data.

Since there is no database on the backend to manage, Splunk is very easy to install and configure. It also scales efficiently, so if you have very large amounts of data to index, adding another Splunk server is simple. 

There are many ways to build your Splunk skills; one way to do it is through Boss of the SOC, aka BOTS. This is a capture-the-flag (CTF) game where you can improve your searching skills while attempting to solve various challenging puzzles. It requires a lot of patience and curiosity and a fair amount of experimentation, but if you are comfortable with a problem-solving approach to learning Splunk, you may want to give it a try.

The free trial version of Splunk is available for download and includes many of the main features of the full program, with the restriction that you can only index 500 MB of data per day