Let’s Learn Elastic Stack (Part 3) — Logstash Architecture

5 min readJun 21, 2022

Hello readers, If you’re a beginner to Elastic stack, I recommend you to read my previous blogs (Part 1) — Introduction, (Part 2) — Elasticsearch to get some prior knowledge before learning Logstash.

In the Elastic Stack (Elasticsearch, Logstash, Kibana and Beats), the crucial task of parsing data is given to the Logstash.

Logstash began as an open source program designed to handle the streaming of massive amounts of log data from a variety of sources. Following its integration into the Elastic Stack, it evolved into the stack’s workhorse, processing log messages, improving and massaging them, and then dispatching them to a determined storage destination (stashing).

Logstash processes logs from different servers and data sources and it behaves as the shipper. The shippers are used to collect the logs and these are installed in every input source. A shipper is an instance of Logstash installed in the server, which accesses the server logs and sends to specific output location.

Logstash Internal Architecture

Logstash processes and aggregates events in three stages: collection, processing, and dispatching. Which data is collected, how it is processed and where it is sent to, is defined in a Logstash configuration file that defines the pipeline. A Logstash pipeline has two required elements; input, output, and, optionally, filters.

Each of these stages is defined in the Logstash configuration file with what are called plugins — “Input” plugins for the data collection stage, “Filter” plugins for the processing stage, and “Output” plugins for the dispatching stage.

Logstash receives the logs using input plugins and then uses the filter plugins to parse and transform the data. The parsing and transformation of logs are performed according to the systems present in the output destination. Logstash parses the logging data and forwards only the required fields. Later, these fields are transformed into the destination system’s compatible and understandable form.

Input plugins

Logstash’s ability to combine logs and events from many sources is one of its most significant features. Logstash can be configured to collect and analyze data from a variety of platforms, databases, and applications, and then transmit it to other systems for storage and analysis using more than 50 input plugins.
File, beats, syslog, http, tcp, udp, and stdin are the most popular inputs, but data can also be ingested from a variety of other sources.

The input plugin is specified in the configuration file’s input section. Each plugin has its own set of configuration settings, which you should familiarize yourself with before implementing. You can refer to this document to check out the available Elastic supported input plugins.

Example:

input {
  http {
    id => "my_plugin_id"
    host => "0.0.0.0"   
    port => "8080"
  }
}

Here we are using the http input plugin. Add a unique ID to the plugin configuration. If no ID is specified, Logstash will generate one. The host or ip to bind and the port is the TCP port to bind to.

Filter plugins

Logstash uses filters in the middle of the pipeline between input and output. Logstash supports a number of extremely powerful filter plugins that enable you to enrich, manipulate, and process logs. Because of the strength of these filters, Logstash is an extremely useful and adaptable tool for parsing log data.

Filters can be combined with conditional statements to perform an action if a specific criterion is met. The most common inputs used are: grok, date, mutate, drop. You can read more about filters in here.

The filter section in the configuration file specifies which filter plugins, or more precisely, what processing, should be applied to the logs. Before using, you should familiarize yourself with each plugin’s unique configuration settings.

Example:

filter {
   grok {
      match => [ "message", "%{LOGLEVEL:loglevel} -
      %{NOTSPACE:taskid} - %{NOTSPACE:logger} -  
      %{WORD:label}( - %{INT:duration:int})?" ]
   }
   date { 
      match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] 
   }   
   geoip { 
      source => "clientip" 
   }
}

In this example we are processing Apache access logs are applying:

A grok filter that parses the log string and populates the event with the relevant information.
A date filter to parse a date field which is a string as a timestamp field (each Logstash pipeline requires a timestamp so this is a required filter).
A geoip filter to enrich the clientip field with geographical data. Using this filter will add new fields to the event (e.g. countryname) based on the clientip field.

Output plugins

Output is the last stage in Logstash pipeline, which send the filter data from input logs to a specified destination. Similar to the input plugins, Logstash offers a variety of output plugins that let you push your data to other platforms, services, and locations. You can use outputs like File, CSV, and S3 to store events, convert them into messages with RabbitMQ and SQS, or send them to a number of other services like HipChat, PagerDuty, or IRC. Logstash is a very flexible event transformer because of the variety of input and output configurations available.

Because logstash events might arrive from a variety of places, it’s critical to double-check if an event should be processed by a specific output. If you don’t specify an output, Logstash will use the default of stdout. Multiple output plugins can be used to process an event.

The output section of the configuration file specifies the location to which the logs should be sent. Each plugin, as before, has its own set of configuration settings, which you should familiarize yourself with before using. You can read more information about output plugins in here.

Example:

output {
   elasticsearch {
      hosts => ["127.0.0.1:9200"]
   }
}

In this example, we are defining a locally installed instance of Elasticsearch.

References

[1] https://static.packt-cdn.com/products/9781787281868/graphics/7a01ea40-7ddc-44a8-a1e9-b8214703872c.jpg

[2] https://www.elastic.co/logstash/

[3] https://www.tutorialspoint.com/logstash/index.htm

[4] https://logz.io/learn/complete-guide-elk-stack/#intro