Let’s Learn Elastic Stack(Part 5) — Filebeat Architecture
Hello readers, If you’re a beginner to Elastic stack, I recommend you to read my previous blogs (Part 1) — Introduction, (Part 2) — Elasticsearch, (Part 3) — Logstash Architecture, (Part 4) — Kibana Architecture to get some prior knowledge before learning Filebeat.
The Elastic Stack is comprised of four components, Elasticsearch, Logstash, Kibana, and Beats. The last one is a family of log shippers for different use cases and Filebeat is the most popular.
Generally, the beats family are open-source lightweight data shippers that you install as agents on your servers to send operational data to Elasticsearch. Beats can send data directly to Elasticsearch or via Logstash, where you can further process and enhance the data (image). The beats Family consists of Filebeat, Metricbeat, Packetbeat, Winlogbeat, Auditbeat, Journalbeat, Heartbeat and Functionbeat. Each beat is dedicated to shipping different types of information — Winlogbeat, for example, ships Windows event logs, Metricbeat ships host metrics, and so forth.
Filebeat is designed to ship log files. It is a lightweight shipper for forwarding and centralizing log data. Installed as an agent on your servers, Filebeat monitors the log files or locations that you specify, collects log events, and forwards them either to Elasticsearch or Logstash for indexing.
At this point, we want to emphasize that Filebeat is not a replacement for Logstash, but it should be used together to take advantage of a unique and useful feature. Filebeat uses a backpressure-sensitive protocol when sending data to Logstash or Elasticsearch to account for higher volumes of data. If Logstash is busy processing data, it lets Filebeat know to slow down its read. Once the congestion is resolved, Filebeat will build back up to its original pace and keep on shipping.
How Filebeats works?
When you start Filebeat, one or more inputs begin searching for log data in the locations you’ve specified. Filebeat starts a harvester for each log it discovers. Every harvester scans a single log for new data, transmits the new log data to libbeat, which aggregates the events and delivers the aggregated data to the output that you’ve set up for Filebeat.
What is a harvester?
A harvester is responsible for reading the content of a single file. The harvester reads each file, line by line, and sends the content to the output. One harvester is started for each file. The harvester is responsible for opening and closing the file, which means that the file descriptor remains open while the harvester is running. If a file is removed or renamed while it’s being harvested, Filebeat continues to read the file.
Filebeat inputs
Filebeat inputs are responsible for locating specific files and applying basic processing to them. From this point, you can configure the path (or paths) to the file you want to track. Also, you can use additional configuration options such as the input type and the encoding to use for reading the file, excluding and including specific lines, adding custom fields, and more.
filebeat.inputs:
- type: log#Change value to true to activate the input configurationenabled: true
paths:
- “/var/log/apache2/*”
- “/var/log/nginx/*”
- “/var/log/mysql/*”
Filebeat processors
Filebeat can process and enhance the data before forwarding it to Logstash or Elasticsearch. This feature is not as good as Logstash, but it is useful. You can decode JSON strings, drop specific fields, add various metadata (e.g. Docker, Kubernetes), and more. Processors are defined in the Filebeat configuration file per input. You can define rules to apply your processing using conditional statements. Below is an example using the drop_fields processor for dropping some fields from Apache access logs:
filebeat.inputs:
- type: logpaths:
- "/var/log/apache2/access.log"
fields:apache: trueprocessors:
- drop_fields:
fields: ["verb","id"]
Filebeat output
Filebeat output defines where the data is going to be shipped. Most often you will use the Logstash or Elasticsearch output types, but you should know that there are other options such as Redis or Kafka. If you define a Logstash instance you can have advanced processing and data enhancement. Otherwise and if your data is well-structured you can ship directly to the Elasticsearch cluster. Also, you can define multiple outputs and use a load balancing option to balance the forwarding of data. For forwarding logs to Elasticsearch:
output.elasticsearch:hosts: ["localhost:9200"]
For forwarding logs to Logstash:
output.logstash:hosts: ["localhost:5044"]
Read — How does filebeat keep the state of files
Read — How does Filebeat ensure at-least-once delivery?
References
[1] https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-overview.html
[2] https://www.elastic.co/guide/en/beats/filebeat/current/how-filebeat-works.html
[3] https://developpaper.com/its-not-difficult-to-understand-filebeat-a-sharp-tool-for-log-collection/
[4] https://logstail.com/blog/what-is-filebeat-and-why-is-it-important/