Let’s Learn Elastic Stack(Part 2) — Elasticsearch Architecture

Isanka Rajapaksha
6 min readJun 3, 2022

--

Hello readers, If you’re a beginner to Elastic stack, I recommend you to read my previous blog Let’s Learn Elastic Stack (Part 1) — Introduction to get a basic idea on Elastic stack.

What is Elasticsearch?

Elasticsearch is a real-time distributed search and analytics engine with high availability. It is used for full-text search, structured search, analytics, or all three in combination. It is a schema-free, document-oriented data store which is built on top of the Apache Lucene library. Elasticsearch comes with simple REST APIs uses JSON over HTTP, which allows you to integrate, manage, and query index data in a variety of ways.

What Elasticsearch is not?

Elasticsearch is not a primary data store. Although it’s technically possible, there’s no guarantee that your data will be correct. Each document has a version number that increases monotonically. When two calls write to Elasticsearch, both will get written simultaneously, but only one will be the latest version. Out of the box, Elasticsearch does not support ACID transactions.

Elasticsearch Usages

Application search, Website search, Enterprise search — — You can use Elasticsearch to enable searches for all types of data and various locations. The engine ingests data from multiple locations, stores it, and indexes the data according to predefined manual or automated mapping.

Because Elasticsearch works with a distributed architecture, users can search and analyze massive volumes of data in near real-time. Additionally, Elasticsearch introduces scalability into the searching process, enabling you to start with just one machine and scale up to the hundreds.

Logging and log analytics — — Elasticsearch is commonly used for ingesting and analyzing log data in near-real-time and in a scalable manner. It also provides important operational insights on log metrics to drive actions.

Infrastructure metrics and container monitoring — — Many companies use the ELK stack to analyze various metrics. This may involve gathering data across several performance parameters that vary by use case.

Security analytics — — Another major analytics application of Elasticsearch is security analysis. Access logs and similar logs concerning system security can be analyzed with the ELK stack, providing a more complete picture of what’s going on across your systems in real-time.

Business analytics — — Many of the built-in features available within the ELK Stack makes it a good option as a business analytics tool.

Elasticsearch Architecture

Key concepts

To better understand how Elasticsearch works, let’s cover some basic concepts of how it organizes data and its backend components.

1. Elasticsearch Cluster

An Elasticsearch cluster is composed of a group of nodes that store data. You can specify the number of nodes that start running with the cluster, as well as the IP address of the virtual or physical server. You can specify this information in the config/elasticsearch.yml file, which contains all configuration settings.

Nodes in an Elasticsearch cluster are connected to each other, and each node contains a small chunk of cluster data. You can run as many clusters as needed. However, usually one node is sufficient. The system automatically creates a cluster when a new node starts. The nodes participate in the overall cluster processes in charge of searching and indexing.

2. Elasticsearch Node

In general, the term node refers to a server that works as part of the cluster. In Elasticsearch, a node is an instance — it is not a machine. This means you can run multiple nodes on a single machine. An Elasticsearch instance consists of one or more cluster-based nodes. By default, when an Elasticsearch instance starts, a node also starts running.

3. Index

Elasticsearch Indices are logical partitions of documents and can be compared to a database in the world of relational databases.

Continuing our e-commerce app example, you could have one index containing all of the data related to the products and another with all of the data related to the customers.

You can have as many indices defined in Elasticsearch as you want but this can affect performance. These, in turn, will hold documents that are unique to each index.

Indices are identified by lowercase names that are used when performing various actions (such as searching and deleting) against the documents that are inside each index.

4. Documents

Documents are JSON objects that are stored within an Elasticsearch index and are considered the base unit of storage. In the world of relational databases, documents can be compared to a row in a table.

In the example of our e-commerce app, you could have one document per product or one document per order. There is no limit to how many documents you can store in a particular index.

Data in documents is defined with fields comprised of keys and values. A key is the name of the field, and a value can be an item of many different types such as a string, a number, a boolean expression, another object, or an array of values.

Documents also contain reserved fields that constitute the document metadata such as _index, _type and _id.

5. Types

Elasticsearch types are used within documents to subdivide similar types of data wherein each type represents a unique class of documents. Types consist of a name and a mapping (see below) and are used by adding the _type field. This field can then be used for filtering when querying a specific type.

Types are gradually being removed from Elasticsearch. Starting with Elasticsearch 6, indices can have only one mapping type. Starting in version 7.x, specifying types in requests is deprecated. Starting in version 8.x, specifying types in requests will no longer be supported.

6. Mapping

Like a schema in the world of relational databases, mapping defines the different types that reside within an index. It defines the fields for documents of a specific type — the data type (such as string and integer) and how the fields should be indexed and stored in Elasticsearch.

A mapping can be defined explicitly or generated automatically when a document is indexed using templates. (Templates include settings and mappings that can be applied automatically to a new index.)

7. Shards

Index size is a common cause of Elasticsearch crashes. Since there is no limit to how many documents you can store on each index, an index may take up an amount of disk space that exceeds the limits of the hosting server. As soon as an index approaches this limit, indexing will begin to fail.

One way to counter this problem is to split up indices horizontally into pieces called shards. This allows you to distribute operations across shards and nodes to improve performance. You can control the amount of shards per index and host these “index-like” shards on any node in your Elasticsearch cluster.

8. Replicas

To allow you to easily recover from system failures such as unexpected downtime or network issues, Elasticsearch allows users to make copies of shards called replicas. Because replicas were designed to ensure high availability, they are not allocated on the same node as the shard they are copied from. Similar to shards, the number of replicas can be defined when creating the index but also altered at a later stage.

Search request from start to finish

Now that you know about clusters, nodes, indices, shards, and documents etc., let’s go over what happens when you make a search request to Elasticsearch.

When you send a request to the cluster, it first passes through a coordinating node. Every node in the cluster should know about the cluster state. Cluster state contains information about which node have which indices and shards.

Since this is a search request, it doesn’t matter if we read from a primary shard or a replica shard. Replica shards are chosen according to load balance. All distinct shards within an index must have the search request routed to it. Each shard will return top results (defaulting to 10) and send them back to coordinator. The coordinator will then merge these results together to get the top global results, which it then returns to the user.

Parallel Concepts Between Elasticsearch and Databases

An index is like a database as it lets users search across many different types of documents; it can help you silo off information or organize it. For instance, if you have US data and UK data, indices make it really easy to limit your searches to one region. When you want to explicitly search across multiple regions, there’s syntax that makes that query equally simple.

Documents are JSON objects that comprise the results that Elasticsearch is searching for.

Resources

[1] https://www.elastic.co/what-is/elasticsearch

[2] https://www.knowi.com/blog/what-is-elastic-search/

[3] https://buildingvts.com/elasticsearch-architectural-overview-a35d3910e515

--

--