An Elasticsearch Tutorial: Getting Started

Elasticsearch is the living heart of what is today’s the most used log analytics platform ELK Stack (Elasticsearch, Logstash and Kibana). Elasticsearch’s role is so clear that it has become synonymous with the name of the stack itself.

Primarily for search and log analysis, Elasticsearch is today one of the most popular database systems available today. This Elasticsearch tutorial provides new users with the prerequisite knowledge and tools to start using Elasticsearch. It includes installation instructions, and initial indexing and data handling instructions.

What is Elasticsearch?

Initially released in 2010, Elasticsearch (ES) is a search and analytics engine which is based on Apache Lucene. Completely open source and built with Java.

Elasticsearch is a NoSQL database. That means it stores data in an unstructured way and that you cannot use SQL to query it.

This Elasticsearch initiation representation could also be considered a NoSQL tutorial. However, unlike most NoSQL databases, Elasticsearch has a strong focus on search capabilities , in fact, that the easiest way to get data from ES is to search for it using the extensive Elasticsearch API.

In the context of data analysis, Elasticsearch is used together with the other components in the ELK Stack, Logstash and Kibana, and plays the role of data indexing and storage.

Let’s start by Installing Elasticsearch

The requirements for Elasticsearch are simple: Java 8 :) .

You can download Elasticsearch as a standalone distribution or install it using the and repositories. We will install Elasticsearch on an Ubuntu 16.04 machine running on AWS EC2 using .

First, you need to add Elastic’s signing key so you can verify the downloaded package (skip this step if you’ve already installed packages from Elastic):

wget -qO - | sudo apt-key add -

For Debian, we need to then install the package:

sudo apt-get install apt-transport-https

The next step is to add the repository definition to your system:

echo "deb stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-6.x.list

All that’s left to do is to update your repositories and install Elasticsearch:

sudo apt-get update
sudo apt-get install elasticsearch

Configuring Elasticsearch

Elasticsearch configurations are done using a configuration file whose location depends on your operating system. In this file, you can configure general settings like node name,security settings .. , as well as network settings (e.g. host and port), where data is stored, memory, log files, and more.

For development and testing purposes, the default settings will suffice yet it is recommended you do some research into what settings you should manually define before going into production.

For example, and especially if installing Elasticsearch on the cloud, it is a good best practice to bind Elasticsearch to either a private IP or localhost:

sudo vim /etc/elasticsearch/elasticsearch.yml "localhost"

Running Elasticsearch

Elasticsearch will not run automatically after installation and you will need to manually start it. How you run Elasticsearch will depend on your specific system. On most Linux and Unix-based systems you can use this command:

sudo service elasticsearch start

And that’s it! To confirm that everything is working fine, simply point curl or your browser to , and you should see something like the following output:

"name" : "33QdmXw",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "mTkBe_AlSZGbX-vDIe_vZQ",
"version" : {
"number" : "6.1.2",
"build_hash" : "5b1fea5",
"build_date" : "2018-01-10T02:35:59.208Z",
"build_snapshot" : false,
"lucene_version" : "7.1.0",
"minimum_wire_compatibility_version" : "5.6.0",
"minimum_index_compatibility_version" : "5.0.0"
"tagline" : "You Know, for Search"

To debug the process of running Elasticsearch, use the Elasticsearch log files located (on Deb) in .

Creating an Elasticsearch Index

Indexing is the process of adding data to Elasticsearch. This is because when you feed data into Elasticsearch, the data is placed into Apache Lucene indexes. This makes sense because Elasticsearch uses the Lucene indexes to store and retrieve its data. Although you do not need to know a lot about Lucene, it does help to know how it works when you start getting serious with Elasticsearch.
Elasticsearch behaves like a REST API, so you can use either the or the method to add data to it. You use when you know the or want to specify the of the data item, or if you want Elasticsearch to generate an for the data item:

curl -XPOST 'localhost:9200/logs/my_app' -H 'Content-Type: application/json' -d'
"timestamp": "2018-01-24 12:34:56",
"message": "User logged in",
"user_id": 4,
"admin": false
curl -X PUT 'localhost:9200/app/users/4' -H 'Content-Type: application/json' -d '
"id": 4,
"username": "john",
"last_login": "2018-01-25 12:34:56"

And the response:


The data for the document is sent as a JSON object. You might be wondering how we can index data without defining the structure of the data. Well, with Elasticsearch, like with any other NoSQL database, there is no need to define the structure of the data beforehand. To ensure optimal performance, though, you can define Elasticsearch mappings according to data types.

To see a list of your Elasticsearch indices, use:

curl -XGET 'localhost:9200/_cat/indices?v&pretty'
health status index uuid pri rep docs.count docs.deleted store.size
yellow open logstash-2018.01.23 y_-PguqyQ02qOqKiO6mkfA 5 1 17279 0 9.9mb 9.9mb
yellow open app GhzBirb-TKSUFLCZTCy-xg 5 1 1 0 5.2kb 5.2kb
yellow open .kibana Vne6TTWgTVeAHCSgSboa7Q 1 1 2 0 8.8kb 8.8kb
yellow open logs T9E6EdbMSxa8S_B7SDabTA 5 1 1 0 5.7kb 5.7kb

The list in this case includes the indices we created above, a Kibana index and an index created by a Logstash pipeline.

Elasticsearch Querying

Once you index your data into Elasticsearch, you can start searching and analyzing it. The simplest query you can do is to fetch a single item.

Once again, via the Elasticsearch REST API, we use :

curl -XGET 'localhost:9200/app/users/4?pretty'

And the response:

"_index" : "app",
"_type" : "users",
"_id" : "4",
"_version" : 1,
"found" : true,
"_source" : {
"id" : 4,
"username" : "john",
"last_login" : "2018-01-25 12:34:56"

The fields starting with an underscore are all meta fields of the result. The object is the original document that was indexed.
We also use GET to do searches by calling the endpoint:

curl -XGET 'localhost:9200/_search?q=logged'
"timestamp": "2018-01-24 12:34:56",
"message": "User logged in",
"user_id": 4,
"admin": false

The result contains a number of extra fields that describe both the search and the result. Here’s a quick rundown:

  • : The time in milliseconds the search took
  • : The number of Lucene shards searched, and their success and failure rates
  • : The actual results, along with meta information for the results

The search we did above is known as a URI Search, and is the simplest way to query Elasticsearch. By providing only a word, ES will search all of the fields of all the documents for that word.

There are many other ways to search including the use of boolean logic, the boosting of terms….

Elasticsearch Query DSL

URI searches are just the beginning. Elasticsearch also provides a request body search with a Query DSL for more advanced searches. There is a wide array of options available in these kinds of searches, and you can mix and match different options to get the results that you require.

It contains two kinds of clauses: 1) leaf query clauses that look for a value in a specific field, and 2) compound query clauses (which might contain one or several leaf query clauses).

Elasticsearch Query Types

There is a wide array of options available in these kinds of searches, and you can mix and match different options to get the results that you require. Query types include:

  1. Geo queries,
  2. “More like this” queries
  3. Scripted queries
  4. Full text queries
  5. Shape queries
  6. Span queries
  7. Term-level queries
  8. Specialized queries

As of Elasticsearch 6.8, the ELK Stack has merged Elasticsearch queries and Elasticsearch filters, but ES still differentiates them by context. The DSL distinguishes between a filter context and a query context for query clauses. Clauses in a filter context test documents in a boolean way: Does the document match the filter, “yes” or “no?”

Filters are also generally faster than queries, but queries can also calculate a relevance score according to how closely a document matches the query. Filters do not use a relevance score. This determines the ordering and inclusion of documents:

curl -XGET 'localhost:9200/logs/_search?pretty' -H 'Content-Type: application/json' -d'
"query": {
"match_phrase": {
"message": "User logged in"

And the result:

"took" : 28,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"skipped" : 0,
"failed" : 0
"hits" : {
"total" : 1,
"max_score" : 0.8630463,
"hits" : [
"_index" : "logs",
"_type" : "my_app",
"_id" : "ZsWdJ2EBir6MIbMWSMyF",
"_score" : 0.8630463,
"_source" : {
"timestamp" : "2018-01-24 12:34:56",
"message" : "User logged in",
"user_id" : 4,
"admin" : false

Creating an Elasticsearch Cluster

Maintaining an Elasticsearch cluster can be time-consuming, But, given Elasticsearch’s powerful search and analytic capabilities, such clusters are indispensable.

What is an Elasticsearch cluster, precisely? Elasticsearch clusters group multiple Elasticsearch nodes and/or instances together. Of course, you can always choose to maintain a single Elasticsearch instance or node inside a given cluster. The main point of such a grouping lies in the cluster’s distribution of tasks, searching, and indexing across its nodes. Node options include data nodes, master nodes, client nodes, and ingest nodes.

Installing nodes can involve a lot of configurations, But here’s the basic Elasticsearch cluster node installation:

First , install Java:

sudo apt-get install default-jre

Next, add Elasticsearch’s sign-in key:

wget -qO - | sudo apt-key add -

Next, install the new iteration of Elasticsearch:

sudo apt-get update && apt-get install elasticsearch

You will have to create and/or set up each Elasticsearch node’s own config file ().

From there, start Elasticsearch and then check your Elasticsearch cluster status. Responses will look something like this:

"cluster_name" : "elasticsearch-cluster-demo",
"compressed_size_in_bytes" : 255,
"version" : 7,
"state_uuid" : "50m3ranD0m54a531D",
"master_node" : "IwEK2o1-Ss6mtx50MripkA",
"blocks" : { },
"nodes" : {
"m4-aw350m3-n0D3" : {
"name" : "es-node-1",
"ephemeral_id" : "x50m33F3mr--A11DnuM83r",
"transport_address" : "",
"attributes" : { }

Elasticsearch cluster health will be next on your list. Periodically check your cluster’s health with the following API call:

curl -X GET "localhost:9200/_cluster/health?wait_for_status=yellow&local=false&level=shards&pretty"

This example shows the parameter as , (which is actually by default). This will show you the status of the master node. To check the local node, change to .

The level parameter will, by default, show you cluster health, but ranks beyond that include and .

that’s it for this article , hope you enjoyed reading it ❤❤❤

A passionate Full Stack Software SpringBoot/Angular Developer 🚀 having an experience of building Web applications with Spring boot / Angular / Postgres / Elast