Elk stack

[box]The elk stack is a collection of three open-source products — ElasticsearchLogstash, and Kibana — all developed, managed and maintained by Elastic. Logstash collects and parses logs, and then Elasticsearch indexes and stores the information. Kibana then presents the data in visualizations that provide actionable insights into one’s environment.[/box]

[box] Elk Concepts [/box]

Elasticsearch

Elasticsearch is a search engine based on the Lucene library. It provides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents . Elasticsearch is developed in Java.

The speed and scalability of Elasticsearch and its ability to index many types of content mean that it can be used for a number of use cases:

  • Application search
  • Website search
  • Enterprise search
  • Logging and log analytics
  • Infrastructure metrics and container monitoring
  • Application performance monitoring
  • Geospatial data analysis and visualization
  • Security analytics
  • Business analytics

Raw data flows into Elasticsearch from a variety of sources, including logs, system metrics, and web applications. Data ingestion is the process by which this raw data is parsed, normalized, and enriched before it is indexed in Elasticsearch. Once indexed in Elasticsearch, users can run complex queries against their data and use aggregations to retrieve complex summaries of their data. From Kibana, users can create powerful visualizations of their data, share dashboards, and manage the Elastic Stack.

An Elasticsearch index is a collection of documents that are related to each other. Elasticsearch stores data as JSON documents. Each document correlates a set of keys (names of fields or properties) with their corresponding values (strings, numbers, Booleans, dates, arrays of values, geolocations, or other types of data).

Elasticsearch uses a data structure called an inverted index, which is designed to allow very fast full-text searches. An inverted index lists every unique word that appears in any document and identifies all of the documents each word occurs in.

During the indexing process, Elasticsearch stores documents and builds an inverted index to make the document data searchable in near real-time. Indexing is initiated with the index API, through which you can add or update a JSON document in a specific index.

Logstash

Logstash, one of the core products of the Elastic Stack, is used to aggregate and process data and send it to Elasticsearch. Logstash is an open source, server-side data processing pipeline that enables you to ingest data from multiple sources simultaneously and enrich and transform it before it is indexed into Elasticsearch.

Kibana

Kibana is a data visualization and management tool for Elasticsearch that provides real-time histograms, line graphs, pie charts, and maps. Kibana also includes advanced applications such as Canvas, which allows users to create custom dynamic infographics based on their data, and Elastic Maps for visualizing geospatial data.

Why use Elasticsearch?

Elasticsearch is fast. Because Elasticsearch is built on top of Lucene, it excels at full-text search. Elasticsearch is also a near real-time search platform, meaning the latency from the time a document is indexed until it becomes searchable is very short — typically one second. As a result, Elasticsearch is well suited for time-sensitive use cases such as security analytics and infrastructure monitoring.

Elasticsearch is distributed by nature. The documents stored in Elasticsearch are distributed across different containers known as shards, which are duplicated to provide redundant copies of the data in case of hardware failure. The distributed nature of Elasticsearch allows it to scale out to hundreds (or even thousands) of servers and handle petabytes of data.

Elasticsearch comes with a wide set of features. In addition to its speed, scalability, and resiliency, Elasticsearch has a number of powerful built-in features that make storing and searching data even more efficient, such as data rollups and index lifecycle management.

The Elastic Stack simplifies data ingest, visualization, and reporting. Integration with Beats and Logstash makes it easy to process data before indexing into Elasticsearch. And Kibana provides real-time visualization of Elasticsearch data as well as UIs for quickly accessing application performance monitoring (APM), logs, and infrastructure metrics data.

[box] Installation[/box]

When installing the Elastic Stack, you must use the same version across the entire stack. For example, if you are using Elasticsearch 7.9.0, you install Beats 7.9.0, APM Server 7.9.0, Elasticsearch Hadoop 7.9.0, Kibana 7.9.0, and Logstash 7.9.0.

Installing elasticsearch

$ sudo rpm –import https://artifacts.elastic.co/GPG-KEY-elasticsearch

$ sudo vi /etc/yum.repos.d/elasticsearch.repo

   [elasticsearch-6.x]
    name=Elasticsearch repository for 6.x packages
    baseurl=https://artifacts.elastic.co/packages/6.x/yum
    gpgcheck=1
    gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
    enabled=1
    autorefresh=1
    type=rpm-md

$ sudo yum install elasticsearch

$ sudo vi /etc/elasticsearch/elasticsearch.yml

    . . .
    cluster.name: behnam
    node.name: node1
    . . .
    network.host: localhost
    . . .

A node can only join a cluster when it shares its cluster.name with all the other nodes in the cluster. The default name is elasticsearch, but you should change it to an appropriate name which describes the purpose of the cluster.
Make sure that you don’t reuse the same cluster names in different environments, otherwise you might end up with nodes joining the wrong cluster.

Elasticsearch uses node.name as a human readable identifier for a particular instance of Elasticsearch so it is included in the response of many APIs. It defaults to the hostname that the machine has when Elasticsearch starts but can be configured explicitly in elasticsearch.yaml.
By default, Elasticsearch binds to loopback addresses only — e.g. 127.0.0.1 and [::1]. This is sufficient to run a single development node on a server.
In order to form a cluster with nodes on other servers, your node will need to bind to a non-loopback address. While there are many network settings, usually all you need to configure is network.host .

[button link=”https://www.elastic.co/guide/en/elasticsearch/reference/current/important-settings.html” color=”orange” newwindow=”yes”] Read more about Elasticsearch configuration[/button]

$ sudo systemctl enable elasticsearch

$ sudo systemctl start elasticsearch

$ curl -X GET “localhost:9200

Installing kibana

$ sudo yum install kibana

$ sudo systemctl enable kibana

$ sudo systemctl start kibana

Configuring Reverse Proxy

Because Kibana is configured to only listen on localhost, we must set up a reverse proxy to allow external access to it.

$ echo “kibanaadmin:`openssl passwd -apr1`” | sudo tee -a /etc/nginx/htpasswd.users

$ sudo vi /etc/nginx/conf.d/example.com.conf

  server {
    listen 80;

         server_name example.com www.example.com;

        auth_basic “Restricted Access”;
        auth_basic_user_file /etc/nginx/htpasswd.users;

       location / {
             proxy_pass http://localhost:5601;
             proxy_http_version 1.1;
             proxy_set_header Upgrade $http_upgrade;
             proxy_set_header Connection ‘upgrade’;
             proxy_set_header Host $host;
             proxy_cache_bypass $http_upgrade;
    }
  }
$ sudo systemctl restart nginx

Installing logstash


$ sudo yum install logstash
$ sudo vi /etc/logstash/conf.d/02-beats-input.conf

    input {
      beats {
        port => 5044
      }
    }

$ sudo vi /etc/logstash/conf.d/10-syslog-filter.conf

    filter {
      if [fileset][module] == "system" {
        if [fileset][name] == "auth" {
          grok {
            match => { ..... }
            pattern_definitions => {
              "GREEDYMULTILINE"=> "(.|\n)*"
            }
            remove_field => "message"
          }
          date {
            match => [ "[system][auth][timestamp]", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
          }
          geoip {
            source => "[system][auth][ssh][ip]"
            target => "[system][auth][ssh][geoip]"
          }
        }
        else if [fileset][name] == "syslog" {
          grok {
            match => { .... }
            remove_field => "message"
          }
          date {
            match => [ "[system][syslog][timestamp]", "MMM  d HH:mm:ss", "MMM dd HH:mm:ss" ]
          }
        }
      }
    }

$ sudo vi /etc/logstash/conf.d/30-elasticsearch-output.conf

    output {
      elasticsearch {
        hosts => ["localhost:9200"]
        manage_template => false
        index => "%{[@metadata][beat]}-%{[@metadata][version]}-%{+YYYY.MM.dd}"
      }
    }

If you want to add filters for other applications that use the Filebeat input, be sure to name the files so they’re sorted between the input and the output configuration, meaning that the file names should begin with a two-digit number between 02 and 30.

$ sudo -u logstash /usr/share/logstash/bin/logstash –path.settings /etc/logstash -t

$ sudo systemctl start logstash

$ sudo systemctl enable logstash

Installing Filebeat

The Elastic Stack uses several lightweight data shippers called Beats to collect data from various sources and transport them to Logstash or Elasticsearch. Here are the Beats that are currently available from Elastic:

  • Filebeat: collects and ships log files.
  • Metricbeat: collects metrics from your systems and services.
  • Packetbeat: collects and analyzes network data.
  • Winlogbeat: collects Windows event logs.
  • Auditbeat: collects Linux audit framework data and monitors file integrity.
  • Heartbeat: monitors services for their availability with active probing.

$ sudo yum install filebeat

$ sudo vi /etc/filebeat/filebeat.yml

     output.logstash:
       # The Logstash hosts
       hosts: ["localhost:5044"]

$ sudo filebeat modules list
$ sudo filebeat modules enable system
$ sudo systemctl start filebeat
$ sudo systemctl enable filebeat
To verify that Elasticsearch is indeed receiving this data, query the Filebeat index with this command:
$ curl -X GET ‘http://localhost:9200/filebeat-*/_search?pretty’

Installing on Docker

$ docker pull sebp/elk
$ vi docker-compose.yml
    elk:
      image: sebp/elk
      ports:
          – “5601:5601”
          – “9200:9200”
          – “5044:5044”

$ sudo docker-compose up elk

 

One thought on “Elk stack

  1. Hello Dear behnam, you are one of the best engineers in Iran and our good friend.i hope everything be ok for you.

Leave a Reply

Your email address will not be published. Required fields are marked *

Proudly powered by WordPress | Theme: Looks Blog by Crimson Themes.