Using Elasticsearch Logstash Kibana (ELK) to monitor server performance

There are myriad tools that claim to be able to monitor server performance for you, but when you’ve already got a sizeable bag of tools doing various automated operations, its always nice to be able to fulfil an operational requirement using one of those rather than having to on board another one.

I love Elasticsearch. It can be a bit of minefield to learn, but when you get to grips with it, and bolt on Kibana, you realise that there is very little you can’t do with it.

Even better, Amazon AWS now have their own Elasticsearch Service, so you can reap all the benefits of the technology without having to worry about maintaining a cluster of Elasticsearch servers.

In this case, my challenge was to expose performance data from a large fleet of Amazon EC2 server instances. Yes, there is certain amount of data available in AWS Cloudwatch, but it lacks key metrics like memory usage and load average, which are invariably the metrics you must want to review.

One approach to this would be to put some sort of agent on the servers and have a server poll the agent, but again, that’s extra tools. Another approach would be to put scripts on the servers that push metrics to Cloudwatch, so that you can augment the existing EC2 Cloudwatch data. This was something we considered, but with this method, the metrics aren’t logged to the same place in Cloudwatch as the EC2 data, so it all felt a bit clunky. And you only get 2 weeks of backlog.

This is where we turned to Elasticsearch. We were already using Elasticsearch to store information about access to our S3 buckets, which we were happy with. I figured there had to be a way to leverage this to monitor server performance, so set about some testing.

Our basic setup was a Logstash server using the S3 Input plugin, and the Elasticsearch output plugin, which was configured to send output to our Elasticsearch domain in AWS

output {
 if [type] == "s3-access" {
     elasticsearch {
         index => "s3-access-%{+YYYY.MM.dd}"
         hosts => ["search-*********-5isan2svbmpipm2xznyupbeabe.us-west-2.es.amazonaws.com:443"]
         ssl => true
    }
 } 
}

We now wanted to created a different type of index, which would hold our performance metric data. This data was going to be taken from lots of servers, so Logstash needed a way to ingest the data from lots of remote hosts. The easiest way to do this is with the Logstash input plugin syslog. We first set up Logstash to listen for syslog input.

input {
     syslog {
         type => syslog
         port => 8514
     }
}

We then get our servers to send their syslog output to our Logstash server, by giving them a universal rsyslogd configuration, where logs.domain.com is our Logstash server:

#Logstash Configuration
$WorkDirectory /var/lib/rsyslog # where to place spool files
$template LogFormat,"%HOSTNAME% ops %syslogtag% %msg%"
*.* @@logs.mydomain.com:8514;LogFormat

We now update our output plugin in Logstash to create the necessary Index in Elasticsearch:

output {
 if [type] == "syslog" {
    elasticsearch {
       index => "test-syslog-%{+YYYY.MM.dd}"
       hosts => ["search-*********-5isan2svbmpipm2xznyupbeabe.us-west-2.es.amazonaws.com:443"]
       ssl => true
    }
 } else {
    elasticsearch {
       index => "s3-access-%{+YYYY.MM.dd}"
       hosts => ["search-*********-5isan2svbmpipm2xznyupbeabe.us-west-2.es.amazonaws.com:443"]
       ssl => true
    }
 }
}

Note that I have called the syslog Index “test-syslog-…”. I will explain this in a moment, but its important that you do this.

Once these steps have been completed, it should be possible to see syslog data in Kibana, as indexed by Logstash and stored in our AWS Elasticsearch domain.

Building on this, all we had to do next was get our performance metric data into the syslog stream on each of our servers. This is very easy. Logger is a handly little utility that comes pre-installed on most Linux distros that allows you send messages to syslog (/var/log/messages by default).

We trialled this with Load Average. To get the data to syslog, we set up the following cronjob on each server:

* * * * * root cat /proc/loadavg | awk '{print "LoadAverage: " $1}' | xargs logger

This writes the following line to /var/log/messages every minute:

Jun 21 17:02:01 server1 root: LoadAverage: 0.14

It should then be possible to search for this line in Kibana

message: "LoadAverage"

to verify that it is being stored in Elasticsearch. When we do find results in Kibana, we can see that the LogFormat template we used in our server rsyslog conf has converted the log line to:

server1 ops root: LoadAverage: 0.02

To really make this data useful however, we need to be able to perform visualisation logic on the data in Kibana. This means exposing the fields we require and making sure those field have the correct data type for numerical visualisations. This involves using some extra filters in your Logstash configuration.

filters {
   if [type] == "syslog" {
       grok {
          match => { "message" => '(%{HOSTNAME:hostname})\s+ops\s+root:\s+(%{WORD:metric-name}): (%{NUMBER:metric-value:float})' }
       }
   }
}

This filter operates on the message field after it has been converted by ryslog, rather than on the format of the log line in /var/log/messages. The crucial part of this is to expose the Load Average value (metric-value) as a float integer, so that Kibana/Elasticsearch can deal with it as an integer rather than a string. If you only specify NUMBER as your grok data type, it will be exposed as a string, so you need to add the “:float” to complete the data type conversion to type integer.

To verify that it is exposed as a string, look in Kibana under Settings -> Indices. You should only have a single Index Pattern at this point (test-syslog-*). Refresh the field list for this, and search for “metric-value”. At this point, it may indicate that the data type for this is “String”, which we can now deal with. If it already has data type “Number”, you’re all set.

In Elasticsearch indices, you can only set the data type for a field when the index is created. If your “test-syslog-” index was created before we properly converted “metric-value” to an integer, you can now create a new index and verify that metric-value is an integer. To do this, update the output plugin in your Logstash configuration and restart Logstash.

output {
 if [type] == "syslog" {
    elasticsearch {
       index => "syslog-%{+YYYY.MM.dd}"
       hosts => ["search-*********-5isan2svbmpipm2xznyupbeabe.us-west-2.es.amazonaws.com:443"]
       ssl => true
    }
 } 
}

A new Index (syslog-) will now be created. Delete the existing Index pattern in Kibana and create a new one for syslog-*, using @timestamp as the default time field. Once this has been created, Kibana will obtain and updated field list (after a few seconds), and in this, you should see that “metric-value” now has a data type of “Number”.

(For neatness, you may want to replace the “test-syslog-” index with a properly named index even if you data type for “metric-value” is already “Number”).

Now that you have the data you need in Elasticsearch, you can graph it with a visualisation.

First, set your interval to “Last Hour” and create/save a Search for what you want to graph, eg:

metric-name: "LoadAverage" AND hostname: "server1"

Now, create a Line Graph visualisation for that Search, setting the Y-Axis to Average for field “metric-value” and the X-axis to Data Histogram. Click “Apply” and you should see a graph like below:

Screen Shot 2016-06-22 at 10.32.56

 

 

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>