How to install and setup Logstash

So you’ve finally decided to put a system in place to deal with the tsumnami of logs your web applications are generating, and you’ve looked here and there for something Open Source, and you’ve found Logstash, and you’ve had a go at setting it up…

…and then you’ve lost all will to live?

Any maybe too, you’ve found that every trawl through Google for some decent documentation leads you to this video of some guy giving a presentation about Logstash at some geeky conference, in which he talks in really general terms about Logstash, and doesn’t give you any clues as to how you go about bring it into existence?

Yes? Well, hopefully by landing here your troubles are over, because I’m going to tell you how to set up Logstash from scratch.

First, lets explain the parts and what they do. Logstash is in fact a collection of different technologies, in which the Java programme, Logstash, is only a part.

The Shipper

This is the bit that reads the logs and sends them for processing. This is handled by the Logstash Java programme.

Grok

This is the bit that takes logs that have no uniform structure and gives them a structure that you define. This occurs prior to the logs being shipped. Grok is a standalone technology. Logstash uses its shared libraries.

Redis

This is a standalone technology that acts as a broker. Think of it like a turnstile at a football ground. It allows multiple events (ie lines of logs) to queue up, and then spits them out in a nice orderly line.

The Indexer

This takes the nice ordered output from Redis, which is neatly structured, and indexes it, for faster searching. This is handled by the Logstash Java programme.

Elasticsearch

This is a standalone technology, into which The Indexer funnels data, which stores the data and provides search capabilities.

The Web Interface

This is the bit that provides a User Interface to search the data that has been stored in Elasticsearch. You can run the web server that is provided by the Logstash Java programme, or you can run the Ruby HTML/Javascript based web server client, Kibana. Both use the Apache Lucene structured query language, but Kibana has more features, a better UI and is less buggy (IMO).

(Kibana 2 was a Ruby based server side application. Kibana 3 is a HTML/Javascript based client side application. Both connect to an ElasticSearch backend).

That’s all the bits, so lets talk about setting it up.

First off, use a server OS that has access to lots of RPM repos. CentOS and Amazon Linux (for Amazon AWS users) are a safe bet, Ubuntu slightly less so.

For Redis, Elasticsearch and the Logstash programme itself, follow the instructions here:

http://logstash.net/docs/1.2.1/

(We’ll talk about starting services at bootup later)

Re. the above link, don’t bother working through the rest of the tutorial beyond the installation of the software. It demos Logstash using STDIN and STDOUT, which will only serve to confuse you. Just make sure that Redis, Elasticsearch and Logstash are installed and can be executed.

Now, on a separate system, we will setup the Shipper. For this, all you need it the Java Logstash programme and a shipper.conf config file.

Lets deal with 2 real-life, practical scenarios:

1. You want to send live logs to Logstash
2. You want to send old logs to Logstash

1. Live logs

Construct a shipper.conf file as follows:

input {

   file {
      type => "apache"
      path => [ "/var/log/httpd/access.log" ]
   }

}

output {
   stdout { debug => true debug_format => "json"}
   redis { host => "" data_type => "list" key => "logstash" }
}

What this says:

Your input is a file, located at /var/log/httpd/access.log, and you want to record the content of this file as the type “apache”. You can use wildcards in your specification of the log file, and type can be anything.

You want to output to 2 places: firstly, your terminal screen, and secondly, to the Redis service running on your Logstash server

2. Old logs

Construct a shipper.conf file as follows:

input {

tcp {
type => "apache"
port => 3333
}

}

output {
stdout { debug => true debug_format => "json"}
redis { host => "" data_type => "list" key => "logstash" }
}

What this says:

Your input is whatever data you read from TCP port 3333, and you want to record the content of this file as the type “apache”. You can use wildcards in your specification of the log file, and type can be anything.

You want to output to 2 places: firstly, your terminal screen, and secondly, to the Redis service running on your Logstash server.

That’s all you need to do for now on the Shipper. Don’t run anything yet. Go back to your main Logstash server.

In the docs supplied at the Logstash website, you were given instructions how to install Redis, Logstash and Elasticsearch, including the Logstash web server. We are not going to use the Logstash web server, and use Kibana instead, so you’ll need to set up Kibana (3, not 2. Version 2 is a Ruby based server side application).

https://github.com/elasticsearch/kibana/

Onward…

(We’re going to be starting various services in the terminal now, so you will need to open several terminal windows)

Now, start the Redis service on the command line:

./src/redis-server --loglevel verbose

Next, construct an indexer.conf file for the Indexer:

input {
   redis {
      host => "127.0.0.1"
      type => "redis-input"
      # these settings should match the output of the agent
      data_type => "list"
      key => "logstash"

      # We use json_event here since the sender is a logstash agent
      format => "json_event"
   }
}

output {
   stdout { debug => true debug_format => "json"}

   elasticsearch {
      host => "127.0.0.1"
   }
}

This should be self-explanatory: the Indexer is talking input from Redis, and sending it to Elasticsearch.

Now start the Indexer:

java -jar logstash-1.2.1-flatjar.jar agent -f indexer.conf

Next, start Elasticsearch:

./elasticsearch -f

Finally, crank up Kibana.

You should now be able to access Kibana at:

http://yourserveraddress:5601

Now that we have all the elements on the Logstash server installed and running, we can go back to the shipping server and start spitting out some logs.

Regardless of how you’ve set up your shipping server (live logs or old logs), starting the shipping process involves the same command:

java -jar logstash-1.2.1-flatjar.jar agent -f shipper.conf

If you’re shipping live logs, that’s all you will need to do. If you are shipping old logs, you will need to pipe them to the TCP port you opened in your shipper.conf file. Do this is a separate terminal window.

nc localhost 3333 < /var/log/httpd/old_apache.log

Our shipping configuration is setup to output logs both to STDOUT and Redis, so you should see lines of logs appearing on your terminal screen. If the shipper can’t contact Redis, it will tell you it can’t contact Redis.

Once you see logs being shipped, go back to your Kibana interface and run a search for content.

IMPORTANT: if your shipper is sending old logs, you need to search for logs from a time period that exists in those logs. there is no point in searching for content from the last 15 mins if you are injecting logs from last year.

Hopefully, you’ll see results in the Kibana window. If you want to learn the ins and outs of what Kibana can do, have a look at the Kibana website. If Kibana is reporting errors, retrace the steps above, and ensure that all of the components are running, and that all necessary firewall ports are open.

2 tasks now remain: using Grok and setting up all the components to run as services at startup.

Init scripts for Redis, ElasticSearch and Kibana are easy to find through Google. You’ll need to edit them to ensure they are correctly configured for your environment. Also, for the Kibana init script, ensure you use the kibana-daemon.rb Ruby script rather than the basic kibana.rb version.

Place the various scripts in /etc/init.d, and, again on CentOS, set them up to start at boot using chkconfig, and control them with the service command.

Grok isn’t quite so easy.

The code is available from here:

https://github.com/jordansissel/grok/

You can download a tarball of it from here:

https://github.com/jordansissel/grok/archive/master.zip

Grok has quite a few dependencies, which are listed in its docs. I was able to get all of these on CentOS using yum and the EPEL repos:

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/$(uname -i)/epel-release-5-4.noarch.rpm

then

yum install -y gcc gperf make libevent-devel pcre-devel tokyocabinet-devel

Also, after you have compiled grok, make sure you run ldconfig, so that its libraries are shared with Logstash.

How to explain Grok?

In the general development of software over the last 20-30 years, very little thought has gone into the structure of log files, which means we have lots of different structures in log files.

Grok allows you to "re-process" logs from different sources so that you can give them all the same structure. This structure is then saved in Elasticsearch, which makes querying logs from different sources much easier.

Even if you are not processing logs from different sources, Grok is useful, in that you can give the different parts of a line of a log field names, which again makes querying much easier.

Grok "re-processing", or filtering, as it is called, occurs in the same place as your Shipper, so we add the Grok config to the shipper.conf file.

This involves matching the the various components in your log format to Grok data types, or patterns as they are referred to in Grok. Probably the easiest way to do this is with this really useful Grok debugger:

http://grokdebug.herokuapp.com/

Cut and paste a line from one of your logs into the input field, and then experiment with the available Grok patterns until you see a nice clean JSON object rendered in the output field below.

3 thoughts on “How to install and setup Logstash

  1. David

    Dude. This guide effectively translates the Klingon of every ELK (elasticsearch, logstash, kibana) guide to English.
    Nice work, keep it up!

  2. Mark Sherman

    Thanks. That was very helpful. I am working in Windows so I needed to make some adjustments. also there have been some changes in the Logstash config since you wrote this. I still have a problem that the data being shipped to redis is not being indexed to elastic search. I am sending a JSON message to the shipper. If I change the shipper to output to elasticserach I seen the JSON fields in the elasticsearch doc. But when I send it to redis and then the indexer pulls it from redis all that shows up in elasticsearch is the timestamp, type and version.
    Thanks again,
    Mark

  3. Mark Sherman

    Ok I found the problem. Needed to change the indexer input to codex => json instead of format => “json_event”

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>