Stagduction

Stagduction

(Noun) A web application state in which the the service provided is not monitored, not redundant and has not been performance tested, but which is in use by a large community of people as a result of poor planning, poor communication and over-zealous sales people.

Install Ruby for Rails on Amazon Linux

A quick HOWTO on how to install Ruby for Rails on Amazon Linux

Check your Ruby version (bundled in Amazon Linux)


ruby -v
ruby 2.0.0p481 (2014-05-08 revision 45883) [x86_64-linux]

Check your sqlite3 version (bundled with Amazon Linux)


sqlite3 --version
3.7.17 2013-05-20 00:56:22 118a3b35693b134d56ebd780123b7fd6f1497668

Check Rubygems version (bundled with Amazon Linux)


gem -v
2.0.14

Install Rails (this sticks on the command line for a while, be patient. The extra parameters exclude the documentation, which if installed, can melt the CPU on smaller instances whilst compiling)


sudo gem install rails --no-ri --no-rdoc

Check Rails installed


rails --version
Rails 4.1.6

Install gcc (always handy to have)


sudo yum install -y gcc

Install ruby and sqlite development packages


sudo yum install -y ruby-devel sqlite-devel

Install node.js (Rails wants a JS interpreter)

 sudo bash
curl -sL https://rpm.nodesource.com/setup | bash -
exit
sudo yum install -y nodejs

Install the sqlite3 and io-console gems


gem install sqlite3 io-console

Make a blank app


mkdir myapp
cd myapp
rails new .

Start it (in the background)


bin/rails s &

Hit it


wget -qO- http://localhost:3000

Debug (Rails console)


bin/rails c

Application monitoring with Nagios and Elasticsearch

As the applications under your control grow, both in number and complexity, it becomes increasingly difficult to rely on predicative monitoring.

Predicative monitoring is monitoring things that you know should be happening. For instance, you know your web server should be accepting HTTP connections on TCP port 80, so you use a monitor to test that HTTP connections are possible on TCP port 80.

In more complex applications, it harder to predict what may or may not go wrong; similarly, some things can’t be monitored in predictive way, because your monitoring system may not be able to emulate the process that you want to monitor.

For example, lets say your application sends Push message to a mobile phone application. To monitor this thoroughly, you would have to have a monitor that persistently sends Push messages to a mobile phone, and some way of monitoring that the mobile phone received them.

At this stage, you need to invert your monitoring system, so that it stops asking if things are OK, and instead listens for applications that are telling it that they are not OK.

Using your application logs files is one way to do this.

Well-written applications are generally quite vocal when it comes to being unwell, and will always describe an ERROR in their logs if something has gone wrong. What you need to do is find a way of linking your monitoring system to that message, so that it can alert you that something needs to be checked.

This doesn’t mean you can dispense with predictative monitoring altogether; what is does means is that you don’t need to rely on predicative monitoring entirely (or in other words, you don’t need to be able to see into the future) to keep your applications healthy.

This is how I’ve implemented log based monitoring. This was something of a nut to crack, as our logs arise from an array of technologies and adhere to very few standards in terms of layout, logging levels and storage locations.

The first thing you need is a logstash implementation. Logstash comprises a stack of technologies: an agent to ship logs out to a Redis server; a Redis server to queue logs for indexing; a logstash server for creating indices and storing them in elasticsearch; an elasticsearch server to search your indices.

The setup of this stack is beyond this article; its well-described over on the logstash website, and is reasonably straightforward.

Once you have your logstash stack set up, you can start querying the elasticsearch search api for results. Queries are based on HTTP POST and JSON, and results are output in JSON.

Therefore, to test you logs, you need to issue a HTTP POST query from Nagios, check the results for ERROR strings, and alert accordingly.

The easient way to have Nagios send a POST request with a JSON payload to elasticsearch is with the Nagios jmeter plugin, which allows you to create monitors based on your jmeter scripts.

All you need then is a correctly constructed JSON query to send to elasticsearch, which is where things get a bit trickier.

Without going into this in any great detail, formulating a well-constructed JSON query that will parse just the right log indices in elasticsearch isn’t easy. I cheated a little in this. I am familiar with the Apache Lucene syntax that the Logstash Javascript client, Kibana, uses, and was able to formulate my query based on this.

Kibana sends encrypted queries to elasticsearch, so you can’t pick them out of the HTTP POST/GET variables. Instead, I enabled logging of slow queries on elasticsearch (threshold set to 0s) so that I could see in the elasticsearch logs what exact queries were being run against elasticsearch. Here’s an example:


{
  "size": 100,
  "sort": {
    "@timestamp": {
      "order": "desc"
    }
  },
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "NOT @source_host:\"uatserver\"",
          "default_field": "_all",
          "default_operator": "OR"
        }
      },
      "filter": {
        "range": {
          "@timestamp": {
            "from": "2014-10-06T11:05:25+00:00",
            "to": "2014-10-06T12:05:25+00:00"
          }
        }
      }
    }
  },
  "from": 0
}

You can test a query like this by sending it straight to your elasticsearch API:


curl -XPOST 'http://localhost:9200/_search' -d '{"size":100,"sort":{"@timestamp":{"order":"desc"}},"query":{"filtered":{"query":{"query_string":{"query":"NOT @source_host:\"uatserver\"","default_field":"_all","default_operator":"OR"}},"filter":{"range":{"@timestamp":{"from":"2014-10-06T11:05:25+00:00","to":"2014-10-06T12:05:25+00:00"}}}}},"from":0}'

This searches a batch of 100 log entries that do not have a tag of “uatserver”, from a previous 5 minute period.

Now that we now what we want to send to elasticsearch, we can construct a simple jmeter script. In this this, we simply specify a a HTTP POST request, containing Body Data of the JSON given above, and include a Response Assertion for the strings we do not want to see in the logs.

We can then use that script in Nagios with the jquery plugin. If the script finds the ERROR string in the logs, it will generate an alert.

2 things are important here:

The alert will only tell you that an error has appeared in the logs, not what that error was; and if the error isn’t persistent, the monitor will eventually recover.

Clearly, there is a lot of scope for false negatives in this, so if your logs are full of tolerable errors (they shouldn’t be really) you are going to have to be more specific about your search strings.

The good news is that if you get this all working, its very easy to create new monitors. Rather than writing bespoke scripts and working with Nagios plugins, all you need to do is change the queries and the Response Assertions in your jmeter script, and you should be able to monitor anything that is referenced in your application logs.

To assist in some small way, here is a link to a pre-baked JMeter script that includes an Apache Lucene query, and is also set up with the necessary Javascript-based date variables to search over the previous 15 minutes.

Negative matching on multiple ip addresses in SSH

In sshd_config, you can use the

Match

directive to apply different configuration parameters to ssh connections depending on their characteristics.

In particular, you can match on ip address, both positively and negatively.

You can specify multiple conditions in the match statement. All conditions must be matched before the match configuration is applied.

To negatively match an ip address, that is, to apply configuration if the connection is not from a particular ip address, use the following syntax

Match Address *,!62.29.1.162/32
ForceCommand /sbin/sample_script

To negatively match more than one ip address, that is, to apply configuration if the connection is not from one of more ip addresses, use the following syntax

Match Address *,!62.29.1.162/32,!54.134.118.96/32
ForceCommand /sbin/sample_script

Is Skype an appropriate tool in corporate environments?

This is a question that has plagued me for several years, in that I have never been able to establish a consistent level of Skype quality in a corporate environment, despite having lots of bandwidth and obtained the consultancy services of CCIE level network experts.

The answer to the question is ultimately, no.

Let me explain by running through the questions.

1. How does Skype work at a network level?

Skype is a “Peer To Peer” (P2P) application. That means that when 2 people are having a Skype conversation, their computers *should* be directly connected, rather than connected via a 3rd computer. For the sake of comparison, Google Hangouts is not a P2P application. Google Hangout participants connect to each other via Google Conference Servers.

2. Does Skype work with UDP or TCP?

Skype’s preference is for UDP, and when Skype can establish a direct P2P connection using UDP, which is typically the case for residential users, call quality is very good. This is because UDP is a much faster protocol than TCP when used for streaming audio and video.

3. What’s the difference between residential and corporate users?

Residential internet connections are typically allocated a temporary fixed public ip address. This IP gets registered to a Skype user on Skype’s servers, so when someone needs to contact that user, Skype knows where to direct the call, and can use UDP to establish a call between the participating users.

In corporate environments, where there are lots of users using the same internet connection, sharing of a a single public IP address between those users has to occur (Port Address Translation). That means that the Skype servers will have registered the same public ip address for all the users in that organisation. This means that Skype is not able to establish a direct UDP P2P connection between a user on the outside of that organisation and a user in that organisation, and has to use other means to make that connection.

4. What are those other means?

When direct connectivity between clients is not possible, Skype uses a process called “UDP hole punching”. In this mechanism, 2 computers that cannot communicate directly with each other communicate with one or more third party computers that can communicate with both computers.

Connection information is passed between the computers in order to try and establish a direct connection between the 2 computers participating in the Skype call.

If ultimately a direct connection cannot be established, Skype will use the intermediary computers to relay the connection between the 2 computers participating in the conversation.

In Skype terminology, these are known as “relay nodes”, which are basically just computers running Skype than have direct UDP P2P capability (typically residential users with good broadband speeds).

From the Skype Administrators Manual:

http://download.skype.com/share/business/guides/skype-it-administrators-guide.pdf

2.2.4 Relays

If a Skype client can’t communicate directly with another client, it will find the appropriate relays for the connection and call traffic. The nodes will then try connecting directly to the relays. They distribute media and signalling information between multiple relays for fault tolerance purposes. The relay nodes forward traffic between the ordinary nodes. Skype communication (IM, voice, video, file transfer) maintains its encryption end-to-end between the two nodes, even with relay nodes inserted.

As with supernodes, most business users are rarely relays, as relays must be reachable directly from the internet. Skype software minimizes disruption to the relay node’s performance by limiting the amount of bandwidth transferred per relay session. 

5. Does that mean that corporate Skype traffic is being relayed via anonymous third party computers?

Yes. The traffic is encrypted, but it is still relayed through other unknown hosts if a direct connection between 2 Skype users is not possible.

6. Is this why performance in corporate environments is sometimes not good?

Yes. If a Skype conversation is dependent on one of more relay nodes, and one of these nodes experiences congestion, this will impact on the quality of the call.

7. Surely, there is some solution to this?

A corporate network can deploy a proxy server, which is directly mapped to a dedicated public ip address. Ideally, this should be a UDP-enabled SOCKS5 server, but a TCP HTTP Proxy server can also be used. If all Skype connections are relayed through this server, Skype does not have to use relay nodes, as Port Address Translation is not in use.

8. So what’s the catch?

The problem with this solution is that it is not generally possible to force the Skype client to use a Proxy Server. When the client is configured to use a Proxy Server, it will only use it if there is no other way to connect to the Internet. So, if you have a direct Internet connection, even one based on Port Address Translation, which impacts on Skype quality, Skype will continue to use this, even if a better solution is available via a Proxy Server.

9. Why would Skype do this?

Skype is owned by Microsoft. Skype have a business product that attaches to Microsoft Active Directory that allows you do force a Proxy connection. So if you invest in a Microsoft network, Microsoft will give you a solution to enable better Skype performance in corporate networks. If you don’t want to invest in a Microsoft network, you’re stuck, and your only option is to block all outbound Internet access from your network and divert it via your Proxy server.

For a lot of companies, particularly software development companies who depend on 3rd party web services, this is not a practical option.

10. What is the solution?

At this time the primary options for desktop Audio/Video conferencing are either Skype or Google Hangouts.

When Skype can be used in an environment where P2P UDP connectivity is “always on”, it provides a superior audio/video experience to Google Hangouts, which is not P2P, and which communicates via central Google Servers.

Where an environment uses Port Address Translation, Skype performance will depend on the ability of Skype client to establish connections via relays, which means Skype performance becomes dependent on the resources available to those relays.

In this instance, Google Hangout may be a better choice where consistent quality is required, as quality can be guaranteed by providing sufficient bandwidth between the corporate network and Google.

 

How to use DJ Bernstein’s daemontools

When I first started working in IT, one of the first projects I had to undertake was to set up a QMail server, which first brought me into contact with DJ Bernstein and his various software components.

One of these was daemontools, which is a “a collection of tools for managing UNIX services”, and which is most frequently used in connection with Qmail.

The deamontools website is from another time. Flat HTML files, no CSS, horizontal rules…its like visiting some sort of online museum. In fact, the website hasn’t changed in over 20 years, and daemontools has been around for that long, and hasn’t changed much in the interim.

The reason for daemontools longevity is quite simple. It works. And it works every time, all the time, which isn’t something you can say about every software product.

So if you need to run a process on a UNIX/Linux server, and that process needs to stay up for a very long time, without interruption, there probably isn’t any other software than can offer the same reliability as daemontools.

Here’s a quick HOWTO:

Firstly, install it, exactly as described here:

http://cr.yp.to/daemontools/install.html

If you can an error during the installation about a TLS reference, edit the file src/conf-cc, and add

-include /usr/include/errno.h

to the gcc line.

Once installed, check:

1. That you have a /service directory
2. That the command /command/svscanboot exists

If this is the case, daemontools is successfully installed

Now, you can create the process/service that you want daemontools to monitor.

Create a directory under /service, with a name appropriate to your service, eg

/service/growfile

(you can also use a symbolic link for this directory, to point to an existing service installation)

In that directory, create a file called run, and give it 755 permission


touch /service/growfile/run
chmod 755 /service/growfile/run

Next, update the run file with the shell commands necessary to run your service


#!/bin/sh

while :
do
echo “I am getting bigger…” > /tmp/bigfile.txt
sleep 1
done

Your service is now set up. To have daemontools monitor it, run the following command:


/command/svscan &

(To start this at boot, add /command/svscanboot to /etc/rc.local, if the install hasn’t done this already)

To see this in action, run ps -ef and have a look at your process list. You will see

1. A process called svsscan, which is scanning the /service directory for new processes to monitor
2. A process called “supervise growfile”, which is keeping the job writing to the file alive

Also, run


tail -f /tmp/bigfile.txt

Every 1 second, you should see a new line being appended to this file:


I am getting bigger...
I am getting bigger...
I am getting bigger...
I am getting bigger...

To test deamontools, delete /tmp/bigfile.txt


rm -f /tmp/bigfile.txt

It should be gone, right?

No! Its still there!


tail -f /tmp/bigfile.txt


I am getting bigger...
I am getting bigger...
I am getting bigger...
I am getting bigger...

Finally, if you want to actually kill your process, you can use the “svc” command supplied with daemontools:

svc -h /service/yourdaemon: sends HUP
svc -t /service/yourdaemon: sends TERM, and automatically restarts the daemon after it dies
svc -d /service/yourdaemon: sends TERM, and leaves the service down
svc -u /service/yourdaemon: brings the service back up
svc -o /service/yourdaemon: runs the service once

This is the basic functionality of daemontools. There is a lot more on the website.

Managing Logstash with the Redis Client

Users of Logstash will be familiar with the stack of technologies required to implement a logstash solution:

The client that ships the logs to Redis

Redis which queues up the files for indexing

Logstash which creates the indices

Elasticsearch which stores the indices

Kibana which queries Elasticsearch

When you’re dealing with multiple components like this, things will inevitably for wrong.

For instance, say for some reason you client stops, and then you start it again 4 days later, and now the stack has to process 4 days of old log files before letting you search the latest ones.

One of the best ways to deal with this is to setup the Redis queue (“list” is the correct term) so that you can selectively remove entries from the list, so that chunks of old logs can be skipped.

Take a look at this config from the logstash shipper:


output {
  stdout { debug => false debug_format => "json"}
  redis { host => "172.32.1.172" data_type => "channel" key => "logstash-%{@type}-%{+yyyy.MM.dd.HH}" }
}

You’ll see here that I’ve modified the default key value for logstash, by adding the log file type and date stamp to the key. The default key value in the Logstash documentation is “logstash’, which means every entry goes into Redis with the same key value.

You will also notice that I have changed the data_type from the default “list” to “channel’, more of which in a moment.

To see what this means, you should now login to your Redis server with the standard redis-cli command line interface

To list all available keys, just type


KEYS *logstash*

and you will get something like


redis 127.0.0.1:6379> keys *logstash*
 1) "logstash-nodelog-2014.03.07.17"
 2) "logstash-javalog-2014.03.07.15"
 3) "logstash-applog-2014.03.07.14"
 4) "logstash-catalina-2014.03.08.23"
 5) "logstash-applog-2014.03.08.23"
 6) "logstash-catalina-2014.03.07.15"
 7) "logstash-nodelog-2014.03.07.14"
 8) "logstash-javalog-2014.03.07.14"
 9) "logstash-nodelog-2014.03.08.23"
10) "logstash-applog-2014.03.07.15"
11) "logstash-javalog-2014.03.08.23"

This shows that your log data are now stored in Redis according to log file type, and data and hour, rather than all just under the default “logstash” key. In other words, there are now multiple keys, rather than just the “logstash” key which is the default.

You also need to change the indexer configuration at this point, so that it looks for multiple keys in Redis rather than just the “logstash” key


input {
  redis {
    host => "127.0.0.1"
    type => "redis-input"
    # these settings should match the output of the agent
    data_type => "pattern_channel"
    key => "logstash*"

    # We use json_event here since the sender is a logstash agent
    format => "json_event"
  }
}

For data_type here, I am using “pattern_channel”, which means the indexer will ingest the data from any key where the key matches the pattern “logstash*”.

If you don’t change this, and you have changed your shipper, none of your data will get to Elasticsearch.

Using Redis in this way also requires a change to the default Redis configuration. When Logstash keys are stored in Redis in a List format, the List is constantly popped by the Logstash indexer, so it remains in a steady state in terms of memory usage.

When the Logstash Indexer pull data from a Redis channel, the data isn’t removed from Redis, and therefore grows.

To deal with this, you need to set up memory management in Redis, namely:

maxmemory 500mb
maxmemory-policy allkeys-lru

What this means is that when Redis reaches a limit of 500mb of used memory, it will drop keys according to a “Least Recently Used” algorithm. The default algorithm is volatile-lru, which is dependent on the TTL value of the key, but as Logstash doesn’t set the TTL on Redis keys, which need to use the allkeys-lru alternatively instead.

Now, if you want to remove a particular log file type from a particular date and time from the Logstash process, you can simply delete that data from Redis


DEL logstash-javalog-2014.03.08.23

You can also check the length of individual lists by using LLEN, to give you an idea of which logs from which dates and times will take the longest to process


redis 127.0.0.1:6379> llen logstash-javalog-2014.03.08.23
(integer) 385460

You can also check you memory consumption in Redis with:

redis 127.0.0.1:6379>info

Command line tool for checking status of instances in Amazon EC2

I manage between 10 and 15 different Amazon AWS accounts for different companies.

When I needed to find out information about a particular instance, it was a pain to have log into the web interface each time. Amazon do provide an API that allows you query data about instances, but to use that, you need to store an Access Key and Secret on your local computer, which isn’t very safe when you’re dealing with multiple account.

To overcome, this I patched together Tim Kay’s excellent aws tool with GPG and a little PHP, to create a tool which allows you query the status of all instances in a specific region in an Amazon EC2 account, using access credentials that are locally encrypted, so that storing them locally isn’t an issue.

Output from the tool is presented on a line by line basis, so you can use grep to filter the results.

Sample output:

ec2sitrep.sh aws.account1 us-east-1

"logs-use"  running  m1.medium  us-east-1a  i-b344b7cb  172.32.1.172  59.34.113.133
"adb2-d-use"  running  m1.small  us-east-1d  i-07d3e963  172.32.3.54  67.45.139.235
"pms-a-use"  running  m1.medium  us-east-1a  i-90852ced  172.32.1.27  67.45.108.146
"s2-sc2-d-use"  running  m1.medium  us-east-1d  i-3d40b442  172.32.3.26  67.45.175.244
"ks2-sc3-d-use"  running  m1.small  us-east-1d  i-ed2ed492  172.32.3.184  67.45.163.141
"ks1-sc3-c-use"  running  m1.small  us-east-1c  i-6efb9612  172.32.2.195  67.45.159.221
"adb1-c-use"  running  m1.small  us-east-1c  i-98cf44e4  172.32.2.221  67.45.139.196
"s1-sc1-c-use"  running  m1.medium  us-east-1c  i-956a76e8  172.32.2.96  67.45.36.97
"sms2-d-use"  running  m1.medium  us-east-1d  i-a86ef686  172.32.3.102  34.90.28.159
"uatpms-a-use"  running  m1.small  us-east-1a  i-b8cf5399  172.32.1.25  34.90.163.110
"uatks1-sc3-c-use"  running  t1.micro  us-east-1c  i-de336dfe  172.32.2.26  34.90.99.226
"uats1-sc1-c"  running  m1.medium  us-east-1c  i-35396715  172.32.2.217  34.90.183.23
"uatadb1-c-use"  running  t1.micro  us-east-1c  i-4d316f6d  172.32.2.29  34.90.109.171
"sms1-c-use"  running  m1.medium  us-east-1c  i-31b29611  172.32.2.163  34.90.100.25

(Note that public ips have been changed in this example)

You can obtain the tool from Bitbucket:

https://bitbucket.org/garrethmcdaid/amazon-ec2-sitrep/

How to monitor the Amazon Linux AMI Security Patch RSS feed with Nagios

People who use Amazon AWS will be familiar with the Amazon Linux AMI, which is a machine image provided by Amazon with a stripped down installation of Linux.

The AMI acts as a starting point for building up your own AMIs, and has its own set of repos maintained by Amazon for obtaining software package updates.

Amazon also maintains an RSS feed, which announces the availability of new security patches for the AMI.

One of the requirements of PCI DSS V2 compliance is as follows:

6.4.5 Document process for applying security patches and software updates

That means you have to have a written down process for being alerted to and applying software patches to servers in your PCI DSS scope.

You could of course commit to reading the RSS feed every day, but that’s human intervention, which is never reliable. You could also set up your Amazon servers to simply take a system wide patch update every day, but if you’d prefer to review the necessity and impact of patches before applying them, that isn’t going to work.

Hence, having your monitoring system tell you if a new patch has been released for a specific software component would be nice thing to have, and here it is, in the form of a Nagios plugin.

The plugin is written in PHP (I’m a ex-Web Developer) but is just as capable as when it comes to Nagios as PERL and Python (without the need for all those extra modules).

I’ve called it check_rss.php, as it can be used on any RSS feed. There is another check_rss Nagios plugin, but it won’t work in this instance, as it only checks the most recent port in the RSS stream, and doesn’t include any way to retire alerts.

You can obtain the Plugin from Bitbucket:

https://bitbucket.org/garrethmcdaid/nagios-rss-checker/src

The script takes the following arguments:

“RSS Feed URL”

“Quoted, comma Separated list of strings you want to match in the post title”

“Number of posts you want to scan”

“Number of days for which you want the alert to remain active”

eg

commands.cfg

define command {
    command_name check_rss
    command_line $USER1$/check_rss.php $ARG1$ $ARG2$ $ARG3$ $ARG4$
}

<sample>.cfg

<snip>
check_command   check_rss!http://aws.amazon.com/rss/amazon-linux-ami.rss!"openssl"!30!3
<snip>

You need to tell Nagios how long you want the alert to remain active, as you have no way of resolving the alert (ie you can’t remove it from the RSS feed)

This mechanism allows you to “silence” the alert after a number of days. This isn’t a feature of Nagios, rather of the script itself.

The monitor will alert if it finds *any* patches, and include *all* matching patches in its alert output.

How to install and setup Logstash

So you’ve finally decided to put a system in place to deal with the tsumnami of logs your web applications are generating, and you’ve looked here and there for something Open Source, and you’ve found Logstash, and you’ve had a go at setting it up…

…and then you’ve lost all will to live?

Any maybe too, you’ve found that every trawl through Google for some decent documentation leads you to this video of some guy giving a presentation about Logstash at some geeky conference, in which he talks in really general terms about Logstash, and doesn’t give you any clues as to how you go about bring it into existence?

Yes? Well, hopefully by landing here your troubles are over, because I’m going to tell you how to set up Logstash from scratch.

First, lets explain the parts and what they do. Logstash is in fact a collection of different technologies, in which the Java programme, Logstash, is only a part.

The Shipper

This is the bit that reads the logs and sends them for processing. This is handled by the Logstash Java programme.

Grok

This is the bit that takes logs that have no uniform structure and gives them a structure that you define. This occurs prior to the logs being shipped. Grok is a standalone technology. Logstash uses its shared libraries.

Redis

This is a standalone technology that acts as a broker. Think of it like a turnstile at a football ground. It allows multiple events (ie lines of logs) to queue up, and then spits them out in a nice orderly line.

The Indexer

This takes the nice ordered output from Redis, which is neatly structured, and indexes it, for faster searching. This is handled by the Logstash Java programme.

Elasticsearch

This is a standalone technology, into which The Indexer funnels data, which stores the data and provides search capabilities.

The Web Interface

This is the bit that provides a User Interface to search the data that has been stored in Elasticsearch. You can run the web server that is provided by the Logstash Java programme, or you can run the Ruby HTML/Javascript based web server client, Kibana. Both use the Apache Lucene structured query language, but Kibana has more features, a better UI and is less buggy (IMO).

(Kibana 2 was a Ruby based server side application. Kibana 3 is a HTML/Javascript based client side application. Both connect to an ElasticSearch backend).

That’s all the bits, so lets talk about setting it up.

First off, use a server OS that has access to lots of RPM repos. CentOS and Amazon Linux (for Amazon AWS users) are a safe bet, Ubuntu slightly less so.

For Redis, Elasticsearch and the Logstash programme itself, follow the instructions here:

http://logstash.net/docs/1.2.1/

(We’ll talk about starting services at bootup later)

Re. the above link, don’t bother working through the rest of the tutorial beyond the installation of the software. It demos Logstash using STDIN and STDOUT, which will only serve to confuse you. Just make sure that Redis, Elasticsearch and Logstash are installed and can be executed.

Now, on a separate system, we will setup the Shipper. For this, all you need it the Java Logstash programme and a shipper.conf config file.

Lets deal with 2 real-life, practical scenarios:

1. You want to send live logs to Logstash
2. You want to send old logs to Logstash

1. Live logs

Construct a shipper.conf file as follows:

input {

   file {
      type => "apache"
      path => [ "/var/log/httpd/access.log" ]
   }

}

output {
   stdout { debug => true debug_format => "json"}
   redis { host => "" data_type => "list" key => "logstash" }
}

What this says:

Your input is a file, located at /var/log/httpd/access.log, and you want to record the content of this file as the type “apache”. You can use wildcards in your specification of the log file, and type can be anything.

You want to output to 2 places: firstly, your terminal screen, and secondly, to the Redis service running on your Logstash server

2. Old logs

Construct a shipper.conf file as follows:

input {

tcp {
type => "apache"
port => 3333
}

}

output {
stdout { debug => true debug_format => "json"}
redis { host => "" data_type => "list" key => "logstash" }
}

What this says:

Your input is whatever data you read from TCP port 3333, and you want to record the content of this file as the type “apache”. You can use wildcards in your specification of the log file, and type can be anything.

You want to output to 2 places: firstly, your terminal screen, and secondly, to the Redis service running on your Logstash server.

That’s all you need to do for now on the Shipper. Don’t run anything yet. Go back to your main Logstash server.

In the docs supplied at the Logstash website, you were given instructions how to install Redis, Logstash and Elasticsearch, including the Logstash web server. We are not going to use the Logstash web server, and use Kibana instead, so you’ll need to set up Kibana (3, not 2. Version 2 is a Ruby based server side application).

https://github.com/elasticsearch/kibana/

Onward…

(We’re going to be starting various services in the terminal now, so you will need to open several terminal windows)

Now, start the Redis service on the command line:

./src/redis-server --loglevel verbose

Next, construct an indexer.conf file for the Indexer:

input {
   redis {
      host => "127.0.0.1"
      type => "redis-input"
      # these settings should match the output of the agent
      data_type => "list"
      key => "logstash"

      # We use json_event here since the sender is a logstash agent
      format => "json_event"
   }
}

output {
   stdout { debug => true debug_format => "json"}

   elasticsearch {
      host => "127.0.0.1"
   }
}

This should be self-explanatory: the Indexer is talking input from Redis, and sending it to Elasticsearch.

Now start the Indexer:

java -jar logstash-1.2.1-flatjar.jar agent -f indexer.conf

Next, start Elasticsearch:

./elasticsearch -f

Finally, crank up Kibana.

You should now be able to access Kibana at:

http://yourserveraddress:5601

Now that we have all the elements on the Logstash server installed and running, we can go back to the shipping server and start spitting out some logs.

Regardless of how you’ve set up your shipping server (live logs or old logs), starting the shipping process involves the same command:

java -jar logstash-1.2.1-flatjar.jar agent -f shipper.conf

If you’re shipping live logs, that’s all you will need to do. If you are shipping old logs, you will need to pipe them to the TCP port you opened in your shipper.conf file. Do this is a separate terminal window.

nc localhost 3333 < /var/log/httpd/old_apache.log

Our shipping configuration is setup to output logs both to STDOUT and Redis, so you should see lines of logs appearing on your terminal screen. If the shipper can’t contact Redis, it will tell you it can’t contact Redis.

Once you see logs being shipped, go back to your Kibana interface and run a search for content.

IMPORTANT: if your shipper is sending old logs, you need to search for logs from a time period that exists in those logs. there is no point in searching for content from the last 15 mins if you are injecting logs from last year.

Hopefully, you’ll see results in the Kibana window. If you want to learn the ins and outs of what Kibana can do, have a look at the Kibana website. If Kibana is reporting errors, retrace the steps above, and ensure that all of the components are running, and that all necessary firewall ports are open.

2 tasks now remain: using Grok and setting up all the components to run as services at startup.

Init scripts for Redis, ElasticSearch and Kibana are easy to find through Google. You’ll need to edit them to ensure they are correctly configured for your environment. Also, for the Kibana init script, ensure you use the kibana-daemon.rb Ruby script rather than the basic kibana.rb version.

Place the various scripts in /etc/init.d, and, again on CentOS, set them up to start at boot using chkconfig, and control them with the service command.

Grok isn’t quite so easy.

The code is available from here:

https://github.com/jordansissel/grok/

You can download a tarball of it from here:

https://github.com/jordansissel/grok/archive/master.zip

Grok has quite a few dependencies, which are listed in its docs. I was able to get all of these on CentOS using yum and the EPEL repos:

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/$(uname -i)/epel-release-5-4.noarch.rpm

then

yum install -y gcc gperf make libevent-devel pcre-devel tokyocabinet-devel

Also, after you have compiled grok, make sure you run ldconfig, so that its libraries are shared with Logstash.

How to explain Grok?

In the general development of software over the last 20-30 years, very little thought has gone into the structure of log files, which means we have lots of different structures in log files.

Grok allows you to "re-process" logs from different sources so that you can give them all the same structure. This structure is then saved in Elasticsearch, which makes querying logs from different sources much easier.

Even if you are not processing logs from different sources, Grok is useful, in that you can give the different parts of a line of a log field names, which again makes querying much easier.

Grok "re-processing", or filtering, as it is called, occurs in the same place as your Shipper, so we add the Grok config to the shipper.conf file.

This involves matching the the various components in your log format to Grok data types, or patterns as they are referred to in Grok. Probably the easiest way to do this is with this really useful Grok debugger:

http://grokdebug.herokuapp.com/

Cut and paste a line from one of your logs into the input field, and then experiment with the available Grok patterns until you see a nice clean JSON object rendered in the output field below.