Installing Passenger for Puppet on Amazon Linux

Introduction

Puppet ships with a web server called Web Brick. This is fine for test and use with a small number of nodes, but will cause problems with larger fleets of nodes. It is recommended to use the Ruby application server, Passenger, to run Puppet in production environments.

Setup

Provision a new server instance.

Install required RPMs. Use Ruby 1.8 rather than Ruby 2.0. Both are shipped with the Amazon Linux AMI at the time of writing, but you need to set up the server to use version 1.8 by default.

sudo yum install -y ruby18 httpd httpd-devel mod_ssl ruby18-devel rubygems18 gcc mlocate
sudo yum install -y gcc-c++ libcurl-devel openssl-devel zlib-devel git

Make Ruby 1.8 the default

sudo alternatives --set ruby /usr/bin/ruby1.8

Set Apache to start at boot

sudo chkconfig httpd on

Install Passenger gem

sudo gem install rack passenger

Update the location DB (you will need this to find files later)

sudo updatedb

Find the path to the installer and add this to the path

locate passenger-install-apache2-module
sudo vi /etc/profile.d/puppet.sh
 
export PATH=$PATH:/usr/lib/ruby/gems/1.8/gems/passenger-5.0.10/bin/
 
sudo chmod 755 /etc/profile.d/puppet.sh

Make some Linux swap space (the installer will fail on smaller instances if this doesn’t exist)

sudo dd if=/dev/zero of=/swap bs=1M count=1024
sudo mkswap /swap
sudo chmod 0600 /swap
sudo swapon /swap

At this point, open a separate shell to the server (you should have 2 shells). This isn’t absolutely essential, but the installer will ask you to update an Apache file mid-flow, so if you want to do things to the letter of the law, a second shell helps.

Next, run the installer, and accept the default options.

sudo /usr/lib/ruby/gems/1.8/gems/passenger-5.0.10/bin/passenger-install-apache2-module

The installer will ask you to add some Apache configuration before it completes. Do this in your second shell. Add the config to a file called /etc/httpd/conf.d/puppet.conf. You can ignore warning about the PATH.

<IfModule mod_passenger.c>
  PassengerRoot /usr/lib/ruby/gems/1.8/gems/passenger-5.0.10
  PassengerDefaultRuby /usr/bin/ruby1.8
</IfModule>

Restart Apache after you add this and then press Enter to complete the installation

Next, make the necessary directories for the Ruby application

sudo mkdir -p /usr/share/puppet/rack/puppetmasterd
sudo mkdir /usr/share/puppet/rack/puppetmasterd/public /usr/share/puppet/rack/puppetmasterd/tmp

Copy the application config file to the application directory and set the correct permissions

sudo cp /usr/share/puppet/ext/rack/files/config.ru /usr/share/puppet/rack/puppetmasterd/
sudo chown puppet:puppet /usr/share/puppet/rack/puppetmasterd/config.ru

Add the necessary SSL config for the ruby application to Apache. You can append this to the existing puppet.conf file you created earlier. Note that you need to update this file to specify the correct file names and paths for your Puppet certs (puppet.pem in the example below).The entire file should now look like below:

LoadModule passenger_module /usr/lib/ruby/gems/1.8/gems/passenger-5.0.10/buildout/apache2/mod_passenger.so
<IfModule mod_passenger.c>
  PassengerRoot /usr/lib/ruby/gems/1.8/gems/passenger-5.0.10
  PassengerDefaultRuby /usr/bin/ruby1.8
</IfModule>
# And the passenger performance tuning settings:
# Set this to about 1.5 times the number of CPU cores in your master:
PassengerMaxPoolSize 12
# Recycle master processes after they service 1000 requests
PassengerMaxRequests 1000
# Stop processes if they sit idle for 10 minutes
PassengerPoolIdleTime 600
Listen 8140
<VirtualHost *:8140>
    # Make Apache hand off HTTP requests to Puppet earlier, at the cost of
    # interfering with mod_proxy, mod_rewrite, etc. See note below.
    PassengerHighPerformance On
    SSLEngine On
    # Only allow high security cryptography. Alter if needed for compatibility.
    SSLProtocol ALL -SSLv2 -SSLv3
    SSLCipherSuite EDH+CAMELLIA:EDH+aRSA:EECDH+aRSA+AESGCM:EECDH+aRSA+SHA384:EECDH+aRSA+SHA256:EECDH:+CAMELLIA256:+AES256:+CAMELLIA128:+AES128:+SSLv3:!aNULL:!eNULL:!LOW:!3DES:!MD5:!EXP:!PSK:!DSS:!RC4:!SEED:!IDEA:!ECDSA:kEDH:CAMELLIA256-SHA:AES256-SHA:CAMELLIA128-SHA:AES128-SHA
    SSLHonorCipherOrder     on
    SSLCertificateFile      /var/lib/puppet/ssl/certs/puppet.pem
    SSLCertificateKeyFile   /var/lib/puppet/ssl/private_keys/puppet.pem
    SSLCertificateChainFile /var/lib/puppet/ssl/ca/ca_crt.pem
    SSLCACertificateFile    /var/lib/puppet/ssl/ca/ca_crt.pem
    SSLCARevocationFile     /var/lib/puppet/ssl/ca/ca_crl.pem
    #SSLCARevocationCheck   chain
    SSLVerifyClient         optional
    SSLVerifyDepth          1
    SSLOptions              +StdEnvVars +ExportCertData
    # Apache 2.4 introduces the SSLCARevocationCheck directive and sets it to none
    # which effectively disables CRL checking. If you are using Apache 2.4+ you must
    # specify 'SSLCARevocationCheck chain' to actually use the CRL.
    # These request headers are used to pass the client certificate
    # authentication information on to the puppet master process
    RequestHeader set X-SSL-Subject %{SSL_CLIENT_S_DN}e
    RequestHeader set X-Client-DN %{SSL_CLIENT_S_DN}e
    RequestHeader set X-Client-Verify %{SSL_CLIENT_VERIFY}e
    DocumentRoot /usr/share/puppet/rack/puppetmasterd/public
    <Directory /usr/share/puppet/rack/puppetmasterd/>
      Options None
      AllowOverride None
      # Apply the right behavior depending on Apache version.
      <IfVersion < 2.4>
        Order allow,deny
        Allow from all
      </IfVersion>
      <IfVersion >= 2.4>
        Require all granted
      </IfVersion>
    </Directory>
    ErrorLog /var/log/httpd/puppet-server.example.com_ssl_error.log
    CustomLog /var/log/httpd/puppet-server.example.com_ssl_access.log combined
</VirtualHost>

The ruby application is now ready. Install the puppet master application. Note, do NOT start the puppetmaster service or set it to start at boot.

sudo yum install -y puppet-server

Restart Apache and test using a new puppet agent. You can also import the the ssl assets from an existing puppet master into /var/lib/puppet/ssl. This will allow you existing puppet agents to continue to work.

Allowing puppet agents manage their own certificates

What?

Why would you want to allow a puppet agent manage the certificates the puppet master holds for that agent? Doesn’t that defeat the whole purpose of certificate based authentication in puppet?

Well, yes, it does, but there are situations in which this is useful, but only where security in not a concern!!

Enter Cloud Computing.

Servers in Cloud Computing environments are like fruit flies. There are millions of them all over the world being born and dying at any given time. In a an advanced Cloud configuration they can have lifespans of hours, if not minutes.

As puppet generally relies on fully qualified domain names to match agent requests to stored certificates, this can become a bit of a problem, as server instances that come and go in something like Amazon AWS can sometimes be required to have the same hostname at each launch.

Imagine the following scenario:

You are running automated performance testing, in which you want to test the amount of time if takes to re-stage an instance with a specific hostname and run some tests against it. Your script both launches the instance an expects the instance to contact a puppet master to obtain its application.

In this case, the first time the instance launches, the puppet agent will generate a client certificate signing request, send that to the master, get it signed and pull the necessary catalog. The puppet master will then have certificate for that agent.

Now, you terminate the instance and re-launch it. The agent presents another signing request, with the same hostname, but this time the puppet master refuses to play, telling you that it already has a certificate for that hostname, and the one you are presenting doesn’t match.

You’re snookered.

Or so you think. The puppet master has a REST api that is disabled by default but when you can open up to it receive HTTP requests to manage certificates. To enable the necessary feature, add the following to your auth.conf file

path /certificate_status
auth any
method find, save, destroy
allow *

Restart the puppet master when you’ve done this.


sudo service puppetmaster restart

Next, when you start you server instance, include the following script at boot. It doesn’t actually matter when this is run, provided it is run after the hostname of the instance has been set.


#!/bin/bash

curl -k -X PUT -H "Content-Type: text/pson" --data '{"desired_state":"revoked"}' https://puppet:8140/production/certificate_status/$HOSTNAME

curl -k -X DELETE -H "Accept: pson"  https://puppet:8140/production/certificate_status/$HOSTNAME

rm -Rf /var/lib/puppet/ssl/*

puppet agent -t

This will revoke and delete the agent certificate on the master, delete the agent’s copy of the certificate and renew the signing process, giving you new certs on the agent and master and allowing the catalog to be ingested into the agent.

You can also pass a script like this as part of the Amazon EC2 process of launching an instance.

aws ec2 run-instances  --user-data file://./pclean.sh

Where pclean.sh is the name of the locally saved script file, and it is saved in the same directory as your working directory (otherwise include the absolute path).

With this in place, each time you launch a new instance, regardless of its hostname, it will revoke any existing cert that has the same hostname, and generate a new one.

Obviously, if you are launching hundreds of instances at the same time, you may have concurrency issues, and some other solution will be required.

Again, this is only a solution for environments where security is not an issue.

Stagduction

Stagduction

(Noun) A web application state in which the the service provided is not monitored, not redundant and has not been performance tested, but which is in use by a large community of people as a result of poor planning, poor communication and over-zealous sales people.

Install Ruby for Rails on Amazon Linux

A quick HOWTO on how to install Ruby for Rails on Amazon Linux

Check your Ruby version (bundled in Amazon Linux)


ruby -v
ruby 2.0.0p481 (2014-05-08 revision 45883) [x86_64-linux]

Check your sqlite3 version (bundled with Amazon Linux)


sqlite3 --version
3.7.17 2013-05-20 00:56:22 118a3b35693b134d56ebd780123b7fd6f1497668

Check Rubygems version (bundled with Amazon Linux)


gem -v
2.0.14

Install Rails (this sticks on the command line for a while, be patient. The extra parameters exclude the documentation, which if installed, can melt the CPU on smaller instances whilst compiling)


sudo gem install rails --no-ri --no-rdoc

Check Rails installed


rails --version
Rails 4.1.6

Install gcc (always handy to have)


sudo yum install -y gcc

Install ruby and sqlite development packages


sudo yum install -y ruby-devel sqlite-devel

Install node.js (Rails wants a JS interpreter)

 sudo bash
curl -sL https://rpm.nodesource.com/setup | bash -
exit
sudo yum install -y nodejs

Install the sqlite3 and io-console gems


gem install sqlite3 io-console

Make a blank app


mkdir myapp
cd myapp
rails new .

Start it (in the background)


bin/rails s &

Hit it


wget -qO- http://localhost:3000

Debug (Rails console)


bin/rails c

Application monitoring with Nagios and Elasticsearch

As the applications under your control grow, both in number and complexity, it becomes increasingly difficult to rely on predicative monitoring.

Predicative monitoring is monitoring things that you know should be happening. For instance, you know your web server should be accepting HTTP connections on TCP port 80, so you use a monitor to test that HTTP connections are possible on TCP port 80.

In more complex applications, it harder to predict what may or may not go wrong; similarly, some things can’t be monitored in predictive way, because your monitoring system may not be able to emulate the process that you want to monitor.

For example, lets say your application sends Push message to a mobile phone application. To monitor this thoroughly, you would have to have a monitor that persistently sends Push messages to a mobile phone, and some way of monitoring that the mobile phone received them.

At this stage, you need to invert your monitoring system, so that it stops asking if things are OK, and instead listens for applications that are telling it that they are not OK.

Using your application logs files is one way to do this.

Well-written applications are generally quite vocal when it comes to being unwell, and will always describe an ERROR in their logs if something has gone wrong. What you need to do is find a way of linking your monitoring system to that message, so that it can alert you that something needs to be checked.

This doesn’t mean you can dispense with predictative monitoring altogether; what is does means is that you don’t need to rely on predicative monitoring entirely (or in other words, you don’t need to be able to see into the future) to keep your applications healthy.

This is how I’ve implemented log based monitoring. This was something of a nut to crack, as our logs arise from an array of technologies and adhere to very few standards in terms of layout, logging levels and storage locations.

The first thing you need is a logstash implementation. Logstash comprises a stack of technologies: an agent to ship logs out to a Redis server; a Redis server to queue logs for indexing; a logstash server for creating indices and storing them in elasticsearch; an elasticsearch server to search your indices.

The setup of this stack is beyond this article; its well-described over on the logstash website, and is reasonably straightforward.

Once you have your logstash stack set up, you can start querying the elasticsearch search api for results. Queries are based on HTTP POST and JSON, and results are output in JSON.

Therefore, to test you logs, you need to issue a HTTP POST query from Nagios, check the results for ERROR strings, and alert accordingly.

The easient way to have Nagios send a POST request with a JSON payload to elasticsearch is with the Nagios jmeter plugin, which allows you to create monitors based on your jmeter scripts.

All you need then is a correctly constructed JSON query to send to elasticsearch, which is where things get a bit trickier.

Without going into this in any great detail, formulating a well-constructed JSON query that will parse just the right log indices in elasticsearch isn’t easy. I cheated a little in this. I am familiar with the Apache Lucene syntax that the Logstash Javascript client, Kibana, uses, and was able to formulate my query based on this.

Kibana sends encrypted queries to elasticsearch, so you can’t pick them out of the HTTP POST/GET variables. Instead, I enabled logging of slow queries on elasticsearch (threshold set to 0s) so that I could see in the elasticsearch logs what exact queries were being run against elasticsearch. Here’s an example:


{
  "size": 100,
  "sort": {
    "@timestamp": {
      "order": "desc"
    }
  },
  "query": {
    "filtered": {
      "query": {
        "query_string": {
          "query": "NOT @source_host:\"uatserver\"",
          "default_field": "_all",
          "default_operator": "OR"
        }
      },
      "filter": {
        "range": {
          "@timestamp": {
            "from": "2014-10-06T11:05:25+00:00",
            "to": "2014-10-06T12:05:25+00:00"
          }
        }
      }
    }
  },
  "from": 0
}

You can test a query like this by sending it straight to your elasticsearch API:


curl -XPOST 'http://localhost:9200/_search' -d '{"size":100,"sort":{"@timestamp":{"order":"desc"}},"query":{"filtered":{"query":{"query_string":{"query":"NOT @source_host:\"uatserver\"","default_field":"_all","default_operator":"OR"}},"filter":{"range":{"@timestamp":{"from":"2014-10-06T11:05:25+00:00","to":"2014-10-06T12:05:25+00:00"}}}}},"from":0}'

This searches a batch of 100 log entries that do not have a tag of “uatserver”, from a previous 5 minute period.

Now that we now what we want to send to elasticsearch, we can construct a simple jmeter script. In this this, we simply specify a a HTTP POST request, containing Body Data of the JSON given above, and include a Response Assertion for the strings we do not want to see in the logs.

We can then use that script in Nagios with the jquery plugin. If the script finds the ERROR string in the logs, it will generate an alert.

2 things are important here:

The alert will only tell you that an error has appeared in the logs, not what that error was; and if the error isn’t persistent, the monitor will eventually recover.

Clearly, there is a lot of scope for false negatives in this, so if your logs are full of tolerable errors (they shouldn’t be really) you are going to have to be more specific about your search strings.

The good news is that if you get this all working, its very easy to create new monitors. Rather than writing bespoke scripts and working with Nagios plugins, all you need to do is change the queries and the Response Assertions in your jmeter script, and you should be able to monitor anything that is referenced in your application logs.

To assist in some small way, here is a link to a pre-baked JMeter script that includes an Apache Lucene query, and is also set up with the necessary Javascript-based date variables to search over the previous 15 minutes.

Negative matching on multiple ip addresses in SSH

In sshd_config, you can use the

Match

directive to apply different configuration parameters to ssh connections depending on their characteristics.

In particular, you can match on ip address, both positively and negatively.

You can specify multiple conditions in the match statement. All conditions must be matched before the match configuration is applied.

To negatively match an ip address, that is, to apply configuration if the connection is not from a particular ip address, use the following syntax

Match Address *,!62.29.1.162/32
ForceCommand /sbin/sample_script

To negatively match more than one ip address, that is, to apply configuration if the connection is not from one of more ip addresses, use the following syntax

Match Address *,!62.29.1.162/32,!54.134.118.96/32
ForceCommand /sbin/sample_script

Is Skype an appropriate tool in corporate environments?

This is a question that has plagued me for several years, in that I have never been able to establish a consistent level of Skype quality in a corporate environment, despite having lots of bandwidth and obtained the consultancy services of CCIE level network experts.

The answer to the question is ultimately, no.

Let me explain by running through the questions.

1. How does Skype work at a network level?

Skype is a “Peer To Peer” (P2P) application. That means that when 2 people are having a Skype conversation, their computers *should* be directly connected, rather than connected via a 3rd computer. For the sake of comparison, Google Hangouts is not a P2P application. Google Hangout participants connect to each other via Google Conference Servers.

2. Does Skype work with UDP or TCP?

Skype’s preference is for UDP, and when Skype can establish a direct P2P connection using UDP, which is typically the case for residential users, call quality is very good. This is because UDP is a much faster protocol than TCP when used for streaming audio and video.

3. What’s the difference between residential and corporate users?

Residential internet connections are typically allocated a temporary fixed public ip address. This IP gets registered to a Skype user on Skype’s servers, so when someone needs to contact that user, Skype knows where to direct the call, and can use UDP to establish a call between the participating users.

In corporate environments, where there are lots of users using the same internet connection, sharing of a a single public IP address between those users has to occur (Port Address Translation). That means that the Skype servers will have registered the same public ip address for all the users in that organisation. This means that Skype is not able to establish a direct UDP P2P connection between a user on the outside of that organisation and a user in that organisation, and has to use other means to make that connection.

4. What are those other means?

When direct connectivity between clients is not possible, Skype uses a process called “UDP hole punching”. In this mechanism, 2 computers that cannot communicate directly with each other communicate with one or more third party computers that can communicate with both computers.

Connection information is passed between the computers in order to try and establish a direct connection between the 2 computers participating in the Skype call.

If ultimately a direct connection cannot be established, Skype will use the intermediary computers to relay the connection between the 2 computers participating in the conversation.

In Skype terminology, these are known as “relay nodes”, which are basically just computers running Skype than have direct UDP P2P capability (typically residential users with good broadband speeds).

From the Skype Administrators Manual:

http://download.skype.com/share/business/guides/skype-it-administrators-guide.pdf

2.2.4 Relays

If a Skype client can’t communicate directly with another client, it will find the appropriate relays for the connection and call traffic. The nodes will then try connecting directly to the relays. They distribute media and signalling information between multiple relays for fault tolerance purposes. The relay nodes forward traffic between the ordinary nodes. Skype communication (IM, voice, video, file transfer) maintains its encryption end-to-end between the two nodes, even with relay nodes inserted.

As with supernodes, most business users are rarely relays, as relays must be reachable directly from the internet. Skype software minimizes disruption to the relay node’s performance by limiting the amount of bandwidth transferred per relay session. 

5. Does that mean that corporate Skype traffic is being relayed via anonymous third party computers?

Yes. The traffic is encrypted, but it is still relayed through other unknown hosts if a direct connection between 2 Skype users is not possible.

6. Is this why performance in corporate environments is sometimes not good?

Yes. If a Skype conversation is dependent on one of more relay nodes, and one of these nodes experiences congestion, this will impact on the quality of the call.

7. Surely, there is some solution to this?

A corporate network can deploy a proxy server, which is directly mapped to a dedicated public ip address. Ideally, this should be a UDP-enabled SOCKS5 server, but a TCP HTTP Proxy server can also be used. If all Skype connections are relayed through this server, Skype does not have to use relay nodes, as Port Address Translation is not in use.

8. So what’s the catch?

The problem with this solution is that it is not generally possible to force the Skype client to use a Proxy Server. When the client is configured to use a Proxy Server, it will only use it if there is no other way to connect to the Internet. So, if you have a direct Internet connection, even one based on Port Address Translation, which impacts on Skype quality, Skype will continue to use this, even if a better solution is available via a Proxy Server.

9. Why would Skype do this?

Skype is owned by Microsoft. Skype have a business product that attaches to Microsoft Active Directory that allows you do force a Proxy connection. So if you invest in a Microsoft network, Microsoft will give you a solution to enable better Skype performance in corporate networks. If you don’t want to invest in a Microsoft network, you’re stuck, and your only option is to block all outbound Internet access from your network and divert it via your Proxy server.

For a lot of companies, particularly software development companies who depend on 3rd party web services, this is not a practical option.

10. What is the solution?

At this time the primary options for desktop Audio/Video conferencing are either Skype or Google Hangouts.

When Skype can be used in an environment where P2P UDP connectivity is “always on”, it provides a superior audio/video experience to Google Hangouts, which is not P2P, and which communicates via central Google Servers.

Where an environment uses Port Address Translation, Skype performance will depend on the ability of Skype client to establish connections via relays, which means Skype performance becomes dependent on the resources available to those relays.

In this instance, Google Hangout may be a better choice where consistent quality is required, as quality can be guaranteed by providing sufficient bandwidth between the corporate network and Google.

 

How to use DJ Bernstein’s daemontools

When I first started working in IT, one of the first projects I had to undertake was to set up a QMail server, which first brought me into contact with DJ Bernstein and his various software components.

One of these was daemontools, which is a “a collection of tools for managing UNIX services”, and which is most frequently used in connection with Qmail.

The deamontools website is from another time. Flat HTML files, no CSS, horizontal rules…its like visiting some sort of online museum. In fact, the website hasn’t changed in over 20 years, and daemontools has been around for that long, and hasn’t changed much in the interim.

The reason for daemontools longevity is quite simple. It works. And it works every time, all the time, which isn’t something you can say about every software product.

So if you need to run a process on a UNIX/Linux server, and that process needs to stay up for a very long time, without interruption, there probably isn’t any other software than can offer the same reliability as daemontools.

Here’s a quick HOWTO:

Firstly, install it, exactly as described here:

http://cr.yp.to/daemontools/install.html

If you can an error during the installation about a TLS reference, edit the file src/conf-cc, and add

-include /usr/include/errno.h

to the gcc line.

Once installed, check:

1. That you have a /service directory
2. That the command /command/svscanboot exists

If this is the case, daemontools is successfully installed

Now, you can create the process/service that you want daemontools to monitor.

Create a directory under /service, with a name appropriate to your service, eg

/service/growfile

(you can also use a symbolic link for this directory, to point to an existing service installation)

In that directory, create a file called run, and give it 755 permission


touch /service/growfile/run
chmod 755 /service/growfile/run

Next, update the run file with the shell commands necessary to run your service


#!/bin/sh

while :
do
echo “I am getting bigger…” > /tmp/bigfile.txt
sleep 1
done

Your service is now set up. To have daemontools monitor it, run the following command:


/command/svscan &

(To start this at boot, add /command/svscanboot to /etc/rc.local, if the install hasn’t done this already)

To see this in action, run ps -ef and have a look at your process list. You will see

1. A process called svsscan, which is scanning the /service directory for new processes to monitor
2. A process called “supervise growfile”, which is keeping the job writing to the file alive

Also, run


tail -f /tmp/bigfile.txt

Every 1 second, you should see a new line being appended to this file:


I am getting bigger...
I am getting bigger...
I am getting bigger...
I am getting bigger...

To test deamontools, delete /tmp/bigfile.txt


rm -f /tmp/bigfile.txt

It should be gone, right?

No! Its still there!


tail -f /tmp/bigfile.txt


I am getting bigger...
I am getting bigger...
I am getting bigger...
I am getting bigger...

Finally, if you want to actually kill your process, you can use the “svc” command supplied with daemontools:

svc -h /service/yourdaemon: sends HUP
svc -t /service/yourdaemon: sends TERM, and automatically restarts the daemon after it dies
svc -d /service/yourdaemon: sends TERM, and leaves the service down
svc -u /service/yourdaemon: brings the service back up
svc -o /service/yourdaemon: runs the service once

This is the basic functionality of daemontools. There is a lot more on the website.

Managing Logstash with the Redis Client

Users of Logstash will be familiar with the stack of technologies required to implement a logstash solution:

The client that ships the logs to Redis

Redis which queues up the files for indexing

Logstash which creates the indices

Elasticsearch which stores the indices

Kibana which queries Elasticsearch

When you’re dealing with multiple components like this, things will inevitably for wrong.

For instance, say for some reason you client stops, and then you start it again 4 days later, and now the stack has to process 4 days of old log files before letting you search the latest ones.

One of the best ways to deal with this is to setup the Redis queue (“list” is the correct term) so that you can selectively remove entries from the list, so that chunks of old logs can be skipped.

Take a look at this config from the logstash shipper:


output {
  stdout { debug => false debug_format => "json"}
  redis { host => "172.32.1.172" data_type => "channel" key => "logstash-%{@type}-%{+yyyy.MM.dd.HH}" }
}

You’ll see here that I’ve modified the default key value for logstash, by adding the log file type and date stamp to the key. The default key value in the Logstash documentation is “logstash’, which means every entry goes into Redis with the same key value.

You will also notice that I have changed the data_type from the default “list” to “channel’, more of which in a moment.

To see what this means, you should now login to your Redis server with the standard redis-cli command line interface

To list all available keys, just type


KEYS *logstash*

and you will get something like


redis 127.0.0.1:6379> keys *logstash*
 1) "logstash-nodelog-2014.03.07.17"
 2) "logstash-javalog-2014.03.07.15"
 3) "logstash-applog-2014.03.07.14"
 4) "logstash-catalina-2014.03.08.23"
 5) "logstash-applog-2014.03.08.23"
 6) "logstash-catalina-2014.03.07.15"
 7) "logstash-nodelog-2014.03.07.14"
 8) "logstash-javalog-2014.03.07.14"
 9) "logstash-nodelog-2014.03.08.23"
10) "logstash-applog-2014.03.07.15"
11) "logstash-javalog-2014.03.08.23"

This shows that your log data are now stored in Redis according to log file type, and data and hour, rather than all just under the default “logstash” key. In other words, there are now multiple keys, rather than just the “logstash” key which is the default.

You also need to change the indexer configuration at this point, so that it looks for multiple keys in Redis rather than just the “logstash” key


input {
  redis {
    host => "127.0.0.1"
    type => "redis-input"
    # these settings should match the output of the agent
    data_type => "pattern_channel"
    key => "logstash*"

    # We use json_event here since the sender is a logstash agent
    format => "json_event"
  }
}

For data_type here, I am using “pattern_channel”, which means the indexer will ingest the data from any key where the key matches the pattern “logstash*”.

If you don’t change this, and you have changed your shipper, none of your data will get to Elasticsearch.

Using Redis in this way also requires a change to the default Redis configuration. When Logstash keys are stored in Redis in a List format, the List is constantly popped by the Logstash indexer, so it remains in a steady state in terms of memory usage.

When the Logstash Indexer pull data from a Redis channel, the data isn’t removed from Redis, and therefore grows.

To deal with this, you need to set up memory management in Redis, namely:

maxmemory 500mb
maxmemory-policy allkeys-lru

What this means is that when Redis reaches a limit of 500mb of used memory, it will drop keys according to a “Least Recently Used” algorithm. The default algorithm is volatile-lru, which is dependent on the TTL value of the key, but as Logstash doesn’t set the TTL on Redis keys, which need to use the allkeys-lru alternatively instead.

Now, if you want to remove a particular log file type from a particular date and time from the Logstash process, you can simply delete that data from Redis


DEL logstash-javalog-2014.03.08.23

You can also check the length of individual lists by using LLEN, to give you an idea of which logs from which dates and times will take the longest to process


redis 127.0.0.1:6379> llen logstash-javalog-2014.03.08.23
(integer) 385460

You can also check you memory consumption in Redis with:

redis 127.0.0.1:6379>info

Command line tool for checking status of instances in Amazon EC2

I manage between 10 and 15 different Amazon AWS accounts for different companies.

When I needed to find out information about a particular instance, it was a pain to have log into the web interface each time. Amazon do provide an API that allows you query data about instances, but to use that, you need to store an Access Key and Secret on your local computer, which isn’t very safe when you’re dealing with multiple account.

To overcome, this I patched together Tim Kay’s excellent aws tool with GPG and a little PHP, to create a tool which allows you query the status of all instances in a specific region in an Amazon EC2 account, using access credentials that are locally encrypted, so that storing them locally isn’t an issue.

Output from the tool is presented on a line by line basis, so you can use grep to filter the results.

Sample output:

ec2sitrep.sh aws.account1 us-east-1

"logs-use"  running  m1.medium  us-east-1a  i-b344b7cb  172.32.1.172  59.34.113.133
"adb2-d-use"  running  m1.small  us-east-1d  i-07d3e963  172.32.3.54  67.45.139.235
"pms-a-use"  running  m1.medium  us-east-1a  i-90852ced  172.32.1.27  67.45.108.146
"s2-sc2-d-use"  running  m1.medium  us-east-1d  i-3d40b442  172.32.3.26  67.45.175.244
"ks2-sc3-d-use"  running  m1.small  us-east-1d  i-ed2ed492  172.32.3.184  67.45.163.141
"ks1-sc3-c-use"  running  m1.small  us-east-1c  i-6efb9612  172.32.2.195  67.45.159.221
"adb1-c-use"  running  m1.small  us-east-1c  i-98cf44e4  172.32.2.221  67.45.139.196
"s1-sc1-c-use"  running  m1.medium  us-east-1c  i-956a76e8  172.32.2.96  67.45.36.97
"sms2-d-use"  running  m1.medium  us-east-1d  i-a86ef686  172.32.3.102  34.90.28.159
"uatpms-a-use"  running  m1.small  us-east-1a  i-b8cf5399  172.32.1.25  34.90.163.110
"uatks1-sc3-c-use"  running  t1.micro  us-east-1c  i-de336dfe  172.32.2.26  34.90.99.226
"uats1-sc1-c"  running  m1.medium  us-east-1c  i-35396715  172.32.2.217  34.90.183.23
"uatadb1-c-use"  running  t1.micro  us-east-1c  i-4d316f6d  172.32.2.29  34.90.109.171
"sms1-c-use"  running  m1.medium  us-east-1c  i-31b29611  172.32.2.163  34.90.100.25

(Note that public ips have been changed in this example)

You can obtain the tool from Bitbucket:

https://bitbucket.org/garrethmcdaid/amazon-ec2-sitrep/