Monitoring whether or not a Docker container is alive on a remote host should be fairly easy, right?
The standard approach in this is to include a suitable NRPE script on the remote host, and call that remotely from your Nagios server via the NRPE TCP daemon on the remote host. This script is a good example of same, and we’ll refer to it in the rest of the article.
This generally works fine when you’re doing innocuous things like checking free disk space or if a certain process is running. But if you’re securing a Docker container, you have to know that it requires an all-inclusive approach, securing everywhere from the host to the network and everything in between. Because of their moving parts, ensuring the security of containers is difficult without help from cyber consulting firms. Checking a Docker container is also a little bit harder, because the command:
can only be run as root, whereas the NRPE service on the remote host runs as a non-privileged user (usually called nagios).
As such, when you test your NRPE call from the Nagios server, like so:
/usr/lib64/nagios/plugins/check_nrpe -H dockerhost.yourdomain.com -c check_docker_container1
Your will see a response like:
NRPE: Unable to read output
UNKNOWN - container1 does not exist.
You get this response because the nagios user cannot execute the docker control command.
Your could get around this by running NRPE on the remote host as the root user, but that really isn’t a good idea, and you should never do this.
A better play (if you are confident that your Nagios set up is secure) is to extend controlled privileged to the nagios user via sudo. You can create the following file in /etc/sudoers.d/docker to achieve this:
nagios ALL=(ALL:ALL) NOPASSWD: /usr/bin/docker inspect * nagios ALL=(ALL:ALL) NOPASSWD: /usr/lib64/nagios/plugins/check-docker-container.sh *
This allows the nagios user to run both the wrapper script around the docker inspect command and the docker control command itself, without requiring a password. Note, only inspect permission is granted. Obviously, we don’t want to give nagios permission to actually manipulate containers.
In addition to this, we must make provision for NRPE to run the command using sudo when called via the NRPE TCP daemon. So, in nrpe.cfg, instead of:
command[check_docker_container1]=sudo /usr/lib64/nagios/plugins/check-docker-container.sh container1