Nagios reporting¶
What is Nagios reporting?¶
Nagios is a common system administration tool for doing system health checks. It works by a central node continually asking questions about “various parts”. These parts can be scripts. The scripts have a simple protocol that they have to adhere to; it’s the exit code these scripts exit on.
- 0 - everything is fine
- 1 - warning (don’t get out of bed)
- 2 - critical (things are on fire!)
The script also has an opportunity to emit a message. It does this by emitting
a single line on stdout
followed by a newline. The convention is to
prefix the message according to the exit code. For example:
$ ./is-everything-ok.sh
OK - Everything is fine!
$ echo $?
0
$ ./is-everything-ok.sh
WARNING - This could get very bad!
$ echo $?
1
$ ./is-everything-ok.sh
CRITICAL - Call the fire department!
$ echo $?
2
How crontabber
can be a Nagios script¶
This is very simple. You simply use the --nagios
parameter. Like this:
crontabber --admin.conf=crontabber.ini --nagios
The rules for which exit code to exit on are fairly simple. However, you need to understand a bit more about Backfillable Jobs.
If no application in your configuration has errored in the last run
the exit code is simply 0
(“OK”).
If any of your applications that is not a backfillable job has errored
the exit code is 2
(“CRITICAL”).
Suppose you have a backfillable job and it has only errored once, then the
exit code is 1
(“WARNING”).
Suppose you get a 1
or a 2
then the message that is printed on
stdout
will look like this for example:
CRITICAL - my-first-app (MyFirstApp) | <type 'exceptions.OSError'> | [Errno 13] Permission denied: '/etc'
If you have multiple apps that have failed, the messages (like the example
above) will be concatenated with a ;
character so it’s all one long line.