Nagios is by far one of the best solutions for monitoring just about everything on a server, and it’s excellent API system means that anything it doesn’t include out of the box can be written in just about any programming language as long as the program output conforms to their standard. I’ve personally written dozens of modules for micro-managing network interfaces, disk IO and so on. I’ve even heard of elaborate schemes of detecting when system load is too high on web servers and launching more Amazon EC2 instances, or checking when load is low enough to terminate EC2 instances, all fully automated.
That said, next to Decaf, Nagios monitoring tools were a strong requirement for my Android arsenal. Waiting for Emails or SMS messages to arrive is one thing, but being able to actively manage system alerts from Nagios is entirely different. In today’s segment, I’ll be looking at two different systems, Nagroid and NagMonDroid. Searching for “nagios” in the Market on a Nexus One with Froyo installed only came up with these two applications. Under Eclair (2.1), I recall seeing several other apps, some of them were commercial apps for purchase.
To begin, I will introduce and describe the two apps, and then cause one of my employer’s systems to fall into an alert stage so we can see what happens in a real-world scenario. Again, thanks to my employer for being trusting about this sort of thing.
At the time of this writing, Nagroid, by Simon Schoar, had 4.5 stars, and describes itself as such:
Nagroid is an unofficial nagios client for android devices.
This utility will help system administrators who are using nagios to monitor their servers and services.
The latest version is 0.0.7 and weighs in at 151Kb. It’s in the 1000-5000 download range, and its 4.5 stars are from 140 ratings, including comments such as ‘Anthony’ saying Nagroid is “much better than having nagios SMS me!”, and helpful feedback such as ‘rince’ saying “Would love the ability to mark a problem as seen so that I stop being bugged once I’ve acknowledged the issue/”. The developer web page, something I haven’t done on previous articles but will do so in all upcoming segments, is http://code.google.com/nagroid/ and its security requirements only ask for Internet access, and prevent the phone from sleeping; secondary security controls ask that it can view the network state of the phone, control the hardware vibration module, and automatically start at boot time.
Upon starting, some fake Nagios systems are entered into your device’s memory so that you can see how the system details will be displayed (image 1). As you can see, the output is quite simplistic. The software also adds an icon to the notification bar on my device. Pulling down the notification area shows us a simple alert (image 2). Tapping on any of the alerts will briefly highlight the alert but doesn’t do anything; likewise, a long tap does nothing either.
Tapping on your Submenu button will show you some controls (image 3), such as refreshing the list, seeing information about the Nagroid app from an ‘about’ dialog, viewing a log of events, a More button which will allow you to view Help pages at the author’s web site, or Enable/Disable the service when starting up your Android device, linking to the configuration screens, or being taken to “My Nagios” which is a URL you enter in the configuration screens for your primary Nagios monitoring server (images 4, 5 and 6). The submenu in the configuration screen is similar to that on the opening screen of the app, but slightly rearranged (image 7).
There are 4 configuration sections: Nagios, Polling, Notification, and Misc.
The Nagios section shows 5 settings here, such as the URL for your main Nagios monitoring server. This is the URL will you redirect to any time you select “My Nagios” from any submenu while in the app. The second option is for self-signed SSL certificates, so security warnings can be bypassed (it warns you that man-in-the-middle attacks are possible if enabled). Third is a setting for basic HTTP Authentication, in case you have it enabled on your main Nagios monitoring server, at which point options 4 and 5 (username and password, respectively) are enabled. Once I entered my URL and other credentials, doing a refresh on the main screen shows me nothing at all (image 8), proving the old cliché of “no news is good news.” It’s also interesting to note that the Notification bar icon has changed – when critical events were seen from the test data, the android icon had red arms; when everything is okay, the icon has green arms. Very clever, and details like this really show the app developer has some ingenuity behind his design process.
The Polling section of the configuration only has one option, which is how often you want your Android device to check your Nagios server for event data, with times of “off”, 1 to 5 minutes, 10, 15, 30, 45 and 60 minute intervals, with 10 minutes being the default. I set mine to one minute for some testing which we’ll get to later.
The Notification section has some interesting options worth talking about. The first checkbox tells Nagroid to only show you events that have not already been marked as ‘handled’ within Nagios itself. Be default, this is already checked for you, however if you have subordinates who do primary service monitoring, you may want to turn this off so you can keep an eye on everything about your system, not just events that haven’t been handled by other staff members. The second checkbox will enable the hardware vibration module within your device, and is on by default. The third option, off by default, will hide the icon on the notification bar if everything is okay; I enable this to keep with the “no news is good news” saying, although having the icon present can assure you the service is actually running. The fourth checkbox will enable audible alarms, and the rest of the options in the Notification section allow you to set appropriate sounds/music.
Finally, the Miscellaneous section has two checkboxes which are both checked by default: one for auto-starting the monitoring service when your Android device restarts, and one to actively check for updates to the software. The latter of the two can be de-selected if you use Android 2.2 (Froyo) and flag the app to auto-update when a new version is available.
Real World Setup and Usage
After entering my URL and login credentials, I refreshed the home screen, and selected “Log” from the submenu, which shows a simple list of information. Since I haven’t begun any Nagios events yet, everything in image 9 looks pretty normal: one minute intervals of showing everything is Ok. There is a “Clear” button at the top to clear any log entries.
Selecting “My Nagios” from any submenu opens the Android browser and redirects me to a sample Nagios monitoring server I configured for this review, and shows me that my one host is online, and 11 services on the server are operational.
We’ll Be Back in a Moment
I’ll introduce NagMonDroid first, and then step into some of my testing to show real usage of these two apps.
NagMonDroid was developed by Simon McLaughlin, and currently is rated 2.5 stars by 28 ratings. It too has had 1000-5000 downloads, and describes itself in the Market like this:
Now open source!
NagMonDroid (formally NagiosMonitor) retrieves the current problems from your Nagios install and displays them. It has a variable update frequency and can be set to vibrate on new update.
The latest version is 1.5.3 and weighs in at 146Kb. The developer web page is http://code.google.com/p/nagmondroid/ and most comments are from pre-2010. The only three comments from 2010 include ‘lowecg2004’ in January saying “not bad but really needs to alert on status change rather than on update. Would also like to see a different icon and sounds for ok/warn/critical.” The other two comments from March 2010 from ‘Matt’ and ‘Piotr’ say the app does not work, or is slow, or that it doesn’t work with SSL-enabled Nagios servers. Several comments ask the developer to make the software open source, which he appears to have done. The security settings on the app installation ask for Internet access, and to control the hardware vibration in the device.
Upon starting, an alert box is immediately shown, telling me to set the URL of my Nagios monitoring server in the Settings menu (image 11). No offense to the developer, but that’s sort of a given, isn’t it? Once that alert is cleared, we can see a blank screen showing that background checking is disabled by default (image 12), which makes sense when the author hasn’t pre-configured any test data. The submenu on this home screen (image 13) shows icons for Settings, an About button (which simply shows a dialog box showing the author’s name, the author’s personal web site (not the application’s web site as listed in the Market), and the current version number), and buttons to start and stop what I can only guess is the background service to check my Nagios server for information.
The settings menu is shown in images 14 and 15, though the second image only shows one missing setting from the first page.
The settings menu is split into two sections called Settings and Preferences.
The Settings section has an area to enter the URL of your Nagios monitoring server, a username and password to log into Nagios (if your server is configured as such), and an update interval with settings of 10, 30, 60 and 90 seconds, 3, 5, 10, 30, 60, 120 minutes, and 3, 6, 12, and 24 hours. Frankly, any sysadmin who would use an app to monitor Nagios instead of getting Email/SMS alerts would be foolish to set anything higher than 60 minutes, unless their systems are so low-volume as to not matter. However, the fact that the author gives sub-minute checking could be extremely vital to sysadmins who need near real-time information from their systems, though the resources and network latency of a 10-second monitoring cycle could be expensive in battery life and bandwidth. It’s also important to note that since the author doesn’t include options for basic HTTP authentication,
The Preferences section includes three checkboxes. The first is to display status messages for services which are not in Warning or Critical modes, and is off by default. The second checkbox, enabled by default, allows you to hide any services that are configured within Nagios to disable notifications. The third option, off by default, would enable the hardware vibration module. Since the author didn’t see fit to include options for audible alerts, this last option should be enabled unless you stare at your Android device 24x7.
Real World Setup and Usage
I entered my Nagios URL, and typed in my basic HTTP authentication details in the Nagios username/password fields, enabled all three of the checkboxes from the Preferences area of the settings, then used the Back button to return to the home screen, pressed the Submenu button, and selected Start. It claims to be connected to my system, at which point I see a new home screen (image 16). However, since the one checkbox under Settings->Preferences is supposed to display all services, even those with notifications turned off within Nagios, the fact that I see a blank screen causes me to believe the app does not work as intended.
I stopped the monitoring service, went back to the Settings menu, erased the Nagios username/password credentials, and changed my URL to use the horribly insecure http://username:email@example.com/ style of authentication, and restarted the monitoring service, and saw no change.
At this point, I cannot recommend NagMonDroid for any further testing. Granted, the author has made his software open-source, so perhaps someone could pick it up and improve the application. My advice for Nagroid would be to check this author’s code for the setup involved with enabling notifications for services set within Nagios with no notifications, as this is a handy means to see that everything is working as it should. Also, sub-minute alert checking would be quite handy as well, though cautioning the user about battery drain and bandwidth would be a wise move.
Real-World Testing with Nagroid
On a separate system at work, I installed a basic Nagios setup with native services for watching disk space, system load, as well as services to the outside world such as Web, Email and SSH access, as well as a few custom modules I wrote to monitor network interface traffic and disk I/O. To show how Nagroid can alert us, I will begin to fill up a local disk partition with dummy data using a simple Linux command:
dd if=/dev/zero of=/tmp/empty.txt count=6M ibs=1K
This will essentially attempt to create a 6GB file of empty data (6 million blocks of 1Kb each = 6GB). The disk partition which mounts the /tmp/ file system is an 8GB partition, and currently has about 0.5GB of data on it, so adding 6GB more, totaling 6.5GB should cross the 80% “Warning” boundary of Nagios.
While this was going on, Nagios actually tripped on a new Critical alert: system load. Unfortunately, nothing showed up in the Nagroid Log screen about it (images 17 and 18), and the red line doesn’t give me any information at all about why a service has gone critical or when, which seems like a serious problem with the reporting of Nagroid. Being able to tap on a line of information should give information about why the service is Warning/Critical, and when it was first reported, when it was last checked.
Once our file load testing was complete, we can see it also shows up in our list as a Warning level (image 19), but again, nothing shows up in the Log screen (image 20), leading me to believe the Log screen must only show status about Nagroid connecting to our Nagios server.
Deleting the files and leaving the system alone for a few minutes removed all alerts from Nagroid, and everything went green.
Well, it always seems a little awkward when there’s only one app in the Market to do something you deem vital to your daily routine, especially if that app doesn’t do everything you want/need/expect. In the case of Nagroid, while I love the interface, seems that it’s not even equivalent to the Email or SMS alerts that can be sent by Nagios, which would include information about WHY a service is in a Warning or Critical state. This kind of information is vital, especially knowing WHEN something has happened. This is a major negative on the part of the system; if it can retrieve status information from Nagios, it should be able to retrieve why. Aside from this lack of information, the only thing on my wish list for the app would be the ability to submit information back to Nagios, such as acknowledging an alert or turning a service (or notifications for a service) on or off. Still, it does log us into our Nagios panel from the configuration of Nagroid itself, and anything Nagroid can’t do can be done well enough within the Android web browser, and this extra step to get to Nagios can provide far more information than an Email or SMS message. I would whole-heartedly recommend Nagroid as a must-have application for any sysadmin who uses Nagios to keep an eye on their systems. Now if only someone could make a reasonable/reliable app (Android or otherwise) to *configure* Nagios …
As for NagMonDroid, I’m disappointed that it couldn’t even connect to Nagios, let alone give us any basis of comparison against Nagroid. Since the application is now open sourced, perhaps some of us within the Android community can pick up the project and improve upon it.