Any experience with either monitoring system?

UnknownSky · Oct 1, 2013

Hello everyone!

I have come to you today to request some information.

Currently the company I am working for is looking for a systems monitoring product.
We have checked into two specific products and I would like for those with experience with the software to post their views/opinions.
We would be monitoring majority network devices and a small amount of servers/applications.
Those who have not worked with either product before are also welcome to post thoughts/ideas.
Nagios
http://www.nagios.org
Opsview
http://www.opsview.com

Basic questions are as follows if you have experience with either product (by the way, I already know that Opsview is built off of Nagios, so we can knix those posts right now).
Dependability?
High/low maintance?
Ease of use?
Proven its cost?
Overall satisfaction?

If this is the wrong place to post this topic, feel free to correct me ^_______^

I hope everyone has a great week and thank you ahead of time for the reference/information.

DelJo63 · Oct 2, 2013

Well now; feels like a class reunion. I worked on a product like this (CCC) which was deployed across heterogeneous systems (ie mainframes, unixs and PCs). I'll share some things for you to consider.

Proven its costs (ie return on investment) is great question, but very difficult to evaluate. Suggest you track how many persons are need to install, configure and maintain.

My experience was that the install was arduous in that each monitored node (system) needed physical contact to install and configure the agents (software gathering information and reporting to the central site monitor).

These products frequently move the customer's focus to SLA (service level agreements) when he may only want some early warning on specifics.

Make sure you understand the operational issues that you have and how they interact with systems monitoring. We fought with an issue for some time where the agent was failing to report every night just about midnight. It turned out that the Unix server was put into single user mode (thus killing all running programs) to allow backups to be taken. Clearly, there are cases of Planned Outages where the agent will not be able to report and send a false alert. The monitoring site must be able to know such operational issues

In such cases, you also need to see that the agent(s) get automatically restarted at boot time - - yea it's obvious, but frequently overlooked :sigh:

Beware of over implementation. You may need only to monitor a few pieces of data but the product is so rich you start sampling far more than necessary. Our product was able to sample even paging rates, which has nothing to due with available/sla but only one atom of system performance.

Suggest that PING is not a good measure of a system's availability. The NIC can reply when infact the system is stalled or swamped with too much work. The availability of a system (particularly a server) is proven by its ability to receive a request, process it and to reply -- ie programs are actually running. SNMP (see wiki) is very useful for devices (routers and printers). For monitored systems (like servers), the SNMP mib hrSystemProcesses 1.3.6.1.2.1.25.1.6 will report the current number of processes currently running on that host. The beauty of SNMP is it requires NO install and only needs to be configured (btw: don't use the Public community string and attempt to use only SNMPv3).

Make sure that when an agent creates an alert (any and all types of sampling techniques), it should automatically reset the monitor status when the alert condition clears itself, eg a printer will alert when out of paper and when reloaded will report the condition cleared - - make sure the monitoring system sees and responds to both conditions.

Some servers may need to be monitored. While the concept is easy, the implementation becomes user customized (specific to your company and the server itself). The concept is to have the server respond to a request, not just that ping per the above. If SNMP is available fine, but it's better to see the server respond with its own protocol, eg HTTP responding to a cgi request. Than way you are assured that the server is performing normally. Other servers to consider are DNS, email (grr, Exchange), Ldap (Active Directory). Btw: you don't need to sample these kinds of server every 30 second - - picking an interval of a few minutes is sufficient.

Configurations: the software should be configurable from the monitoring station and by all means, make sure you can export & import the settings. Rebuilding your monitoring triggers from scratch is intolerable and will quickly lead to your dissatisfaction.

DelJo63 · Oct 2, 2013

Here's a simplistic httpBeacon implemented in php.

save the file to your webserver /cgi-bin directory and rename it to strip off the trailing .txt:
httpBeacon.php.txt becomes httpBeacon.php

A monitoring system can then be configured to access

http : // domain.name/cgi-bin/httpBeacon.php

and expect a one line string response like:

WEBSERVER:hostname IP:192.168.0.5

which is tag:value pairs separate by a tab (\t)

DelJo63 · Oct 3, 2013

A brute force php shell monitor would be invoked

php httpBeaconTest.php

and have the pseudo code like

Code:

<?php
// open socket to webserver
//   if error on open
//	  say so, wait xx seconds, redo open
//			notice: could be configuration error or firewall issue
 
// setup Beacon test loop
//	issue http/1.0 GET /cgi-bib/httpBeacon.php
//	read reply
//	if timeout or read error
//	   report err
//	   iterate from the top to retest
//	if good reply, report status
//	initiate wait interval
//	iterate on polling loop
// close the socket
1;
?>

St1ckM4n · Oct 3, 2013

Best solution - implement a Microsoft Solution.... which name I forgot.

The latest can even monitor/manage Mac machines, Android, etc.

DelJo63 · Oct 3, 2013

St1ckM4n said:
Best solution - implement a Microsoft Solution....

hmm - - LOL (n)

St1ckM4n · Oct 4, 2013

Well, it's true. This way you have the same product looking after everything, rather than a mish-mash of different products which don't integrate properly.

DelJo63 · Oct 4, 2013

Solutions for heterogeneous systems do integrate - - well too.

MS stuff works well as long as you're working only with MS.
As soon as there's any other platform involved - - lots of luck.

as noted in the first reply I've got years on these types of systems

deployed across heterogeneous systems (ie mainframes, unixs and PCs)

DelJo63 · Oct 10, 2013

Here's a complete HTTP availability monitor (command line, not GUI)

Contents:

httpBeaconClient-sessionDoc.txt (sample usage results)
httpBeacon.Client.pl (the PERL script to run on the PC)
httpBeacon.php (the PHP script to be saved in /cgi-bin/

The webserver needs PHP, but that is commonly available in any hosting solution.
The PC side uses Perl as my library for PHP has errors in the Socket library :sigh:

Move the httpBeacon.php to your webserver /cgi-bin location - - it will be invoked by the httpBeacon.Client.pl as needed and consumes no resources until called.

Download and Install Perl at C:\Perl

Move httpBeacon.Client.pl to \Perl

The invocation is then:

$c: cd \Perl

$ perl.exe .\httpBeaconClient.pl
Processing arguments ...

*W* must provide -IP address
[-h this help list]
[-IP server_ip_address (not domain.names )]
[-P server_port_number]
[-S polling_interval_in_seconds]
[-d {1 or 2} enables debug trace level]

When invoked w/o any parameters, you get the above -h (help) display and that's all.

The default -P port is 80; -S 30 and you must provide the -IP xx.ww.zz.aa address

Perform some sample tests for yourself using -S 5 and if you have control of the webserver, you can stop it, wait 30 seconds and then restart to see how the beaconClient.pl reacts.

The polling interval is too frequent for every day use - - try -S 60 , -S 120 or even -S 180 as reasonable values.

For those demanding a GUI presentations, you could always use this paradigm:

perl.exe .\httpBeaconClient.pl -IP ww.xx.yy.aa -S 180 | GUI_beacon_program.exe

but you're on your own for that development project.

Any experience with either monitoring system?

Which would be a better monitoring solution?

Nagios XI

Opsview Enterprise

UnknownSky

Posts: 43 +5

DelJo63

DelJo63

Attachments

DelJo63

St1ckM4n

Posts: 2,887 +628

DelJo63

St1ckM4n

Posts: 2,887 +628

DelJo63

DelJo63

Attachments

Similar threads

Latest posts