Archive

Archive for the ‘SNMP’ Category

Vendors, Open Source, and Hypocrisy

May 7th, 2008

A couple weeks ago, one of our support customers requested help in configuring OpenNMS to collect performance data from a network storage server in their environment. I was not familiar with the storage server vendor, but the vendor’s web site touts their rating among the fastest-growing tech companies in the U.S. The vendor’s MIB was good and provided plenty of useful objects. With the data collection definitions in place, we restarted OpenNMS, discovered one of the storage servers, and… nothing. Data collection failed, and we started seeing some new SNMP-related messages in the logs:

ERROR [DefaultUDPTransportMapping_127.0.0.1/0]
org.snmp4j.MessageDispatcherImpl: java.io.IOException: Only 32bit unsigned integers are
supported at position 52

Anybody who does enterprise management for a living knows that there are plenty of lame SNMP agents out there. I did a few walks and learned that the storage server is running a Linux kernel and is using version 5.2.1.2 of the open-source UCD-SNMP agent. That’s a pretty old version of Net-SNMP, so I was not too surprised that it was giving us trouble. After taking some packet traces from the customer’s system and spending some time with WireShark, the SNMP4J source code, and William Stallings’ SNMP, SNMPv2, SNMPv3, and RMON 1 and 2, Third Edition, I tracked down the exact problem. I’ll just quote my notes from the ticket here, redacting to protect the guilty.

This device is definitely exhibiting buggy SNMP protocol behavior that is stopping OpenNMS collecting interface statistics from it. This reply and the several comments that precede it should be adequate documentation to open a bug report with the vendor.

By manually decoding the varbinds in this response PDU, one can see the problem. Starting at the 52nd octet in the dump we see the value of the first varbind (for ifInOctets.3) described:

41:05:01:0f:1c:a5:26

The first two octets (41:05) identify the type (counter(41)) and encoded length (5). The next five octets (01:0f:1c:a5:26) should encode the actual value of the counter. The problem is that a counter is defined as a 32-bit unsigned integer. All integers (regardless of signedness) are stored as two’s-complement according to the ASN.1 BER, but an unsigned integer represented in this format must always have 00 as its first octet. A quick inspection of the SNMP4J code confirms the issue, in file org/snmp4j/asn1/BER.java:

public static final long decodeUnsignedInteger(BERInputStream is, MutableByte type)
throws IOException
/* snippage — jeffg was here */

// check for legal uint size
int b = is.read();
if ((length > 5) || ((length > 4) && (b != 0×00))) {
throw new IOException(”Only 32bit unsigned integers are supported”+
getPositionMessage(is));
}

As an additional reference, please see the attached excerpt from SNMP, SNMPv2, SNMPv3, and RMON 1 and 2_, 3rd Ed., William Stallings (Addison Wesley, 1999), p. 591.

The upshot of all this analysis is that we won’t be able to collect performance data from these storage server nodes until the vendor can provide a software update that resolves the counter-encoding bug. This will probably take the form of a newer Net-SNMP agent (the 5.2.1.2 version currently loaded is from mid-2005) that addresses this issue. I’ve spent some time trying to track down what release fixed this problem but can’t find a reference to it in the Net-SNMP changelog. I’m certain it’s fixed in some later release, though, because we don’t see this problem with modern Net-SNMP agents.

I thought the above would be plenty of ammunition for our customer to go straight to developer-level support with the storage vendor. Today our customer contact (Mike) updated the support ticket with the storage vendor’s reply:

Mike,

Not sure what you’re looking for here but we only support SNMP agent for HP OpenView, CA Unicenter, IBM Tivoli NetView and BMC Patrol. [We provide] SNMP support to integrate [storage server product] management into an existing enterprise management solution such as HP OpenView, CA Unicenter, IBM Tivoli NetView and BMC Patrol.

No support for OpenNMS to my knowledge…

Rgds,
Jacques

Did I miss something? OpenNMS exists. It’s an enterprise management solution. It’s bested HP OpenView and IBM Tivoli Netview in at least one survey of actual users. The storage server’s SNMP agent is clearly and demonstrably in violation of the encoding rules specified for the SNMP SMI, a fact that is likely to cause interoperability problems with any reasonably strict implementation of SNMP. Why should a storage vendor dictate which enterprise management products its customers should use to manage their storage servers?

After an appeal from our customer, Jacques grudgingly agreed to escalate the issue, but not without being snooty about it:

Mike,

Sorry for that but I’m not making the rules…
Will however escalate your ‘concerns’ to higher level support.
You also might wanna ask for an RFE (Request for enhancement) thru your sales or SA.

Rgds,
Jacques

Nobody had used the word “concerns” up to this point, so Jacques’ use of quote marks around it is pretty clearly for the sheer contempt of it. I do hope that Mike will contact his sales or SA, not to request an enhancement, but to suggest a proper fix as a great way to keep commission checks coming.

This story would not bother me nearly as much if the storage vendor were not standing on the shoulders of two open-source giants while thumbing its nose at a third open-source project — and at one of its own paying customers! By using the Linux kernel and the UCD-SNMP agent in its storage servers, the vendor eliminated the huge cost of developing these components in-house. Likewise, by choosing OpenNMS over the very expensive commercial management products in the storage vendor’s anointed list, our mutual customer has a far larger slice of budget available to buy network storage servers. Given these facts, I fully expect that the answer from the escalation team will be “Bien sûr! Tout de suite!”

Update 18 June 2008

Despite my faith that the vendor would act appropriately, they came back and said that the only supported management platforms are the ones called out above (HPOV NNM, Unicenter, Tivoli Netview, and BMC Patrol). I spent a little time and built Net-SNMP 5.2.1.2 on an x86_64 Linux system, shoved enough traffic through its interface to trigger the BER bug when I request ifInOctets, and tried hitting it with xnmgraph from OpenView Network Node Manager. Just like OpenNMS, NNM’s SNMP library discards the invalid response PDUs and reports a timeout. I’m recommending that our customer install the HPOV NNM demo, discover the storage server, and see how the vendor feels about the situation now.

Update 25 June 2008

Our customer brought up a VM for us to install NNM 7.53. As expected, the NNM tools choke on the mangled Counter in the response PDUs that the storage server sends:

[root@nnmserver ~]# /opt/OV/bin/snmpget -d -v 1 -c public -r 1 10.11.12.13 ifInOctets.1
Transmitted 45 bytes to 10.11.12.13 port 161:
Initial Timeout: 0.80 seconds
...
Received 50 bytes from 10.11.12.13 port 161:
    0:  30 30 02 01 00 04 06 70 75 62 6c 69 63 a2 23 02     00.....public.#.
   16:  04 5c f8 55 1d 02 01 00 02 01 00 30 15 30 13 06     .\.U.......0.0..
   32:  0a 2b 06 01 02 01 02 02 01 0a 01 41 05 02 d3 c0     .+.........A....
   48:  65 fe -- -- -- -- -- -- -- -- -- -- -- -- -- --     e...............

    0:  SNMP MESSAGE (0x30): 48 bytes
    2:    INTEGER VERSION (0x2) 1 bytes: 0 (SNMPv1)
    5:    OCTET-STR COMMUNITY (0x4) 6 bytes: "public"
   13:    RESPONSE-PDU (0xa2): 35 bytes
   15:      INTEGER REQUEST-ID (0x2) 4 bytes: 1559778589
   21:      INTEGER ERROR-STATUS (0x2) 1 bytes: noError(0)
   24:      INTEGER ERROR-INDEX (0x2) 1 bytes: 0
   27:      SEQUENCE VARBIND-LIST (0x30): 21 bytes
   29:        SEQUENCE VARBIND (0x30): 19 bytes
   31:          OBJ-ID (0x6) 10 bytes: .1.3.6.1.2.1.2.2.1.10.1
   43:          error parsing number

Our customer has asked the storage vendor to take another look at the problem since their SNMP agent is demonstrably incompatible with one of their anointed management platforms.

OpenNMS, Rants, SNMP, Software

Austin, and the Ghost of Bosses Past

April 3rd, 2008
Comments Off

I’ve just wrapped up a trip to Austin, Texas. This was my first visit to Austin despite my parents having lived there for over a year, so Mandy and I went out on Friday afternoon and spent the weekend at their place. We had no idea how beautiful Austin is — the hill country is breathtaking in much the same way that Dallas and Houston (both fine cities!) are not.

There’s also very good eating. From Bergstrom we went to the Eastside Café for a tasty and healthy dinner. Saturday brunch at Chez Zee left us literally lying around the living room (the crème brulée French toast is delicious and formidable) for a few hours before heading to The Oasis for drinks and an unfortunately hazy view of Lake Travis. Duly libated, we proceeded to Siena for an excellent Tuscan dinner. Sunday included a new Egoscue menu work-up with my dad and lunch at the Kerbey Lane Café, which besides good food has perhaps the coolest 1960s-retro sink counter ever in its men’s bathroom.

I spent Monday and Tuesday with one of our customers, a video game development house that’s integrating OpenNMS into the system that will monitor and manage the health of the many servers that will power a new MMO game. These guys are using our software in a way that’s not quite like anything I’ve heard of before, and it’s going to be really cool. As a testament to the flexibility of OpenNMS we were able, in just two days, to come to a good understanding of what they want to accomplish and how to approach the project.

Back to food for a moment :) We broke for an authentically Austinite Tex-Mex lunch at Chuy’s (whose salsa rivals the hole-in-the-wall Mexiclone place near our house) and had dinner at Rudy’s for my first-ever helping of Texas brisket barbecue. On Tuesday one of the guys took me for a quick lunch at Conans, which besides excellent pepperoni and veggie supreme pizza also boasts a really unique atmosphere that I’m told is very Austin.

When I got back from lunch on Tuesday, there was a voicemail waiting in our sales mailbox from Kathleen, the former boss who unwittingly launched my career in the network management field. She wanted to talk with somebody about using OpenNMS to replace an installation of HP OpenView Network Node Manager (HPOV NNM).

Now there’s a bit of history here — when I turned in my notice eight years ago to Kathleen, it was to go to work for a vendor whose software she had paid to train me on. She wasn’t terribly happy about that situation, but I heard that the vendor gave her a couple of free training seats as penance.

If you’ve been paying attention, you’ll realize that this voicemail arrived on 01 April. I was well primed for pranks already, having been pwned hard-core by YouTube’s masterful RickRoll and having made the OpenNMS “Enterprise Edition” price calculator for one of Tarus‘ series of blog posts. I immediately IMed Johnny to see if he had put her up to calling, but he had had nothing to do with it.

If you’re reading this, Kathleen, thanks for considering OpenNMS and I hope to see you soon in one of our training classes or as the guy doing your GreenLight!

Tuesday night I met up at Koreana with some folks from one of our other Austin customers, also a video game house, who also want to use OpenNMS in their MMO game server monitoring system! If OpenNMS keeps spreading through the video game industry at this rate, the old stand-by of blaming slow game servers for a failed raid will soon be history ;) The bluefin sashimi was great, by the way.

Geeky, OpenNMS, SNMP, Software, Travel

The Worst MIB in the World

March 5th, 2007

This lovely turd is courtesy of NexTone Communications (Update: whose newer MIBs are much better — see Scott’s second comment below). I have personally never seen a worse MIB in my life — it consists of a single OID that is just a string and is used by the solitary generalTrap to convey who-knows-what when some unknown thing happens.

Update: NexTone’s more recent MIBs are much better written and convey much more useful and deterministic information. If you’re selecting a vendor based on SNMP support, and NexTone is in the running, I’d say that NexTone looks pretty good now. I’d still like to see more than just traps, but as trap-only MIBs go, the new ones aren’t bad at all. See Scott’s second comment below.
Read more…

Geeky, Rants, SNMP, VoIP

Reporting on Cisco IP SLA tests in OpenNMS

February 20th, 2007

A large customer with exacting standards has seen some voice quality problems lately. The WAN guys set up a pair of IP SLA VoIP Jitter tests on the routers at either end of one of the customer’s critical links, and I spent a weekend hacking away at OpenNMS collection groups and graph definitions so that we can do historical reporting and, ultimately, thresholding on the stats. I was pleasantly surprised at how easy it was, all thanks to the new “resource types”, (support for generic indexed tables) in the 1.3.2 release. I blame DJ for my having any time to relax this week, since he did the bulk (all?) of the resourceType work.
Here’s the write-up on what’s needed (no code changes, w00t!) as well as some eye candy.

Update: This work is included in OpenNMS releases since 1.3.6.

Cisco, Geeky, OpenNMS, QoS, SNMP, Software, VoIP

Interface-specific routing in Solaris

December 11th, 2006
Comments Off

In Solaris, you can add a route whose traffic should go out a specific interface by adding -ifp [ifname] to the route command line. For instance, suppose a host has two interfaces (eri0 and hme0) on the same IP subnet (10.4.2.9/24 with gateway 10.4.2.254), and traffic for just a few hosts needs to go out the secondary hme0 interface. One reason this setup may be needed is for monitoring both some firewalls and the apps that those firewalls protect from a single network management station. On the firewalls you would add host-specific routes for the network management station’s secondary interface via the firewall management network, allowing that interface to talk directly to the firewalls. The primary interface of the network management station gets routed normally, though, and so is able to talk to hosts protected by the same firewalls.
The following command makes this happen:

# route add -host 172.29.4.3 10.4.2.254 -ifp hme0

add host 172.29.4.3: gateway 10.4.2.254

# route add -host 172.29.4.4 10.4.2.254 -ifp hme0

add host 172.29.4.4: gateway 10.4.2.254

# route add -host 172.29.7.31 10.4.2.254 -ifp hme0

add host 172.29.7.31: gateway 10.4.2.254

# route add -host 172.29.7.32 10.4.2.254 -ifp hme0

add host 172.29.7.31: gateway 10.4.2.254

Now all traffic for the four hosts above will go out hme0 instead of eri0.
This trick is actually buried in a tiny section of the route(1M) man page that is worded such that my tiny brain didn’t get it. I’m not even sure what ifp stands for. The obvious candidate, the -iface or -interface flag, can’t be right because it requires the use of proxy ARP.

Firewall, Geeky, SNMP, solaris

Getting sysObjectID out of cpsnmpd on a Crossbeam APM

March 10th, 2006

I’d like to use OpenNMS to discover and collect statistics from the SNMP agent of Check Point FW-1 firewalls running on Crossbeam APMs. The major challenge in doing so is the fact that cpsnmpd in its default configuration defines a MIB view that includes only 1.3.6.1.4.1.2620.1.1 or enterprises.checkpoint.products.fw. This fact means that the sysObjectID MIB object (1.3.6.1.2.1.1.2.0) is excluded from this view, so OpenNMS cannot determine what kind of device it is dealing with. I know of at least one person who is having the same issue trying to discover these agents using CA (Concord) eHealth.

I gathered the above information on the MIB view from looking at a FW-1 R55 installation on a Solaris system; in $CPDIR/lib/snmp I found a file called view.conf with the following contents:

#
# entries are in the following format:
# viewIndex viewSubtree viewStatus viewMask
# where viewstatus is either "included" or "excluded",
# and mask is either "null" or a hex number 1-16 bytes long.

10 .iso.org.dod.internet.private.enterprises.checkpoint.products.fw included Null

This file does not exist on the FW-1 APM installations that I have looked at (where the CPMs are running XOS 6.0.1 and 7.0.1), but the $CPDIR/lib/snmp directory does exist. I would propose creating the view.conf here with the following contents:

#
# entries are in the following format:
# viewIndex viewSubtree viewStatus viewMask
# where viewstatus is either "included" or "excluded",
# and mask is either "null" or a hex number 1-16 bytes long.

10 .iso.org.dod.internet.private.enterprises.checkpoint.products.fw included Null
10 .iso.org.dod.internet.mgmt.mib-2.system included Null

That should add the MIB-2 system group to the default MIB view, allowing most network management systems to discover these devices. A cprestart will almost certainly be needed to get the cpsnmpd to reload its view configuration, though sending a HUP signal to the running cpsnmpd process might do the trick.

Firewall, Geeky, OpenNMS, SNMP