LWCE 2008: An honest assessment

August 16th, 2008

This post is a bit late in coming, but I’ve been working on it since my body was somewhere over southern Utah during my flight from Atlanta (home to me) to San Francisco. Having had my nose buried in Lyle Estill’s book Small Is Possible, my mind was in Chatham County, North Carolina (home of OpenNMS Group World HQ). Walking from the head through business class back to my seat I thought that one of the guys I walked past looked familiar. For a brief instant I could see his face at Chatham Marketplace. That notion was plausible because I’ve been there a couple times with Tarus, but not at all likely. I climbed back across two neighbors to my window seat in coach, looked out again at canyon country, and got out my laptop to start writing this post.

Originally, nobody from the OpenNMS project was slated to go to LWCE. We have had a booth in the .ORG pavilion in past years — up until last year, in fact:

Ben and Antonio at LWCE 2007
Page 42 of the LWCE 2008 Attendee Guide features Ben (left) and Antonio in the OpenNMS booth at LWCE 2007!

Despite having been offered a booth again, we decided to pass this year as the show has become increasingly commercialized and decreasingly about Open Source software. The purpose of my trip as I understood it was to provide technical support for a demo, to run in the Open Solutions Alliance (OSA) showcase, showing how alerts from Hyperic HQ could be turned into help-desk tickets in Concursive ConcourseSuite (formerly Centric CRM). Both Hyperic and Concursive are OSA members; The OpenNMS Group is not a member, but has independently developed integrations with both HQ and ConcourseSuite. The idea was to show off a software stack that David, in a clever play on LAMP, dubbed OUCH:

  • OpenNMS
  • Ubuntu
  • Concursive
  • Hyperic

David had built an Ubuntu virtual machine and installed OpenNMS, HQ Open Source, and ConcourseSuite Community Edition before his plate filled up and he handed off the project to me. All that was left for me to do was to set up the integrations. As luck would have it, the VM became corrupted on setup day (Monday) and I had to start completely from scratch. I’d left my Ubuntu media at home, and the Internet access both on the show floor and at my hotel was slow and unreliable. I hit up the team from Canonical for a Hardy x86 server CD. Those guys have a real penchant for giving away install media, and would not let me leave without also taking DVDs of the desktop version for both x86 and x86_64 and a pile of stickers. In exchange for their help, I promised to put an Ubuntu sticker on my Macbook Pro for the demo. Chalk one up for a project that does business without ever charging for the right to use its software, and therefore has no agenda that could put it at odds with its community of volunteer contributors.

On the way out of the hall that day, I stopped by the Hyperic booth to say “hi” to Stacey, and also got to meet Jeremy Hogan, Hyperic’s new Directory of Community Management. Jeremy lives in the RTP area, right in the back yard of our World HQ. Hopefully we’ll get him to join us for lunch before he gets whisked away to the other coast!

I spent Tuesday fighting crappy Internet access, working out operational kinks in the integrations among the three applications in the OUCH stack (with lots of great help from Josh and Ananth of Concursive), watching people walk through the OSA Showcase area without watching the demos, and wandering the show floor dodging the Dice operatives who simply could not fathom that I am employed at what is literally my dream job. Outside Canonical’s mammoth booth and the .ORG ghetto (where I ran into Josh from PostgreSQL), I spotted only one software exhibitor that actually gets what Open Source means and lives it the way we do. That vendor is OpsView, a Nagios® integrator that sells only services. The guy I talked to in their booth was Adrian, who not only works with OpsView but turns out to be a co-worker of the OGP’s own Jonathan Sartin. Truly it’s a small (Linux)world. I was happy to learn that OpsView won Best System Management Tool in this year’s Product Excellence Awards at LWCE even despite all the money that another Nagios integrator (one that tries to do a hybrid Open Source / commercial play) was spending to put people on the floor wearing penguin costumes and carrying sandwich boards.

At lunch time on Wednesday, OGP member Jason (who also cooks a damn tasty pork chop) came by for a while. He and I walked around the floor together and talked to a couple of interesting hardware vendors on the NGDC side of the show. Apart from those guys and the .ORG ghetto, I’m inclined to agree with Jay’s assessment that the show at large amounted to “a lot of vendor wankery”. After more help from David my demo was finally ready to go, so I took over the podium and the big screen. Despite looping among a pretty slide show, a video loop that I had made for LUGRadio Live USA 2008, and a demo of the actual integration, only a few people ever sat down, and probably 75% of those who did had stopped just to rest their feet. At one point I abandoned the presentation entirely and sat down to talk with two gentlemen who were more interested in substance (”So what’s your demo actually supposed to show me?”) than in flash, and in ideology (”Wait, you’re not pulling an open-source bait and switch?”) than in hype. Even so, I suspect the number of people I truly reached and engaged is larger than the number drawn in during the rest of the show by the other demos from the OSA Showcase.

I came home early on the Wednesday night redeye and felt fine about missing the last day of the show on Thursday. All in all, I think the week affirmed our assessment that the U.S. LWCE is no longer a worthwhile show for us. I’ll see you in London in October!

Vendors, Open Source, and Hypocrisy

May 7th, 2008

A couple weeks ago, one of our support customers requested help in configuring OpenNMS to collect performance data from a network storage server in their environment. I was not familiar with the storage server vendor, but the vendor’s web site touts their rating among the fastest-growing tech companies in the U.S. The vendor’s MIB was good and provided plenty of useful objects. With the data collection definitions in place, we restarted OpenNMS, discovered one of the storage servers, and… nothing. Data collection failed, and we started seeing some new SNMP-related messages in the logs:

ERROR [DefaultUDPTransportMapping_127.0.0.1/0]
org.snmp4j.MessageDispatcherImpl: java.io.IOException: Only 32bit unsigned integers are
supported at position 52

Anybody who does enterprise management for a living knows that there are plenty of lame SNMP agents out there. I did a few walks and learned that the storage server is running a Linux kernel and is using version 5.2.1.2 of the open-source UCD-SNMP agent. That’s a pretty old version of Net-SNMP, so I was not too surprised that it was giving us trouble. After taking some packet traces from the customer’s system and spending some time with WireShark, the SNMP4J source code, and William Stallings’ SNMP, SNMPv2, SNMPv3, and RMON 1 and 2, Third Edition, I tracked down the exact problem. I’ll just quote my notes from the ticket here, redacting to protect the guilty.

This device is definitely exhibiting buggy SNMP protocol behavior that is stopping OpenNMS collecting interface statistics from it. This reply and the several comments that precede it should be adequate documentation to open a bug report with the vendor.

By manually decoding the varbinds in this response PDU, one can see the problem. Starting at the 52nd octet in the dump we see the value of the first varbind (for ifInOctets.3) described:

41:05:01:0f:1c:a5:26

The first two octets (41:05) identify the type (counter(41)) and encoded length (5). The next five octets (01:0f:1c:a5:26) should encode the actual value of the counter. The problem is that a counter is defined as a 32-bit unsigned integer. All integers (regardless of signedness) are stored as two’s-complement according to the ASN.1 BER, but an unsigned integer represented in this format must always have 00 as its first octet. A quick inspection of the SNMP4J code confirms the issue, in file org/snmp4j/asn1/BER.java:

public static final long decodeUnsignedInteger(BERInputStream is, MutableByte type)
throws IOException
/* snippage — jeffg was here */

// check for legal uint size
int b = is.read();
if ((length > 5) || ((length > 4) && (b != 0×00))) {
throw new IOException(”Only 32bit unsigned integers are supported”+
getPositionMessage(is));
}

As an additional reference, please see the attached excerpt from SNMP, SNMPv2, SNMPv3, and RMON 1 and 2_, 3rd Ed., William Stallings (Addison Wesley, 1999), p. 591.

The upshot of all this analysis is that we won’t be able to collect performance data from these storage server nodes until the vendor can provide a software update that resolves the counter-encoding bug. This will probably take the form of a newer Net-SNMP agent (the 5.2.1.2 version currently loaded is from mid-2005) that addresses this issue. I’ve spent some time trying to track down what release fixed this problem but can’t find a reference to it in the Net-SNMP changelog. I’m certain it’s fixed in some later release, though, because we don’t see this problem with modern Net-SNMP agents.

I thought the above would be plenty of ammunition for our customer to go straight to developer-level support with the storage vendor. Today our customer contact (Mike) updated the support ticket with the storage vendor’s reply:

Mike,

Not sure what you’re looking for here but we only support SNMP agent for HP OpenView, CA Unicenter, IBM Tivoli NetView and BMC Patrol. [We provide] SNMP support to integrate [storage server product] management into an existing enterprise management solution such as HP OpenView, CA Unicenter, IBM Tivoli NetView and BMC Patrol.

No support for OpenNMS to my knowledge…

Rgds,
Jacques

Did I miss something? OpenNMS exists. It’s an enterprise management solution. It’s bested HP OpenView and IBM Tivoli Netview in at least one survey of actual users. The storage server’s SNMP agent is clearly and demonstrably in violation of the encoding rules specified for the SNMP SMI, a fact that is likely to cause interoperability problems with any reasonably strict implementation of SNMP. Why should a storage vendor dictate which enterprise management products its customers should use to manage their storage servers?

After an appeal from our customer, Jacques grudgingly agreed to escalate the issue, but not without being snooty about it:

Mike,

Sorry for that but I’m not making the rules…
Will however escalate your ‘concerns’ to higher level support.
You also might wanna ask for an RFE (Request for enhancement) thru your sales or SA.

Rgds,
Jacques

Nobody had used the word “concerns” up to this point, so Jacques’ use of quote marks around it is pretty clearly for the sheer contempt of it. I do hope that Mike will contact his sales or SA, not to request an enhancement, but to suggest a proper fix as a great way to keep commission checks coming.

This story would not bother me nearly as much if the storage vendor were not standing on the shoulders of two open-source giants while thumbing its nose at a third open-source project — and at one of its own paying customers! By using the Linux kernel and the UCD-SNMP agent in its storage servers, the vendor eliminated the huge cost of developing these components in-house. Likewise, by choosing OpenNMS over the very expensive commercial management products in the storage vendor’s anointed list, our mutual customer has a far larger slice of budget available to buy network storage servers. Given these facts, I fully expect that the answer from the escalation team will be “Bien sûr! Tout de suite!”

Update 18 June 2008

Despite my faith that the vendor would act appropriately, they came back and said that the only supported management platforms are the ones called out above (HPOV NNM, Unicenter, Tivoli Netview, and BMC Patrol). I spent a little time and built Net-SNMP 5.2.1.2 on an x86_64 Linux system, shoved enough traffic through its interface to trigger the BER bug when I request ifInOctets, and tried hitting it with xnmgraph from OpenView Network Node Manager. Just like OpenNMS, NNM’s SNMP library discards the invalid response PDUs and reports a timeout. I’m recommending that our customer install the HPOV NNM demo, discover the storage server, and see how the vendor feels about the situation now.

Update 25 June 2008

Our customer brought up a VM for us to install NNM 7.53. As expected, the NNM tools choke on the mangled Counter in the response PDUs that the storage server sends:

[root@nnmserver ~]# /opt/OV/bin/snmpget -d -v 1 -c public -r 1 10.11.12.13 ifInOctets.1
Transmitted 45 bytes to 10.11.12.13 port 161:
Initial Timeout: 0.80 seconds
…
Received 50 bytes from 10.11.12.13 port 161:
    0:  30 30 02 01 00 04 06 70 75 62 6c 69 63 a2 23 02     00…..public.#.
   16:  04 5c f8 55 1d 02 01 00 02 01 00 30 15 30 13 06     .\.U…….0.0..
   32:  0a 2b 06 01 02 01 02 02 01 0a 01 41 05 02 d3 c0     .+………A….
   48:  65 fe — – — – — – — – — – — – — –     e……………

    0:  SNMP MESSAGE (0×30): 48 bytes
    2:    INTEGER VERSION (0×2) 1 bytes: 0 (SNMPv1)
    5:    OCTET-STR COMMUNITY (0×4) 6 bytes: “public”
   13:    RESPONSE-PDU (0xa2): 35 bytes
   15:      INTEGER REQUEST-ID (0×2) 4 bytes: 1559778589
   21:      INTEGER ERROR-STATUS (0×2) 1 bytes: noError(0)
   24:      INTEGER ERROR-INDEX (0×2) 1 bytes: 0
   27:      SEQUENCE VARBIND-LIST (0×30): 21 bytes
   29:        SEQUENCE VARBIND (0×30): 19 bytes
   31:          OBJ-ID (0×6) 10 bytes: .1.3.6.1.2.1.2.2.1.10.1
   43:          error parsing number

Our customer has asked the storage vendor to take another look at the problem since their SNMP agent is demonstrably incompatible with one of their anointed management platforms.

Austin, and the Ghost of Bosses Past

April 3rd, 2008

I’ve just wrapped up a trip to Austin, Texas. This was my first visit to Austin despite my parents having lived there for over a year, so Mandy and I went out on Friday afternoon and spent the weekend at their place. We had no idea how beautiful Austin is — the hill country is breathtaking in much the same way that Dallas and Houston (both fine cities!) are not.

There’s also very good eating. From Bergstrom we went to the Eastside Café for a tasty and healthy dinner. Saturday brunch at Chez Zee left us literally lying around the living room (the crème brulée French toast is delicious and formidable) for a few hours before heading to The Oasis for drinks and an unfortunately hazy view of Lake Travis. Duly libated, we proceeded to Siena for an excellent Tuscan dinner. Sunday included a new Egoscue menu work-up with my dad and lunch at the Kerbey Lane Café, which besides good food has perhaps the coolest 1960s-retro sink counter ever in its men’s bathroom.

I spent Monday and Tuesday with one of our customers, a video game development house that’s integrating OpenNMS into the system that will monitor and manage the health of the many servers that will power a new MMO game. These guys are using our software in a way that’s not quite like anything I’ve heard of before, and it’s going to be really cool. As a testament to the flexibility of OpenNMS we were able, in just two days, to come to a good understanding of what they want to accomplish and how to approach the project.

Back to food for a moment :) We broke for an authentically Austinite Tex-Mex lunch at Chuy’s (whose salsa rivals the hole-in-the-wall Mexiclone place near our house) and had dinner at Rudy’s for my first-ever helping of Texas brisket barbecue. On Tuesday one of the guys took me for a quick lunch at Conans, which besides excellent pepperoni and veggie supreme pizza also boasts a really unique atmosphere that I’m told is very Austin.

When I got back from lunch on Tuesday, there was a voicemail waiting in our sales mailbox from Kathleen, the former boss who unwittingly launched my career in the network management field. She wanted to talk with somebody about using OpenNMS to replace an installation of HP OpenView Network Node Manager (HPOV NNM).

Now there’s a bit of history here — when I turned in my notice eight years ago to Kathleen, it was to go to work for a vendor whose software she had paid to train me on. She wasn’t terribly happy about that situation, but I heard that the vendor gave her a couple of free training seats as penance.

If you’ve been paying attention, you’ll realize that this voicemail arrived on 01 April. I was well primed for pranks already, having been pwned hard-core by YouTube’s masterful RickRoll and having made the OpenNMS “Enterprise Edition” price calculator for one of Tarus‘ series of blog posts. I immediately IMed Johnny to see if he had put her up to calling, but he had had nothing to do with it.

If you’re reading this, Kathleen, thanks for considering OpenNMS and I hope to see you soon in one of our training classes or as the guy doing your GreenLight!

Tuesday night I met up at Koreana with some folks from one of our other Austin customers, also a video game house, who also want to use OpenNMS in their MMO game server monitoring system! If OpenNMS keeps spreading through the video game industry at this rate, the old stand-by of blaming slow game servers for a failed raid will soon be history ;) The bluefin sashimi was great, by the way.

The best kind of love

January 31st, 2008

Picking up on Dave’s recent post, I’d like to spend a moment discussing love.

My mother tells the story of a day when I was being a kid, getting dirty playing in the garden as she and her own mother watched. Suddenly I just put my fun-having on hold, ran over to Gran, and gave her a giant muddy hug. She turned to my mother and said, “that’s the best kind of love, when you don’t even have to ask for it.”

Back in grown-up land, a guy named Neil Watson has been hanging out for a while on the OpenNMS discussion list, asking and answering questions, and generally being part of the community. Neil is not a network management guru, but he’s displayed a keen interest in OpenNMS and has always been appreciative of the help he’s received and willing to give back. That alone is good enough to call love, especially when contrasted with some of the help vampires who have plagued our community over the years.

Neil didn’t stop at giving back on the mailing list, though. He’s written a very nice and accessible review of OpenNMS 1.3.9. Nobody on the mailing list asked him to do this, and as far as I can tell his employer did not pay him to do it. He just did it. It’s like a big unsolicited hug for the OGP and everyone else who loves OpenNMS, and we didn’t even have to get muddy.

So thanks, Neil. If I ever get to Toronto, I owe you a beer and a big hug.

Why buy support for free software?

January 23rd, 2008

Why should you consider purchasing a support contract for OpenNMS or another true open-source product? The answer may surprise you.

My employer, The OpenNMS Group, maintains OpenNMS as a truly and unquestionably open-source project. The software is really free, as in beer, speech, and freedom — we do not sell an enterprise version with more features. Every penny of my paycheck represents revenue from the sale of support contracts, professional services, and training for OpenNMS, which we offer at darned reasonable prices.

One of our support customers, a service provider in the United Kingdom, reported a problem last Friday with their OpenNMS server. The OpenNMS daemons had stopped unexpectedly, and after our customer’s staff had started them back up, the daemons stopped again after a few minutes. This happened repeatedly, so we had a fairly hot support ticket on our hands. It turned out that the customer had added RAM to the OpenNMS server at our suggestion (the amount of stuff they are now managing with OpenNMS is larger than they initially expected), and it was immediately after bringing up the server with the new memory that the daemons started crashing.

In a previous life working on commercial software, if a ticket like this had gotten escalated to me I would have requested some logs and recommended that the customer run an exhaustive memory test on the server, since a bad memory module can easily cause crashes. In this case, I requested the memory test. Then I logged in to the customer’s server and set to work looking at the system. After bad RAM, my second suspicion was that the daemons were trying to breathe more deeply in the newly expanded physical memory, but were running out of heap space since the Java VM under which they were running was constrained to a maximum heap size that was appropriate for the pre-upgrade system. As soon as the workday was over in the customer’s timezone, I adjusted the maximum heap size setting and restarted the daemons. I continued to monitor the system throughout the weekend even though this customer does not pay for 24×7 support. The good news was that the daemons were no longer crashing after just a few minutes. The bad news was that they were instead stopping after a few hours.

On Monday morning, I came back to this ticket in earnest. Logging in again and looking closely at the OpenNMS logs, I saw no telltale signs of bad memory being the culprit. In fact, the logs painted a picture of the daemons having been shut down in a somewhat orderly fashion — shutdown hooks were being called, bean contexts were being destroyed, and resources were being freed. This is not what a memory-related daemon crash looks like at all. I saved the logs and started the daemons again, and went about my morning, checking back periodically to see whether the daemons were still running.

An hour or so later on our daily scrum call, I brought up this issue. Everybody agreed that the scenario was odd, and I got a couple of additional pairs of eyeballs committed to have a look. Dave stuck his head in and quickly found a message in the system’s kernel logs indicating that something called oom_killer had actually sent a shutdown signal to the OpenNMS daemons. That explained the orderly shutdown I had observed in the daemon logs, but neither of us was familiar with the OOM Killer. It turns out that this dreaded beast comes into play when the Linux kernel is Out Of Memory (hence OOM), sacrificing a running process in order to free up enough resources for the rest of the system to continue running. But why was the system running out of memory? After the memory upgrade, it should have had plenty of RAM.

It turns out that the server was exhausting its physical memory for short periods when OpenNMS detected critical outages and queued up e-mail notifications. When physical memory is exhausted, the kernel turns to paging for a respite, using swap space specially allocated on the system’s disks as a place to “page out” some less critical processes that are using the physical memory that the system needs to handle the tasks at hand. Most of the time this strategy works very well, but the customer had not adjusted the amount of swap space allocated on the server’s disks to compensate for the just-added physical memory — the rule of thumb is that a server should have an amount of swap space equal to twice the size of its physical memory, but now this server’s swap size was just 25% of its physical memory size. This comparably tiny amount of swap was quickly exhausted, but the kernel needed to make more room, so it dispatched the OOM Killer to assassinate a process whose demise would free up a ton of virtual memory. That unlucky process was the Java VM under which the OpenNMS daemons were running.

Fairly confident that we had a handle on the problem, I suggested to Dave that adding swap space on the customer’s server was a good next step. He agreed, and since for this customer we manage the server and operating system as well as the OpenNMS installation, I brought online additional swap to bring the system in line with the twice-physical-memory guideline. The daemons have not crashed since, and I’m crossing my fingers that clicking the “Publish” button on this blog post will not cause them to fall over :)

Now, ask yourself whether you would expect this level of service from a commercial software vendor. In a previous life at such a company, my approach to this same issue would have been very different. I would have requested logs from the customer, and after spotting the appearance of an ordered shutdown in the log messages, the customer would have been on his own to track down the OOM Killer’s involvement in the problem. The ticket would have dragged on for weeks or months. Managers and directors would have wrung their hands in one weekly meeting after another at the pernicious red-highlighted row in a spreadsheet. The customer would have become disgruntled but been powerless to do anything besides escalate the issue, because as much as they might want to cancel their product maintenance (which typically costs from 17% to 25% of the list price of the software), the vendor would cut them off from getting new product versions if they did so.

Because the software that we support at The OpenNMS Group is available to anybody free of charge, we cannot say to a customer, “oh, you didn’t pay your support bill, so you can’t have the new version of the product.” Therefore our support has to go far beyond what most people expect from support that costs five or ten times as much. We also cannot build a revenue model around charging for support according to the number of nodes, interfaces, or agents being managed, or even according to the number of OpenNMS servers that are running. Everybody who buys a given level of support pays the same amount for that support.

So why buy support for free software? Because it’s very likely that you will get much more and much better support for your money from a company that does not have a software license revenue cash cow, and has therefore figured out how to transform support from a “cost center” to the star of its show.

Injections and reflections and web apps, oh my!

January 19th, 2008

I love application security. It has always been a part of my work, whether as an explicit part of my job or as something my curious nature just couldn’t resist spilling over into.

I hate tickingclock.gif. A long time ago, there was a horrible animated GIF image of a ticking clock in the web interface of my employer’s main product. When you ran a report, you saw the ticking clock. There was no progress indicator, no cancel button, just the ticking clock.

In the 5.0 release, engineering added a Cancel button, and there was much rejoicing. The cancel button was just the submit button of an HTML form that held in a hidden field the PID of the running report process, which would be sent a kill signal on the server. Of course, since the PID was unchecked on the server side, anybody who had a little cunning and permission to run reports could kill any process running as NH_USER, including the server processes themselves. Oops. They fixed that issue a year or so later.

Fast-forward to the present. A good samaritan notified my new employer of an SQL injection vulnerability in the asset information part of the OpenNMS web app. We were building SQL queries directly out of data received from the web UI, which is high on the list of ways to get your application cracked. Remember how I said I can’t resist app security? I spent a few hours changing all the code in the web app AssetModel classes to use prepared statements and parameter binding, which gives you escaping of SQL special characters for free. It also gives a performance boost in some situations.

I knew at the time I was doing these fixes that it would be impossible to plug every hole. The web app is something we plan to redo from whole cloth in the next year or two, so this is work that will be thrown away. As if to remind me of these twin facts, I got a follow-up e-mail from our good samaritan just moments after checking in the SQL injection fixes. He had now found a reflected cross-site scripting (XSS) vulnerability. I was fried and my wife had come home, so I shelved the XSS problem until this morning.

Reflected XSS vulnerabilities are the less nasty kind (the nastier being the stored variety), and their potential for wreaking havoc is sometimes difficult to see at first. An attacker has to be somewhat crafty in order to get a victim to bite, but complacency and e-mails composed in 24-point purple MS Comic Sans font (with importance set to high, please) make the attacker’s job easier. Once the victim clicks on the attacker’s link (and logs in if not already authenticated) the fun can begin. It’s almost no work at all to craft a URL that sends the user’s session ID cookie to a drop-box where the attacker can retrieve it, allowing him to hijack the victim’s web app session. If the victim is assigned administrative privileges, it’s game over.

Fixing a reflected XSS vulnerability is harder than the technical part of exploiting one. When I started banging on this bug this morning, I did a proof-of-concept fix that I felt confident would work: I escaped the offending parameter string before putting it into the exception that our code throws when it’s unable to parse the string as an integer. Still, the alert pop-up that I had crafted into my test URL came right back. Why didn’t that work? I did a clean build and a fresh new install and tried again. Still no change. Another cup of coffee helped. The problem was that the Javascript encoded into one of the URL parameters was being reflected not once but twice. Even though I had escaped the dirty data before putting it into my exception’s message, the next exception up the stack contained the dirty data verbatim. Building a nice tailored exception class and a corresponding error-page fixed this particular XSS issue, and had the added benefit of making it look much nicer when a typo or an old link triggered the same exception-handling code that the XSS attack targeted.

Almost every web app vulnerability exploits the fact that some code uses data received from the browser without validating the data first. I went through the remainder of the web app code and replaced about a million Integer.parseInt method calls with an equivalent method that sanitizes the data first, removing anything that’s not a decimal digit. There were also a dozen or so Long.parseLong calls and a handful of Double.parseDouble calls, all of them now using safe equivalent methods. I also ran across one more SQL injection vulnerability that I had previously missed because the offending code was hidden in inner classes instead of being in plain sight in its own package.

There are really two lessons from all of this, I think. First, never trust data received from the user. That’s elementary, but too often overlooked by even good programmers who just want to finish a project. Second, never fall into the trap of thinking you’ve plugged all the holes. There are always more out there.

Identity support in Apple’s Mail.app

August 9th, 2007

I was about to give up on Mail.app and install Thunderbird when I found this article on making Mail.app support multiple identities within a single e-mail account. This dead-simple, undocumented (AFAICT) solution did the trick for my needs.

The Worst MIB in the World

March 5th, 2007

This lovely turd is courtesy of NexTone Communications (Update: whose newer MIBs are much better — see Scott’s second comment below). I have personally never seen a worse MIB in my life — it consists of a single OID that is just a string and is used by the solitary generalTrap to convey who-knows-what when some unknown thing happens.

Update: NexTone’s more recent MIBs are much better written and convey much more useful and deterministic information. If you’re selecting a vendor based on SNMP support, and NexTone is in the running, I’d say that NexTone looks pretty good now. I’d still like to see more than just traps, but as trap-only MIBs go, the new ones aren’t bad at all. See Scott’s second comment below.
Read the rest of this entry »

Reporting on Cisco IP SLA tests in OpenNMS

February 20th, 2007

A large customer with exacting standards has seen some voice quality problems lately. The WAN guys set up a pair of IP SLA VoIP Jitter tests on the routers at either end of one of the customer’s critical links, and I spent a weekend hacking away at OpenNMS collection groups and graph definitions so that we can do historical reporting and, ultimately, thresholding on the stats. I was pleasantly surprised at how easy it was, all thanks to the new “resource types”, (support for generic indexed tables) in the 1.3.2 release. I blame DJ for my having any time to relax this week, since he did the bulk (all?) of the resourceType work.
Here’s the write-up on what’s needed (no code changes, w00t!) as well as some eye candy.

Interface-specific routing in Solaris

December 11th, 2006

In Solaris, you can add a route whose traffic should go out a specific interface by adding -ifp [ifname] to the route command line. For instance, suppose a host has two interfaces (eri0 and hme0) on the same IP subnet (10.4.2.9/24 with gateway 10.4.2.254), and traffic for just a few hosts needs to go out the secondary hme0 interface. One reason this setup may be needed is for monitoring both some firewalls and the apps that those firewalls protect from a single network management station. On the firewalls you would add host-specific routes for the network management station’s secondary interface via the firewall management network, allowing that interface to talk directly to the firewalls. The primary interface of the network management station gets routed normally, though, and so is able to talk to hosts protected by the same firewalls.
The following command makes this happen:

# route add -host 172.29.4.3 10.4.2.254 -ifp hme0

add host 172.29.4.3: gateway 10.4.2.254

# route add -host 172.29.4.4 10.4.2.254 -ifp hme0

add host 172.29.4.4: gateway 10.4.2.254

# route add -host 172.29.7.31 10.4.2.254 -ifp hme0

add host 172.29.7.31: gateway 10.4.2.254

# route add -host 172.29.7.32 10.4.2.254 -ifp hme0

add host 172.29.7.31: gateway 10.4.2.254

Now all traffic for the four hosts above will go out hme0 instead of eri0.
This trick is actually buried in a tiny section of the route(1M) man page that is worded such that my tiny brain didn’t get it. I’m not even sure what ifp stands for. The obvious candidate, the -iface or -interface flag, can’t be right because it requires the use of proxy ARP.