Musings
muse: to turn something over in the mind meditatively and often inconclusively

« Byte me | Main | Time for glasses? »

Conservatism

Something which worried me when I joined as a peewee wet-behind-the-ears developer was what I took to be stagnation.

My team is responsible for, among other things, an SDK the rest of our software builds on top of. When I first joined this team the SDK had been through three releases (a period of a few years at least) with almost zero changes. The first release I was involved in saw more changes than the entire collection of releases before that that one.

And I was pretty disheartened with the reaction. People expected code that was written on a later version of our SDK to run unchanged on earlier versions of the SDK. They clearly hadn't stopped and thought about this because it implies that we somehow ensure that code added in a later release of the SDK make its way into versions that had already been released. I thought a lot about that and it didn't take much to see what caused this reaction: our developers had grown used to the fact that the SDK was static.

Thankfully we've dragged them kicking and screaming out of this mindset. I view that as progress: imagine where we'd be if the latest Java release had APIs that were unchanged from the 1.1 release. Or Python. Or any library.

While not on this scale, I've bumped into a related phenomenon over the past few years. It's a kind of conservatism that manifests as an attempt to stifle any change that's viewed as even remotely risky.

I ran into it again this evening and it makes my blood boil. My team introduced a profiler-like mechanism into our SDK that allows us to report on database activity (mostly timing data) in an attempt to tackle a recurring problem on client sites: the app locks up briefly, often because of excessive locking at the database level. One of the first things we need to understand is which database operations where involved. And more importantly, we need to know then and there. This is about diagnosing a problem in the field.

Now the first implementation was flawed and resulted in a few problems under load. We corrected it and that issue is no longer an issue. Recently another performance problem arose and it was tracked back to this code. As it turns out, badly behaving code was (erroneously) generating thousands of unique query strings, resulting in our internal profiling structures gobbling up excessive amounts of RAM. Fair enough, the SDK can be (and has been) updated to cope with badly behaved applications.

One of the guys involved in tracking this down has now demanded that we disable this functionality by default. It's just too risky he feels. What he fails to understand is that this functionality is useless unless it's enabled. That's the whole point of a tool aimed at problem diagnosis.

The knee jerk reaction in this case, "take it out, take it out, it's too risky!", really gets up my nose. Yes, there's risk. Is it unmanageable? No. Do we expect a few teething problems? Yes. Is the potential gain here worth accepting those teething problems? I think so.

Without tools like this we are up the creek when it comes to diagnosing this class of problem. Just mention "610 event" to any one of our developers or support engineers and watch the reaction.

It's really (really) important to me that the environment I work in views change as positive. Progress should be a given, and not something we're occasionally prepared to risk.

Posted at 07:37 PM