For an industry that prides itself on its analytical ability and abstract mental processing, we often don't do a great job applying that mental skill to the most important element of the programmer's tool chest: ourselves.

It's long been a maxim of the development community that the management team is from an entirely different planet than the rest of us. You know to whom I refer - the CxOs and VPs and other folks in the spiffy suits and ties and heels who speak in languages we don't understand, using tools we don't recognize, engaging in activities we see as pointless…. What's up with those folks? They always seem like they're on “the other side” from us, using strange nomenclature and near-religiously-held beliefs about the world to block us from getting our job done; or worse, they demand things of us that no sane human being would ever ask.

(If you're a manager, or just want to engage in a little “grass is greener” exercise, take the previous paragraph and replace “CxOs and VPs” with “coders and developers,” and “spiffy suits and ties and heels” with “T-shirts, jeans, and flip-flops,” and then re-read the paragraph. Funny how the world works, eh?)

But since I've become a CTO of a consulting company, I've had to spend more time with those folks, and while some of them definitely qualify as “pointy-haired boss” material, a lot of them are just like developers: thoughtful people who are trying to figure out how best to accomplish the tasks that they need to get done. Like developers, they often look for answers in the literature of their industry. And, a lot like the technical industry, sometimes the answers they find are taken out of context, misquoted, misused, and they collect a bad reputation as a result. But as I've been reading through some management books, I've discovered that if you sort through the noise, there's a lot of interesting material that applies pretty well to the developer arena just as much as it does to the management arena.

Of course I wouldn't make that assertion without an example to back it up: benchmarking.

Benchmarking?

We in the technical arena think of benchmarking as comparing the performance of two products against a given suite of tests, as best exemplified by things like the TPC-n SQL database benchmarks, or the disastrous “PetShop” benchmark that Microsoft launched against the enterprise Java servers twelve years ago. And typically, those benchmarks don't prove much, since each vendor claims the exact same benchmark proves that their product is superior to all the others in some way, which somehow makes sense to each of the vendors but not to anybody else.

But benchmarking - as the management concept describes it - is actually a little more fundamental than that and, quite frankly, something that we as developers and technologists really need to do better.

Theory…

The basic theory behind benchmarking is that “if someone is doing something more successfully than you are, it makes sense to look over their shoulder and see what you can learn from them.” Historically, the term really began to see use when US manufacturers, accustomed to being the big dogs in the manufacturing arena, found themselves getting kicked to the curb by Japanese manufacturers. The most obvious example of this was in the automobile arena: Where Americans used to be pretty proud of their domestic vehicles, and Japanese cars were paragons of ineptitude and terrible quality. By the 1980's, that situation was quite obviously (and horrifyingly) reversed. Japanese cars were the ones running for decades at a time and American cars were falling apart literally as you drove them off the lot.

Benchmarking actually started about ten years prior, in the 70s, when Xerox (remember them?) was feeling a little behind the eight-ball, so to speak, from their competitors. They wanted to get better, but the first maxim of trying to get better - as any good debugger knows - is that you have to first figure out where the problems are. So they took all the key parts of their business, from sales through production and all the way through to maintenance, and measured them against their competitors. If the performance of the competitor's process was better in some way (quicker, cheaper, more efficient - you name it), Xerox resolved to match it.

You gotta admit, that takes some guts.

Benchmarking is a lens that can be focused in a variety of ways, on a number of different subjects. A company can benchmark internally, comparing different regions of the company against other regions of the company in the way they handle warranty claims or which departments deal with customer complaints more effectively. Internal benchmarks have the advantage that they can be a useful “dry run” before engaging in more audacious benchmarking as the management team learns how to benchmark, if nothing else.

External benchmarking - in which the company benchmarks against an external entity along the same axes of interest - can be much harder to do, and ideally should be a great deal more productive. (Internally, even scattered across the country, the company may be in an “echo chamber” situation where the principals in each region are already comparing notes informally, and no really new ideas are emerging.) Of course, doing it with direct competitors can be tricky, since you are technically looking for ways to beat those same people, but in some areas (health and/or organizations involved in public safety, for example) it's absolutely necessary.

The really fascinating thing is that many companies, seeking that external benchmark but leery of trying to benchmark against competitors or companies too close to their own industry, look to benchmark against entities in industries that are entirely orthogonal to their own but face similar challenges. For example: if you're an airport, and you're trying to benchmark your performance in moving massive numbers of people through the mandatory steps (check-in, TSA, etc.) in an airport, to where do you turn? Sure, other airports, but that's likely to be more of the echo chamber. Where else?

How about to a race course? Or a football stadium? British airport BAA did exactly this when they benchmarked against Ascot racecourse and Wembley football stadium, and got a number of good ideas for improving their own performance by doing so. (Personally, I'd have suggested Disney - that company knows how to manage lines like no place I've ever been.)

...and Parallels…

Interestingly enough, several groups within the software industry are already aware of some of this. The drive to benchmarking comes from the same basic drive that the Japanese are famous for, called “continuous improvement.” also known as kaizen (Japanese for “good change”), which some in the software craftsmanship community have adopted as their core mantra.

But what I find most interesting is that while much of the software craftsmanship community claims a greater-or-lesser strict adherence to the principles of kaizen, it's more or less nebulous and/or described by self-driven processes (code reviews, a set of agile principles that are more or less well-known by this point), whereas management benchmarking takes a much more holistic and broad approach.

In broad strokes, benchmarking doesn't have any set standards to start from - it can apply to any standard of performance, from production rates and defect levels to how you answer the phone. The key is to first analyze your own performance, using whatever metrics offers some kind of view of the item/task/metric being analyzed, then compare it against others' performance in the same area. The rest seems self-explanatory - if theirs is the superior result, you examine the differences between the two and look for ways to match them, or better yet, exceed them. In this respect, management benchmarking seems like a more focused subset of kaizen, with more in the way of analysis and follow-through.

...and Dangers

Of course, benchmarking has its downsides, too. Management consultants (Daniel Levinthal of Wharton business school being one of the most vocal) are now arguing against it, on the grounds that while imitation may be the sincerest form of flattery, it also runs the risk of simply enlarging the echo chamber: “Copying best practices may make you more efficient, but it will also make you look more like your competitors,” with the implicit assumption that looking just like your competitors means it becomes more difficult to differentiate from your competitors, and transforms your business from one competing on quality or features into one competing on price, which usually means that a “race to the bottom” will ensue.

Another danger, one near and dear to many readers' hearts, comes from Microsoft: the stack-rank. For those unfamiliar with this practice, Microsoft adopted the approach in which a given team was performance-ranked against one another: each member was ranked not against some objective standard, but against each other, for determination of bonuses and/or “performance plans” or discipline or termination. Where it may have started out as a way of trying to benchmark employees against one another, because it was the metric by which an employee's performance was measured, it quickly “went bad” and ended up turning employees against one another.

...and Practice

Certainly, it seems like benchmarking might be a tactic that the development arena could use - certainly we've tried on a limited basis with case studies, but those are usually vendor-driven and more about the technology than the people or the processes involved. And, granted, when developers from different companies meet, we often engage in an informal and limited form of the practice, usually under the guise of “Whose Job Sucks Worse?” games.

If we were going to take this management practice and apply it, it's actually a pretty straightforward exercise:

Analyze the results. This is going to depend on the data itself, but based on the benchmark's target, the analysis should be at least a good hint as to where the “gap” is. Once analyzed, if no obvious solutions present themselves, cast the net far and wide within the company to solicit suggestions, implement, and return to Step 5 for an internal benchmark after the implementation has “taken root.” Periodically return to Steps 5, 6, and 7 to make sure the change is still in place, and that the expected results are actually still there.

But a couple of caveats come with this:

  • Don't confuse benchmarking with taking a survey. Asking your developers to rate your team's effectiveness over SurveyMonkey is not benchmarking. You don't want numbers by themselves; you're looking for the analysis behind the numbers.
  • Don't confuse benchmarking with research. Asking, “How do they develop line-of-business applications which are similar in scope and nature to ours?” is a benchmark. Asking, “How do they develop Internet-of-Things applications, which is something we'd like to do, too?” is research.
  • Don't take on too much or change the goals. Monolithism and scope creep can kill non-software projects, too.

...and End

Benchmarking isn't going to solve all of software development's ills, but it's fascinating how something that the suit-and-tie crowd cooked up actually can be useful to the T-shirt-and-jeans crowd.

Maybe managers aren't quite as clueless as we thought they were.