Monday, August 23, 2010

What's in a name?

When I first started at OTI, and as we were slowly digested into IBM, our team didn't have any formal coding guidelines. We were a small group, and everyone basically followed the same unwritten conventions and tried to keep things sensible.

As we grew it became apparent that this wasn't going to scale. So a few years ago we wrote down our coding guidelines for C and C++. (We write Smalltalk, too, but we've never had to codify the Smalltalk conventions. We do a lot less Smalltalk now, so we probably never will.) Most of them are fairly straightforward, but one overarching theme is simplicity, simplicity, simplicity.

I believe that reading code is significantly harder than writing code. (I think the opposite is true of prose.) When you're writing a new function you've already worked out the structure in your head (hopefully) and you just need to translate that into something a compiler can understand. But when you're reading code you've got to reverse-engineer the structure from the simple instructions to the compiler. Plus there's a multiplier: on any large project you'll probably come back to debug and read certain pieces of code dozens or hundreds of times, but (hopefully) you only need to write it once. So not only is reading harder, but you'll do it much more frequently.

One of our most important guidelines, and one which seems to be fairly unusual, is that variable names need to be spelled out. Don't use idx when index will do. Don't use firstOpt where firstOption (or firstOptimization?) will do.

It seems fairly minor when you're writing the code, but it's only a few extra keystrokes and it makes your code that much easier to decipher later on. Why waste a few brain cycles (regardless how minor) to expand objcnt, when you could have just written objectCount in the first place? (Or object_count if you don't like CamelCase.)

(Of course there are few exceptions. You can use i in a for loop, since it's such a common idiom. You can use abbreviations if they're at least as commonly used in speech as their expansions.)

So why is this so hard for new team members to follow sometimes? I think it largely comes down to bad examples, and there are a few reasons for this.

Computer science developed largely out of mathematics. Mathematicians love using short names. First they go through the Latin alphabet, then they capitalize all the letters, then they start stealing letters from other alphabets, and finally, like Prince, they just start making up new symbols. If they simply can't find a symbol for some concept, they might condescend to string together two or maybe three letters, but anything more than that seems to make them uncomfortable. But even very complex theorems are unlikely to use more than a few dozen variables and constants, and they've got a few hundred years of precedent, so they can get away with it.

Other sources of bad abbreviation precedents, I think, are academic papers. Because of the standard two column layout used in most journals, code examples must fit in narrow columns. I counted 36 characters in the examples here. Don't they know that impressionable young students read these things?

Finally, remember that you're not the only one who needs to read your code. Once you've moved on to newer and cooler projects someone else might have to read and maintain that code. So be polite and make their job a little bit easier -- please don't abbreviate your variable names!

1 comment:

  1. Great post, Peter. I completely agree with you. I take the view that code is primarily a notation for documenting a solution to a problem, and that the solution is intended to be read by humans. So clearly communicating the solution is key. The fact that we can compile and execute that solution is secondary. When you think like that you start optimizing for human consumption. Sure, we all know that the solution must execute fast, but let's not optimize for execution too early, otherwise you're at risk of trading human readability for insignificant runtime performance gains.

    In addition to writing code for easy reading, I also find myself writing code for easy debugging, since I tend to do that a lot too. This leads me to writing painfully simple code, be it a little verbose. For example, my code uses plenty of temporary variables, and complicated conditions are always placed in boolean methods that have intention revealing names.