Wednesday, August 13, 2008

Beautiful Code: What Is Beautiful Code?

This post is part of the Beautiful Code series.

No complex programming concepts in this post. Just a question: What is beautiful code? What makes code ugly? (And isn’t that a matter of opinion? Pshaw…) Most of the posts so far have been about beautiful ideas encapsulated in the programs, but what about the code itself?

This post is all over the place, so brace yourself for it.

Reusability

To start with, some thoughts from Adam Kolawa.

…. Code reuse significantly reduces the effort required for code development, testing, and maintenance. In fact, it is one of the best ways to increase developers’ productivity and reduce their stress. The problem is that reuse is typically difficult. Very often, code is so complicated and difficult to read that developers find it easier to rewrite code from scratch than to reuse somebody else’s code. Good design and clear, concise code are vital to promoting code reuse.

Unfortunately, much of the code written today falls short in this respect. Nowadays, most of the code written has an inheritance structure, which is encouraged in the hope that it will bring clarity to the code. However, I must admit that I’ve spent hours on end staring at a few lines of code…and still could not decipher what it was supposed to do. This is not beautiful code; it is bad code with a convoluted design. If you cannot tell what the code does by glancing at the naming conventions and several code lines, then the code is too complicated.

Beautiful code should be easy to understand. I hate reading code that was written to show off the developer’s knowledge of the language, and I shouldn’t need to go through 25 files before I can really understand what a piece of it is really doing. The code does not necessarily need to be commented, but its purpose should be explicit, and there should be no ambiguity in each operation. The problem with the new code being written today—especially in the C++ language—is that developers use so much inheritance and overloading that it’s almost impossible to tell what the code is really doing, why it’s doing it, and whether it’s correct. To figure this out, you need to understand all of the hierarchy of inheritance and overloading. If the operation is some kind of complicated overloaded operation, this code is not beautiful to me.

This is an interesting quote—one with which I both agree and disagree. I definitely agree that code which is difficult to understand is not beautiful; Kolawa uses reusability as a criterion for beauty, which I think is a good criterion to use. I also agree that if it takes a few hours to understand what a piece of code is doing, that code is not beautiful, and not reusable.

And I also agree that code written to show off the developer’s knowledge of the language is not beautiful. You might have a very good reason why you had to take advantage of some obscure feature of the language, but if your program is going to be at all maintainable, you’re going to have to comment it so well that you might have been better off finding another way to do it, that other programmers would understand more easily. Either that or you don’t actually care about maintainability; you might feel that people should have to spend hours going through the language reference guides, trying to decipher your code—which is arrogant—or you might just be worried about job security, for which I have no respect.

However, at the same time, this quote also seems to be a diatribe against object-oriented programming. It almost sounds curmudgeonly; “In my day, we didn’t have no object-oriented programmin’. We had miles and miles of procedural code, and that’s the way we liked it!” I agree with Kolawa’s main points, but I disagree that object-oriented programming in general, or C++ in specific, are to blame for bad code (if that’s what he’s saying). You can write clear, concise, object-oriented code, and you can write really terrible object-oriented code. Just as you can with procedural code. It’s true that you have to get your mind around object-oriented programming before object-oriented code is going to make sense to you, but that doesn’t make object-oriented programming bad, or even necessarily more difficult; it’s just a different way of thinking about code. (I can’t believe I’m typing this in 2008.)

On the next page, Kolawa had another quote that I also found interesting:

My next criterion for beautiful code is that you can tell that a lot of thought went into how the code will be running on the computer. What I’m trying to say is that beautiful code never forgets that it will be running on a computer, and that a computer has limitations. As I said earlier in this chapter, computers have limited speed, sometimes operate better on floating-point numbers or integer numbers, and have finite amounts of memory. Beautiful code must consider these limitations of reality. Quite often, people writing code assume that memory is infinite, computer speed is infinite, and so on. This is not beautiful code; it’s arrogant code. Beautiful code is frugal about things like memory use and reuses memory whenever possible.

I have mixed feelings about this quote, too. I find myself wanting to agree, and I do agree under certain conditions, but I also disagree under other conditions. There are cases where a developer should not worry about memory, should not try to optimize for particular CPU architectures, should not worry about the low-level details of a computer. (After all, computer science has been trying to abstract these concepts away almost since its inception.) Writing code in C++ that will run on a Windows desktop is much different from writing code in Java that will run on a J2EE application server.

But, to Kolawa’s point, using memory as an example, even if Java is supposed to abstract away details about memory cleanup, that doesn’t mean that Java code running on a J2EE server can use up as much memory as it wants. Just because some of the details are no longer important, you still have to put some thought into how much memory you’re allocating. Maybe loading up 10MB worth of data into a user’s session every time the user logs in to your web site is a really bad idea. The fact that Java will automatically garbage collect objects when they’re no longer used isn’t going to help you in this case, you’re just wastefully using up too much memory.

The Actual Source Code—The Seven Pillars

This is all well and good, but it’s still a bit too conceptual; what about the code itself? The semi-colons and the tabs and spaces and newlines? What makes that beautiful? Chapter 32 talked about that, based on an article by Christopher Seiwald called Seven Pillars of Pretty Code. I won’t bother to list all of the “pillars”, you can read the article if you wish, but some examples they gave are making code “bookish”, making alike look alike, and overcoming indentation.

“Bookish” Code

When they say making code bookish, they’re talking about a couple of things, which can be summed up by laying out your code the way that text is laid out in books or magazines; “columns” of code shouldn’t be too wide, and it should be broken up into chunks, not put in continuous blocks. The writers commented on this:

Research also seems to show that, when it comes to line lengths of text, there’s a difference between reading speed and reading comprehension. Longer lines can be read faster, but shorter lines are easier to comprehend.

Chunked text is also easier to comprehend than a continuous column of text. That’s why columns in books and magazines are divided into paragraphs. Paragraphs, verses, lists, sidebars, and footnotes are the “transition markers” of text, saying to our brains, “Have you grokked everything so far? Good, please go on.”

As a side note, as of this writing, I still haven’t got around to fixing the width of the text in my blog. My apologies to anyone who’s reading this on a really wide widescreen display. (As another side note, see the definition of “grok” from the Jargon File, if you’re not familiar with the term.)

Making Alike Look Alike

For making alike look alike, they’re just saying that when code blocks that are similar in nature look similar to each other, it’s much easier for the brain to comprehend what’s going on at a glance. They give an example, that looks like this:

while( d.diffs == DD_CONF && ( bf->end != bf->lines() ||
lf1->end != lf1->lines() ||
lf2->end != lf2->lines() ) )
Even if you know nothing about this code—as I don’t—you can easily see how the conditions within that while loop are all similar to each other. The fact that each test looks the same (compare the end member to the result of the lines() method—and not the other way around—indent them all to the same place) makes it easier to comprehend this code.

Overcoming Indentation

And finally—or at least, finally for the points that I’m going to mention here—they mention overcoming indentation. By which they are not saying that you shouldn’t indent code! What they’re saying is that you should avoid nested code as much as possible. (Which, in so doing, will reduce indentation. Which also does help, by the way, since it helps with keeping your columns of text narrow, although the main point here is that we’re trying to avoid nested logic as much as possible.) If you can, try and avoid having nested conditionals in your code; they’ve got a bunch of statistics showing how much harder it is to maintain deeply nested code than code that isn’t nested.

For example, suppose I want to write a function to calculate a tip, based on a bill. But I only want to calculate the tip if it hasn’t already been included on the bill; if it has, the tip is 0. If I had some kind of DTO object, with details about the bill, I might write a method like this:

float calculateTip(Bill bill) {
if (!bill.includesTip) {
for (int i = 0; i < bill.lineItems.length; i++) {
float subTotal += bill.lineItems[i].cost;
}

return subTotal * 0.15;
}

else {
return 0.0;
}
}
This does exactly what I’d said above, and calculates the tip only if it hasn’t already been included on the bill. Unfortunately, it means that the bulk of the code for this method has to be included within that if statement. And that means that the code is inherently a little bit harder to understand. Although the logic isn’t too complex, in this case, when you’re reading that for loop, you do have to keep in mind that this is within the if statement, meaning that it only happens when the tip is not included on the bill.

But there’s another way of looking at this logic; if the bill includes the tip, we can return 0 and exit the method right away:

float calculateTip(Bill bill) {
if (bill.includesTip) {
return 0.0;
}

for (int i = 0; i < bill.lineItems.length; i++) {
float subTotal += bill.lineItems[i].cost;
}

return subTotal * 0.15;
}
Although it logically does the same thing as the version above, the fact that it’s got less nested logic makes it inherently a bit easier to read and comprehend. We took care of the logic of determining if the bill already includes the tip, and once that is done, we can carry on with the rest of the code, and not worry about it again.

Self-Documenting Code

Interestingly, I was surprised at an aspect of code that wasn’t mentioned in the book, but since I’m talking about pretty code, I’ll mention it here. In Martin Fowler’s excellent book Refactoring: Improving the Design of Existing Code—a book I highly recommend—he introduced me to the concept of “self-documenting code”. Let me give a silly example, using Java-like pseudocode:

/* note that I've hard-coded the amounts for tax and tip,
but this isn't real code, it's just illustrating a point,
and in real life I would never do that, blah blah blah */

Bill calculateBill(BillLineItem[] lineItems) {
Bill bill = new Bill();

bill.lineItems = lineItems;

//calculate and set bill subtotal
for(int i = 0; i < lineItems.length; i++) {
float total += lineItem[i].cost;
}
bill.subTotal = total;

//calculate and set tax (PST and GST)
bill.provincialTax = bill.subTotal * 0.06;
bill.federalTax = bill.subTotal * 0.05;

//calculate and set tip
bill.tip = bill.subTotal * 0.15;

return bill;
}
To make the code self-documenting, you might refactor it to make it more like this:

Bill calculateBill(BillLineItem[] lineItems) {
Bill bill = new Bill();

bill.lineItems = lineItems;

calculateAndSetBillSubtotal(bill);

calculateAndSetTax(bill);

calculateAndSetTip(bill);

return bill;
}

void calculateAndSetBillSubtotal(Bill bill) {
for(int i = 0; i < bill.lineItems.length; i++) {
float total += bill.lineItems[i].cost;
}

bill.subTotal = total;
}

void calculateAndSetTax(Bill bill) {
bill.provincialTax = bill.subTotal * 0.06;
bill.federalTax = bill.subTotal * 0.05;
}

void calculateAndSetTip(Bill bill) {
bill.tip = bill.subTotal * 0.15;
}
In other words, where possible, when you have a comment in front of a block of code, you can break it out into its own function instead, and name the function such that it replaces the comment. In general, this simplifies the calling function, and promotes reuse, since the smaller chunk of code is more likely to be usable somewhere else. (Not guaranteed to be reusable, obviously, but there’s more possibility of reusing a smaller piece of code than a larger piece, in general. The larger a block of code is, the more chance that it will have a specialized side effect that you don’t want, when you’re trying to reuse it.)

Now I know—I know!—that I’m going to get a bunch of comments on this example, talking about execution speed and optimization. Especially since the method in question was so simple in the first place. “Why would you take the performance hit of making that a function call… blah blah blah…” Yes, yes, stay with me folks, there are always multiple tradeoffs to consider. But we’re talking about making the code understandable, and if you can make the code self-documenting, instead of putting in explanatory comments, it’s easier to read, and therefore, easier to maintain. (I don’t have statistics to back that up. If you disagree and would like to comment on the fact, feel free—I’m getting good at separating the wheat from the chaff in my comments.)

If you look at the new calculateBill() method, you can see the high-level logic much easier than you could with the previous version. It’s true that you can’t see the details of everything that it’s doing all at once—and that’s the point. You don’t need to know every detail of everything this method is doing all at once. If you’re reading the code to get an idea of what it’s doing, you can look at the calculateBill() method, and not get sidetracked with details about how the tip is calculated; conversely, if you’re tasked with fixing a bug in how the code calculates a tip, you can glance at the calculateBill() method, see that you should be looking at the calculateAndSetTip() method, and concentrate your energies there. And, again, not be bogged down with details on how tax is calculated.

Code Should Be Concise

Another aspect of beautiful code, which was mentioned numerous times by numerous authors in the book, is that it should be concise. I was reminded of this by another quote from Diomidis Spinellis (who was mentioned in the Computer-Generated Code post):

I always feel elated if, after committing a refactoring change, the lines I add are less than the lines I remove.

And I have to say that I have the same feelings. When I’m modifying some code, I get a great feeling of satisfaction when I can make the code smaller, rather than larger. But it should be noted that conciseness is something you have to work at. Typically, you will write code that is longer, and only with some extra thought can you make it more concise.

Along these lines, my email sig includes a bastardization of a quote from Blaise Pascal; it says:

Sorry this email was so long—I didn’t have time to make it shorter.

0 comments: