Saturday, July 19, 2008

Beautiful Code: False Optimizations

This post is from the Beautiful Code series.

One of the first concepts that jumped out at me from Beautiful Code concerned false optimizations—that is, something added to the code which should have been an optimization, but as it turns out, actually causes the code to run slower.

As an example, we’ll look at some code that implements a binary search, from a chapter written by Tim Bray:

package Binary;

public class Finder {
public static int find(String[] keys, String target) {
int high = keys.length;
int low = -1;
while (high - low > 1) {
int probe = (low + high) >>> 1;
if (keys[probe].compareTo(target) > 0)
high = probe;
else
low = probe;
}
if (low == -1 || keys[low].compareTo(target) != 0)
return -1;
else
return low;
}
}
As mentioned, this function simply searches through some strings, using the binary search algorithm, and sees if there is a match. I just quickly read through the code, and didn’t realize—until Bray pointed it out—that there is no explicit check within the loop to see if the match has been made. One would expect him to add that in, as an optimization, so that when the target is found, the loop could exit, and not bother continuing with its checks. But as it turns out, that would be a false optimization; mathematically, the match is likely to be found late enough in the loop that the extra if statements during every execution of the loop would be more of a performance hit than simply letting the loop execute a few more times. Or, as Tim puts it:

Some look at my binary-search algorithm and ask why the loop always runs to the end without checking whether it’s found the target. In fact, this is the correct behavior; the math is beyond the scope of this chapter, but with a little work, you should be able to get an intuitive feeling for it—and this is the kind of intuition I’ve observed in some of the great programmers I’ve worked with.

Let’s think about the progress of the loop. Suppose you have n elements in the array, where n is some really large number. The chance of finding the target the first time through is 1/n, a really small number. The next iteration (after you divide the search set in half) is 1/(n/2)—still small—and so on. In fact, the chance of hitting the target becomes significant only when you’re down to 10 or 20 elements, which is to say maybe the last four times through the loop. And in the case where the search fails (which is common in many applications), those extra tests are pure overhead.

You could do the math to figure out when the probability of hitting the target approaches 50 percent, but qualitatively, ask yourself: does it make sense to add extra complexity to each step of an 0(log2 N) algorithm when the chances are it will only save a small number of steps at the end?

The take-away lesson is that binary search, done properly, is a two-step process. First, write an efficient loop that positions your low and high bounds properly, then add a simple check to see whether you hit or missed.

Let’s look at another example. This is a snippet of code from Elliotte Rusty Harold, from XOM, used in verifying XML documents. After verifying an XML namespace (which is a URI), it is cached; when a namespace is encountered, the parser can look in the cache, to see if it has already been verified, and, if so, not bother verifying it again. This class is the cache Elliotte uses.
private final static class URICache {

private final static int LOAD = 6;
private String[] cache = new String[LOAD];
private int position = 0;

synchronized boolean contains(String s) {

for (int i = 0; i < LOAD; i++) {
// Here I'm assuming the namespace URIs are interned.
// This is commonly but not always true. This won't
// break if they haven't been. Using equals() instead
// of == is faster when the namespace URIs haven't been
// interned but slower if they have.
if (s == cache[i]) {
return true;
}
}
return false;
}

synchronized void put(String s) {
cache[position] = s;
position++;
if (position == LOAD) position = 0;
}
}
This code is fairly simple. It’s a helper class used to maintain a list of namespace URIs that have been captured, and a method which will tell you if a URI is in the list or not. The surprising thing is that the helper class uses a static array, with exactly six items, and if you try to add a seventh, it simply overwrites the first one (and so on). Why not use a hash map, or a table? Something dynamic, which could grow as large as necessary? Why a statically-sized array?

Again, the answer is simple performance, because Elliotte Rusty Harold has seen enough XML documents to know that rarely will any document have more than six namespaces defined in it. So by using a statically-sized array, instead of something more dynamic, he gets a boost in performance in the majority of cases. In the case where a document does have more than six namespace URIs, validation will simply take a little longer for that document, because some namespace URIs may have to be verified twice. (If the parser comes across a URI which has already been verified, but has been overwritten in the cache, then it would have to be re-verified. Of course, if the URI hasn’t been overwritten, it still won’t have to be re-verified.)

This was counter-intuitive for me, at first; it seems like the “Right Thing” would be to always make the size of your list dynamic, to deal with as many elements as are necessary. In fact, at first glance, it almost seems like a rookie developer’s mistake, to use a statically-sized list. But in this case, it was done on purpose, and for very good reason.

So again, he’s not doing something that he doesn’t have to do, even though, at first glance, it would seem like an optimization. Without thinking about it, I’d have made the size of the array dynamic, so that I’d never have to verify a URI that had already been verified, but Elliotte thought more deeply about it, and realized that making the code more complex in that manner would not have been an improvement. And he’s right; that array will be accessed much more often than it’s added to, so he was able to optimize for the majority of cases, with a small potential performance hit in the minority of cases.

21 comments: