Sunday, July 20, 2008

Beautiful Code: False Optimization Redux

This post is part of the Beautiful Code series.

Here we’re going to look at the concept that you should know your usage patterns, before you try to optimize for them. (This is pretty much just an addendum to the False Optimizations post.)

Andrew Kuchling wrote a chapter about Python’s dictionary implementation. Why is Python’s dictionary implementation beautiful? One reason is the way that it determines its size: the dictionary always keeps a number of spots open, for new insertions, and any time the dictionary gets too full (that is, too few empty spots), it quadruples in size, to make more room. And what happens when you remove an item from the dictionary? Well… nothing. A Python dictionary expands when necessary, but never bothers to contract.

Any why is this? Because of the way that Python dictionaries are used:

This means that dictionaries are never resized on deletion. If you build a large dictionary and then delete many keys from it, the dictionary’s hash table may be larger than if you’d constructed the smaller dictionary directly. This usage pattern is quite infrequent, though. Keys are almost never deleted from the many small dictionaries used for objects and for passing function arguments. Many Python programs will build a dictionary, work with it for a while, and then discard the whole dictionary. Therefore, very few Python programs will encounter high memory usage because of the no-resize-on-deletion policy.

Honestly, I would have put this in the False Optimizations post, except that I’d already posted it by the time I started writing about this chapter. But the concept is the same: It seems that the right thing to do is to have the dictionary expand and contract as necessary. But since most Python programs don’t spend a lot of time removing items from dictionaries, it’s not worth the effort to build that functionality into the Python dictionary implementation. For a tradeoff of taking up a bit more memory, the Python dictionary implementation means that deleting from a dictionary is quicker (since no shrinking of the dictionary needs to happen), and that the Python dictionary itself is easier to maintain for the people who build Python itself, because less code means easier to maintain code.

2 comments: