Sunday, July 20, 2008

Beautiful Code: False Optimization Redux

This post is part of the Beautiful Code series.

Here we’re going to look at the concept that you should know your usage patterns, before you try to optimize for them. (This is pretty much just an addendum to the False Optimizations post.)

Andrew Kuchling wrote a chapter about Python’s dictionary implementation. Why is Python’s dictionary implementation beautiful? One reason is the way that it determines its size: the dictionary always keeps a number of spots open, for new insertions, and any time the dictionary gets too full (that is, too few empty spots), it quadruples in size, to make more room. And what happens when you remove an item from the dictionary? Well… nothing. A Python dictionary expands when necessary, but never bothers to contract.

Any why is this? Because of the way that Python dictionaries are used:

This means that dictionaries are never resized on deletion. If you build a large dictionary and then delete many keys from it, the dictionary’s hash table may be larger than if you’d constructed the smaller dictionary directly. This usage pattern is quite infrequent, though. Keys are almost never deleted from the many small dictionaries used for objects and for passing function arguments. Many Python programs will build a dictionary, work with it for a while, and then discard the whole dictionary. Therefore, very few Python programs will encounter high memory usage because of the no-resize-on-deletion policy.

Honestly, I would have put this in the False Optimizations post, except that I’d already posted it by the time I started writing about this chapter. But the concept is the same: It seems that the right thing to do is to have the dictionary expand and contract as necessary. But since most Python programs don’t spend a lot of time removing items from dictionaries, it’s not worth the effort to build that functionality into the Python dictionary implementation. For a tradeoff of taking up a bit more memory, the Python dictionary implementation means that deleting from a dictionary is quicker (since no shrinking of the dictionary needs to happen), and that the Python dictionary itself is easier to maintain for the people who build Python itself, because less code means easier to maintain code.


Anonymous said...

"growth without limits is the philosophy of a cancer cell" -- Ed Abby, I think.

I'm not familiar enough with Python to have an informed opinion, BUT... Hopefully weekly reindexing and size limitation relative to volume size is implemented by default on any data store that cannot elegantly manage its own size dynamically.

Otherwise, it's just one developer placing his/her needs on a pedestal and ignoring the possible needs of other users. Like, I don't need to throw out the empty juice cans in the refrigerator until I buy some new juice, because I simply ignore the possibility that my my housemates might have a temporary need for the same space.


Anonymous said...

Seems like a horrible book. It promotes lousy programming practices.

Hash tables are expensive to shrink so no wonder Python chose to never do it. But is that beautiful? No, it's lazy.

A beautiful solution would be to chose the right data structure!
Unfortunately that can't be done with no knowledge how it will be used, which only the programmer knows.

The morale... umm... general purpose blanket types like basically everything in Python or the like (i.e. Ruby...) are never optimal.
Scripting languages make coding a lot easier but they should let and even encourage the programmer to chose the best datatype themselves.

Is there such a language?