Wednesday, September 13, 2006


I’ve been thinking a lot about wikis lately. (I love the irony—or appropriateness, depending on your point of view—of linking to Wikipedia for the definition of “wiki”.)

If you’re not familiar with the concept, a wiki is essentially an “open source encyclopedia”. Articles can be written, and edited, by anyone. Some would argue that “encyclopedia” is too specific of a term, but I’m using it in a loose sense; a wiki can be “about” anything. I’ve been thinking about wikis because they can be very useful within an organization / corporation / whatever for internal documentation; instead of setting up an intranet site, or a SharePoint site with lots of Word and PowerPoint docs, or a “knowledge base” or source control system with lots of documents, the organization can set up a wiki instead. In time, as people edit articles and create new ones, the knowledge which is captured becomes more and more complete.

And, of course, you just can’t have a discussion about wikis these days without mentioning Wikipedia. Many of my posts to this blog contain links to the English version of Wikipedia, for definitions or discussions of various topics.

I’m fascinated by Wikipedia, and, at the same time, have all of the same reservations that everyone else has about a “collaborative encyclopedia”. Namely: Who’s to say that the information I read on Wikipedia is accurate? Or, put another way: Who’s to say the person who wrote an article in Wikipedia really knows what s/he is talking about? I was reading an article on about wikis, and it provided a link to an article by the CEO of Britannica, Robert McHenry, called The Faith-Based Encyclopedia, which made this point very clearly. (Wikipedia lovers will immediately point out the bias of an article being written by the CEO of a “traditional” encyclopedia, criticizing a wiki-based encyclopedia.)

Here is a [rather long] quote, discussing the Wikipedia article for Alexander Hamilton:

I know as well as anyone and better than most what is involved in assessing an encyclopedia. I know, to begin with, that it can’t be done in any thoroughgoing way. The job is just too big. Professional reviewers content themselves with some statistics—so many articles, so many of those newly added, so many index entries, so many pictures, and so forth—and a quick look at a short list of representative topics. Journalists are less stringent. To see what Wikipedia is like I chose a single article, the biography of Alexander Hamilton. I chose that topic because I happen to know that there is a problem with his birth date, and how a reference work deals with that problem tells me something about its standards. The problem is this: While the day and month of Hamilton’s birth are known, there is some uncertainty as to the year, whether it be 1755 or 1757. Hamilton himself used, and most contemporary biographers prefer, the latter year; a reference work ought at least to note the issue.

The Wikipedia article on Hamilton (as of November 4, 2004) uses the 1755 date without comment. Unfortunately, a couple of references within the body of the article that mention his age in certain years are clearly derived from a source that used the 1757 date, creating an internal inconsistency that the reader has no means to resolve. Two different years are cited for the end of his service as secretary of the Treasury; without resorting to another reference work, you can guess that at least one of them is wrong. The article is rife with typographic errors, styling errors, and errors of grammar and diction. No doubt there are other factual errors as well, but I hardly needed to fact-check the piece to form my opinion. The writing is often awkward, and many sentences that are apparently meant to summarize some aspect of Hamilton’s life or work betray the writer’s lack of understanding of the subject matter. A representative one runs thus:

“Arguably, he set the path for American economic and military greatness, though the benefits might be argued.”

All these arguments aside, the article is what might be expected of a high school student, and at that it would be a C paper at best. Yet this article has been “edited” over 150 times. Some of those edits consisted of vandalism, and others were cleanups afterward. But how many Wikipedian editors have read that article and not noticed what I saw on a cursory scan? How long does it take for an article to evolve into a “polished, presentable masterpiece,” or even just into a usable workaday encyclopedia article?

The history page for this article reveals a most interesting story. Originally, the 1757 birth date was used. Thus the internal inconsistencies of ages and dates that I saw are artifacts of editing. Originally, the two citations of the year Hamilton resigned from the Cabinet agreed; editing has changed one but not the other. In fact, the earlier versions of the article are better written overall, with fewer murky passages and sophomoric summaries. Contrary to the faith, the article has, in fact, been edited into mediocrity.

In other words, if you were writing a paper on Alexander Hamilton, and went to Wikipedia to find out when he was born, the information wouldn’t be accurate. That is, as of November 4th, 2004 it wasn’t—as of this writing, September 13th, 2006, there was indeed a mention at the top of the article that Hamilton may have been born in 1755 or 1757. And so the Wikipedia supporters would say that the collaborative essence of Wikipedia has won the day.

If, that is, the article stays that way. But what if some know-nothing decides that he knows better than the “experts”, and figures that it’s just simpler to leave the date as 1757? I found another article, called Why Wikipedia Must Jettison Its Anti-Elitism—by Larry Sanger, one of Wikipedia’s founders—which makes a good case that the culture of Wikipedia might make this likely.

It’s one thing to develop an open-source encyclopedia, where anyone can contribute. But it’s quite another to develop a culture which has no place for “expertise”. Mr. Sanger’s point—if I may sum up—is that expertise should be prized, and that, while the collaborative nature of Wikipedia should be encouraged and made to flourish, there should also be a subset of articles which are “vetted”, meaning that people who actually know what they’re talking about have put their seal of approval on it. As he says in his article:

…as a community, Wikipedia lacks the habit or tradition of respect for expertise. As a community, far from being elitist (which would, in this context, mean excluding the unwashed masses), it is anti-elitist (which, in this context, means that expertise is not accorded any special respect, and snubs and disrespect of expertise is tolerated).

And a bit later on:

Consequently, nearly everyone with much expertise but little patience will avoid editing Wikipedia, because they will—at least if they are editing articles on articles that are subject to any sort of controversy—be forced to defend their edits on article discussion pages against attacks by nonexperts. This is not perhaps so bad in itself. But if the expert should have the gall to complain to the community about the problem, he or she will be shouted down (at worst) or politely asked to “work with” persons who have proven themselves to be unreasonable (at best).

And finally:

I know, of course, that Wikipedia works because it is radically open. I recognized that as soon as anyone; indeed, it was part of the original plan. But I firmly disagree with the notion that that Wikipedia-fertilizing openness requires disrespect toward expertise. The project can both prize and praise its most knowledgeable contributors, and permit contribution by persons with no credentials whatsoever. That, in fact, was my original conception of the project. It is sad that the project did not go in that direction.

This discussion rings very true to me, on a smaller scale, because of my work on Beginning XML. As a technical reference book, there is always, of course, a set of “technical editors” who work on any edition of the book. These are people who are recognized experts in the field, and the purpose of having them is to make sure that my writing—and that of the other authors—is vetted. If I get one of my facts wrong, one of the technical editors should catch it, and it will be fixed before the book goes to print. And let me be clear: the vast majority of the technical editors who have worked on Beginning XML have been good, knowledgeable, and helpful, and the book is much better for their contribution.

That being said, there have been one or two—mostly on the first edition—who were not helpful, and not especially knowledgeable. And those are the people who are frustrating to work with; when you get a guy who keeps insisting that one of your facts is wrong, even though it’s well documented—and the other experts agree that it’s correct—it can be very difficult to work with him. Luckily, a book is not a democracy, and you can ensure that the correct facts get printed (usually), and the incorrect opinions do not.

But in Wikipedia, that’s not the case. If this person had been reading the Wikipedia article on XML, instead of doing technical editing for the book, he could have simply edited the article, and put in whatever he wanted. (Heck, for all I know, maybe he did.) The Wikipedia supporters would tell you that, in theory, people would eventually un-do his edits, or further edit the edits, until the article ended up “correct”. Of course, as the Alexander Hamilton example shows, that might not happen.

So far, I’ve been talking mostly about facts vs. ignorance. But there’s another issue that muddies the waters even further: Controversy. What happens when a topic is controversial, and an “edit war” springs up? The wiki article mentioned this topic, and used the Wikipedia article on George Bush as an example:

It is easy to understand why the George W. Bush page might be a battleground. There are many, many people who love George W. Bush, and there are many, many people who despise him. Those who love him naturally want to emphasize things about George Bush that match their view of the man. In the same way, so do those who despise him. Thus, you can get dozens of people editing and re-editing the article to express their point of view.

The interesting thing about an edit war like this is that, with a controversial topic, it is completely natural and to be expected. Both sides have their unique point of view, and those views are incompatible. However, the outcome of the conflict is interesting, and you can see it if you read the George W. Bush page.

Both parties have to reach consensus on the page, and that eventually causes the page to achieve a neutrality and objectivity that satisfies both parties. Controversial topics, like Bush’s National Guard service, move to separate pages so they can be dealt with separately. In general (and excluding cases of lame edit wars), the process actually works.

But does it? Let’s take a look at some samples, from the aforementioned article on Wikipedia:

Bush won the 2000 presidential election as the Republican candidate in a close and controversial contest. Although he did not secure a majority of the popular vote, he did win the required number of electoral votes after a very close battle in the state of Florida.

Wait a minute… no he didn’t win the required number of electoral votes. Unfortunately, although this might be a “fact”—and well documented—it’s much too controversial to put into an encyclopedia article, because there are enough conservatives arguing that yes, he did win. So instead, the article has come to this wording through consensus—the people with different “views” on this had to eventually agree on this wording.

Moving on:

During his first term, Bush sought and obtained Congressional approval for two additional tax cuts: the Job Creation and Worker Assistance Act of 2002 and the Jobs and Growth Tax Relief Reconciliation Act of 2003. These acts increased the child tax credit and eliminated the so-called “marriage penalty.” Arguably, cuts were distributed disproportionately to higher income taxpayers through a decrease in marginal rates, but the change in marginal rates was greater for those of lower income, resulting in an income tax structure that was more progressive overall. Complexity was increased with new categories of income taxed at different rates and new deductions and credits, however; at the same time, the number of individuals subject to the alternative minimum tax increased since it had remained unchanged.

Whoa. So the article is arguing that Bush’s tax cuts were “progressive”, and not just beneficial to the rich? Unfortunately, this is just plain wrong—the tax cuts definitely helped out the rich, to the detriment of the less-well-off. However, this is considered controversial, because there are enough conservative voices claiming that it’s not the case. So the best the article does, in coming to consensus, is include the word “arguably”.

In essence, because the articles are created and edited by groups, they don’t settle on truth, so much as they settle on consensus. Or, to put it another way, articles in Wikipedia aren’t “correct”, so much as they’re “agreed upon by the ‘community’”.

These are two examples; I didn’t read the entire article, looking for more. But in reading as much as I did, I did notice a less perceptible slant to the article: although it was usually worded very carefully, to be factually correct, it was also written in such a way that it cast the president in a favourable light. So, for example, when talking about the Iraq war, the article says “the U.S. promoted urgent action in Iraq, stating that Iraqi President Saddam Hussein possessed weapons of mass destruction”—which is true, if by “the U.S.” you mean “Bush”—and “Bush argued Saddam… had tried to acquire nuclear material, had not properly accounted for Iraqi biological weapons and chemical weapons material in violation of U.N. sanctions, and that some Iraqi missiles had a range greater than allowed by the UN sanctions”—which is also true, Bush did argue this, but the article doesn’t mention that this was all complete hogwash.

The lesson to be learned is that you always have to take Wikipedia articles with a grain of salt. Which should be obvious to anyone who’s been on the internet for any length of time; however, because of the authoritative nature of an encyclopedia, I think people tend to be more trusting of Wikipedia than they otherwise might. I know I sometimes am; if I don’t know anything about the Milgram experiment, and look it up on Wikipedia, I just figure “well, I don’t know anything about it, so the person who wrote the article must know more than me!”

So how does all of this affect my thoughts on using wikis for internal documentation? It doesn’t, actually. Having a wiki for your organization, where documentation can be stored and/or collaboratively written, is a much different thing than an online encyclopedia, for the entire internet to see. So, although I find all of this fascinating, I’m still planning to try and set something up, and see how easy it is to get an internal wiki up and running.