Monday, June 21, 2010

Voice Recognition

I read a nice post on Coding Horror today about voice recognition—and the fact that it’s still not here yet. (And may never come.) I’m not sure what I can add to his post, though, other than just saying “me too” over and over.

Atwood mentions the Google app you can get for your iPhone, and its voice recognition feature. My experience has been slightly different than his—I’ve found that it works remarkably well—but at the same time, I think of it more like a novelty, not a real useful thing. It’s fun to pull it out and speak into it, and have it automatically perform a Google search for me, especially if I want to show it off to others, but when I want to actually search for something… I pull up Safari, and type it into the Google search text box. (If I had my dream phone, I’d just do the search right from the phone’s “desktop” and not even pull up the browser until I found a result I needed!) I use Safari instead of the Google app with its voice recognition for a couple of reasons:

  1. It’s faster. Safari loads faster than the Google app on my iPhone, and if I find the result I want, it’s going to end up loading Safari anyway. So it’s much faster to just start with Safari in the first place, and cut out the middle man.
  2. Add to that the times when the voice recognition doesn’t work, and you have to do the search over and over, vs. just typing it in and getting it right the first time.
Plus there’s the whole speaking at your phone thing. Aside from the coolness factor, is there actually any benefit to saying your Google search at your phone, instead of typing it in? Any benefit whatsoever?

I was also right there with Atwood when it comes to dictation. He mentions that someone had had the idea of having him and Joel Spolsky use voice recognition software transcribe their podcast, and I was thinking of when I was looking into doing something similar for our church. We were going to start putting our sermons online, and I was thinking that having a textual version of the sermon would be very handy for things like Google searches, so I was playing around with Microsoft Word’s speech recognition. Which, again, is very good. But… not good enough, it turns out. In fact, I was trying to do some tests, using Microsoft Word, and one word that it could just never get right was “verse”. Imagine trying to transcribe a sermon without using the word “verse”! (To get a feel for why this is important, go back through any of the sermons we’ve got online, and see how often the pastor is referring to this or that verse, as he refers to passage after passage.) It’s possible that the speech recognition might have done a good job, and I’d just have to go through and correct it, but I’m with Atwood on this one, too:
Maybe it’s just me, but the friction of the huge error rate inherent in the machine transcript seems far more intimidating than a blank slate human transcription. The humans may not be particularly efficient, but they all add value along the way—collective human judgment can editorially improve the transcript, by removing all the duplication, repetition, and “ums” of a literal, by-the-book transcription.
I actually approached it very optimistically, but in my testing quickly came away with the idea that it wouldn’t work out well in practice. (What we ended up doing is coupling the pastor’s sermon notes with the audio for the sermon. It’s not the best solution—pastors often end up straying from their notes, so the notes won’t always match up with the actual sermon—but I think it’s a good compromise.)

So even though I sometimes find the iPhone’s text input kind of annoying, I’ll still choose it over the Google voice recognition any day. And do—every day.

1 comments: