Inside Google's Age of Augmented Humanity: Part 1, New Frontiers of Speech Recognition

1/3/11Follow @wroush

(Page 4 of 5)

very important,” he says. That’s mainly because of the user-interface problem: phones are small and it’s inconvenient to type on them.

“At the time, Google had barely any effort in mobile, maybe four people doing part-time stuff,” Cohen says. “In my interviews, I said, ‘I realize you can’t tell me what your next plans are, but if you are not going to be serious about mobile, don’t make me an offer, because I won’t be interested in staying.’ I felt at the time that mobile was going to be a really important area for Google.”

As it turned out, of course, Cohen wasn’t the only one who felt that way. Schmidt and Google co-founders Larry Page and Sergey Brin also believed mobile phones would become key platforms for browsing and other search-related activities, which helped lead to the company’s purchase of mobile operating system startup Android in 2005.

Cohen built a whole R&D group around speech technology. Its first product was goog-411, a voice-driven directory assistance service that debuted in 2007. Callers to 1-800-GOOG-411 could request business listings for all of the United States and Canada simply by speaking to Google’s computers. The main reason for building the service, Cohen says, was to make Google’s local search service available over the phone. But the company also logged all calls to goog-411, which made it “a source of valuable training data,” Cohen says: “Even though goog-411 was a subset of voice search, between the city names and the company names we covered a great deal of phonetic diversity.”

And there was a built-in validation mechanism: if Google’s algorithms correctly interpreted the caller’s prompt, the caller would go ahead and place an actual call. It’s in many such unobtrusive ways (as Schmidt pointed out in his Berlin speech) that Google recruits users themselves to help its algorithms learn.

Google shut down goog-411 in November 2010—but only because it had largely been supplanted by newer products from Cohen’s team such as Voice Search, Voice Input, and Voice Actions. Voice Search made its first appearance in November 2008 as part of the Google Mobile app for the Apple iPhone. (It’s now available on Android phones, BlackBerry devices, and Nokia S60 phones as well.) It allows mobile phone users to enter Google search queries by speaking them into the phone. It’s startlingly accurate, in part because it learns from users. “The initial models were based on goog-411 data and they performed very well,” Cohen says. “Over time, we’ve been able to train with more Voice Search data and get improvements.”

Google isn’t the only company building statistical speech-recognition models that learn from data; Cambridge, MA, startup Vlingo, for example, has built a data-driven virtual assistant for iPhone, Android, BlackBerry, Nokia, and Windows Phone platforms that uses voice recognition to help users with mobile search, text messaging, and other tasks.

But Google has a big advantage: it’s also a search company. Before Cohen joined Google, he says, “they hadn’t done voice search before—but they had done search before, in a big way.” That meant Cohen’s team could use the logs of traditional Web searches at Google.com to help … Next Page »

Wade Roush is a contributing editor at Xconomy. Follow @wroush

Single Page Currently on Page: 1 2 3 4 5 previous page

By posting a comment, you agree to our terms and conditions.