Inside Google's Age of Augmented Humanity: Part 1, New Frontiers of Speech Recognition
Editor’s Note: This is Part 1 of a three-part story that we originally published on January 3, 5, and 6, 2011. We’re highlighting it today because the series was just named by Longform.org as one of its top technology stories of 2011.
Already, it’s hard for anyone with a computer to get through a day without encountering Google, whether that means doing a traditional Web search, visiting your Gmail inbox, calling up a Google map, or just noticing an ad served up by Google Adsense. And as time goes on, it’s going to get a lot harder.
That’s in part because the Mountain View, CA-based search and advertising giant has spent years building and acquiring technologies that extend its understanding beyond Web pages to other genres of information. I’m not just talking about the obvious, high-profile Google product areas such as browsers and operating systems (Chrome, Android), video (YouTube and the nascent Google TV), books (Google Book Search, Google eBooks), maps (Google Maps and Google Earth), images (Google Images, Picasa, Picnik), and cloud utilities (Google Docs). One layer below all of that, Google has also been pouring resources into fundamental technologies that make meaning more machine-tractable—including software that recognizes human speech, translates written text from one language to another, and identifies objects in images. Taken together, these new capabilities promise to make all of Google’s other products more powerful.
The other reason Google will become harder to avoid is that many of the company’s newest capabilities are now being introduced and perfected first on mobile devices rather than the desktop Web. Already, our mobile gadgets are usually closest at hand when we need to find something out. And their ubiquity will only increase: it’s believed that 2011 will be the year when sales of smartphones and tablet devices finally surpass sales of PCs, with many of those new devices running Android.
That means you’ll be able to tap Google’s services in many more situations, from the streets of a foreign city, where Google might keep you oriented and feed you a stream of factoids about the surrounding landmarks, to the restaurant you pick for lunch, where your phone might translate your menu (or even your waiter’s remarks) into English.
Google CEO Eric Schmidt says the company has adopted a “mobile first” strategy. And indeed, many Googlers seem to think of mobile devices and the cameras, microphones, touchscreens, and sensors they carry as extensions of our own awareness. “We like to say a phone has eyes, ears, skin, and a sense of location,” says Katie Watson, head of Google’s communications team for mobile technologies. “It’s always with you in your pocket or purse. It’s next to you when you’re sleeping. We really want to leverage that.”
This is no small vision, no tactical marketing ploy—it’s becoming a key part of Google’s picture of the future. In a speech last September at the IFA consumer electronics fair in Berlin, Schmidt talked about “the age of augmented humanity,” a time when computers remember things for us, when they save us from getting lost, lonely, or bored, and when “you really do have all the world’s information at your fingertips in any language”—finally fulfilling Bill Gates’ famous 1990 forecast. This future, Schmidt says, will soon be accessible to everyone who can afford a smartphone—one billion people now, and as many as four billion by 2020, in his view.
It’s not that phones themselves are all that powerful, at least compared to laptop or desktop machines. But more and more of them are backed up by broadband networks that, in turn, connect to massively distributed computing clouds (some of which, of course, are operated by Google). “It’s like having a supercomputer in your pocket,” Schmidt said in Berlin. “When we do … Next Page »