Inside Google’s Age of Augmented Humanity: Part 2, Changing the Equation in Machine Translation
When science fiction fans think about language translation, they have two main reference points. One is the Universal Translator, software built into the communicators used by Star Trek crews for simultaneous, two-way translation of alien languages. The other is the Babel fish from Douglas Adams’ The Hitchhiker’s Guide to the Galaxy, which did the same thing from its home in the listener’s auditory canal.
When AltaVista named its Web-based text translation service after the Babel fish in 1997, it was a bit of a stretch: the tool’s translations were often hilariously bad. For a while, in fact, it seemed that the predictions of the Star Trek writers—that the Universal Translator would be invented sometime around the year 2150—might be accurate.
But the once-infant field of machine translation has grown up quite a bit in the last half-decade. It’s been nourished by the same three trends that I wrote about on Monday in the first part of this week’s series about Google’s vision of “augmented humanity.” One is the gradual displacement of rules-based approaches to processing speech and language by statistical, data-driven approaches, which have proved far more effective. Another is the creation of a distributed cloud-computing infrastructure capable of holding the statistical models in active memory and crunching the numbers on a massive scale. Third, and just as important, has been the profusion of real-world data for the models to learn from.
In machine translation, just as in speech recognition, Google has unique assets in all three of these areas—assets that are allowing it to build a product-development lead that may become more and more difficult for competitors to surmount. Already, the search giant offers a “Google Translate” app that lets an Android user speak to his phone in one language and hear speech-synthesized translations in a range of languages almost instantly. In on-stage previews, Google has been showing off “conversation-mode” version of the app that does the same thing for two people. (Check out Google employees Hugo Barra and Kay Oberbeck carrying out a conversation in English and German in this section of a Google presentation in Berlin last September.)
While still experimental, the conversation app is eerily reminiscent of the fictional Universal Translator. Suddenly, the day seems much closer when anyone with an Internet-connected smartphone will be able to make their way through a foreign city without knowing a word of the local language.
In October, I met with Franz Josef Och, the head of Google’s machine translation research effort behind the Translate app, and learned quite a bit about how Google approaches translation. Och’s long-term vision is similar to that of Michael Cohen, who leads Google’s efforts in speech recognition. Cohen wants to eliminate the speech-text dichotomy as an impediment, so that it’s easier to communicate with and through our mobile devices; Och wants to take away the problem of language incomprehension. “The goal right from the beginning was to say, what can we do to break down the language barrier wherever it appears,” Och says.
This barrier is obviously higher for many Americans than it is for others, present company included—I’m functionally monolingual despite years of Russian, French, and Spanish classes. (“It’s always a shock to Americans,” Google CEO Eric Schmidt quipped during the Berlin presentation, but “people actually don’t all speak English.”) So a Babel fish in my ear—or in my phone, at any rate—would definitely count as a step toward the augmented existence Schmidt describes.
But in the big picture, Google’s machine translation work is really just a subset of its larger effort to make the world’s information “universally accessible and useful.” After all, quite a bit of this information is in languages other than those you or I may understand.
The Magic Is in the Data
Given the importance of language understanding in military affairs, from intelligence-gathering to communicating with local citizens in conflict zones, it isn’t surprising that Och, like Cohen, found his way to Google by way of the U.S. Defense Advanced Research Projects Agency (DARPA). The German native, who had done masters work in statistical machine translation at the University of Nuremberg and PhD work at the University of Aachen, spent the early 2000s doing DARPA-funded research at USC’s Information Sciences Institute. His work there focused on automated systems for translating Arabic and Chinese records into English, and he entered the software in yearly machine translation “bake-offs” sponsored by DARPA. “I got very good results, and people at Google saw that and said ‘We should invite that guy,’” Och says.
Och was getting his results in part by setting aside the old notion that computers should translate expressions between languages based on rules. In rules-based translation, Och says, “What you write down is dictionaries. This word translates into that. Some words have multiple translations, and based on the context you might have to choose this one or that one. The overall structure might change: the morphology, the extensions, the cases. But you write down the rules for that too. The problem is that language is so enormously complex. It’s not like a computer language like C++ where you can always resolve the ambiguities.”
It was the heady success of British and American cryptographers and cryptanalysts at breaking Japanese and German codes during World War II, Och believes, that set the stage for the early optimism about rule-based translation. “If you look 60 years ago, people said ‘In five years, we’ll have solved that, like we solved cryptography.’” But coming up with rules to capture all the variations in the ways people express things turned out to be a far thornier problem than experts expected. “It didn’t take five years [to start to solve it], it took 60 years,” Och says. “And the way we are doing it is different. For us, it’s a computer science problem, not a linguistics problem.”
The pioneers in statistical machine translation in the 1990s, Och says, came from the field of speech recognition, where it was already clear that it would be easier to bootstrap machine-learning algorithms by feeding them lots of recordings of people actually speaking than to codify all the rules behind speech production.
The more such data researchers have, the faster their systems can learn. “Data changes the equation,” says Och. “The system figures out on its own what is correlated. Because we feed it billions of words, it learns billions of rules. The magic comes from these massive amounts of data.”
But back in 2004, when Google was taking a look at Och’s DARPA bake-off entry, the magic was still slow, limited by his team’s computation budget at USC. “We had a few machines, and the goal was to translate a given test sentence, and it would take a few days to translate just that sentence,” he says. Translating random text? Forget it. “Building a real system would have needed much bigger computational resources. We were CPU-constrained, RAM-constrained.”
But Google wasn’t. Access to the search company’s data centers, Och figured, would advance his project by a matter of several years overnight. Then there was Google’s ability to crawl the Web, collecting examples of already-translated texts—which are the key to bootstrapping any statistical machine translation system. But what clinched the deal when Google finally hired Och in early 2004, he says, was the opportunity to work on technology that would reach so many people: “The idea of being able to build a real system that millions of people might use to break down the language barrier, that was a very exciting thought.”
Och and the machine-translation team he started to build at Google began with his existing systems for translating Chinese, Arabic, and Russian into English, fanning out over the Web to find as many examples as possible of human-translated texts. The sheer scope of Google’s areas of interest was a help here—it turns out that many of the millions of books Google was busily scanning for its Book Search project have high-quality translations. But another Google invention called MapReduce was even more important. The more statistical associations that a translation system can remember, the faster and more accurately it can translate new text, Och says—so the best systems are those that can hold hundreds of gigabytes of data in memory all at once. MapReduce, the distributed-computing framework that Google engineers developed to parallelize the work of mapping, sorting, and retrieving Web links, turned out to be perfect for operating on translation data. “It was never built with machine translation in mind, but our training infrastructure uses MapReduce at many many places,” says Och. “It helps us, as part of building those gigantic models, to manage very large amounts of data.”
By October of 2007, Och’s team was able to replace the third-party, rules-based translation software Google had been licensing from Systran—the same technology behind AltaVista’s Babel Fish (which now lives at babelfish.yahoo.com)—with its own, wholly statistical models. Today the Web-based Google Translate system works for 58 languages, including at least one—Latin—that isn’t even spoken anymore. “There are not so many Romans out there any more, but there are a lot of Latin books out there,” Och explains. “If you want to organize all the world’s information and make it universally accessible and useful, then the Latin books are certainly part of that.”
And new languages are coming online every month. Human-translated documents are still the starting point—”For languages where those don’t exist, we cannot build systems,” Och says—but over time, the algorithms have gotten smarter, meaning they can produce serviceable translations using less training data. “We have Yiddish, Icelandic, Haitian Creole—a bunch of very small languages for which it’s hard to find those kinds of documents,” Och says.
Today Google Translate turns up in a surprising number of places across the Google universe, starting with plain old search box. You can translate a phrase at Google.com just by typing a phrase like “translate The Moon is made of cheese to French” (the translation offered sounds credible to me: La Lune est faite de fromage). For those times when you know the best search results are likely to be in some other language, there’s the “Translated Search” tool. And when results come up that aren’t in your native language, Google can translate entire Web pages on the fly. (If you’re using Google’s Chrome browser or the Google Toolbar for Internet Explorer or Firefox, page translation happens automatically.) Google can translate your Gmail messages, your Google Talk chats, and your Google Docs. It can even render the captions on foreign-language YouTube videos into your native tongue. And, of course, there are mobile-friendly versions of the Google translation tools, including the Android app.
In September, Hugo Barra, Google’s director of mobile products, hinted that the conversation-mode version of the Google Translate Android app would be available “in a few months,” meaning it could arrive any day now. But whenever it appears, Och and his team aren’t likely to stop there. In fact, they’re already thinking about how to get around some of the barriers that remain between speakers of different languages—even if they have an Android smartphone to mediate between them.
“If I were speaking German now, ideally we should just be able to communicate, but there are some user interface questions and issues,” Och points out. There’s the delay, for one thing—while the Google cloud sends back translations quickly, carrying on a conversation is still an awkward process of speaking, pushing a button, waiting while the other person listens and responds, and so on. Then there’s the tinny computer voice that issues the translation. “It would be my voice going in, but isn’t my voice coming out,” Och says. “So no one knows how [true simultaneous machine translation] would work, until we get to the Babel fish or the Universal Translator from Star Trek—where it just works, and language is never an issue when they go to a different planet.”
Coming in Part 3: A talk with Hartmut Neven, leader of the team that created Google Goggles, the company’s eyebrow-raising computer vision tool.
[Update, 2/28/11: Click here for a convenient single-page version of all three parts of "Inside Google's Age of Augmented Humanity."]