Lost In Translation: Ford Teams With Nuance Communications To Master Human Language

6/27/11Follow @xconomy

“Call John Smith.”

“I wanna call John Smith.”

At first glance, these sentences look pretty similar. But try telling that to the voice recognition technology behind Ford Motor’s SYNC system. You might as well be speaking Greek.

Voices recognition software has come a long way in recent years. Google’s Android platform, for example, allows users to search for information by speaking into their smart phones. But mastering the subtleties of human language remains beyond the reach of even our most sophisticated technology. (Remember how the IBM supercomputer Watson was kicking some serious butt on Jeopardy! until the final round when it answered “Toronto” to a question about U.S. cities?)

Voice commands are a key component to Ford’s SYNC system; the company, based in Dearborn, MI, promotes SYNC as a safety feature because it allows drivers to do stuff without taking their hands off the wheel.

With that in mind, Ford recently said it will partner with the appropriately named Nuance Communications, based in Burlington, MA, to develop software that can not only recognize specific words/phrases but the intent of the person speaking them.

Normally, SYNC relies on what’s called “structured commands.” The company basically records phrases that drivers must speak in order for the car to execute their wishes.

There are two problems with technique. First, there are an awful lot of commands. Drivers today can order their cars to do everything from make phone calls and find directions to play music and adjust the cabin temperature.

Secondly, Ford is only really guessing what people will say, which, more often than not, is not what they will actually say. For instance, Ford initially programmed SYNC to recognize the command “Play Tracks.” Unless you work in the music business, you probably don’t even know what a track is. A better command would be “Play Songs.”

“You can’t stop someone from saying something,” says Brigitte Richardson, Ford’s lead engineer on its global voice control technology/speech systems.

In other words, Ford can’t force people to adjust their speech to use its commands. People are going to speak how they are going to speak.

Working with Nuance, Ford wants to develop software based on more advanced algorithms called “statistical language modeling” (SLM).

The concept, first developed in the 1980s, estimates the probability of how people will group together words, phrases, and sentences according to their natural speech patterns. For Nuance, the company specifically wants to organize the words into “semantic classifications” of meanings. Based on that work, Nuance is developing an “inference engine” that can learn, understand, and interpret voice commands, says Ed Chrumka, senior product manager of connected car services.

Thus, the car will better match the driver’s actual words to their actual intent. In theory, a car will learn a driver’s linguistic habits so eventually there will be no difference between “Call John Smith,” (the original command the car recognizes) and “I wanna call John Smith,” a phrase the driver is more likely to use.

SLM is “a totally different way of doing things,” Richardson says. “It wants to incorporate more natural ways of talking.”

But SLM is easier said than done (pun intended). Human language is complex and broad and entails an almost infinite number of combinations of phrases, words, and sentences. For a computer, recognizing words is the easy part. Figuring out the speaker’s actual meaning is a whole different ball game.

“Perhaps the most frustrating aspect of statistical language modeling is the contrast between our intuition as speakers of natural language and the over-simplistic nature of our most successful models,” Ronald Rosenfeld, a computer science professor at Carnegie Mellon University, wrote in a research paper. “As native speakers, we feel strongly that language has a deep structure. Yet we are not sure how to articulate that structure, let alone encode it, in a probabilistic framework.”

Ford’s job is even tougher. While Google’s voice recognition technology can draw upon the vast resources of cloud-based Internet servers, Ford wants to program its speech software onto a single, self-contained chip installed in the car.

Relying on outside servers will force drivers to spend more money on data plans, Richardson says. Plus it will probably take more time for the car to execute the driver’s command, she says.

However, “you are going to have to have adequate [computing] horsepower in the car to deliver the experience we are all looking for,” Chrumka of Nuance says.

By posting a comment, you agree to our terms and conditions.