Expect Labs Anticipates a Day when the Computer Is Always Listening

8/20/13Follow @wroush

Today’s computers are like distracted middle-school students: you practically have to scream at them to get their attention. To tell a search engine you want a news article, you have to type a few words into the search on your phone or laptop. To tell Siri or Google Now that you need a map or a phone number, you have to hold down a button first.

But they’re just machines—why should we have to get their attention? Shouldn’t it be the other way around? Why aren’t our devices listening to us all the time, ready to respond the moment we need them?

Well, in fact, they’re starting to do this. Google’s new Moto X phone, which goes on sale this week, contains a low-power chip whose only job is to listen for the phrase “Okay Google Now,” which alerts the device to turn your next words into a command or a search query.

But that’s a small step—in a way, it just substitutes a trigger phrase for a button-push. Soon, your phone or laptop may be able to go much farther: tracking everything you say; searching for related personal data or Web resources; and showing the results to you proactively, just in case you’re interested.

At least, that’s the vision at Expect Labs. The San Francisco startup, which is backed by an array of high-profile investors like Samsung, Google Ventures, Telefonica, Intel, Liberty Global, IDG, and Greylock, is pushing a concept it calls “anticipatory computing”—and sooner or later, it’s likely to become part of everyone’s computing experience.

“In just a few years, the search engine on your phone is not going to be waiting around to be asked questions,” says founder and CEO Tim Tuttle. “You want it to pay attention continuously when something is happening in your life, so that it can anticipate your question before you pull your phone out of your pocket.”

Set aside, if you can, the fact that the National Security Agency may also want to pay attention continuously—and that a world where computers can anticipate our every need would, in effect, be a world of total electronic surveillance. That’s a privacy tradeoff that each individual consumer will have to consider carefully, in light of this summer’s revelations about the startling scope of the federal government’s eavesdropping programs.

Expect Labs' MindMeld app shows results relating to a conversation about restaurants in San Francisco.

Expect Labs’ MindMeld app shows results relating to a conversation about restaurants in San Francisco.

The thing you really need to understand about Tuttle and his crew at Expect Labs, who aim to release a showcase mobile app called MindMeld this fall, is that they don’t care so much about whether their software can answer questions or respond to commands, the way “virtual personal assistants” like Siri or Google Now can. Those tasks come down to speech recognition, semantics, and grammar. Most of the computing cycles involved go toward figuring out what the user meant and responding appropriately, not developing a bigger picture of the user’s context.

Anticipatory computing, in Tuttle’s world, is completely different. He says it’s about using signals and data from your devices to construct “a model that represents what is happening in your life,” then taking a more-is-better approach, “proactively searching across all the data sources you care about.” It’s about statistics, relationships, and educated guesses.

“If you have the ability to listen all the time, it dramatically improves usability, because then you can talk to your computer the same way you talk to a person, where you assume they are up to speed on what they’ve heard,” Tuttle says. “That’s what we are building toward.”

The MindMeld app for the iPad and Android tablets will demonstrate the whole idea in the context of teleconferencing. The app works like Skype or any other Voice-over-IP app, except that it’s constantly listening to your side of the conversation in the background and showing a series of appointments, contacts, Web clips, news articles, and other resources related to whatever you’re talking about. Say you’re calling friends to invite them over for dinner, and the conversation veers toward food choices. MindMeld might hear you mention Italian food and show a recipe for fettuccine Alfredo.

But as cool as that sounds, Expect doesn’t expect to stay in the app business. MindMeld is designed mainly just to demonstrate the capabilities of the company’s underlying “Anticipatory Computing Engine.” The real show will get started later this year when Expect Labs gives outside programmers access to the APIs, or application programming interfaces, that will let them use the engine to power their own apps. In other words, Expect Labs wants to provide the smarts that make other companies’ software anticipatory, whether that software is being used by smartphone owners, call-center employees, or drivers in connected cars.

Expect Labs was formed in 2011 by a team of researchers from MIT, Carnegie Mellon, and Stanford, “most of whom have PhDs in statistical search, natural language understanding, and speech recognition,” according to Tuttle. After getting his own computer-science PhD at MIT, Tuttle came west to found Bang Networks, a content distribution network for real-time data, and then Truveo, a video search engine acquired by AOL in 2006.

The insight that grabbed Tuttle, within a couple of years after he left AOL in 2008, was that “search is becoming conversational, real-time, driven by speech and language as a key input.” Smartphones and tablets were the main drivers of this change. “These devices are with us all the time and have access to all sorts of sensor data, including live audio and video,” Tuttle says he realized. “Those could become the inputs to let an intelligent discovery engine find you want you need.”

Tuttle assembled a group of computer geniuses (including former Nexidia researcher Marsal Gavaldà, machine learning expert Simon Handley, DNAnexus and scalable computing veteran Pete Kocks) and set to work building that engine. In its current form, the Expect platform has three main functions: First, it analyzes the signals coming in from a user’s device: audio, video, GPS, and more. Second, it uses these signals to create and continuously update a statistical model of the user’s situation and likely interests. (In Tuttle’s words, this model is “essentially a collection of all the important concepts, entities, or words that have been said during a conversation; the relationships between them that we can draw from our knowledge graph; and numerical weights that represent our confidence in these relationships.”) Third, the platform acts on the model to proactively search for relevant material across the Web, the user’s social graph, and personal documents.

In MindMeld, you can get together as many as seven friends for an audio conversation, with show-and-tell items automatically provided by the Anticipatory Computing Engine. When Tuttle demonstrated the app for me at Expect’s office in downtown San Francisco, he connected over MindMeld with a colleague, then launched into a rambling monologue about Apple, iOS7, Tim Cook, a mobile conference he had just attended, the NBA finals, cooking, and the relative merits of lasagna, clam chowder, and strawberry shortcake. As he spoke, topic headings appeared in a scrolling timeline on the left side of the iPad screen, and links to related material such as news articles and recipes popped up on cards on the right. To show the materials to his colleague, all Tuttle had to do was drag and drop the cards to the sharing area. (For more details on the app, watch this video.)

“We are trying to extract what we believe are the high-level topics or points based on the language,” Tuttle explained afterward. “The timeline serves as a set of conversational bookmarks, or an annotated thread of what your conversation was about. It’s also a navigational aid, if there are certain things from the past conversation you want to drill down on.”

The screen-share area of the Expect Labs Mindmeld app.

The screen-share area of the Expect Labs Mindmeld app.

MindMeld—which was still in early “alpha” release with a few customers when I visited Expect Labs in June—only cares about topics identified in the last five to 10 minutes. But it would be easy to make the engine go back and prepare a summary of the last hour’s conversation, Tuttle says. “The models are very tunable depending on the use case,” he says. The startup is experimenting with a smartphone version of MindMeld that doesn’t get in the way during a phone call, but presents the user with three salient pieces of information after a call is finished—including, for example, social-networking profiles for the other parties on the call.

It’s not hard to imagine how such tools might be useful in business contexts such as sales or customer support—and those are exactly the sorts of application areas Tuttle hopes third-party developers will explore once the Anticipatory Computing Engine’s APIs are released.

Tuttle says Expect Labs has 10 patents covering its continuous, context-driven search engine technology. But none of them have much to do with old-school natural language processing of the sort that Siri, and its parent, the defense-funded CALO project at SRI International, are built around.

“I think we have a philosophical difference with the direction they took,” Tuttle says. “At the CALO project they believed that solving the problem of how you understand conversation can be done by understanding the construction of the English language, the grammar and syntax, and ultimately the meaning of the words. What we have believed from the beginning, and what the industry is starting to come around to understand, is that this approach only gets you so far. You need to be able to complement natural language understanding with large-scale statistical search and information retrieval—what Google has pioneered. If you know the content of every document on the Web, or in your personal document collection, that ends up providing stronger signals than a basic understanding of English.”

As an example, Tuttle cites the sentence “Did you hear about those tornadoes in Oklahoma?” A program parsing that sentence using a traditional natural language processing engine might eventually figure out that Oklahoma is a place and that a tornado is a type of weather pattern. “But it will have no idea that this is an important question because there was massive destruction from tornadoes this month,” Tuttle says. In other words, a search engine doesn’t need a semantic understanding of the word “tornado” to be able to match it with thousands of Web articles containing the same word. “That second signal, you only get from the search side of the situation.”

With 12 employees, Expect Labs has raised somewhere north of $4.8 million in venture funding (it hasn’t reported the exact sum). Backers like Samsung and Telefonica have more than a passing interest in better search technology for mobile devices; the Google Ventures connection is especially interesting, given that Google, like Apple, is investing deeply in technologies to make its mobile devices and operating systems smarter and more responsive.

If it turns out that lots of developers want to tap into the Anticipatory Computing Engine, Expect Labs could quickly end up colliding with (or being courted by) both of these Silicon Valley giants. “We are in a space where there are a lot of very big companies that care a lot about this technology, which is a bit frightening,” says Tuttle. “So we will see. Hopefully we can get there before they do.”

Wade Roush is a contributing editor at Xconomy. Follow @wroush

By posting a comment, you agree to our terms and conditions.