The Story of Siri, from Birth at SRI to Acquisition by Apple—Virtual Personal Assistants Go Mobile

6/14/10Follow @wroush

A couple of years ago, a $999 iPhone app called “I Am Rich” made headlines for being the most expensive item in Apple’s iTunes App Store. (It was the ultimate symbol of conspicuous consumption, doing nothing but displaying a glowing red icon.) But compared to Siri—the “virtual personal assistant” app that can make restaurant reservations, book concert tickets, or look up weather forecasts based on spoken commands—I Am Rich was a steal.

The Siri app itself isn’t expensive; in fact, it’s free to iPhone, iPad, or iPod Touch users. The algorithms that make the app work, however, are the product of years of defense-sponsored research at Menlo Park, CA-based SRI International and other institutions that cost taxpayers at least $150 million. After SRI spun out Siri, Inc., to commercialize this work in 2008, Silicon Valley venture capital firms Menlo Ventures and Morgenthaler Ventures poured another $24 million into the technology. And finally, this April, Apple itself acquired the startup for a reported $150 million to $250 million.

How could a single mobile application have caused so much money to change hands?

The answer, of course, is that the fuss isn’t about the Siri app. It’s about the artificial-intelligence insights behind it: the chain of machine-learning, natural-language processing, and Web search algorithms that swing into action with every Siri query. When you can access these algorithms from a mobile device like the iPhone, and prime them with a bit of contextual awareness such as a GPS location reading or an understanding of the user’s preferences, you have a powerful personal tool that Norman Winarsky, SRI’s vice president of ventures, licensing, and strategic programs, likes to describe as a “do engine” rather than a search engine.

Right now, Siri can handle a limited range of jobs, such as checking a flight time, sending a tweet or an e-mail reminder, or finding out when a movie is showing—all things that can be achieved by connecting with existing Web services or tapping the structured information in open Web databases. But as the technology evolves, it could help to change consumers’ expectations of their mobile devices, gradually weaning them away from the keyword-driven thinking inculcated by traditional search engines and allowing them to interact with their gadgets in more conversational ways. So it’s not hard to understand why Apple (NASDAQ: AAPL), which is betting a large part of its future on the iPhone and the iPad, would pay to bring Siri in-house (and, not incidentally, to keep it away from Google).

Now that Siri’s technologists are behind the walls in Cupertino, they aren’t talking with the press. But in a recent conversation with Winarsky and William Mark, the head of SRI’s Information and Computing Sciences Division, I got a deep view of the project that gave birth to Siri—which is still underway and, as it turns out, will soon produce more progeny. The story of Siri’s emergence within SRI reveals quite a lot about the future of mobile technology, the undiminished role of defense spending in Silicon Valley’s success, the art of the spinoff, and the way researchers think inside this legendary Silicon Valley institution.

As a non-profit R&D center doing contract research for the government and other clients, SRI International has a fixation on real-world problems that reaches all the way back its founding as the Stanford Research Institute in 1946. The automated check reading technology developed at SRI in the 1950s, for example, is still in use by banks today. And the first computer mouse and other fundamental innovations in human-computer interfaces were loosed upon the world by SRI researcher Douglas Engelbart in 1968, in a lecture in San Francisco that has been called “the mother of all demos.”

“The original concept that Doug was so fond of was human augmentation,” says Winarsky. “He was talking about a different kind of augmentation than Siri. A lot of what we are doing now automates processes that Engelbart never conceived of, such as natural language understanding and real-time machine learning. But the overall theme remains the same.”

In the early 2000s, according to Mark, the Defense Advanced Research Projects Agency (DARPA) asked SRI to lead a project investigating the feasibility of a “personalized assistant that learns”—a system that could help commanders and staff manage information more effectively in military command-and-control environments. Giving such a system learning capabilities was key, Mark says. “There is an enormous burden to get all of the knowledge into the system that it really needs to have, and it is never going to stay current unless there is some kind of learning,” he explains. The project SRI launched to explore many different kinds of computer learning was called CALO, for Cognitive Assistant that Learns and Organizes.

Mark became CALO’s principal investigator, and at DARPA’s request, he focused the work on creating a virtual office assistant. “The reason they asked us to do that was that it was a very large team that included about 20 universities across the country, and the one thing we all had in common is that we were all knowledgeable about working in office environments,” Mark says. “They wanted the team to understand the domain and work in that domain themselves.” In other words, DARPA wanted the CALO researchers to eat their own dog food.

Around the same time, but separate from the CALO project, Winarsky and Mark had launched an internal SRI study that they code-named Vanguard. It was an attempt to understand and improve the puzzling economics of the mobile telephony industry.

“What was happening in the industry was that revenue from voice services was going to zero on a very sad-looking, downward exponential curve, and the hope of the industry was data services revenue, which was projected to have a very happy, upward exponential curve,” Mark says. “The problem was that the revenues from data services were not actually occurring, or were not increasing nearly enough to compensate. Everybody knew that. So Norman and I brainstormed and came up with a thesis, which was that it was because data services were just too hard to use.”

People were eager to user their mobile devices to accomplish more in their lives, but the software that wireless operators and their partners had come up with to that point was simply too clunky, Mark believed. “We went around and talked to a whole lot of people in the industry, the carriers and the handset makers and people in the general industry, and they essentially validated that,” Mark says. “They also started sponsoring some R&D in that space.”

Eventually, Vanguard met CALO. “We knew there was this incredible need for a better way to deal with services in the mobile world,” says Mark. “So we took the CALO concept of an office assistant, and this driving need for assistance in the mobile world, and created the basic Siri concept.”

At this point in the story, a little detour into SRI’s own business model and innovation philosophy is required. More than 80 percent of the research SRI does is funded by the federal government. Thanks to the Bayh-Dole Act of 1980, which gave universities, small businesses, and non-profits the rights to intellectual property arising from federal funding, SRI is able to license the results of its work to other organizations. And while a government agency like DARPA may be the instigator of a specific breakthrough, the government “is almost never the right market” for such innovations, Winarsky says.

So SRI spends a lot of time figuring out what is the right market for each technology it creates. It sifts through the possibilities using a simple formula called NABC, for Need, Approach, Benefits, and Competition. SRI president (and Xconomist) Curt Carlson elaborates on this formula in his book Innovation: The Five Disciplines For Creating What Customers Want, which “everyone at SRI has to read,” according to Winarsky. But it really boils down to what Winarsky calls “venture capital 101—what you always want to see in a venture presentation.”

Through Vanguard, Winarsky and Mark believed they had already identified a need—in this case, for mobile data services that were easier to use. They also had an approach, in the form of the human-computer interaction model developed by the CALO researchers. The core of this model, Mark says, is “this idea of having a contextually aware system, meaning a system that has some knowledge of the user as well as some knowledge of what can be done in a particular space.”

But the “N” and the “A” by themselves aren’t enough; that’s why the 2,000-plus projects launched annually at SRI spawn only three or four spinoffs per year. In late 2006, to make sure they had nailed the expected benefits of a CALO-style personal assistant in the mobile world and that they understood the competition, Winarsky and Mark added CALO chief architect Adam Cheyer to their skunkworks team. “We brainstormed like crazy and iterated and iterated,” Winarsky recalls.

Eventually, convinced they had a good take on the NABCs, Mark, Winarsky, and Cheyer took their plans to SRI’s commercialization board, whose role is to apply the filters all over again. “We look for a quantitative value proposition,” says Winarsky. “We look for weaknesses. How do you find your white space; who are your competitors? Is it a license, is it a venture? Do you have the right patents? The board funds [finding] the answers to those questions.”

The commercialization board liked the overall mobile-assistant idea. In fact, members liked it enough to recommend bringing in a SRI entrepreneur-in-residence named Dag Kittlaus as the candidate CEO for the potential spinoff. (SRI, like many venture firms, keeps plenty of EIRs on hand as CEOs-in-waiting.) Kittlaus was a former executive from the Norwegian telecom giant Telenor Mobile, and had pioneered a push-media system for Motorola phones called Screen3.

It wasn’t until Kittlaus joined the project in mid-2007, Winarsky says, that the idea really began to gel. “Dag was a master of one other capability which we often need, which we call ‘bring it to life,’” he says. “It’s easy to focus on the market opportunity and the project, and leave it all abstract how you are going to solve the problem. Well, Dag put together a living demo of how [our technology] would solve the problem and how competitors, including Google, would solve the problem.”

At that point, the commercialization board was convinced, and “the venture was ready to be launched,” Winarsky says. Cheyer agreed to leave SRI and become the startup’s vice president of engineering. And with the board’s support, he and Kittlaus brought in SRI’s Tom Gruber, a user experience expert, to round out the founding team.

Up to this time, interestingly, the internal code name for the project was HAL, after the sentient computer in Arthur C. Clarke’s 2001: A Space Odyssey. “But we dropped that because of the negative connotations,” says Winarsky. (Probably a good idea, given HAL’s murderous tendencies.) For the next year or so, the operation would go by the intentionally nondescript name Active Technologies.

The next step for Active was the traditional road show, where the team would have the privilege of giving their presentation all over again to dozens of venture firms. But for budding SRI spinoffs, the road show starts somewhere unusual: the “nVention” board, an advisory group of veteran venture capitalists from top-drawer firms like Kleiner Perkins and Draper Fisher Jurvetson. The nVention board members have a real job: providing advice, analysis, and networking help to a project’s founding team. But they get something potentially lucrative in return, namely, an early look at the projects that SRI considers most “venturable,” to use Winarsky’s word.

“They have no special rights, but when they see something outstanding, they have had a chance to see it first,” says Winarsky. In Active Technologies’ case, nVention board members from Menlo Ventures and Morgenthaler Ventures immediately wanted in on a deal. And in October 2008, the startup—which, by this point, had finally settled on the name Siri, in tribute to SRI—closed an $8.5 million Series A financing round, with Menlo managing director Shawn Carolan and Morgenthaler partner Gary Morgenthaler joining the board. (A $15.5 million Series B round followed in November 2009, led by the same firms.)

All the while, Cheyer and the other CALO engineers attached to the project had continued to improve their learning technology and adapt it to the mobile world. And along the way, they caught a couple of lucky breaks that hadn’t been part of the original plan but would prove key to Siri’s future. The first was the iPhone 3GS.

The Siri team had always known that smartphones were the right platform for a virtual personal assistant. The debut of the iPhone in 2007, and of the iTunes App Store in 2008, had provided a natural showcase and a powerful distribution mechanism for the software. But the 3GS, which came out in June 2009, was the first version of the iPhone that had both the internal processing power and the wireless bandwidth the Siri team needed to make their product work the way they wanted.

The second unexpected break was the realization that speech recognition systems from companies like Cambridge, MA-based Vlingo and Burlington, MA-based Nuance (another SRI spinoff, as it happens) had become powerful enough to use as Siri’s primary query interface. That was huge, because it gave the team a way to circumvent consumers’ reluctance to type on small screens.

“There was an estimate that the number of people who would continue to use the service dropped by 50 percent for every click,” says Winarsky. “But in the beginning, Siri did not even expect to do a launch with speech. It was expected to be text-only. I was on the board, and the team came to us and said, ‘Guess what? We’ve got a demo of speech running.’ We used the demo, and we said, ‘This is transformational.’ You don’t even have to click—you just ask.”

So in a way, all of Winarsky and Mark’s careful NABC analysis had only served to win the project a green light within SRI; it was the convergence of several other trends that provided Siri’s real fuel.

“We happened to be at the right place at the right time in many ways,” Winarsky acknowledges. “One, the smartphone. Two, the bandwidth—3G is terribly important. Three, natural language processing had reached a point where [a voice interface] could be done. Fourth, CALO had happened, so the machine learning could be supported in real time. And finally, the iTunes app marketplace. The walled garden was coming down—Apple deserves huge credit for that.”

The rest of Siri’s story has been thoroughly covered by the tech blogs: The company demonstrated the mobile app at several tech conferences in 2009, released it into the wild on February 4, 2010 (for the iPhone 3GS only), and created some serious buzz at the South by Southwest geekfest in March. Then, on April 28, less than three months after the app’s debut, news emerged that Apple had purchased the startup.

The financial terms of the acquisition weren’t disclosed, and Winarsky and Mark say they can’t talk about the deal, even to speculate on what future Apple might see for the technology. Kittlaus, Cheyer, and the rest of the Siri team are now employees at Apple, which didn’t reply to my request for interviews. For now, the Siri app is still available in the App Store—but Apple has a pattern of shuttering the services it acquires, presumably in order to redeploy their technologies elsewhere (see Lala Media, which Apple acquired in early December and closed on May 31).

But even if Siri were to disappear for a time, Winarsky wouldn’t be too worried. When I put it to him that $150 million was a lot for taxpayers to spend on a technology that’s now been taken inside Apple, he corrected my premise on several counts, arguing that acquisitions are a natural outcome of SRI’s spinoff process.

“I think the Bayh-Dole Act is one of the most brilliant acts in the history of Congress,” Winarsky says. “What you call ‘taking the technology inside’ has been responsible in large part for the creation of companies like Intel, Cisco, Apple, and Sun. The government would have had to pay billions of dollars, perhaps, to continue to advance this technology, while instead the commercial marketplace is making it available to everybody. Consumer revenue is what drives future products, rather than our taxes.”

Winarsky also emphasizes that the intellectual property that SRI licensed to Siri—the technology now controlled by Apple—is only a small slice of the IP generated by the CALO project. In fact, he says SRI will have news “within a month or two” about a new CALO spinoff, focused on a different domain from mobile search.

“Apple no more owns all of the technology for the virtual personal assistant than they own all of AI or all of speech or math or physics,” Winarsky sums up. “You can imagine using a virtual personal assistant to support you in your need to deal with your healthcare or your doctors, or to do your shopping online, or to help sales agents. Basically, this is about the creation of the next generation of assistants.” And Siri, he says, is only the start.

Wade Roush is a contributing editor at Xconomy. Follow @wroush

By posting a comment, you agree to our terms and conditions.