From Smartphones to Smart Spaces: SRI’s Vision of Computer Evolution
If the future is here but unevenly distributed, as William Gibson said, then where is it concentrated?
One place, certainly, is the contract research giant SRI International. Created by Stanford University in 1946, it’s the organization we have to thank for inventions like automated check processing, the computer mouse, hypertext, the ARPANET (which evolved into the Internet), and ultrasound as a medical diagnostic tool. And SRI is still innovating today—one of its recent creations is Siri, the virtual-assistant iPhone app that was spun off as a startup last year and quickly snapped up by Apple for a reported $150 to $250 million.
SRI researchers like the legendary Douglas Engelbart have long had a knack for seeing how the rest of us will be using computers in the future. Eager to hear what SRI is cooking up these days, I talked yesterday with Bill Mark, the head of the institute’s Information and Computing Sciences Division. Aside from directing a staff of 250 scientists, Mark is a software systems designer who studies smart spaces—environments where embedded computers help people work, learn, or communicate more effectively.
Mark argues that despite our culture’s current infatuation with iPhones, iPads, and the like, mobile devices are actually ill-suited for many tasks, especially those involving group interactions. In those situations, he says, it would make more sense to embed computing smarts in the environment, be it a conference room or a classroom.
I asked Mark to lay out some of those ideas as an appetizer for Xconomy’s May 17 forum Beyond Mobile: Computing in 2021. At this evening event on the SRI campus in Menlo Park, CA, Mark will be on stage alongside Calit2 director Larry Smarr, Microsoft eXtreme Computing Group leader Dan Reed, and myself to talk about the current trends shaping the way computers will fit into our lives in 10 years’ time. The following outtakes from my conversation with Mark give a partial preview of the topics we’ll unpack at the event. To hear the rest, you’ll have to buy a ticket. (Disclosure: SRI is an Xconomy underwriter.)
Wade Roush: To build the Siri mobile app—which can help users do things like buy concert tickets or book a table at a local restaurant—your scientists drew on years of defense-funded research at SRI on natural language understanding and other aspects of artificial intelligence. But the app is still limited to fairly simple query-response situations. Will we be having full conversations with future versions of Siri?
Bill Mark: Yes, we view Siri as a first step in that direction. When you say something to Siri, it understands your intent and puts together a set of services that fulfill that intent. That is great—I really think Siri did a fantastic job, and we’ll see what Apple does with that core technology. But there is much more to the story than that. One thing is dialogue. In real life, we use dialogue all the time. It’s extremely rare that you say something and your assistant goes off and does it and that’s the entire interaction. Our research right now is pushing into systems that can do that.
Roush: That sounds an order of magnitude harder than just responding to a spoken search query.
Mark: It’s much harder. This sounds obvious, but one challenge is that the system needs to understand what it just told you. People in a dialogue assume that the other person, or in this case the piece of software, understood the previous utterance. Most systems don’t. There are also performance issues. The system has to come back with a reasonable response in a reasonable amount of time, otherwise it’s not dialogue. And the key piece is that the system has to come back with something reasonable every time. It doesn’t have to be brilliant—any more than a human being is brilliantly knowledgeable—but it does have to be intelligent enough to give you confidence that it understands you.
Roush: Do you think these dialogue-based systems will eventually replace Google-style searches?
Mark: I think it depends a lot on why you’re doing a search. True search will remain a very important thing 10 years from now, and that model won’t in any way disappear. But there are areas where we would really like a different paradigm of interaction.
Roush: Maybe a situation like, “I’ve got 45 minutes to cook dinner, can you find me a recipe based on what’s in the fridge right now or what’s on sale at the nearest supermarket?”
Mark: That’s a good example. We are all going to have to decide over time what kind of user experience we want, and it will be different for different people at different times. I could say “Dinner tonight,” and it comes back with a recommendation and I’m done. There is another experience where I want to have a dialogue—maybe I start with “Dinner tonight” and the system comes back with “How about this pasta recipe?” and my response is “You know, kind of, but I’m not in the mood for pasta, but I really like those ingredients, can you give me another dish that is kind of like this?” or “We’re going out to a fancy place tomorrow so can you give me something that’s lower in calories?” I think you’ll want that sort of interaction a lot of the time, and the technology is getting to the point where we will be able to do that more and more.
Roush: The bulk of SRI projects are government-funded, but you also do commercial research, and there’s also an active program to find spinoff opportunities. Which types of projects will lead to real-world applications for these dialogue-based interfaces?
Mark: We think of this as something that will be extremely relevant in military and civilian government applications. Think of a call center motif—to pick something that is topical, think about the IRS. The IRS staffs many, many people to answer taxpayer questions about a very complex set of rules. And most people are very dissatisfied with that service, because there are not enough resources, and you have to wait on the line a long time, and sometimes you don’t get the best answer. So dialogue-based systems could be very important [in speeding up service]. Going beyond that into the commercial world, we are engaged in a pretty large project right now, where the client doesn’t want to be named, that I think will be pretty revolutionary. We are also always looking at whether it’s reasonable to start new ventures, and one of the topics of discussion here is whether or not [dialogue-based systems] should be a venture.
Roush: Let’s talk about robotics—another area that SRI is famous for. What problems are you focusing on there?
Mark: Some of the really challenging problems have to do with real-time vision. When a human enters a room, we very quickly size up the scene and we immediately recognize the objects and decide if there is a place to sit down or a place to do something. That’s what we’re working on in computer vision—all of these things that seem relatively easy for a human. Also manipulation. Robots on the factory floor move very rapidly, but their manipulation capability is very narrow. Certain things that are very easy for humans to do are beyond the realm—I was going to say beyond the grasp, but it’s a terrible pun—of robots. We’re collaborating with researchers at government and university labs to create manipulation with human quality. That’s an example of something that’s government-sponsored, but we see that as having real impact in the commercial world over time.
Roush: Looking at the 10-year time horizon, do you think these problems of computer vision and manipulation are being solved fast enough that we’ll have robot servants in our homes by 2021? And separately from that, will we really want them? One of the old standby scenarios for home robots is elder care in places like Japan with rapidly aging populations. But do you really want to hand your elderly mother over to a robot, and are there other compelling use cases?
Mark: If you take away the 10-year boundary and ask whether we will ever see robots in the home and the workplace helping us out, my answer is unequivocally yes. The need is great. We spend time—or if we have the money, we hire people to do—cleaning and cooking and tasks where, if machines could do it, there would be a decided advantage. But it does have to be at the right price point, and it has to be the kind of machine you would want in your home. You’re definitely right that people focus on care for the elderly, not just in Japan but around the world, and that leads to the question of what kind of help is that, and what embodiment of a robot is acceptable.
The image that some people have is of this robot that assists an elderly person into or out of a chair. That’s not a bad idea, but there’s also the whole question of social interaction. So many elderly people are lonely. They would love to be interacting with their friends and family. We hope to put them in situations where that is exactly what they can do, but that’s not always the case. You’ve seen the first examples of things like robot dogs as companions, but that is a very far cry from what you want. There is this idea of having [a robot] that a human being would be happy to interact with, not just one that silently does things for them but us part of the household. I believe that’s the right vision. Will we see robots like Rosie from the Jetsons in 10 years? Maybe not, but certainly things will be a lot different.
Roush: Let’s move on to smart spaces, which is your own area of research. But before we get into the role they might play in the future—how do you define “smart spaces” in your own mind, and what’s the state of the art? I’ve been following this area for a while, at least since the days of Project Oxygen at the MIT AI Lab, and frankly the first prototype systems seemed pretty clunky and impractical.
Mark: There are all kinds of different ideas of what smart spaces are and what they’re going to be. The fundamental insight was the realization at a number of institutions—and it seems quite obvious now—that computation was becoming pervasive, in the sense that small and relatively inexpensive devices could be embedded in the environment. Today, smart spaces exist and we use them every day, and in some ways we’re not even conscious of that. My favorite trick question is, what’s the most important human-computer interface in your life? And the answer is, the automatic brake pedal in your car. You just push on it, and it brakes the car better than you could. Cars are a lot smarter in other ways too—they have all kinds of sensors to know when somebody is sitting and when to nag them to fasten their seat belt. So that is one example of a smart space.
Most of the work I’ve seen in smart spaces is an extension of that—an environment that senses you and does good things for you. You used the word “clunky.” The way I think of it is, we are still trying to think of the right use cases. Some people’s reaction to being in these environments is “Wow, what a great thing just happened.” But other times, it’s just weird, and you think, “Is this something I really want?” So we have to figure out what people really want in this area. And then there is getting it into the infrastructure. People buy new cars at relatively high frequency, but that is not true of homes and office buildings, which don’t have a smart infrastructure built into them right now. So there is a long cycle of getting this into real environments that’s gating things.
Roush: Say the gates were removed—in what ways could spaces be smarter 10 years from now?
Mark: I’m interested in spaces that understand human interaction. How can my work environment make me more effective in the way I interact with other people? How can my home environment provide a better experience for me and my family, and how can schoolrooms provide better experiences for students and teachers?
In a work environment, to get concrete about it, we have meetings. We draw stuff on the whiteboard. We might be looking at a PowerPoint presentation. What if the environment understood what we were talking about and could make proactive suggestions? It could say “I see you are looking at this PowerPoint and I happen to have noticed that Pam created a newer version of this, would you like to look at that?” Then there’s the whole idea of meeting summarization. I can’t go to every meeting I’m supposed to be in, and I go to meetings that I don’t want to be in. It would be great if I had some way of getting the gist of what went on—not a transcript, but something that said “here are the key points discussed and here is an action item you were given in this meeting.” Then I could get the system to take me just to that part of the meeting, just that four minutes that I’d be happy to listen to.
Those are work examples. But take the point of view of a schoolroom for a moment. A schoolroom has seen kids come and go for years. If it’s seventh-grade algebra, the room has seen kids being taught the same material pretty much year after year. What if the room could help teachers based on what has gone on before in that environment? Imagine that the room could recognize, “Oh, from the mistakes this kid is making on quizzes, this is a pattern I’ve seen and this is a kid who is not understanding this particular principle of algebra, so I’m going to tell the teacher to prove other examples that have helped in the past.” That’s just meant to be a perspective on the problem. I’m not going to say it’s easy and cheap to create environments like that, but I think it’s easier than some of the other smart space concepts, and inexpensive microphones and cameras can go a long way.
Roush: To me, it seems like many of the types of assistance you’re describing could just as well be delivered by a mobile device like a tablet computer. At what point does it make sense to transfer that kind of smarts to the room itself? Which, by the way, is one of the themes we want to explore at the “Beyond Mobile” event.
Mark: It all comes down to the kind of experience we want people to have, and does it feel natural. There has been enormous emphasis on mobile devices, and it’s been great. As many have predicted, computation has become mobile, and our smartphones are getting to be as powerful as our desktop machines. But the fact of the matter is, most of the time, I’m not truly mobile. I’m either at home or I’m in a meeting. If I’m not mobile, why am I using a mobile device? The answer, right now, is because that’s what I have. But mobile devices are distracting and annoying to many people. Using a mobile device or laptop during a meeting can be bad social behavior, because you’re focusing on something other than interacting with other people. So one overarching reason to embed this stuff in the environment is that it’s not intrusive. It hears what people say (with their knowledge, not eavesdropping) and it’s not in the way.
Roush: I’d like to end by rewinding to where we started, with the idea of the virtual assistant. One of the science-fiction ideas that has really stuck with me is from William Gibson’s book Mona Lisa Overdrive, which features a computer named Continuity. It’s this massive, cloud-based artificial-intelligence character who talks to the main character via a little microphone implanted in her ear; it’s her omnipresent advisor and caretaker. Do you think projects like Siri and the dialogue-based interfaces might eventually lead us to that scenario of hyper-personal assistants, which feels like it’s at the opposite end of the spectrum from smart spaces?
Mark: Do I think the personal advisor belongs in the spectrum of the computer help that we are going to get? Absolutely yes. The embodiment of it is the question. We are not doing any research into implanting devices into people. But the fact of the matter is that a lot of people carry a mobile device like a smartphone with them almost all of the time. It’s not an implant, but it’s almost part of their person. And people do get cochlear implants to help them hear, so I would guess that implants will be quite feasible. But if you back off from the implant idea—which I think is pretty scary to a lot of people, including me—and talk about mobile devices, then yeah, I’d love to have something like that.
You were saying that this is at the opposite side of the spectrum from smart spaces, and I agree, but there is a linkage. Think about the schoolroom. Kids walk in with all kinds of mobile devices these days. They’re told to switch them off, and I hope they do. But you could think of the devices that kids walk in with as being part of the computational space of the room. Maybe it belongs to the kid, but it’s using the knowledge that the room is in charge of keeping. So there’s plenty of overlap between these visions.
See Bill Mark, Larry Smarr, and Dan Reed in conversation with Wade Roush at Beyond Mobile: Computing in 2021, May 17 at the SRI International Campus in Menlo Park, CA.