Stephen Wolfram Talks Bing Partnership, Software Strategy, and the Future of Knowledge Computing

1/5/10Follow @gthuang

There is something oddly human about Stephen Wolfram using his iPhone to look up the mass of the “cascade hyperon,” a subatomic particle with who-knows-what properties. That’s what Wolfram, one of the world’s most distinguished experts in physics and computing, was doing on the day we spoke a few weeks ago.

Maybe it stood out because it means that even Wolfram—whose depth of scientific knowledge seems to exist on a different plane from other humans—needs a smartphone these days. Or maybe it’s just funny that anyone would use an iPhone app to look up such a thing.

In any case, Wolfram, 50, is a renowned scientist, author, and business leader. Born in London, he resides in the Boston area, but his company, Wolfram Research, is global, with headquarters in Champaign, IL, and 600-some employees spread around the U.S., Europe, and Asia. Last May, he launched an ultra-ambitious project called Wolfram Alpha, a kind of “knowledge engine” that answers queries about everything from geography to statistics to finance by “computing” the answer from an extensive database. It’s different from a search engine, which returns a list of links and documents. But the two can work together: in November, Microsoft announced it had formed a partnership to incorporate Wolfram Alpha into some of Bing’s search results.

So it was high time I checked in with Wolfram, whose career I have followed over the years. Interestingly, he calls Wolfram Alpha “the most complicated project I’ve ever done.” That says quite a lot, given that Wolfram spent more than a decade writing A New Kind of Science, the 1,200-page tome he released in 2002 that potentially turns every field of science and technology on its head. He is also the creator of Mathematica, a software program used widely for scientific and technical computing (things like modeling, simulations, and visualizations)—it’s the main reason Wolfram’s company has been profitable since 1988.

We spoke by phone on a quiet December afternoon just before the holidays. I asked him about the technology and strategy behind Wolfram Alpha and the future of search engines and knowledge engines, as well as business lessons learned from building his company and running it remotely. (I also couldn’t resist asking for his take on the massive physics effort at the Large Hadron Collider, the Swiss-based particle accelerator that amounts to the biggest science experiment in history.)

If you’ve ever interviewed Wolfram, you know to choose your questions wisely. It’s not just that he doesn’t suffer fools, but that he answers every question so thoroughly that he will embark on tangents that turn out to be mind-blowing—much more interesting than the path of the original question. Which is a bit like the best queries in science, business, and Wolfram Alpha itself, come to think of it. (You should try the site here if you haven’t yet.)

Here are some edited and slightly condensed highlights from our conversation:

Xconomy: Tell me about the organizational structure of Wolfram Alpha. How big is the project?

Stephen Wolfram: Wolfram Alpha has about 200 people. The parent company is Wolfram Research, and headquarters are in Champaign. It’s quite a distributed operation at this point. There are pieces in Boston and the U.K. We have one or two people in Seattle. Our people are scattered literally all over the world. I set a bad example by being a remote CEO starting in 1991. For many kinds of things, it’s tremendously productive.

X: What are your tips for managing a company remotely?

SW: My theory is the most productive form of meeting is conference calls with Web conferencing. You can have more people in the meeting, and you’re not wasting anyone’s time. They can work on other things, and if you need them, you just say their name. I’ve found that it’s what I spend my life doing. The Wolfram Alpha project is the most complicated project I’ve ever done. It’s remarkable for what it needs to pull in—specific types of knowledge, from linguistics [for example]—to how you make servers work well. That was a tremendous exercise in being able to pull the right resources together and get very disparate groups to communicate well.

We have a “who knows what” database—if we have a question about who in the company knows about mechanical engineering [say]. It has nothing to do with geography. It has to do with what questions you ask of people, and having a company culture where it’s conceivable you might have “random question X.”

X: Let’s talk about the genesis of Wolfram Alpha, which was released in May.

SW: I’d been kind of thinking about what makes knowledge computable for a long time. Like many of these things, the idea is only clear after you’ve built something serious about it. Looking back, I was sort of embarrassed to find things I was doing when I was 12 years old—gathering scientific information and putting it on a typewriter. I’d been thinking about how one makes knowledge systematic for a long time.

In the beginning of the ‘80s, when I was starting to work on NKS [A New Kind of Science], I had built a [computing] language called SMP. I was wondering how far you could get formalizing knowledge, and how does this relate to AI-ish things. At the time, I thought making all knowledge formal was too hard, we can’t do it.

After finishing NKS [in 2002], I was thinking—you can get complexity from simple rules. Can we make a large swath of human knowledge computable? I got more serious about that. At the beginning, it was really unclear this would be possible. There’s just too much data in the world, too many topics, you can’t understand the linguistics [of queries], you can’t deliver the stuff fast enough.

In linguistics, we used the NKS system. For years, people were trying to do natural language processing and making computers understand written text. It turns out to be really hard, but what do you mean by “understand”? For us, we have a very clear target: is this related to something we can compute?

X: Can you give more details on how it works? How do you interpret a query?

SW: We’ve had to build our own big edifice of linguistic processing to handle what we want. I wasn’t sure if it was possible. I thought there might be too much ambiguity. You might have to see the person—see if they were dressed in a spacesuit or in surgeon’s garb—to get enough context. As it turns out, it hasn’t been a huge problem. There’s enough sparsity in human expression. By the time someone is asking anything real, you have enough context. The whole thing is full of heuristics. Any sequence [of terms or numbers] could be anything. But if it’s the name of a town with a population of 20, and it’s 6,000 miles away from where the query is being asked, that’s unlikely [to be relevant].

X: Where does Wolfram Alpha get the data with which it computes answers?

SW: The truth is very little of our data comes from the Web. The Web is a great place to know what‘s out there, but in the end, for every one of thousands of domains, we’ve gone to the primary data source and gotten the most original, most useful source. One exception to that, I suppose, is what happens with linguistic things. Wikipedia is really useful to us—if we have an entity, a chemical, a movie, what do people actually call this?

The reason that Wolfram Alpha has been at all possible for us is we’re starting with Mathematica. Which has this pretty complete collection of formal algorithms in areas like geometry, image processing [and many others]. We start with Mathematica, and we build onto it. Mathematica deals with pure knowledge, but there’s also very specific knowledge. (What is the actual approximation to the flow over an airfoil, for example.)

Wolfram Alpha is implemented in 7 million lines of code. There’s a lot more work to do. We try to always get to the frontier of what can be computed. We’re going to get it to the best it can be done. But in 2009, computers just aren’t fast enough to compute some pieces. [Compare 7 million lines to the four lines of code that Wolfram thinks might underlie the workings of the entire universe---Eds.]

X: Let’s step back for a minute. How is Wolfram Research doing financially?

SW: We have 600-something employees. I was shocked at how large our company Christmas party was. We’ve been lucky enough to be profitable for 21 years, since 1988. That’s the reason we were able to do Wolfram Alpha. If I had gone to venture capitalist friends of mine, I don’t know what would have happened. [See this account of what happened in his first company, which went the VC route---Eds.] This was all internally funded. This year, Mathematica has been doing really well. Maybe we do well in recessions because people think more then.

X: How is the partnership with Bing going? Will we see more deals with search engines and other websites?

SW: It’s in early days. We’re starting to see some Wolfram Alpha content showing up as part of the big search engine experience. I think there’s a nice complementarity with computable knowledge with what search engines are trying to do. Expect a bunch more things along those lines.

We have an API [application programming interface] starting to get used by a bunch of people. (It’s the basis for the Bing partnership and the Wolfram Alpha iPhone app). One thing coming out soon is the first step in a big arc of taking what we’ve done with Wolfram Alpha and merging it with the precise programming capabilities we have in Mathematica. The first thing will be a widget builder. Like a mortgage calculator, or the distance to the moon, or where I am on some medical distribution curve. Normally you do some work making a Web interface, but then the real hard work is to make the interface connect to something to do something. We’ve done that.

You publish a piece of Java script on a page, and there’ll be a widget you can type into and it connects to Wolfram Alpha. There’s a great intellectual problem which I haven’t completely solved: On one hand we have Wolfram Alpha that’s very, very broad and quick—I’ve got one question— and on the other hand we have Mathematica, that’s very precise and formulated in a way you can really build on. So the very interesting thing is, what’s between these two extremes? An example is future versions of Mathematica—where you can type free-form linguistics, and it turns into Mathematica formulations you can pick out. You type it in in English, and you get back a piece of code.

From a business point of view, there’s the API and deployment on lots of different devices and platforms and use in other programs. The widget builder is for anyone who’s building a business.

X: So what are the next big steps for Wolfram Alpha?

SW: It was a very difficult decision when we should release Wolfram Alpha into the wild. It had to be good enough for people to see where it was going. But we knew after it’s out in the wild, we’d have a better idea. We can analyze hundreds of millions of queries streaming through the thing. In linguistics, we can learn the language people use to make queries. Before that, we used Web corpuses. Now we’re in a position to really learn that. We can see what domains people expect us to have that we don’t yet have. That helps us prioritize our development work. It is notable to me that it’s never a pure “turn the crank” thing. You might think by the time you’ve done a few thousand domains, the next one will be easy. But always, some new issue comes up which requires a domain expert and some thought.

We’ve been expanding the domains we cover. The code base has grown. From a software engineering point of view, we have a smooth thing going. A new code base is released every week. Data feeds are being updated every second. We’re seeing evolution in Wolfram Alpha itself, and in the user base for how to use Wolfram Alpha. We can make the system adapt to the users, but also the users will change and adapt to us. It’s a co-evolution, in different professions—medicine, engineering. We’ll see more drilling [down] of those spaces.

We have a healthy volunteer network of people helping us with all kinds of data questions, helping us analyze anomalies in data. I expect that will grow. It’s a nice way to augment what we have internally.

The iPhone app is doing quite nicely. I was quite pleasantly surprised. It feels very different [from the Web version]. It’s kind of amazing you can get this thing out of your pocket and type in and get stuff back that, if you showed it to a technical person from years ago, they would think it’s a bizarre, impossible object.

Another big direction which has taken off much more quickly than I expected is enterprise Wolfram Alpha stuff. It has been recognized remarkably quickly by CEOs and CIOs: “What can you do with the terabytes of data we have at our company?” There’s not anything announced, but a whole bunch of companies we’re working with. The big challenge for us is ramping up. From the point of view of software engineering. We just delivered the first two Wolfram Alpha “appliances”—little pieces of our data center that can sit in someone else’s data center. From a business point of view, that’s a big growth direction for us.

X: What about from a technology standpoint? What’s next?

SW: In Wolfram Alpha, a lot of what it works out is “old science” based. There is an existing model for such and such economic process [for example]. These models are based on equations and mathematical kinds of things. But can we not only compute on the fly, can we also invent and create on the fly? That brings us into the world of searching programs and NKS. I simply don’t know if today’s computers are fast enough to pull this off in a useful way. We have created musical forms using this, and it has been picked up by serious composers. But there are lots of domains. Until you try it, you really don’t know. There’s a tremendous range of applications and lots of different business directions. My priority right now is trying to ramp up our business.

X: How mainstream will Wolfram Alpha become, compared with search engines like Google or Bing?

SW: These are complementary kinds of things. It’s like asking, how successful is science going to be in the world? It’s saying, what can you compute in the world? How could search engines become so important? When it becomes sufficiently easy to be a reference librarian hundreds of times a day.

I think the set of people for whom Wolfram Alpha is useful is very broad. It’s a sobering comment on the human condition what people are actually typing in [to search engines]. We don’t see the porn, the celebrity gossip, but we do see lots of stuff where people try to figure out, in a machine shop, what size of drill should they use to make a hole of a certain size. Or how far is it from here to there, or how does this compare to that.

I expect in time, the things we’re doing will become commonplace. My children are playing with Wolfram Alpha; it’s trivial to find out things. Gradually, they become well absorbed into the culture, and things become assumed. Even with NKS, in a different direction, I wrote in the preface, all these things that when the book comes out will seem shocking, in time will seem completely obvious and commonplace.

X: Shifting gears to big science: Are physicists at the Large Hadron Collider (LHC) using your computational techniques?

SW: There’s a lot of Mathematica usage. I’d expect LHC people would use [NKS] on their laptops for searching the space of models. It’s for the future of NKS to figure out if something bizarre is seen at LHC.

X: Ultimately, what do you think they will find? (A potential discovery would be the Higgs boson, the so-called “God” particle that gives matter its mass—and the only element of the Standard Model of particle physics yet to be observed.)

SW: I’ll be disappointed if the Higgs boson is discovered. I’ve never been very keen on that theory.

Gregory T. Huang is Xconomy's Deputy Editor, National IT Editor, and the Editor of Xconomy Boston. You can e-mail him at gthuang@xconomy.com. Follow @gthuang

By posting a comment, you agree to our terms and conditions.