Google Gets A Second Brain, Changing Everything About Search
(Page 2 of 5)
take many new types of relevancy signals into account. The improvements were so dramatic that Singhal was later made a Google Fellow and awarded a prize “in the millions of dollars,” according to journalist Steve Levy’s account of Google’s early years, In the Plex.
Still, despite these accomplishments, Singhal says the history of search is basically one big kludge designed to simulate actual human understanding of language.
“The compute power was not there and various other pieces were not there, and the most effective way to search ended up being what today is known as keyword-based search,” Singhal explains. “You give us a query, we find out what is important in that query, and we find out if those important words are also important in a document, using numerous heuristics. This process worked incredibly well—we built the entire field of search on it, including every search company you know of, Google included. But the dream to actually go farther and get closer to human understanding was always there.”
After his initial rewrite of Google’s relevance algorithms, Singhal went on to tackle other problems like morphological analysis: figuring out how to reduce words like “runner” and “running” to their roots (“run,” in this case), in order to perform broader searches, while at the same time learning how to sidestep anomalies (apple and Apple obviously come form the same root, but have very different meanings in the real world). Universal search came next, then autocomplete and Google Instant, which begins to return customized search results even before a user finishes typing a query. (Type “wea,” for example, and you’ll get a local weather forecast.)
“But throughout this process, one thing always bothered us,” Singhal says. “It was that we didn’t ever represent the real world properly in the computer. It was still all a lot of statistical magic, built on top of runs of letters. Even though it almost looked like an incredibly intelligent computer, and we did it far better than anyone, the truth was it was still working on strings of letters.”
This frustration wasn’t just a matter of intellectual aesthetics. Singhal says that by 2009 or 2010, Google had run up against a serious barrier. The goal of the company’s search engineers had always been to connect users with the information they need as efficiently as possible. But for a large group of ambiguous search terms, statistical correlations alone couldn’t help Google intuit the user’s intent. Take Singhal’s favorite example, Taj Mahal. Is the user who types that query searching for the famous mausoleum in Uttar Pradesh (Singhal’s home state), the Grammy-winning blues musician, or the Indian restaurant down the street? Google’s engineers realized that using statistics alone, “we would never be able to say that one of those [interpretations] was more important than the other,” Singhal says.
“I’m very proud of what we achieved using the statistical method, and we still have huge components of our system that are built upon that,” Singhal says. “But we couldn’t take that to the system that we would all want five years from now. Those statistical matching approaches were starting to hit some fundamental limits.”
What Google needed was a way to know more about all of the world’s Taj Mahals, so that it could get better at guessing which one a user wants based on other contextual clues such as their location. And that’s where Metaweb comes into the story. “They were on this quest to represent real-world things, entities, and what is important, what should be known about them,” says Singhal. When Google came across the startup, it had just 12 million entities in its database, which Singhal calls “a toy” compared to real world. “But we saw the promise in the representation technology, and the process they had built to scale that to what we really needed to build a representation of the real world.”
The Database of Everything
Metaweb Technologies has a fascinating history of its own. The company was born as a 2005 spinoff of Applied Minds, the Glendale, CA-based consulting firm and invention factory founded five years before by former Disney R&D head Bran Ferren and former Thinking Machines CEO Danny Hillis. John Giannandrea, a director of engineering at Google who was Metaweb’s chief technology officer, says the idea behind the startup was to build “a machine-readable encyclopedia” to help computers mimic human understanding.
“If you and I are having a conversation, we share a vocabulary,” says Giannandrea, who came to Metaweb after CTO roles at Tellme Networks and Netscape/AOL. “If I say ‘fiscal cliff,’ you know what I mean, and the reason is that you have a dictionary in your head about ideas. Computers don’t have that. That’s what we set about doing.”
The knowledge base Metaweb built is called Freebase, and it’s still in operation today. It’s a collaborative database—technically, a semantic graph—that grows through the contributions of … Next Page »