Google Gets A Second Brain, Changing Everything About Search

12/12/12Follow @wroush

(Page 2 of 5)

take many new types of relevancy signals into account. The improvements were so dramatic that Singhal was later made a Google Fellow and awarded a prize “in the millions of dollars,” according to journalist Steve Levy’s account of Google’s early years, In the Plex.

Still, despite these accomplishments, Singhal says the history of search is basically one big kludge designed to simulate actual human understanding of language.

“The compute power was not there and various other pieces were not there, and the most effective way to search ended up being what today is known as keyword-based search,” Singhal explains. “You give us a query, we find out what is important in that query, and we find out if those important words are also important in a document, using numerous heuristics. This process worked incredibly well—we built the entire field of search on it, including every search company you know of, Google included. But the dream to actually go farther and get closer to human understanding was always there.”

After his initial rewrite of Google’s relevance algorithms, Singhal went on to tackle other problems like morphological analysis: figuring out how to reduce words like “runner” and “running” to their roots (“run,” in this case), in order to perform broader searches, while at the same time learning how to sidestep anomalies (apple and Apple obviously come form the same root, but have very different meanings in the real world). Universal search came next, then autocomplete and Google Instant, which begins to return customized search results even before a user finishes typing a query. (Type “wea,” for example, and you’ll get a local weather forecast.)

“But throughout this process, one thing always bothered us,” Singhal says. “It was that we didn’t ever represent the real world properly in the computer. It was still all a lot of statistical magic, built on top of runs of letters. Even though it almost looked like an incredibly intelligent computer, and we did it far better than anyone, the truth was it was still working on strings of letters.”

This frustration wasn’t just a matter of intellectual aesthetics. Singhal says that by 2009 or 2010, Google had run up against a serious barrier. The goal of the company’s search engineers had always been to connect users with the information they need as efficiently as possible. But for a large group of ambiguous search terms, statistical correlations alone couldn’t help Google intuit the user’s intent. Take Singhal’s favorite example, Taj Mahal. Is the user who types that query searching for the famous mausoleum in Uttar Pradesh (Singhal’s home state), the Grammy-winning blues musician, or the Indian restaurant down the street? Google’s engineers realized that using statistics alone, “we would never be able to say that one of those [interpretations] was more important than the other,” Singhal says.

“I’m very proud of what we achieved using the statistical method, and we still have huge components of our system that are built upon that,” Singhal says. “But we couldn’t take that to the system that we would all want five years from now. Those statistical matching approaches were starting to hit some fundamental limits.”

What Google needed was a way to know more about all of the world’s Taj Mahals, so that it could get better at guessing which one a user wants based on other contextual clues such as their location. And that’s where Metaweb comes into the story. “They were on this quest to represent real-world things, entities, and what is important, what should be known about them,” says Singhal. When Google came across the startup, it had just 12 million entities in its database, which Singhal calls “a toy” compared to real world. “But we saw the promise in the representation technology, and the process they had built to scale that to what we really needed to build a representation of the real world.”

The Database of Everything

Metaweb Technologies has a fascinating history of its own. The company was born as a 2005 spinoff of Applied Minds, the Glendale, CA-based consulting firm and invention factory founded five years before by former Disney R&D head Bran Ferren and former Thinking Machines CEO Danny Hillis. John Giannandrea, a director of engineering at Google who was Metaweb’s chief technology officer, says the idea behind the startup was to build “a machine-readable encyclopedia” to help computers mimic human understanding.

“If you and I are having a conversation, we share a vocabulary,” says Giannandrea, who came to Metaweb after CTO roles at Tellme Networks and Netscape/AOL. “If I say ‘fiscal cliff,’ you know what I mean, and the reason is that you have a dictionary in your head about ideas. Computers don’t have that. That’s what we set about doing.”

The knowledge base Metaweb built is called Freebase, and it’s still in operation today. It’s a collaborative database—technically, a semantic graph—that grows through the contributions of … Next Page »

Wade Roush is Chief Correspondent and Editor At Large at Xconomy. You can subscribe to his Google Group or e-mail him at wroush@xconomy.com. Follow @wroush

Single Page Currently on Page: 1 2 3 4 5 previous page

By posting a comment, you agree to our terms and conditions.

  • Prasad

    To boldly go where no one has gone before… Google is doing it… without waiting for the 24th century…

    • gesster

      and to steal all of your data…

  • foutight

    Good guy google!

    • gesster

      you obviously don’t know a whole lot about google.

      i’d suggest you go to duckduckgo to see some of the evil that your precious google does

  • John Ellis

    Brilliant researched article thanks! I had forgotten how much google had changed. Google ability to buy in change marks it as different. It is by far the safest way to advent Artificial Intelligence. When I started on the internet no-one trusted credit card transactions and there weren’t search engines. That’s only 13 years ago!

    I challenge the idea that memory and data storage is necessary: with enough calculation power and enough speed, you could generate any memory string to order in any context.

    That’s difficult philosophically but it’s true.

    So one possible future of google is calculation on vast scales, possible with super-recursive algorithms or quantum computing.

  • Sunil

    Great article. Google really has changed many things but It is more comfortable for a ordinary person.

  • Jacob Varghese

    Nice article. Makes in-context content creation that much more important for future online health.

  • http://www.about.me/kennethtrueman Kenneth Trueman

    I was contributing to Freebase back in 2007. Thought it was pretty cool then. Went back and did some more just now. Thanks for the article that explains how this stuff is helpful.

  • rahulanand

    Way to go… KG is definately the future :)

  • gesster

    ai probably isn’t even possible, how does google account for the chinese room? google is evil and terrible… run away, run fast