Lexalytics Digests Wikipedia, Sees Text Analysis Markets Broaden to Include Search, Travel, Law
You can learn a lot from Wikipedia, despite all its faults—and Jeff Catlin’s company has done just that.
Boston-based Lexalytics said today that its latest text-analysis software incorporates insights from combing through Wikipedia’s entire user-generated online encyclopedia for relationships between words, phrases, and their meanings. The company says its new software, which powers products used by big brands and other organizations to quantify the meaning and sentiment behind conversations on the Web, will be available this summer.
Before diving into the technology, here are some business metrics. Lexalytics is “quite profitable this year,” according to Catlin, the firm’s CEO. It saw 65 percent revenue growth last year, and is continuing to grow in 2011 in a number of new markets, he says (more on that in a minute). The company currently has 18 employees in the U.S. and U.K. Catlin himself splits time between offices in Boston and Amherst, MA.
About a year ago, my colleague Wade profiled Lexalytics and its humble beginnings in 2003, when Catlin was running an engineering group at LightSpeed Software, a Woburn, MA-based content management startup. LightSpeed was consolidating and closing its East Coast operation, but Catlin convinced the firm’s investors to let him run his division as a separate company (which became Lexalytics).
Lexalytics has gone on to provide “sentiment analysis” technology for companies that help brands and organizations monitor and manage their reputations online, such as Cymfony, ScoutLabs, and social-media firms like Bit.ly. Lexalytics recently landed Newton, MA-based TripAdvisor as a customer and partner; TripAdvisor (which is being spun out of Expedia as a separate public company) uses Lexalytics’ software to understand user sentiment—what people like and don’t like—in their online reviews of hotels, restaurants, cruises, and other attractions.
But the opportunity for Lexalytics goes far beyond understanding sentiment in blogs, tweets, and other social media. As I see it, the technology is really about getting a computer to understand the meaning of sentences and the deeper relationships between words and phrases in documents. So it’s about classifying “wonderful day” as positive and “horrible disaster” as negative, sure, but it’s also about identifying names and acronyms; detecting sarcasm or hype amidst praise or insults; and being able to classify things like “cabin” as a type of room on a ship, “chicken tikka masala” as Indian food, “golf club” as having to do with outdoor recreation, and “Red Sox” as a (currently bad) baseball team.
The technologies behind such “semantic” analysis of text—natural language processing, machine learning, and statistical modeling techniques—have been around for more than a decade. But they have continued to improve in recent years, enhanced in part by the availability of big, user-generated databases, like Wikipedia. And, crucially, the market for these technologies has started to open up. Which also opens the door for plenty of competition, of course.
For its part, Lexalytics is doing more in the field of Web search these days, Catlin says, working with companies like Endeca, the Cambridge, MA-based e-commerce search firm. Other emerging markets for Lexalytics’ software include electronic discovery and digital forensics for the legal industry, and electronic medical records in healthcare. “We’re broadening out across a lot of things [besides sentiment analysis],” Catlin says. “The new leads are extremely horizontal.”
That’s because “the way people speak and text is a generic problem,” Catlin says. And his company’s technology is fundamentally about understanding “who’s talking, to whom, and what are they talking about?”
But one longstanding problem with semantic analysis is that it’s hard to apply the same technology across different sectors (travel and law, say), where similar words can have widely different meanings and connotations. Perhaps Lexalytics’ effort with Wikipedia—and IBM’s much larger project with Watson, the Jeopardy-playing machine—are examples of a new, brute-force approach to solving the problem using more computational horsepower. (It worked for chess, after all.)
The question for Lexalytics is how long it can keep its competitive advantage over big players like IBM, Google, and Microsoft—in terms of what its technology does best—as well as other hungry startups vying for a piece of the text-analysis pie.
“We’re geeky engineer types. You can’t ‘protect’ your [intellectual property] per se, so you do really good work and you try to build a mousetrap and stay ahead of the game,” Catlin says. “The truth is, what IBM built with Watson took them a long, long time. We’re going to put out something that will enable people to build a good chunk of what they did. But it’s not easy stuff.”
Indeed, Catlin is talking about solving a very deep problem—and one that could pay off in unexpected ways down the road. “If you can take verbal questions or written words and understand what they’re asking,” he says, “then the applications are as wide as your brain.”