The Web’s Last Word on Words

10/26/11Follow @wroush

Wordnik (wərd-nik) noun. 1. A person associated with or characterized by a love of words, word usage, linguistics, or lexicography. 2. A startup company headquartered in San Mateo, California. Origin: word + -nik (Yiddish, from Russian). Cf. beatnik, peacenik.

On the Internet, it’s hard to get people to slow down enough to use actual words, rather than just acronyms (LOL, IMHO, WTF). Which makes it all the more unusual to see someone like Erin McKean, the former editor-in-chief of the New Oxford American Dictionary, founding a Silicon Valley startup. Lexicographers are a pretty rare breed to begin with—”You could fit us all in a 737 and still have room left over for copy editors,” McKean jokes—but you’re even less likely to find them patrolling the Web, where crass and careless language has long since overpowered concision and eloquence.

But maybe that’s exactly why the Web needs Wordnik. McKean’s company is dedicated to two interrelated ideas about language. The first is that the best way to help people understand words is to show how they’re used in context, with the real-time Web and social media—everything from Wiktionary to Twitter—comprising a big part of that context. The second is that once you’ve built the technology needed to maintain a living, online repository of words in context—a “dictionary on steroids,” as the startup puts it—you’re also in a position to process, enhance, and illuminate text of all kinds.

So if you go to Wordnik as a reader, you’ll find the world’s largest collection of English words and usage examples—6.7 million and counting, more than six times the size of the Oxford English Dictionary. But if you go there as a publisher or a software developer, you’ll find a growing set of tools for making your own content more interesting and actionable. Barnes & Noble, for example, uses Wordnik’s database to power the Word of the Day app on the Nook Color, its e-reader. Blekko, the geeky yet fashionable new search engine, calls on Wordnik’s database for a definition every time a user does a search using the “/define” slashtag.

Erin McKean, Wordnik Founder

When outside Web services like Blekko or the Nook connect to Wordnik, they tap into the same nexus of words and word associations—McKean calls it the Word Graph—that powers Wordnik’s own online dictionary. Like the Web crawlers at Google, Wordnik’s software is constantly assimilating newly published content from trusted sources and adding it to the Word Graph. (The graph sucks in new words at the rate of 8,000 per second, according to Wordnik’s technical co-founder Tony Tam.) This allows the company to surface word relationships and examples that you’d never find through a traditional dictionary lookup. “The domain of lexicography has traditionally been a very manual one,” says McKean. “But taking the same process and adding computational heft means you can do it for a lot more words, and build a really rich graph”—a kind of social network diagram or neural network of words.

Silicon Valley investors are betting big on Wordnik’s ability to change the way people interact with words. In July the company collected $8 million in Series C funding, with Lucas Venture Group, Mohr Davidow Ventures, Floodgate, Baseline Ventures, and individual investor Roger McNamee contributing; the round brought the startup’s total venture funding to $12.8 million. To go along with all that new money, the company brought in some veteran Silicon Valley management help: former Gaia Interactive sales chief Joe Hyrkin was appointed as CEO.

It’s all evidence that after more than three years of research and development work on the Word Graph, Wordnik is shifting into business-development mode and scouting for partners who’ll pay for access to the company’s reference services. “We are now in the process of taking this set of services and rolling it out to major publishers and building a business,” says Hyrkin. “A lot of the under-the-hood stuff, over the next year, is going to become more apparent.”

To which McKean adds, in a characteristic bit of word play: “It will be over the hood—mounted right on the hood.”

One of the founding precepts at Wordnik was that language evolves far faster today than traditional dictionaries can keep up. Thanks in part to the Web itself, words can pop up out of nowhere, experience meteoric careers, and flame out just as quickly. Think of “birther,” which used to be slang for “heterosexual” but now connotes someone who questions whether President Obama was born in the United States, or “sheenius,” coined by some anonymous Internet punster this spring to distill Charlie Sheen’s unique brilliance at turning an attention-getting personal meltdown into a career-advancing move. By constantly scouring the Web for fresh text and feeding it into the Word Graph, Wordnik can detect such novel words and meanings well before human lexicographers have a chance to get a grip on them.

But to make sense of it all, the company had to come up with some new technology. “When I joined the company three years ago, what Erin was out to solve was figuring out what things mean automatically, from the relationships in the text,” says Tam. “Almost out of necessity, we came up with this idea of a word graph that relates all of these ideas with other ideas. In this graph we have tens of millions of words and on the order of 50 million relationships.”

It's raining words: The Wordnik front page displays a cascade of recently-searched words.

The graph knows, for example, that “sheenius” can mean “genius,” but that it can also mean “crazy person.” Inside the Word Graph, even simple words like “run” possess a radiating web of probabilistic links to various concepts—running down a street, running out of money, a run on a bank. And the webs are always growing, fed by sources both highbrow and low, such as NPR, Simon & Schuster, Forbes, CNN, the UK’s Guardian newspaper, and Twitter.

“The words go in as fast as we can find them,” says McKean. She calls the graph’s job “supernatural language processing,” a pun on the computer-science discipline of natural language processing.

It’s all fodder for what is perhaps the Web’s most interesting dictionary. When you look up a word at Wordnik.com, you get definitions from sources like WordNet, the Century Dictionary and Cyclopedia, Wiktionary, and the American Heritage Dictionary; etymologies; examples from blogs, newspapers, and magazines; hypernyms (more generic or abstract terms); words found in similar contexts; a reverse dictionary (words that contain your word in their definition); tweets; related images from Flickr; audio pronunciations; lookup statistics; and even tags, comments, and lists generated by Wordnik’s community of 75,000 registered members. It’s all enough to make a word maven swoon.

But what’s the practical import of it all? “We are, right now, in the midst of answering that question,” says Hyrkin. “You can use these definitions and meanings to drive a whole range of activities.”

One early user of the Word Graph is TaskRabbit, the San Francisco-based site where users can farm out small jobs such as picking up groceries to vetted errand-runners. The two startups have a pair of investors in common—Floodgate and Baseline—but Tam says the partnership actually came about after he ran into TaskRabbit’s vice president of engineering, Brian Leonard, at their kids’ playground. “Brian said, ‘You guys know a lot about words—well, I have a word problem,’” says Tam.

The problem was that people who post tasks on TaskRabbit aren’t always very good at categorizing them, which makes it harder for errand runners to find suitable tasks on the site. “In TaskRabbit’s case, child care and baby-sitting and day care mean roughly the same thing, so we thought if we ran our classification algorithms against their text, we could do a good job of matching and categorizing,” says Tam.

Using Wordnik’s existing application programming interface, or API, the two companies built the auto-categorization system in a matter of weeks, he says. You can see the technology in action today at the bottom of any TaskRabbit task page, where there’s a section for “Tasks similar to this one.” Using a similar matching system, Wordnik can help TaskRabbit users decide how much money to offer for a given task, by finding examples of similar tasks in TaskRabbit’s database.

There are many companies like TaskRabbit where the words aren’t the main point, but where more efficient communication could make a big difference, Hyrkin and McKean argue. “Companies shouldn’t have to think about hiring an editor,” McKean says. “They should be able to use the content they’ve already got and get more out of it.”

Wordnik makes its API available to third-party developers, and there’s a growing set of Web and mobile apps that tap into the Word Graph, such as Panabee, a “brainstorm engine” that helps startups find available company names and Web domains. Many of these apps hit the market without any direct involvement from Wordnik, McKean says. “The best thing is when we find out about these apps after they’ve launched, and they’ve done well,” she says. Of course, only a certain number of free API calls are allowed—after hitting a certain threshold, developers need to contact Wordnik to talk about licensing terms.

Hyrkin and McKean say they’re working on a number of bigger deals with commerce and publishing partners, but they won’t be able to share details until the deals are inked, likely in 2012. “The larger they are, the less you want to talk about them until it’s done,” McKean says.

In a sense, Wordnik wants to grow into the BASF of digital text—it doesn’t make a lot of the words you use, but it wants to make the words you use more interesting. The ultimate sign of success? Seeing Wordnik itself turn up in dictionaries, and as more than just a noun. Says McKean, “We would like to be a verb—’Wordnik that.’”

Wade Roush is a contributing editor at Xconomy. Follow @wroush

By posting a comment, you agree to our terms and conditions.