Weaving Words with Wordle: A Talk with IBM’s Jonathan Feinberg

3/16/09Follow @wroush

Xconomy chose to set up shop in Kendall Square because we wanted to be at the epicenter of investment and innovation in the Boston area. But it was just luck that we ended up right across the street from IBM’s Cambridge research facility at 1 Rogers Street, which is home to both the Collaborative User Experience (CUE) Research group and the new Center for Social Software. The creative software engineers, user interface designers, and visualization researchers in the group have come up with a whole gallery of fascinating tools for studying the way business people (and average Web surfers) interact with data, including Many Eyes, a community portal for visualization experiments.

But one of the Web-based applications included in Many Eyes has taken on a surprising life of its own outside IBM. It’s a free visualization tool called Wordle, where users can dump a bunch of text into a window and then see it automatically yet artfully arranged into a cloud of words, with the size of each word corresponding to its frequency in the original passage. As the Wordle site puts it, it’s “a toy for generating ‘word clouds’ from text that you provide.”

These aren’t your father’s tag clouds: they’re playful, colorful, and made from attractive fonts. And they fill up the space around their center points in a clever way that often seems to say something profound about the nature of the ideas in the text. In fact, Wordle achieved minor fame during the 2008 presidential campaign, when several newspapers and other media outlets used the tool to analyze publications and speeches by the major candidates. It’s also found its way into popular culture: a Google image search for Wordle diagrams turns up more than 200,000 examples. (The Wordle word cloud on this page is based on the text of this article, and the one on page 2 is based in Lincoln’s Gettysburg Address.)

I dropped by the CUE Research group’s offices several weeks ago for a conversation with Wordle’s creator and shepherd, Jonathan Feinberg. Though Feinberg is a senior software engineer at CUE, he developed Wordle as a personal project, and maintains it on a server outside of IBM. Martin Wattenberg, a mathematician who lead’s the CUE Research group’s Visual Communications Lab, was also on hand for the talk. Below is an abridged transcription.

Xconomy: Where did Wordle come from?

Jonathan FeinbergJonathan Feinberg: It came from a Lotus project I worked on called Dogear. It was a social bookmarking engine. In every piece of social software there’s a tag cloud, and I have implemented some, but I thought they were ugly and boring. I was prompted by Dogear to think about the ways you could fit words together on a page. The core bit of code I wrote was a Java applet within Dogear called TagExplorer, but when Dogear came out as a product and TagExplorer was not part of it, because Java applets are considered too slow to load. Then two years later, while clearing out one of my workspaces, I stumbled onto the code and thought I’d like to do something with it. That’s when I built Wordle. So, the sand that formed the pearl was a tag cloud, but it’s really divorced from that idea now.

There was no way to put it out there as a branded IBM thing. I didn’t even begin working on the real Web application for Wordle until I had gotten permission to use the code externally. The lawyers and the product managers didn’t want this code for a product, and I got permission to use it non-commercially—which serviced me well, because I don’t like doing business. So Wordle the Web app belongs to me personally, not ManyEyes.

Martin Wattenberg: But ManyEyes has Wordle in it, and it’s become one of the most popular visualization types on ManyEyes.

X: What’s really wrong with conventional tag clouds?

MW: If you look at how the academic community has viewed things like tag clouds, it’s been with a certain amount of suspicion and skepticism about what they convey. And that skepticism has been backed up by studies. If you are going to try to convey a list of word frequencies and measure what people remember about a text, it’s not clear that tag clouds give you any benefit over, say, a list of words ordered by frequency.

Wordle word cloudThat said, when Wordle went up, Jon positioned it as a toy. To me there is a fine line between a toy and an experiment. We’ve had thousands of people from all over the Web experimenting with Wordle and finding all sorts of things to do with it. There is a blog on the top 25 uses for Wordle. For me that is an indication that there is a lot more to it than just being a toy. There are teachers who have found that it’s extremely useful as part of their teaching. And maybe if you are analyzing a text very serious, a Wordle diagram has no advantage over a list of word frequencies. But if you are trying to convey something emotionally, Wordle has a huge advantage.

There is this analogy with cameras. A Wordle is almost like a Polaroid camera for text. It lets people take snapshots that mean something to them. We’ve found more than one blog where guys talk about getting “boyfriend points” by making Wordles out of their love letters to their girlfriends. All of these things point to really interested new uses of visualization.

JF: But how am I ever going to get back all of the husband points I lost by spending all that time making Wordle?

MW: One of the interesting things we’ve seen is a bunch of fairly high profile media outlets like the Boston Globe and the Washington Post have used Wordles to illustrate articles about speeches. That indicates that it is in some sense more than a toy. Poeple who are professional political analysts find this an interesting way to illustrate their points. Jon looked at tag clouds, thought about what was wrong with them, and created this fantastic alterantive. And by putting it on the Web, we’ve seen ways that it’s useful that I don’t think we would have invented ourselves or trusted. It’s a little counterintuitive, and it’s certianly contrary to the usual paradigm of interface design, were you have a task in mind and an audience in mind before you start.

JF: Calling it a toy was a defensive measure. I don’t want to make any claims for Wordle as a visualization tool that gets you an accurate idea of your text. People write to me and say ‘Wow, I made a Wordle and now I see wasn’t writing about what I thought I was writing about.’ But I feel very resistant to those kinds of analyses.

MW: That raises a research question about what people are getting out of this way of visualizing text and what they think they’re getting. Wordle really is the starting point of an important line of research. We are looking at the way it’s being used, and are in the midst of writing our first paper on it. By putting it out in public without making any strong claims about it, we get to see how people use it. And this fits very well with the goals of the Social Software Center, which is a living laboratory where we put things in public and see how they’re used.

X: But you just said Wordle is maintained outside of the center, by Jon himself.

JF: Yes, but the center is creating a venue inside the company for projects like Wordle. Wordle just predates that effort. I am one of the point people on trying to set up a legal, policy, and technology infrastructure for making things like this happen inside IBM.

X: Were there any big technical hurdles to creating Wordle?

JF: The vast majority of the work was in creating a good user experience. The core layout algorithm is basically the same as when I first made it. In a normal tag cloud, you’re lining up words from the top left to the bottom right. You take up as much vertical space as the words need. You get to the end of the line and you do a line break.I wanted to see how much of the space I could fill in. The basic technique came from Martin, who has used it in many visualizations and artworks. It’s a randomized algorithm where you throw stuff on the screen and if it’s overlapping with something else, you move it around a bit. I had to add a lot more computer science to it to make it fast enough. If you do the purely random algorithm thing it will take five minutes and in Wordle it takes a few seconds. It treats the words as graphics. Internally they become shapes, and I manipulate those shapes.

The concurrency is extremely complicated. Managing things that are happening at the same time, without creating inconsistent stages in the memory, took a lot of thought. Coming up with and exposing different layout options about whether words are vertical or horizontal—all that stuff took a ton of work.

Wordle diagram of the Gettysburg AddressX: You’re talking about the way the user controls the appearance of a Wordle?

JF: A user has influence but not control. There are a couple of different settings. One is hte orientation of the words—horizontal or vertical or some combination. There is the question of where the words want to be on the screen; two different choices are exposed in the Web ap, one where they’re randomly distributed across the center line and the other where they prefer to be in alphabetical order. You have control over the font and the palette, although you don’t have control over how the colors get assigned to words.

X: Who owns the copyright on a finished Wordle? The text probably belongs to the person using the software, but many of the design decisions are built into the software. So the program seems to raise an interesting question about ownership.

JF: There is a little boilerplate language about how you can embed a link to Wordle in your blog. Aside from that you have to either take a screen shot, or print to PDF. You can distribute that however you like. The license says you can take any image you make and do anything you like, as long as you give attribution to the website Wordle.net.

MW: On ManyEyes, one of the ways we handle this is to have you agree to the terms of service when you sign up, and assert that you have the right to reproduce whatever data you are putting up. But yeah, it’s one of the interesting aspects of this new kind of transformation that can take place on the Web. You are mixing authorship.

X: What if you make a Wordle using someone else’s words, what then?

JF: I think you’d be hard pressed to find a lawyer who would assert that a Wordle is an infringement. It’s just a list of words. It’s fair use.

MW: What is amazing to me is the range of uses people are putting it to. You can do a lab study of something like Wordle and get a result where you don’t see any benefit, but you can’t extrapolate from that to the field. It shows how important it is to do this sort of experiment in the world at scale. The world is very creative, and it will figure out things to do with the technology that would never come up at all from just looking at a visualizaiton widget. And you have to embrace that.

A good example of this came from the Boston Globe. They did a beautiful comparison of the Obama and McCain campaign websites, and what they discovered was that McCain was mentioning Obama a whole lot, and Obama was not mentioning McCain very much at all, and that showed up loud and clear in the two Wordles. It was like looking at a photo. It immediately got that point across and made you want to look further.

JF: One of the ways that a Wordle functions is that you get not only a picture of the relative frequency of words but you can get happy random juxtapositions of words that are conducive to associative thinking. It’s generating ideas about something that otherwise wouldn’t have occurred to you. It’s like a data toy. I don’t think that it is fair to call it a “mere” toy, becuase play is a very powerful thing. In calling it a toy, I’m not saying it’s not something valuable and useful.

X: I wonder what a professor of literature or a rhetorician might have to say about Wordle, and the way it takes texts and mixes up their meanings and suggests new ones that the original author may never have intended.

MW: You can argue that chopping up the text into individual words is an assault on the text. In some sense, there is a natural controversy there. But focusing on what a rhetorician would say about this may not be the right focus. Trying to figure out why it’s so appealing has to do with other things. If you hear a speech read aloud, it’s read with expression; there is a cadence to every expression that is unique. Wordle brings back some of that unique cadence on the page. It’s almost a return to the richness of the spoken word.

JF: You’re going much further than I’d go.

MW: Well, this is just me speaking and speculating. This is an active area of research for us. And this general line of investigation, into how you illustrate text online, is central to what the Visual Communication Lab is doing right now. The traditional visualization question is how do you see a terabyte database all at once? How do you see your customer transactions all at once? The future qestion may be, how do you see a million blogs at once, or how do you see every book ever written all at once? Those are very important questions for visualization.

X: Do you plan to add any features to Wordle, or eventually come out with Wordle 2.0?

JF: I fix bugs, but I haven’t added any features for a very long time. If there is something interesting and important to do, I’ll do it, sure. But it’s so useful and so deeply enjoyed by so many people right now, why break it? I’ve said no to many features—not just in the interest of protecting my own time, but also in the interest of protecting how easy it is to use for newcomers.

Wade Roush is a contributing editor at Xconomy. Follow @wroush

By posting a comment, you agree to our terms and conditions.