The Encyclopedia of Life: Can You Build A Wikipedia for Biology Without the Weirdos, Windbags, and Whoppers?
After 16 months in business, Xconomy has published about 3,400 pages of articles. At this pace, we’ll get to 1.8 million pages in about 700 years. But the Encyclopedia of Life—a new scientific and educational website that will have one page for every species on the planet—intends to hit that number in just 10 years. And even then, it will only be getting started: while biologists have named, described, and catalogued some 1.8 million critters, they estimate that another 8 million species of plants, animals, fungi, bacteria, protists, viruses, and archaea remain undiscovered.
That’s a seriously big website. We’re talking Wikipedia big. (The famous free online encyclopedia, begun in 2000, has 2.6 million articles in English alone, and over 10 million all told.) Which means the organizers of the Encyclopedia of Life (EOL for short) are going to have to throw out the old playbook in taxonomy—the slow and meticulous science of species classification, born 250 years ago this year with the publication of Karl Linnaeus’ Systema Naturae—and turn to the techniques of Web 2.0.
Specifically, they’re going to have to rely on thousands of amateur naturalists to collect and submit data for the encyclopedia. But that creates a fascinating problem: How do you partake of the revolution in “user-generated content,” as Wikipedia has done, while keeping the material you publish wholly factual and stable—as it ought to be if the Encyclopedia of Life is to be a useful resource for scientists, students, and policy makers, and as Wikipedia manifestly is not? I’ve been reading up on the EOL project this week, and as far as I can tell, the organizers haven’t yet worked out a thorough answer to that question.
Of course, not all of the material in EOL will be user-generated. A big part of the concept for the encyclopedia, a $40 million project funded by the John D. and Catherine T. MacArthur Foundation and the Alfred P. Sloan Foundation, is that it will be a classic Web 2.0 “mashup.” You’re probably familiar with RSS news readers, which assemble headlines and stories from hundreds of separate websites; in a similar way, EOL will use Web-based aggregation technology under development at the Marine Biological Laboratory in Woods Hole, MA, to suck in and recompile information from existing online species databases, such as the uBio NameBank, iSpecies, FishBase, AmphibiaWeb, and North American Mammals.
Another big part of EOL involves digitizing millions of print books and journal articles in 10 of the world’s leading natural history libraries, including the Harvard University Botany Libraries and the Ernst Mayr Library of the Museum of Comparative Zoology here in Cambridge (the full list is here). The hope is that once this information has been scanned, run through optical-character recognition software, and automatically tagged with the appropriate metadata, it will be possible to access passages from the scientific literature from the relevant species pages in EOL. Say you’re researching Nicrophorus americanus, the American Burying Beetle—a colorful but critically endangered species for which there’s already a very nice page at EOL. The encyclopedia may lead you to, among other resources, a detailed description published in the Proceedings of the Academy of Natural Sciences of Philadelphia in 1853.
The problem is that scanning, classifying, editing, and mashing together all of that material is going to take years, especially given that it’s all being done on the cheap (EOL and digitization partner, the Biodiversity Heritage Library, have nothing like the amount of money Google is spending on its Book Search project). But “EOL must show some results and value quickly” if it is to be taken seriously by scientists, funders, and the public at large, as the project’s own planning documents acknowledge.
It’s hard to see how the current plan, spelled out in the planning documents and the project’s FAQ page, will accomplish that. Each species page is to have a volunteer “curator,” a competent scientist responsible for authenticating information submitted by contributors before it’s published. Unfortunately, the world population of trained taxonomists is only about 6,000, according to E. O. Wilson, the famed Harvard biologist who conceived EOL and is the project’s honorary chairman. So if you left the curating to the real experts, they’d each have … Next Page »