Coalition of Boston Libraries Chooses the Un-Google Route to Digitization

9/28/07Follow @wroush

If there’s one thing New England has in great supply, it’s books. And that makes the area one of the battlegrounds in the digital library wars—the competition between commercial entities such as Google and Microsoft and non-profit groups such as the Internet Archive to secure agreements to scan, digitize, and distribute the world’s print literature. This week a large subset of the region’s libraries threw their weight behind the non-profit camp, signaling the growing resistance to Google’s professed goal of organizing “all the world’s information.”

The Boston Library Consortium, a group of 19 state, college, institutional, and university libraries around New England, announced Monday that it has chosen the Open Content Alliance (OCA) as the main technology partner in a massive effort to digitize public-domain books and other materials (that is, those not subject to copyright or no longer covered by copyright). The project—which will result in “a freely accessible library of digital materials from all 19 member institutions,” according to a statement from the consortium—will be based at a new facility at the Boston Public Library, built around automated book-scanning machines developed by the San Francisco-based Internet Archive.

Local library leaders greeted the announcement as a victory for taxpayers. “The Boston Library Consortium includes some pretty significant publicly funded institutions—the Boston Public Library, for instance, as well as the State Library of Massachusetts and the flagship state libraries of Connecticut and New Hampshire,” notes Ann Wolpert, director of libraries at MIT, which is part of the consortium. “These libraries, which are funded by taxpayers, really have a public purpose, and I think that for them, the argument of joining the OCA rather than going with one of the more commercial partners was simply a matter of being true to their funding base. Rather than put their work in a proprietary environment, searchable only by one search engine, [the consortium] felt strongly that it would be important to make sure that whatever they digitize should be available to taxpayers without constraint.” Libraries at the private colleges and universities belonging to the consortium, including MIT, Brandeis, Tufts, Brown, and Williams, “were happy to honor that point of view,” says Wolpert.

At stake for every library considering how to digitize its collections is the question of how to help readers find the material once it’s online. Google is credited by many members of the library community with kick-starting the entire digitization movement in 2004 with its massive Google Print project (now known as Google Book Search). But the same librarians have shied away from working with the search giant, largely due to the company’s requirement, mentioned by Wolpert, that books digitized via its factory-scale scanning effort be searchable exclusively through the Google search engine. Microsoft, which has its own book digization project, Live Search Books, imposes similar terms.

Yahoo and the Internet Archive, a non-profit created by software entrepreneur and Alexa founder Brewster Kahle, launched the OCA in 2005 as a non-commercial alternative to the Google project. The alliance’s goal is to build a permanent archive of digitized text and multimedia content from collections around the world, using open standards for the “metadata” such as author and title information that makes these materials more discoverable and searchable. With funding from the Alfred P. Sloan foundation, the OCA has already digitized more than 100,000 volumes from private, national, and university collections—including the personal library of John Adams, held by the Boston Public Library. The materials are indexed by the OCA members’ own search engines and by the Internet Archive.

“Going with OCA was a very concerted effort by the Consortium, and the primary reason is for the material to remain free and open,” says Barbara Preece, the consortium’s executive director. “People weren’t happy with the restrictions that Google and Microsoft put on—that you can only search from the single search engine. These groups are not interested in content. They are interested in search and advertising. And the fear is that this content would be restricted only to people who can afford to get it.”

One local institution conspicuously absent from the Boston Library Consortium and from the agreement with the OCA is the Harvard University Library, which, with its 15.8 million volumes, is the largest academic library in the world. Harvard was among the five institutions partnering with Google when it launched Google Print, and still has an agreement with Google to digitize the public-domain books in its collection.

But the same local library leaders who advocate the OCA’s non-commercial, non-exclusive approach to book digitization stop short of criticizing Harvard for sticking with Google. In fact, some welcome the presence of competing philosophies as libraries go through the digital transition. “It’s a very important time to have lots of experiments running, to test different notions of how to do industrial-scale digitizing,” says MIT’s Wolpert. “And so my point of view on this is the more the merrier, because we’ll learn from every one of these undertakings. Frankly, I think that Google gets enormous credit for having gotten the ball rolling.”

Library professionals in other regions are applauding the Boston consortium’s move. “Given the dispersal of print materials across the North American landscape, it’s good news anytime you get a cluster of institutions coming together” on open digitization efforts, says Nancy Elkington, director of partner relations at RLG Programs, the R&D wing of the Online Computer Library Center (OCLC), a 40-year-old library cooperative that maintains the WorldCat global union catalog.

Wade Roush is a contributing editor at Xconomy. Follow @wroush

By posting a comment, you agree to our terms and conditions.