Pilot eBook Digitization Project with Google
(Via Slashdot) According to an e-mail sent today to Harvard students, Google will collaborate with Harvard's libraries on a pilot project to digitize a substantial number of the 15 million volumes held in the University's extensive library system, which is second only to the Library of Congress in the number of volumes it contains. In related agreements, Google will launch similar projects with Oxford, Stanford, the University of Michigan, and the New York Public Library. Digitized books will be accessible through the Google Print interface.
A FAQ detailing the Harvard pilot program says:
For the Harvard community, we hope to be able to integrate the Harvard and Google systems. Harvard users would be able to use Google as one more way to search for Harvard library content. The full text of public-domain works in the collection would be available by way of the Google digital copy and would be accessible through both Google and the Harvard online HOLLIS Catalog. We would also hope to provide Harvard users with selected information (such as snippets of text or tables of contents) to aid in determining the relevance of a work.
I was having a discussion with one of my colleagues about this, where she was wondering why exactly would any university want to do this. From a librarian's point of view, the lack of relevance ranking on most Google searches greatly reduces their value for scholars. The fact that students don't usually make the distinction between scholarly and non-scholarly search results made it all the worse.
My response was that if you're doing a generic Google search, you probably don't care that Harvard has a copy of a book on your topic. However, if you're on the Harvard website (especially in the library's domain), you probably DO care. I imagine that the ranking of library catalog holdings will be low enough that they sink to the bottom of generic search results, but combined with site searching, will appropriately appear along links to university web pages.
The Harvard FAQ also notes that "For users outside of Harvard, the larger project would make accessible the full text of a large number of public-domain books." I'm more skeptical that this could work since Google ranking, as noted above, will probably send these results to the basement for off-campus users. I guess we'll see how this experiment plays out; I'm glad they're taking a chance.
Addendum: Stories about this ran in the New York Times, Chronicle of Higher Education, and SearchEngineWatch.
Posted by Tom on December 14, 2004