« OpenTable | Main | Web Image Sites Show Explosive Growth Due to Rise in Blogging Activity »

Google's Book Digitization Program

In the September issue of D-Lib Magazine, OCLC Vice President, Research and Chief Strategist Lorcan Dempsey and others provide a preliminary analysis of the Google Print Library Project (GPLP), Google's effort to digitize the collections of several major libraries. The article takes the collections of the "Google 5" -- the libraries partnering with Google in the digitization of their collections -- and looks at characteristics of their collections when treated as a single unit. The analysis takes place within the context of WorldCat, the largest, "system-wide" collection of books available with 32 million records. Among the results:

  • The proportion of the system-wide collection covered by GPLP, once duplicate holdings across the five institutions are removed, is about 33 percent, or 10.5 million unique books out of the 32 million in the system-wide collection.
  • The pattern of cross-collection overlap implies that if each collection were fully digitized, about four out of every ten books would be re-digitized at least once, or in other words, the GPLP project reflects a minimum redundancy rate of about 40 percent.
  • Only 3 percent of the books in the 10.5 million GPLP collection are held by all five libraries.
  • More than 80 percent of the materials in the Google 5 collections are still in copyright.
  • The resource created by the GPLP may be far more culturally diverse than originally anticipated, given the fact that more than 430 languages were identified and that English language materials are slightly less than half of the books in the Google 5 combined collection.

(Via OCLC Abstracts)

Posted by Tom on September 20, 2005