Skip to content

There are too few nonprofit organizations like CC fighting for the digital commons – support our vital leadership with an end of year contribution. Donate today!

Google Book Search Adds Copyright Renewal Data

Uncategorized

Google Book Search recently did a great service for those interested in the public domain by digitizing a huge amount of copyright renewal data for books dating as far back as 1923. From Inside Google Book Search:

How do you find out whether a book was renewed? You have to check the U.S. Copyright Office records. Records from 1978 onward are online (see http://www.copyright.gov/records) but not downloadable in bulk. The Copyright Office hasn’t digitized their earlier records, but Carnegie Mellon scanned them as part of their Universal Library Project, and the tireless folks at Project Gutenberg and the Distributed Proofreaders painstakingly corrected the OCR.

Thanks to the efforts of Google software engineer Jarkko Hietaniemi, we’ve gathered the records from both sources, massaged them a bit for easier parsing, and combined them into a single XML file available for download here.

This allows for a much clearer (although still somewhat problematic) understanding of which books have maintained their copyright status and which have gone in to the PD. Jakob Kramer-Duffield speaks well to the implications of Google’s efforts in pointing out “there’s a danger […] that our great knowledge resources from the past are ignored or left to molder, and the difficulty of determining copyright status has been something of a hurdle to digitization efforts thusfar.” Peter Suber more succinctly states, “I love the way we can now use free information to free information.”

Posted 27 June 2008

Tags