As reported a few weeks ago, OCLC has recommended that its member libraries adopt the Open Data Commons Attribution license (ODC-BY) when they share their library catalog data online. The recommendation to use an open license like ODC-BY is a positive step forward for OCLC because it helps communicate in advance the rights and responsibilities available to potential users of bibliographic metadata from library catalogs. But the decision by OCLC to recommend the licensing route — as opposed to releasing bibliographic metadata into the public domain — raises concerns that warrants more discussion.
OCLC says that making library data derived from WorldCat available under an open license like ODC-BY complies with their community norms. There are other options, however, that are equally compliant. Harvard Library, for example, developed an agreement with OCLC earlier this year that makes its metadata available under the CC0 Public Domain Dedication. This means that Harvard relinquishes all its copyright and related rights to that data, thereby enabling the widest variety of downstream reuse. Even though it puts this information into the public domain, Harvard requests that users provide attribution to the source as a best practice without making attribution a legally binding requirement through a license.
There are good reasons for relying on community norms for metadata attribution instead of requiring it as a condition of a licensing agreement. The requirement to provide attribution through a contract like ODC-BY is not well-suited to a world where data are combined and remixed from multiple sources and under a variety of licenses and other use restrictions. For example, the library community is experimenting with new technologies like linked data as a means of getting more value from its decades-long collective investment in cataloging data. And we’re happy to see that OCLC has released a million WorldCat records containing 80 million linked data triples in RDF. However, we believe that requiring attribution as a licensing condition introduces complexity that will make it technically difficult — if not impossible — for users to comply.
Then there is the question of how to properly attach attribution information to a discrete bit of data (e.g. a single field, subfield, or triple). OCLC has helpfully provided guidelines around attribution for its linked data, but how would these work for member libraries that follow OCLC’s recommendation to adopt the ODC-BY license when they publish their own data? Library linked data collections are often derived from small subsets of many large collections and recombined with new relationships, potentially requiring separate attribution for each data element. In the case of OCLC’s data release, imagine that a user downloads the OCLC file containing 80 million linked data triples, extracts the ones she’s interested in, and then links them to her own catalog data to create a new linked dataset. The guidelines for the WorldCat data include the option of considering a WorldCat URI to be sufficient attribution, but how would that work for the library’s own bibliographic data or for additional data drawn from non-OCLC sources? The guidelines do not include recommendations for how libraries should implement their own data in such a way that reusers can comply with the attribution requirements imposed by the ODC-BY license. The community norms and best practices for reusing library linked data are not yet well defined, so relying on them in the context of a legally binding license is troubling.
Another question arises about the scope of the ODC-BY license with its focus on European database rights in addition to copyright — database rights that do not apply in the U.S. and that cover the database in its entirety but not its contents, making it uncertain whether it can be applied to a simple file of bibliographic data. And the question of whether copyright applies at all to bibliographic data, given its mainly factual nature, is doubtful and differs depending on legal jurisdiction. While the ODC-BY license may make good sense for OCLC to apply to WorldCat itself, it would be a questionable choice for a U.S. library that is looking to share some of its catalog data as a downloadable file.
Moreover, because most countries outside of the European Union — including the United States — do not grant protection to non-creative databases, the ODC-BY license does not operate except at best as a contractual restriction on those downloading directly from the licensor’s website. So this restriction, which is not based on any underlying exclusive property right, is unlikely to bind reusers that do not obtain the data directly from the original data provider. The absence of a binding contract coupled with the lack of any underlying property right means licensors may be surprised to learn they do not have a strong and effective remedy such as a claim of infringement against those downstream users. This is a known concern with the Open Database License, ODC-BY’s sister license that has the same license + contract design feature. Thus, the license in many instances simply will not protect the library that shared the data, or OCLC, in the manner they expect.
Another more general concern about using a licence to share bibliographic metadata has to do with its technical feasibility. This is evident in the Model language that OCLC recommends, which includes links to the WCRR Record Use Policy (WorldCat Rights and Responsibilities), community norms and an FAQ. Following these links takes readers to pages with yet more information about the requirements expected for members and non-members. The concern is not so much the opaqueness of the rules, but that they may become linked to a great number of records which have nothing to do with OCLC. For example, many members may only have started fairly recently to re-use records from OCLC, yet in the model language no distinction is made between OCLC and non-OCLC sourced records, again, because there is no feasible technical solution to differentiate between these. The result: attribution is (wrongly) given to OCLC for the whole database, and a large number of OCLC principles linked to the library database’s complete contents. While the ODC-BY and WCRR may well be well-intentioned instruments to turn the WorldCat data into a “Common Pool Resource” for OCLC members, it certainly lacks the technical solutions to demarcate where it begins and ends, potentially resulting in confusion and overreaching requirements for members that try to comply. Fundamentally, this begs the question whether library records shouldn’t just be public goods released into the public domain.
For all of the above reasons, cultural institutions including The British Library, Europeana, the University of Michigan Library, Harvard and others have adopted the CC0 Public Domain Dedication for publishing their catalog data online. From this, we see that a truly normative approach for the library community would be a public domain dedication such as CC0, coupled with requests to provide attribution to the source (e.g. OCLC) to the extent possible. Such an approach would maximize experimentation and innovation with the cataloging data, in keeping with the mission and values of the library community, while respecting the investment of OCLC and the library community in this valuable resource.
Contributors to this post: Timothy Vollmer, MacKenzie Smith, Paul Keller, Diane Peters.
Are there any real world applications for bibliographic data? I’ve seen much enthusiasm about these initiatives, but not one mashup service, database or anything useful to me.
Great, detailled post, just two timely pointers:
OCLC hosted a round table on Linked Data at IFLA 2012, moderated by Richard Wallis (Ex-Talis) from OCLC, and including Neil Wilson from The British Library, Emmanuelle Bermès from Centre Pompidou, Paris, and Martin Malmsten form the Royal Library of Sweden. All four presentations are worth a look:
http://www.ifla.org/en/news/presentations-from-oclc-linked-data-round-table-available
Emmanelle Bermès is blogging herself from IFLA 2012, specifically about Linked Open Data.
http://figoblog.org/node/2010
She is pointing to the OCLC data attribution guidelines, particularly Special case 5, “URI referencing”. She notes: (my translation) “Considering the simple use of an OCLC URI as sufficient attribution is bringing to the heart of Linked Open Data, in legal terms, what we evangelists have always preached: that Linked Open Data should “follow its nose”, actively navigate through the links; and so make your institution visible and its URI’s valuable.”
http://www.oclc.org/data/attribution.html
This whole having to include attribution for a CATALOG is redonk. This is not the actual book/publication. It’s a CATALOG. I’m not a librarian, so maybe I don’t fully understand the art of the catalog.
I started to think of how to use Google as a metaphor for this situation. But a google metaphor actually argues in the point of having the catalog being licensed instead of open-source. Google doesn’t make their entire index downloadable. BUT isn’t there a Google API that allows people access to their index? Oh wait. But the Google API is Google’s property. I don’t think their API is developed by open-source. People can make open-source stuff off Google’s API, but the API itself is Google’s.
If there’s a developer or librarian who knows more about this than me, please correct me if I’m wrong.
Maybe I should think of it this way. An owner of multiple websites has the opportunity to submit their websites to Google via sitemaps. Libraries have a choice to submit their records to the CC0 Public Domain Dedication. These library records are then available via this public domain. Just as websites submitted to Google are available through Google’s API. But Google’s API isn’t as open as the CC0 Public Domain Dedication, right?
I share Torsten’s concern. I haven’t seen any actual usage of bibliographic data yet.
I just want to call attention to Klaus Tochtermann’s blog post “CC0 for Library Data – Publish then Perish” and the discussion in the comments about the pros and cons of using CC0 for library data. See http://www.zbw-mediatalk.eu/2012/08/thoughts-beyond-boundaries-cc0-for-library-data-publish-then-perish/.
Bibliographic metadata belongs in the public domain. Licensing is just an attempt to turn library information into a money-making machine.