As reported a few weeks ago, OCLC has recommended that its member libraries adopt the Open Data Commons Attribution license (ODC-BY) when they share their library catalog data online. The recommendation to use an open license like ODC-BY is a positive step forward for OCLC because it helps communicate in advance the rights and responsibilities available to potential users of bibliographic metadata from library catalogs. But the decision by OCLC to recommend the licensing route — as opposed to releasing bibliographic metadata into the public domain — raises concerns that warrants more discussion.
OCLC says that making library data derived from WorldCat available under an open license like ODC-BY complies with their community norms. There are other options, however, that are equally compliant. Harvard Library, for example, developed an agreement with OCLC earlier this year that makes its metadata available under the CC0 Public Domain Dedication. This means that Harvard relinquishes all its copyright and related rights to that data, thereby enabling the widest variety of downstream reuse. Even though it puts this information into the public domain, Harvard requests that users provide attribution to the source as a best practice without making attribution a legally binding requirement through a license.
There are good reasons for relying on community norms for metadata attribution instead of requiring it as a condition of a licensing agreement. The requirement to provide attribution through a contract like ODC-BY is not well-suited to a world where data are combined and remixed from multiple sources and under a variety of licenses and other use restrictions. For example, the library community is experimenting with new technologies like linked data as a means of getting more value from its decades-long collective investment in cataloging data. And we’re happy to see that OCLC has released a million WorldCat records containing 80 million linked data triples in RDF. However, we believe that requiring attribution as a licensing condition introduces complexity that will make it technically difficult — if not impossible — for users to comply.
Then there is the question of how to properly attach attribution information to a discrete bit of data (e.g. a single field, subfield, or triple). OCLC has helpfully provided guidelines around attribution for its linked data, but how would these work for member libraries that follow OCLC’s recommendation to adopt the ODC-BY license when they publish their own data? Library linked data collections are often derived from small subsets of many large collections and recombined with new relationships, potentially requiring separate attribution for each data element. In the case of OCLC’s data release, imagine that a user downloads the OCLC file containing 80 million linked data triples, extracts the ones she’s interested in, and then links them to her own catalog data to create a new linked dataset. The guidelines for the WorldCat data include the option of considering a WorldCat URI to be sufficient attribution, but how would that work for the library’s own bibliographic data or for additional data drawn from non-OCLC sources? The guidelines do not include recommendations for how libraries should implement their own data in such a way that reusers can comply with the attribution requirements imposed by the ODC-BY license. The community norms and best practices for reusing library linked data are not yet well defined, so relying on them in the context of a legally binding license is troubling.
Another question arises about the scope of the ODC-BY license with its focus on European database rights in addition to copyright — database rights that do not apply in the U.S. and that cover the database in its entirety but not its contents, making it uncertain whether it can be applied to a simple file of bibliographic data. And the question of whether copyright applies at all to bibliographic data, given its mainly factual nature, is doubtful and differs depending on legal jurisdiction. While the ODC-BY license may make good sense for OCLC to apply to WorldCat itself, it would be a questionable choice for a U.S. library that is looking to share some of its catalog data as a downloadable file.
Moreover, because most countries outside of the European Union — including the United States — do not grant protection to non-creative databases, the ODC-BY license does not operate except at best as a contractual restriction on those downloading directly from the licensor’s website. So this restriction, which is not based on any underlying exclusive property right, is unlikely to bind reusers that do not obtain the data directly from the original data provider. The absence of a binding contract coupled with the lack of any underlying property right means licensors may be surprised to learn they do not have a strong and effective remedy such as a claim of infringement against those downstream users. This is a known concern with the Open Database License, ODC-BY’s sister license that has the same license + contract design feature. Thus, the license in many instances simply will not protect the library that shared the data, or OCLC, in the manner they expect.
Another more general concern about using a licence to share bibliographic metadata has to do with its technical feasibility. This is evident in the Model language that OCLC recommends, which includes links to the WCRR Record Use Policy (WorldCat Rights and Responsibilities), community norms and an FAQ. Following these links takes readers to pages with yet more information about the requirements expected for members and non-members. The concern is not so much the opaqueness of the rules, but that they may become linked to a great number of records which have nothing to do with OCLC. For example, many members may only have started fairly recently to re-use records from OCLC, yet in the model language no distinction is made between OCLC and non-OCLC sourced records, again, because there is no feasible technical solution to differentiate between these. The result: attribution is (wrongly) given to OCLC for the whole database, and a large number of OCLC principles linked to the library database’s complete contents. While the ODC-BY and WCRR may well be well-intentioned instruments to turn the WorldCat data into a “Common Pool Resource” for OCLC members, it certainly lacks the technical solutions to demarcate where it begins and ends, potentially resulting in confusion and overreaching requirements for members that try to comply. Fundamentally, this begs the question whether library records shouldn’t just be public goods released into the public domain.
For all of the above reasons, cultural institutions including The British Library, Europeana, the University of Michigan Library, Harvard and others have adopted the CC0 Public Domain Dedication for publishing their catalog data online. From this, we see that a truly normative approach for the library community would be a public domain dedication such as CC0, coupled with requests to provide attribution to the source (e.g. OCLC) to the extent possible. Such an approach would maximize experimentation and innovation with the cataloging data, in keeping with the mission and values of the library community, while respecting the investment of OCLC and the library community in this valuable resource.
Contributors to this post: Timothy Vollmer, MacKenzie Smith, Paul Keller, Diane Peters.6 Comments »
The last few months has seen a growth in open data, particularly from governments and libraries. Among the more recent open data adopters are the Austrian government, Italian Ministry of Education, University and Research, Italian Chamber of Deputies, and Harvard Library.
The Italian Ministry of Education, University and Research launched its Open Data Portal under CC BY, publishing the data of Italian schools (such as address, phone number, web site, administrative code), students (number, gender, performance), and teachers (number, gender, retirement, etc.). The Ministry aims to make all of its data eventually available and open for reuse, in order to improve transparency, aid in the understanding of the Italian scholastic system, and promote the creation of new tools and services for students, teachers and families.
Lastly, Harvard Library in the U.S. has released 12 million catalog records into the public domain using the CC0 public domain dedication tool. The move is in accordance with Harvard Library’s Open Metadata Policy. The policy’s FAQ states,
“With the CC0 public domain designation, Harvard waives any copyright and related rights it holds in the metadata. We believe that this will help foster wide use and yield developments that will benefit the library community and the public.”
Harvard’s press release cites additional motivations for opening its data,
John Palfrey, Chair of the DPLA, said, “With this major contribution, developers will be able to start experimenting with building innovative applications that put to use the vital national resource that consists of our local public and research libraries, museums, archives and cultural collections.” He added that he hoped that this would encourage other institutions to make their own collection metadata publicly available.
We are excited that CC tools are being used for open data. For questions related to CC and data, see our FAQ about data, which also links to many more governments, libraries, and organizations that have opened their data.2 Comments »
CERN Library releases its book catalog into the public domain via CC0, and other bibliographic data news
CERN, the European Organization for Nuclear Research that is home to the Large Hadron Collider and birthplace of the web, has released its book catalog into the public domain using the CC0 public domain dedication. This is not the first time that CERN has used CC tools to open its resources; earlier this year, CERN released the first results of the Large Hadron Collider experiments under CC licenses. In addition, CERN is a strong supporter of CC, having given corporate support at the “creator” level, and is currently featured as a CC Superhero in the campaign, where you can join them in the fight for openness and innovation!
Jens Vigen, the head of CERN Library, says in the press release,
“Books should only be catalogued once. Currently the public purse pays for having the same book catalogued over and over again. Librarians should act as they preach: data sets created through public funding should be made freely available to anyone interested. Open Access is natural for us, here at CERN we believe in openness and reuse… By getting academic libraries worldwide involved in this movement, it will lead to a natural atmosphere of sharing and reusing bibliographic data in a rich landscape of so-called mash-up services, where most of the actors who will be involved, both among the users and the providers, will not even be library users or librarians.”
In related news, the Cologne-based libraries have made the 5.4 million bibliographic records they released into the public domain earlier this year, also via CC0, available in various places. See the hbz wiki, lobid.org (and their files on CKAN), and OpenDATA at the Central Library of Sport Sciences of the German Sports University in Cologne. For more information, see the case study.
The German Wikipedia has also used CC0 to dedicate data into the public domain; specifically, their PND-BEACON files are available for download. Since Wikipedia links out to quite a number of external resources, and since a lot of articles link to the same external resources, PND-BEACON files are the German Wikipedia’s way of organizing the various data. “In short a BEACON file contains a 1-to-1 (or 1-to-n) mapping from identifiers to links. Each link consists of at least an URL with optionally a link title and additional information such as the number of resources that are available behind a link.” Learn more from the English description of the project.1 Comment »
The University of Michigan Library now offers content on its website under the Creative Commons Attribution (CC BY) license. This announcement is significant because the Library had been using the more restrictive Creative Commons Attribution-NonCommercial (CC BY-NC) license. By switching to the Attribution license, the Library has granted more permissions to use, share, and repurpose its research and technology guides, video tutorials, toolkits, copyright education materials, bibliographies, and other resources.
From the press release:
“It seemed that for some people the term ‘noncommercial’ implied ‘anti-commerce.’ That wasn’t the message we wanted to send,” says Melissa Levine, MLibrary’s lead copyright officer. “After some careful consideration, and in consultation with all library personnel, we concluded that dropping the commercial restriction would encourage broader use of our educational resources, which was really our intent when we switched to the Creative Commons license in the first place.”
Mike Linksvayer, vice president of Creative Commons, believes MLibrary to be the first major research library to adopt the CC-BY license. “Many other people and projects have dropped the noncommercial condition from their licenses as they‘ve gotten more comfortable with and reaped the benefits of openness, but the U-M Library is the most prominent so far. As other institutions follow, this leadership will be seen as an important marker in the history of increasing access to and collaboration around educational and research materials.”
Congratulations to MLibrary on its announcement to increase openness by using the Attribution license.Comments Off
A couple years ago, the Lifelong Kindergarten Group at MIT Media Lab developed a Web 2.0 programming platform for kids called Scratch. Scratch allows kids, and virtually anyone else, to create and remix rich media of all kinds—video, video games, even simple photo animations. The programming behind Scratch focuses on building blocks, like Legos, to get kids not only friendly, but adept at the technology that dominates our world. Each user can create a project, whether it be a video or a video game, and upload it to share on the Scratch website. Scratch currently exceeds more than 400,000 projects, all licensed CC BY-SA, allowing any youth to flex her creative muscles and enhance a peer’s project by remixing it with her own.
The School Library Journal wrote up an excellent article about them last week, emphasizing that “Literacy in the 21st century encompasses the full range of skills needed to engage in our global society—computer, information technology, media, and information literacy skills.” The SLJ reports that Scratch is now being tested in libraries in the Minneapolis area, “to determine if the workshops and classes for young people are replicable and sustainable for a range of libraries.” Unsurprisingly, library staff are finding that kids quickly learn the program on their own, and are guided more by their own intuitions than an “expert’s” instruction.
I decided to try out Scratch myself, and found some cool projects along the way. One project by “cougars” is a photo animation of a human skateboard. Another is a video game simulation of the Buggers war from Ender’s Game by PetertheGeek. (How cool is that?)
What’s more, the Scratch program is global, available in more than 40 languages, and the code itself is free for anyone to copy, publish, or distribute.1 Comment »
One of the benefits of public domain books is that once they are scanned and made available on the Internet, they are then available for anyone, including other organizations, to use and reuse in other contexts and sites. The Prospector Alliance, the union catalog of Colorado Alliance Research Libraries, did exactly this by enhancing the bibliographic records of the University of Michigan’s giant collection of digitized public domain books. According to the press release,
“Library users in Colorado and Wyoming now have access to tens of thousands of additional open-access digitized books and serials through the Prospector Library Catalog (http://prospector.coalliance.org). The digitized items originate from the University of Michigan, a partner in the Google Books digitization project and a member of a consortium of libraries called Hathi Trust. Last year the University of Michigan made available bibliographic records for many of the out-of-copyright titles that Google digitized from its collections. The University then made available online files for each of the digitized works.
…Now library patrons from across Colorado have access to the online books via the Prospector catalog. Except for the University of Michigan where the books originated, the Auraria Library was the first library in the nation to make these books available to its users.”1 Comment »
In another innovative move, the University of Michigan Library has adopted CC licensing for all of its own content. Any work that is produced by the library itself, and to which the University of Michigan holds the copyrights, will be released under the Creative Commons Attribution Noncommercial license (CC BY-NC). This allows anyone, including you, to access, adapt, remix, reproduce, and redistribute the library’s works for noncommercial purposes. This is fantastic news for educators, researchers, and students, who often dread the laborious task of obtaining permissions to synthesize diverse works with just as diverse (not to mention tricky) rights attached to them. From their press release:
The University of Michigan Library has decided to adopt Creative Common Attribution-Non-Commercial licenses for all works created by the Library for which the Regents of the University of Michigan hold the copyrights. These works include bibliographies, research guides, lesson plans, and technology tutorials. We believe that the adoption of Creative Commons licenses is perfectly aligned with our mission, “to contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge.”
University Librarian Paul Courant said, “Using Creative Commons licenses is another way the University Library can act on its commitment to the public good. By marking our copyrighted content as available for reuse, we offer the University community and the public a rich set of educational resources free from traditional permissions barriers.”
Recall that they also recently installed the Espresso Book Machine, which prints on demand copies of over 2 million public domain books. Now they can add even more works to the mix! What will the Library be up to next? Thanks to Molly Kleinman for alerting us to the good news.2 Comments »