News

4 Stars for Metadata: an Open Ranking System for Library, Archive, and Museum Collection Metadata

MacKenzie Smith, June 17th, 2011

This post was written by participants of the LOD-LAM Summit which was held on June 2nd/3rd in San Francisco and is crossposted on the Open Knowledge Foundation blog and the Open bibliography and Open Bibliographic Data blog. For author information see the list at the end of this post.

The library, archives and museums (i.e. LAM) community is increasingly interested in the potential of Linked Open Data to enable new ways of leveraging and improving our digital collections, as recently illustrated by the first international Linked Open Data in Libraries Museums and Archives Summit (LOD-LAM) Summit in San Francisco. The Linked Open Data approach combines knowledge and information in new ways by linking data about cultural heritage and other materials coming from different Museums, Archives and Libraries. This not only allows for the enrichment of metadata describing individual cultural objects, but also makes our collections more accessible to users by supporting new forms of online discovery and data-driven research.

But as cultural institutions start to embrace the Linked Open Data practices, the intellectual property rights associated with their digital collections become a more pressing concern. Cultural institutions often struggle with rights issues related to the content in their collections, primarily due to the fact that these institutions often do not hold the (copy)rights to the works in their collections. Instead, copyrights often rest with the authors or creators of the works, or intermediaries who have obtained these rights from the authors, so that cultural institutions must get permission before they can make their digital collections available online.

However, the situation with regard to the metadata — individual metadata records and collections of records — to describe these cultural collections is generally less complex. Factual data are not protected by copyright, and where descriptive metadata records or record collections are covered by rights (either because they are not strictly factual, or because they are vested with other rights such as the European Union’s sui generis database right) it is generally the cultural institutions themselves who are the rights holders. This means that in most cases cultural institutions can independently decide how to publish their descriptive metadata records — individually and collectively — allowing them to embrace the Linked Open Data approach if they so choose.

As the word “open” implies, the Linked Open Data approach requires that data be published under a license or other legal tool that allows everyone to freely use and reuse the data. This requirement is one of most basic elements of the LOD architecture. And, according to Tim Berners-Lee’s 5 star scheme, the most basic way of making available data online is to make it ‘available on the web (whatever format), but with an open licence’. However, there still is considerable confusion in the field as to what exactly qualifies as “open” and “open licenses”.

While there are a number of definitions available such as the Open Knowledge Definition and the Definition of Free Cultural Works, these don’t easily translate into a licensing recommendation for cultural institutions that want to make their descriptive metadata available as Linked Open Data. To address this, participants of the LOD-LAM summit drafted ‘a 4-star classification-scheme for linked open cultural metadata’. The proposed scheme (obviously inspired by Tim Berners-Lee’s Linked Open Data star scheme) ranks the different options for metadata publishing — legal waivers and licenses — by their usefulness in the LOD context.

In line with the Open Knowledge Definition and the Definition of Free Cultural Works, licenses that either impose restrictions on the ways the metadata may be used (such as ‘non-commercial only’ or ‘no derivatives’) are not considered truly “open” licenses in this context. This means that metatdata made available under a more restrictive license than those proposed in the 4-star system above should not be considered Linked Open Data.

According to the classification there are 4 publishing options suitable for descriptive metadata as Linked Open Data, and libraries, archives and museums trying to maximize the benefits and interoperability of their metadata collections should aim for the approach with the highest number of stars that they’re comfortable with. Ideally the LAM community will come to agreement about the best approach to sharing metadata so that we all do it in a consistent way that makes our ambitions for new research and discovery services achievable.

Finally, it should be noted that the ranking system only addresses metadata licensing (individual records and collections of records) and does not specify how that metadata is made available, e.g., via APIs or downloadable files.

The proposed classification system is described in detail on the International LOD-LAM Summit blog but to give you a sneak preview, here are the rankings:

★★★★ Public Domain (CC0 / ODC PDDL / Public Domain Mark)
★★★ Attribution License (CC-BY / ODC-BY) where the licensor considers linkbacks to meet the attribution requirement
★★ Attribution License (CC-BY / ODC-BY) with another form of attribution defined by the licensor
★ Attribution Share-Alike License (CC-BY-SA/ODC-ODbL)

We encourage discussion of this proposal as we work towards a final draft this summer, so please take a look and tell us what you think!

Paul Keller, Creative Commons and Knowledgeland (Netherlands)
Adrian Pohl, Open Knowledge Foundation and hbz (Germany)
MacKenzie Smith, MIT Libraries (USA)
John Wilbanks, Creative Commons (USA)

3 Responses to “4 Stars for Metadata: an Open Ranking System for Library, Archive, and Museum Collection Metadata”

  1. I sometimes feel like I’m the only person in the world who defines ‘open access’ as non-commercial access. I honestly don’t get how people think charging for a resource somehow makes it ‘more free’, and I’m pretty sure there’s a sizeable foundation-type lobby making sure that the ‘open-as-commercial’ perspective holds sway. Well, I haven’t drunk the commercialism Kool-Aid, and consequently, I reject the proposal coming forth from the so-called ‘LOD-LAM Summit’ to create a 4-star definition of openness. If you can block access to something and demand payment for it, it’s not open. It is certainly not ‘more open’ than the non-commercial form of openness that most people actually want to use.

  2. SJ Klein says:

    MacKenzie, this is an interesting model for illustrating ongoing discussions about sharing and licensing – thank you for presenting it so clearly.

    Stephen, I have read your posts on this topic for years, and appreciate your interest in sharing knowledge and education. While I don’t sympathize with your fondness for ‘non-commercial’ licenses,I can follow your reasoning. But this conspiracy theory version of your view goes too far. Please assume good faith of those who disagree with you, and be moderate in assumptions about what “most people” want. No lobby is required to make people view public domain as the ‘most free’ license — it is, by many rules of thumb.

    I cannot speak for most people, but in the communities I frequent – where highly distributed or multi-contributor reuse and derivative use are common – the problems with ‘NC’ restrictions, or any restriction beyond simple attribution, are well known. Metadata is the sort of knowledge that lends itself naturally to repeated revision, clarification, and expansion over time. I don’t think such data should be placed under copyright at all, and hope that placing it into the public domain becomes standard practice.

  3. Ryan Kaldari says:

    Stephen Downes: Restricting data to non-commercial use doesn’t make it more free. If the data is public domain, who cares if someone is including it in a published book that costs money? You can always get the data for free on the internet. By restricting use to non-commercial, however, you are effectively limiting the data to only be used on the internet since every other use costs money which needs to be recouped.