metadata

4 Stars for Metadata: an Open Ranking System for Library, Archive, and Museum Collection Metadata

MacKenzie Smith, June 17th, 2011

This post was written by participants of the LOD-LAM Summit which was held on June 2nd/3rd in San Francisco and is crossposted on the Open Knowledge Foundation blog and the Open bibliography and Open Bibliographic Data blog. For author information see the list at the end of this post.

The library, archives and museums (i.e. LAM) community is increasingly interested in the potential of Linked Open Data to enable new ways of leveraging and improving our digital collections, as recently illustrated by the first international Linked Open Data in Libraries Museums and Archives Summit (LOD-LAM) Summit in San Francisco. The Linked Open Data approach combines knowledge and information in new ways by linking data about cultural heritage and other materials coming from different Museums, Archives and Libraries. This not only allows for the enrichment of metadata describing individual cultural objects, but also makes our collections more accessible to users by supporting new forms of online discovery and data-driven research.

But as cultural institutions start to embrace the Linked Open Data practices, the intellectual property rights associated with their digital collections become a more pressing concern. Cultural institutions often struggle with rights issues related to the content in their collections, primarily due to the fact that these institutions often do not hold the (copy)rights to the works in their collections. Instead, copyrights often rest with the authors or creators of the works, or intermediaries who have obtained these rights from the authors, so that cultural institutions must get permission before they can make their digital collections available online.

However, the situation with regard to the metadata — individual metadata records and collections of records — to describe these cultural collections is generally less complex. Factual data are not protected by copyright, and where descriptive metadata records or record collections are covered by rights (either because they are not strictly factual, or because they are vested with other rights such as the European Union’s sui generis database right) it is generally the cultural institutions themselves who are the rights holders. This means that in most cases cultural institutions can independently decide how to publish their descriptive metadata records — individually and collectively — allowing them to embrace the Linked Open Data approach if they so choose.

As the word “open” implies, the Linked Open Data approach requires that data be published under a license or other legal tool that allows everyone to freely use and reuse the data. This requirement is one of most basic elements of the LOD architecture. And, according to Tim Berners-Lee’s 5 star scheme, the most basic way of making available data online is to make it ‘available on the web (whatever format), but with an open licence’. However, there still is considerable confusion in the field as to what exactly qualifies as “open” and “open licenses”.

While there are a number of definitions available such as the Open Knowledge Definition and the Definition of Free Cultural Works, these don’t easily translate into a licensing recommendation for cultural institutions that want to make their descriptive metadata available as Linked Open Data. To address this, participants of the LOD-LAM summit drafted ‘a 4-star classification-scheme for linked open cultural metadata’. The proposed scheme (obviously inspired by Tim Berners-Lee’s Linked Open Data star scheme) ranks the different options for metadata publishing — legal waivers and licenses — by their usefulness in the LOD context.

In line with the Open Knowledge Definition and the Definition of Free Cultural Works, licenses that either impose restrictions on the ways the metadata may be used (such as ‘non-commercial only’ or ‘no derivatives’) are not considered truly “open” licenses in this context. This means that metatdata made available under a more restrictive license than those proposed in the 4-star system above should not be considered Linked Open Data.

According to the classification there are 4 publishing options suitable for descriptive metadata as Linked Open Data, and libraries, archives and museums trying to maximize the benefits and interoperability of their metadata collections should aim for the approach with the highest number of stars that they’re comfortable with. Ideally the LAM community will come to agreement about the best approach to sharing metadata so that we all do it in a consistent way that makes our ambitions for new research and discovery services achievable.

Finally, it should be noted that the ranking system only addresses metadata licensing (individual records and collections of records) and does not specify how that metadata is made available, e.g., via APIs or downloadable files.

The proposed classification system is described in detail on the International LOD-LAM Summit blog but to give you a sneak preview, here are the rankings:

★★★★ Public Domain (CC0 / ODC PDDL / Public Domain Mark)
★★★ Attribution License (CC-BY / ODC-BY) where the licensor considers linkbacks to meet the attribution requirement
★★ Attribution License (CC-BY / ODC-BY) with another form of attribution defined by the licensor
★ Attribution Share-Alike License (CC-BY-SA/ODC-ODbL)

We encourage discussion of this proposal as we work towards a final draft this summer, so please take a look and tell us what you think!

Paul Keller, Creative Commons and Knowledgeland (Netherlands)
Adrian Pohl, Open Knowledge Foundation and hbz (Germany)
MacKenzie Smith, MIT Libraries (USA)
John Wilbanks, Creative Commons (USA)

3 Comments »

Open Attribute, a simple way to attribute CC-licensed works on the web

Jane Park, February 7th, 2011

Open Attribute, “a suite of tools that makes it ridiculously simple for anyone to copy and paste the correct attribution for any CC licensed work,” launched today with browser add-ons for Mozilla Firefox and Google Chrome. The add-ons “query the metadata around a CC-licensed object and produce a properly formatted attribution that users can copy and paste wherever they need to.”

If you use our license chooser and copy and paste the resulting HTML code into your website, then you’re pretty much good to go. Anyone who uses the Open Attribute browser add-on to query your site will automatically receive a formatted HTML or plain text attribution that they can copy and paste to give you the proper credit.

Open Attribute uses CC REL metadata found in the pages to generate the attribution metadata. You might remember that we developed a guide with real examples to make CC REL metadata much easier to implement: CC REL by Example contains example HTML pages, as well as explanations and links to more information. If you’re curious to see how Open Attribute pulls the metadata, the guide includes a specific section on Attributing Reuses.

Open Attribute is a direct result of the Mozilla Drumbeat Festival held last year in Barcelona on Learning, Freedom and the Web. See Molly Kleinman’s post for a more comprehensive run-down of the origins and team behind Open Attribute.

6 Comments »

CC REL by Example

Nathan Yergler, January 7th, 2011

The following is cross-posted from the CC Labs blog. Creative Commons technical team blogs at CC Labs about metadata, emerging standards, demos, prototypes, and Creative Commons’ technical infrastructure.

You may have noticed that the copy-and-paste HTML you get from the CC license chooser includes some strange attributes you’re probably not familiar with. That is RDFa metadata, and it allows for the CC license deeds, search engines, Open Attribute, and other tools to discover metadata about your work and generate attribution HTML. Many platforms have implemented CC REL metadata in their CC license marks, such as Connexions and Flickr, and it’s our recommended way to mark works with a CC license.

In an effort to make CC license metadata (or CC REL metadata) much easier to implement, we’ve created CC REL by Example. It includes many example HTML pages, as well as explanations and links to more information.

We’re hoping this guide will serve as a useful set of examples for developers and publishers who want to publish metadata for CC licensed works. Even if you just use CC licenses for your own content, now is a great time to take a first step into structured data and include information about how you’d like to be attributed.

You can find the source to the guide in git. Feedback and suggestions can be sent to webmaster@creativecommons.org.

No Comments »

Educational Search and DiscoverEd

Alex Kozak, June 25th, 2010

Last week in the vuDAT building at Michigan State University, a group of developers interested in educational search and discovery got together to contribute code (in what’s commonly called a code sprint) to Creative Commons’ DiscoverEd project. Readers interested in the technical details about our work last week can find daily posts on CC LabsDay 1, Day 2, and Day 3.

DiscoverEd is a semantic enhanced search prototype. What does that mean practically? Let’s say you’re a ninth grade biology teacher interested in finding education resources about cell organelles to hand out to students. How would you go about that?

If you’re web savvy, you might open up a search engine like Google, Yahoo, or Bing and search for “cell organelles”. You’d find a lot of resources (Google alone finds over 11 million pages!), but which do you choose to investigate further? It’s time consuming and difficult to sift through search results for resources that have certain properties you might be interested in, like being appropriate for 9th graders, being under a CC license that allows you to modify the resource and share changes, or being written in English or Spanish, for example. As you throw up your hands in dismay, you might think “Can’t someone do this for me?!”

DiscoverEd is an educational search prototype that does exactly that, by searching metadata about educational resources. It provides a way to sift through search results based on specific qualities like what license it’s under, the education level, or subject.

Compare search results for “cell organelles” in Google, Yahoo, Bing, and now in DiscoverEd. You can see that finding CC licensed educational resources is friendlier because of the available metadata accompanying each result.

While most search engines rely solely on algorithmic analyses of resources, DiscoverEd can incorporate data provided by the resource publisher or curator. As long as curators and publishers follow some basic standards, metadata can be consumed and displayed by DiscoverEd. These formats (e.g. RDFa) allow otherwise unrelated educational projects, curators, and repositories to express facts about their resources in the same format so that tools (like DiscoverEd) can use that data for useful purposes (like search and discovery).

Creative Commons believes an open web following open standards leads to better outcomes for everyone. Our vision for the web is that everyone following interoperable standards, whether they be legal standards like the CC licenses or technical standards like CC REL and RDFa, will result in a platform that enables social and technical innovation in the same way that HTTP and HTML enabled change. DiscoverEd is a project that allows us to explore ways to improve search for OER, and simultaneously demonstrate the utility of structured data.

Continued development of DiscoverEd is supported by the AgShare project, funded by a grant from The Gates Foundation. Creative Commons thanks MSU, vuDAT, MSU Global, and the participants in the DiscoverEd sprint last week for their support.

1 Comment »

Opening Education–the little things you can do

Jane Park, September 25th, 2009

By now, you’ve heard and/or used the term OER (Open Educational Resources) a ton of times. Whether you’re an advocate for open education, promoting the use, reuse, and adaptation of openly licensed educational materials, or an everyday user of them because you find them convenient and effective for your teaching or learning needs, you have contributed in some way to improving the educational landscape for everyone, everywhere.

But there’s a lot of little things you can do to improve education and the educational process no matter who you are and where you’re located. These are things you do all the time as part of your professional or personal routines, such as filling out forms about your job or project, writing up summaries or abstracts on papers you’ve researched, or describing and tagging photos (aka adding metadata). These activities are also integral to the functioning of many open education projects, which depend on efforts from online communities consisting of persons like ourselves. A list of these projects are growing on OpenEd’s volunteer page, which currently points to projects like dScribe and AcaWiki. If your project could use help on a specific activity, please add it here! OpenEd is a wiki; anyone can edit.

dScribe needs descriptions for their medical images
dScribe has created over 200 images to aid instructors in their teaching, but they need to be made discoverable first! You can help by adding tags and short descriptions for one or two images. All images and their accompanying info will be licensed CC BY.

AcaWiki could use those summaries and abstracts you’ve written
AcaWiki makes summaries and literature reviews of peer-reviewed academic research available to the general public via CC BY, allowing people like us to easily find desired information. If you’ve written summaries and reviews for papers before, now’s the time to make them useful by uploading those files to AcaWiki. And if you regularly research and write up abstracts for class or for your own good, you can easily make uploading them a habitual part of the process. It only takes a couple of extra clicks.

We also encourage you to add your project or organization to ODEPO, ccLearn’s Open Database of Educational Projects and Organizations. Not only will this make your project more discoverable, it will enable better research across the landscape of open education related projects.

For other ways to get involved, see OpenEd’s Get Involved space.

2 Comments »

New web metadata validator released

Asheesh Laroia, January 6th, 2009

(This was originally published on CC Labs.)

This past summer, Hugo Dworak worked with us (thanks to Google Summer of Code) on a new validator. This work was greatly overdue, and we are very pleased that Google could fund Hugo to work on it. Our previous validator had not been updated to reflect our new metadata standards, so we disabled it some time ago to avoid creating further confusion. The textbook on CC metadata is the “Creative Commons Rights Expression Language”, or ccREL, which specifies the use of RDFa on the web. (If this sounds like keyword soup, rest assured that the License Engine generates HTML that you can copy and paste; that HTML is fully compliant with ccREL.) We hoped Hugo’s work on a new validator would let us offer a validator to the Creative Commons community so that publishers can test their web pages to make sure they encode the information they intended.

Hugo’s work was a success; he announced in August 2008 a test version of the validator. He built on top of the work of others: the new validator uses the Pylons web framework, html5lib for HTML parsing and tokenizing, and RDFlib for working with RDF. He shared his source code under the recent free software license built for network services, AGPLv3.

So I am happy to announce that the test period is complete, and we are now running the new code at http://validator.creativecommons.org/. Our thanks go out to Hugo, and we look forward to the new validator gaining some use as well as hearing your feedback. If you want to contribute to the validator’s development or check it out for any reason, take a look at the documentation on the CC wiki.

No Comments »

CC Talks With: MusicBrainz

Cameron Parkins, October 28th, 2008

We recently had the pleasure of catching up with Robert Kaye, “lead geek” at MusicBrainz, a community music database that “attempts to create a comprehensive music information site.” Kaye fills us in on what is happening at MusicBrainz, including extensive background on the project, how they use CC licenses, and their goal to add broader support for classical music.

Where does MusicBrainz fit in the open content ecology?

MusicBrainz plays an important role in blazing the path for open databases. We know how to play with open source and music, and we have few examples of how to work
with open structured data. We work hard to make our data useful and available to people, as we believe that Metcalfe’s law also applies to data. Thus, getting lots of people to use our data makes MusicBrainz vastly more useful and valuable. With that in mind, we want to be the de-facto standard for music metadata in the open content ecology.
Read More…

No Comments »

Adobe continues to do the right thing with XMP

Mike Linksvayer, September 12th, 2008

XMP is the format Creative Commons recommends for embedding metadata (such as licensing information) in most media file types. Frankly there isn’t much competition — embedded metadata is poorly supported, formats are balkanized, and nobody save Adobe (XMP’s developer) has had the willingness to work on a problem that can only be solved over many years (programmers have to build support into software people actually use) and a platform to drive initial adoption.

Fortunately Adobe’s long term efforts are paying off. More and more software supports reading and embedding XMP with more and more file formats. This only makes sense, as more and more people have the need to manage huge media collections that previously only media houses such as ad agencies needed.

Equally fortunately, Adobe continues to make the right moves toward keeping XMP open, ensuring it continues progressing toward being the universal means of embedding metadata in media files. Last year Adobe released the XMP software development kit under the permissive BSD software license. This directly enabled Creative Commons’ liblicense to use some of this code.

Now Adobe’s XMP product manager Gunar Penikis blogs that Adobe has posted a royalty free public patent license for XMP:

This will further remove barriers to the adoption and use of XMP and a metadata standard across our partner solutions and ecosystems. Which is really exciting because better interoperability results in a better customer experience when media is exchanged across applications and services.

This is especially welcome news for the free/open source software world, including again, the code Creative Commons develops — software patents can block development and distribution of open code (e.g., see media codecs), so it is reassuring that Adobe has added a patent license to its openness strategy for XMP.

Thanks to Adobe! Incidentally, Gunar Penikis spoke about XMP at the CC technology summit held in June. See the summit page for slides and video.

No Comments »

RDFa goes to W3C Proposed Recommendation

Mike Linksvayer, September 5th, 2008

Yesterday RDFa reached Proposed Recommendation status at the World Wide Web Consortium, the final stage before becoming a W3C Recommendation.

Using RDFa, one can make data in web pages rendered for humans also readable in a meaningful way by computers. This is important to Creative Commons, as we have always seen the promise of the Semantic Web to describe licenses and make works more findable and reusable, ironically it has always been difficult to bring the Semantic Web to the World Wide Web we’re all used to using and loving. RDFa is a crucial bridge to bring these worlds together.

Creative Commons, primarily through the efforts of Ben Adida, our W3C Representative (see a recent interview with him at the Yahoo! Search Blog), has been a major contributor to the development of RDFa since 2004. I strongly suspect the standard would have taken more than four years without CC’s contributions.

You can read an in-depth description of some of the early CC use cases for RDFa in a paper we released earlier this year, including machine-readable attribution and description of images and other resources included in web pages.

CC’s technology team, led by Nathan Yergler, is also a leading implementer of RDFa, which is now used throughout our open source projects, including our license chooser and license deeds.

Check out the RDFa wiki for tutorials, examples, and code.

No Comments »

Semantic Media Wiki Quick Reference Guide

Fred Benenson, August 25th, 2008

Creative Commons uses Semantic Media Wiki for both our external wiki and our internal task and project management system.

As opposed to a normal wiki where text is “flat”, the text and data inside a SMW can be structured in sophisticated ways that allow for meaningful querying of knowledge statements of the corpus. To give a more concrete example, a list of United States Vice Presidents by longevity must be maintained by humans on Wikipedia, whereas a similar list can be automatically generated via a query inside a semantic media wiki (supposing there are pages about the presidents in the first place). Or in the case of Creative Commons’ wiki, we use SMW to store information about case studies, which can then be recalled in interesting ways, such as listing all Creative Commons licensed projects that use text and are based in Australia. You can see the exact query used to generate that list by clicking “edit query” on the page. Try changing the country to something else to get a feel for how the search works.

One final aspect about SMW that makes it relevant to CC’s work is that it automatically creates RDF (the language of the semantic web) statements about pages. This gives any semantic media wiki a machine-readable output that allows for easy parsing by machines.

Sound familliar? That’s because Creative Commons encourages the use of RDFa to express license information about objects in webpages. RDFa is meant to be the “human readable” version of RDF which also contains machine readable statements. Think of it as extra-fancy XHTML with semantic sparkle dust.

Despite some real leaps in user-interface design for SMWs, editing and querying them remains a little confusing. Yaron Koren, the developer behind the essential Semantic Forms extension, has created a “quick reference guide” that he’s released under Creative Commons’ Attribution license.

Yaron has made the guide available in three formats so that it is easy to print (pdf), remix (svg), and read (png).

No Comments »


Page 2 of 212

Subscribe to RSS

Archives

  • collapse2012
  • expand2011
  • expand2010
  • expand2009
  • expand2008
  • expand2007
  • expand2006
  • expand2005
  • expand2004
  • expand2003
  • expand2002