Yesterday, Europeana — Europe’s digital library, museum and archive, and the first major adopter of the Public Domain Mark for works in the worldwide public domain — published and made available The Europeana Licensing Framework using the CC0 public domain dedication. The licensing framework encompasses and is a follow-on to the recent Data Exchange Agreement that Europeana adopted in September, and which Europe’s national librarians publicly supported weeks later.
In Europeana’s own words, the licensing framework “underpins Europeana’s Strategic Plan” for 2011-2015:
“The goal of the Europeana Licensing Framework is to standardize and harmonize rights-related information and practices. Its intention is to bring clarity to a complex area, and make transparent the relationship between the end-users and the institutions that provide data.”
“Users need good and reliable information about what they may do with [content]. Whether they can freely re-use it for their educational, creative or even commercial projects or not. The Europeana Licensing Framework therefore asks data providers to provide structured rights information in the metadata they provide about the content that is accessible through Europeana. Doing so makes it easier for users to filter content by the different re-use options they have – by ‘public domain’, for example and hence easier for users to comply with licensing terms.”
The framework supports re-use of data and content through CC legal tools (CC0 public domain dedication, the Public Domain Mark, and CC BY-SA), providing guidelines for their appropriate applications. Download the European Licensing Framework (pdf) or peruse the full set of resources at Europeana Connect.
Relatedly, see Europeana’s white paper no. 2 published last month, The Problem of the Yellow Milkmaid: A Business Model Perspective on Open Metadata (pdf). The white paper “explore[s] in detail the risks and rewards of open data from different perspectives” after “extensive consultation with the heritage sector, including dozens of workshops.” It opens:
1 Comment »
“‘The Milkmaid’, one of Johannes Vermeer’s most famous pieces, depicts a scene of a woman quietly pouring milk into a bowl. During a survey the Rijksmuseum discovered that there were over 10,000 copies of the image on the internet—mostly poor, yellowish reproductions1. As a result of all of these low-quality copies on the web, according to the Rijksmuseum, “people simply didn’t believe the postcards in our museum shop were showing the original painting. This was the trigger for us to put high-resolution images of the original work with open metadata on the web ourselves. Opening up our data is our best defence against the ‘yellow Milkmaid’.”
The Creative Commons 2011 Global Summit was a remarkable success, bringing together CC affiliates, board, staff, alumni, friends and stakeholders from around the world. Among the ~300 attendees was an impressive array of legal experts. Collectively, these experts brought diversity and depth of legal expertise and experience to every facet of the Summit, including knowledge of copyright policy across the government, education, science, culture, and foundation sectors. We designed the Summit’s legal sessions (pdf) to leverage this expertise to discuss our core license suite and the 4.0 license versioning process.
The 3.0 License Suite
The current 3.0 license suite has been in service since 2007, and is faring extraordinarily well for many important adopters. Notably, government adoption and promotion of the licenses for releasing public sector information, content and data has increased in the intervening four years, predominantly leveraging the 3.0 licenses. From the New Zealand Government Open Access and Licensing Framework, to the explicit acceptance of CC BY by the Australian government as the default license for Australian government materials, to the official websites of heads of state, to numerous open data portals, governments are increasingly looking to and depending on CC licenses as the preferred mechanism for sharing information.
As robust as the 3.0 continues (and will continue) to prove for many adopters, we also have learned that limitations exist for other would-be adopters that inhibit use of our licenses. These limitations set the stage in some instances for the creation of custom licenses that are at best confusing to users and at worst incompatible with some of CC’s licenses. One of the more compelling limitations driving the need for versioning now is the existence of sui generis database rights throughout the European Union, and the treatment of those rights in CC’s 3.0 licenses. But other limitations also exist for important categories of those would-be adopters. For example, although 55+ jurisdictions have ported some version of the CC licenses to their jurisdictions, there remain many others that want to leverage CC licenses but are without necessary resources to undertake the time-intensive process porting demands, and do not wish to use the international (unported) suite however suitable those licenses are for adoption worldwide.
So as well as our 3.0 licenses operate for many, we recognize as license stewards there exists room to improve if we are to avoid risking a fragmentation of the commons. Of course it bears emphasizes here and throughout the versioning process that 3.0 license adopters can continue to count on our stewardship and support for that suite, just as we have done with all prior versioning efforts. We are committed to remaining alert to revisions that might undermine or compromise pre-4.0 license implementations and frameworks, and will now more than ever look to the expertise and dedication of our affiliates to assist us with the process and the subsequent adoption efforts.
Beginning the 4.0 Process
Against this backdrop, Professor Mike Carroll, CC board member and founder, led a discussion around CC’s plans for beginning the versioning of its licenses from the current 3.0 version to 4.0. His remarks provided a detailed explanation of the reasons leading CC to version in 2012, given the limitations for several adopters in the existing suite, the many opportunities at hand, and the current environment of accelerating adoption by governments and others.
CC’s goals and those of our affiliate community for 4.0 are ambitious, and include:
- Internationalization — position our licenses to ensure they are well received, readily understood, and easily adopted worldwide;
- Interoperability — maximize interoperability between CC licenses and other licenses to reduce friction within the commons, promote standards and stem license proliferation;
- Long-lasting — anticipate new and changing adoption opportunities and legal challenges, allowing the new suite of licenses to endure for the foreseeable future; and
- Data/PSI/Science/Education — recognize and address impediments to adoption of CC by governments as well as other important, publicly-minded institutions in these and other critical arenas.
- Supporting Existing Adoption Models and Frameworks — remain mindful of and accommodate the needs of our existing community of adopters leveraging pre-4.0 licenses, including governments but also other important constituencies.
These goals for 4.0 are not arbitrary — rather, we have recognized them as important levers for the CC license suite to support achieving CC’s mission and vision.
Addressing Restrictions Beyond Copyright – sui generis database rights and more
By design, CC licenses are intended to operate as copyright licenses, granting conditional permission to reuse licensed content in ways that would otherwise violate copyright. Once applied, wherever copyright exists to restrict reuse, the CC license conditions are triggered, but not otherwise. Yet what about that category of rights that exist close to, or perhaps even overlap with, copyright, making it difficult to exercise rights granted under CC licenses without additional permissions? This question drew the focus of Summit attendees across several of the legal sessions, particularly in the context of sui generis database rights that exist in the European Union and a few other places as a result of free trade and other agreements. Participants evaluated the practical problems associated with continuing CC’s existing policy of waiving CC license conditions (BY, NC, SA and ND, as applicable) in the 3.0 EU ported licenses where only sui generis database rights are implicated. Among others, Judge Jay Yoon of CC Korea provided a practical perspective on the challenges associated with CC’s current policy.
Sui generis database rights are widely criticized as bad policy, and are unproven in practice to deliver the economic benefits originally promised. While these views were shared by the vast majority of affiliates attending the Summit, many also agreed that a reconsideration of CC’s current policy is appropriate, and that we should shift to licensing those rights in 4.0 on the same terms and conditions as copyright. This change in policy would be pursued in the greater interest of facilitating reuse, meeting the expectations of licensors and users, and growing the commons.
As foreshadowed earlier this year, and now with support from CC’s affiliate network, CC intends to pursue this course in 4.0, absent as-of-yet-unidentified, unacceptable consequences. Importantly, we will take great care to ensure that by licensing these rights where they exist we do not create new or additional obligations where such rights do not exist.
As the steward of our licenses and one of several stewards of the greater commons (including the Free Software Foundation and the Open Knowledge Foundation), we remain mindful and take with utmost seriousness the risks associated with shifting course. We fully intend to (and expect to be held accountable for) strengthening our messaging to policymakers about the dangers of maintaining and expanding these rights within the EU and beyond, and of creating new related rights. We also plan to develop ample education for users to help avoid over-compliance with license conditions in cases where they do not apply.
Further Internationalization of the CC Licenses
Until version 3.0, the CC licenses had been drafted against U.S. copyright law and referred to as the “generic” licenses. At version 3.0, that changed as we made our first attempt to draft a license suite utilizing the language of major international copyright treaties and conventions. While a vast improvement over pre-3.0 versions, there remains ample opportunity to improve to reach those who cannot or would prefer not to port. Thus, one of our major objectives with the process will be to engage with CC’s knowledgeable affiliates around the globe with the intention of crafting a license suite that is another step further removed from its U.S. origins, and more reflective of CC’s status as an international organization with a global community and following. This focal point will impact the versioning process in several respects, and will require the engagement and focus of our affiliate network, other legal experts and the broader community. But it will also impact our work post publication, where the legal expertise of our affiliates will become still more relevant to adoption efforts and implementations.
As part of this discussion at the Summit, Paul Keller of CC Netherlands and Kennisland led a robust conversation on the wisdom of the CC license porting process, and Massimo Travostino of CC Italy and the NEXA Center gave a presentation on the legal and drafting issues involved with creating global licenses.
Defining Noncommercial; License Enforceability
The legal program also included a presentation by Mike Linksvayer on the definition and future of noncommercial and an update from Andres Guadamuz on CC license enforceability. While a decision about retaining or modifying the definition of NC in 4.0, and branding thereof, remains open, any change has a high barrier to demonstrate it would be a net benefit to the commons, given the broad use and acceptance of CC licenses containing the NC term. And CC’s licenses in court continue their strong enforceability record, most recently with a favorable decision in September 2011 that enforced BY-SA in Germany. We plan to take caution when drafting 4.0 to avoid making changes that could compromise this record.
Next steps in the versioning process will be announced shortly to this blog and the CC license discuss list. Subscribe to stay apprised of future announcements about the 4.0 process and how you can contribute.
Thanks to everyone who contributed to the license discussions and helped make the Summit a success!Comments Off
As part of our blog series for the European Public Sector Information Platform (ePSIplatform) on the role of Creative Commons in supporting the re-use of public sector information, we have researched and published the State of Play: Public Sector Information in the United States.
Beth Noveck, former United States deputy CTO of open government and now a Professor of Law at New York Law School, provides an excellent overview of the report, noting that it is “an excellent report on open data in the United States” and “provides a concise and accurate primer (with footnotes) on the legal and policy framework for open government data in the US.” Abstract:
State of Play: Public Sector Information in the United States
This topic report examines the background of public sector information (PSI) policy and re-use in the United States, describing the federal, state and local government PSI environments. It explores the impact of these differences against the European framework, especially in relation to economic effects of open access to particular types of PSI, such as weather data. The report also discuss recent developments in the United States relating to PSI re-use, such as Data.gov, the NIH Public Access Policy, and new open licensing requirements for government funded educational resources.
The report is published on the ePSIplatform and also on our wiki (pdf). It complements our previous report, Creative Commons and Public Sector Information: Flexible tools to support PSI creators and re-users; both are available under CC Attribution.Comments Off
We’ve been working on a series of blog posts for the European Public Sector Information Platform (ePSIplatform) on the role of Creative Commons in supporting the re-use of public sector information. In addition, we’ve published a topic report. The abstract is posted below.
Creative Commons and Public Sector Information: Flexible tools to support PSI creators and re-users
Public sector information is meant for wide re-use, but this information will only achieve maximum possible impact if users understand how they may use it. Creative Commons tools, which signify availability for re-use to users and require attribution to the releasing authority, are ideal tools for the sharing of public sector information. There is also increasing interest in open licenses and other tools to share publicly funded information, data, and content, including various kinds of cultural resources, educational materials, and research findings; Creative Commons tools are applicable here and recommended for these purposes too.
What does it mean to be open in a data-driven world?
On January 11, 2011, we gathered together four knowledgeable individuals who interact with data in different ways and who each understand the importance of exploring this timely question. The result was a stellar CC Salon at LinkedIn Headquarters.
You can now watch the video from the event, which included brief presentations from Internet Archive’s Peter Brantley, LinkedIn’s DJ Patil, and 3taps’ Karen Gifford, as well as a panel discussion moderated by O’Reilly Media’s Tim O’Reilly. View it now!
Also see our post today on Creative Commons tools, data, and databases.Comments Off
You may have heard that data is huge — changing the way science is done, enabling new kinds of consumer and business applications, furthering citizen involvement and government transparency, spawning a new class of software for processing big data and new interdisciplinary class of “data scientists” to help utilize all this data — not to mention metadata (data about data), linked data and the semantic web — there’s a whole lot of data, there’s more every day, and it’s potentially extremely valuable.
Much of the potential value of data is to society at large — more data has the potential to facilitate enhanced scientific collaboration and reproducibility, more efficient markets, increased government and corporate transparency, and overall to speed discovery and understanding of solutions to planetary and societal needs.
A big part of the potential value of data, in particular its society-wide value, is realized by use across organizational boundaries. How does this occur (legally)? Facts themselves are not covered by copyright and related restrictions, though the extent to which this is the case (e.g., for compilations of facts) varies considerably across jurisdictions. Many sites give narrow permission to use data via terms of service. Much ad hoc data sharing occurs among researchers. And increasingly, open data is facilitated by sharing under public terms, e.g. CC licenses or the CC0 public domain dedication.
CC tools, data, and databases
Since soon after the release of version 1.0 of the CC license suite (December, 2002) people have published data and databases under CC licenses. MusicBrainz is an early example (note their recognition that parts of the MusicBrainz database is strictly factual, so in the public domain, while other parts are licensible). Other examples include Freebase, DBpedia (structured information extracted from Wikipedia), OpenStreetMap, and various governments (Australia in particular has been a leader).
More recently CC0 has gained wide use for releasing data into the public domain (to the extent it isn’t already), not only in science, as expected, but also for bibliographic, social media, public sector data, and much more.
With the exception of strongly recommending CC0 (public domain) for scientific data, Creative Commons has been relatively quiet about use of our licenses for data and databases. Prior to coming to the public domain recommendation for scientific data, we published a FAQ on CC licenses and databases, which is still informative. It is important to recognize going forward that the two are complementary: one concerns what ought be done in a particular domain in line with that domain’s tradition (and public funding sources), the other what is possible with respect to CC licenses and databases.
This is/ought distinction is not out of line with CC’s general approach — to offer a range (but not an infinity) of tools to enable sharing, while encouraging use of tools that enable more sharing, in particular where institutional missions and community norms align with more sharing. For a number of reasons, now is a good time to make clear and make sure that our approach to data and databases reflects CC’s general approach rather than an exaggerated caricature:
- We occasionally encounter a misimpression that CC licenses can’t be used for data and databases, or that we don’t want CC licenses to be used for data and databases. This is largely our fault: we haven’t actively communicated about CC licenses and data since the aforementioned FAQ (until very recently), meaning our only message has been “public domain for scientific data” — leaving extrapolation to other fields to the imagination.
- Our consolidation of CC education and science “divisions” has facilitated examinations of domain-specific policies, and increased policy coherence.
- Ongoing work and discussions with CC’s global affiliate network; many CC affiliates are deeply involved in promoting open public sector information, including data.
- The existence and increasing number of users of CC licenses for data and databases (see third paragraph above).
- A sense of overwhelming competitive threat from non-open data; the main alternative to public domain is not sharing at all — absence of a strong CC presence, except for a normative one in science, creates a correspondingly large opportunity cost for society due to “failed sharing” (e.g., under custom, non-interoperable terms) and lack of sharing.
- A long-term shift in understanding of CC’s role: from CC as purveyor of a variety of tools and policies to CC as steward of the commons, and thus need to put global maximization, interoperability and standards before any single tool or policy idea that sounds good on its own, and to encourage (and sometimes push) producers of data and databases to do the same.
- We’ve thought and learned a lot about data and databases and CC’s role in open data. In 2002 data was not central to CC’s programs, now (in keeping with the times), it is.
- Ongoing confusion among providers and users of data about the copyrightability of data (it depends) and rights that may or may not exist as a result of how the data is compiled and distributed — the database.
- Later in 2011 we expect to begin a public requirements process for version 4.0 of our license suite. At the top level, we know that an absolute requirement will be to make sure the 4.0 licenses are the best possible tools (where public domain is not feasible, for whatever reason) for legally sharing data possible.
One other subtlety should be understood with respect to current (3.0) CC licenses. Data and databases are often copyrightable. When licensed under any of our licenses, the license terms apply to copyrightable data and databases, requiring adaptations that are distributed be released under the same or compatible license terms, for example, when a ShareAlike license is used.
Databases are covered by additional rights (sometimes called “sui generis” database rights) in Europe (similar database rights exist in a few other places). A few early (2.0) European jurisdiction CC license “ports” licensed database rights along with copyright. Non-EU jurisdiction and international CC licenses have heretofore been silent on database rights. We adopted a policy that version 3.0 EU jurisdiction ports must waive license requirements and prohibitions (attribution, share-alike, etc) for uses triggering database rights — so that if the use of a database published under a CC license implicated only database rights, but not copyright, the CC license requirements and prohibitions would not apply to that use. The license requirements and prohibitions, however, continued to apply to all uses triggering copyright.
CC licenses other than EU jurisdiction 3.0 ports are silent on database rights: databases and data are licensed (i.e., subject to restrictions detailed in the license) to the extent copyrightable, and if data in the database or the database itself are not copyrightable the license restrictions do not apply to those parts (though they still apply to the remainder). Perhaps this differential handling of database rights is not ideal, given that all CC licenses (including jurisdiction ports) apply worldwide and ought be easily understandable. However, those are not the only requirements for CC tools — they are also intended to be legally valid worldwide (for which they have a good track record) and produce outcomes consistent with our mission.
These requirements mandate the caution with which we approach database rights in our license suite. In particular, database rights are widely recognized to be bad policy, and instance of a general class of additional restrictions that are harmful to the commons, and thus harmful to collaboration, innovation, participation, and the overall health of the Internet, the economy, and society.
If database rights were to be somehow “exported” to non-EU jurisdictions via CC licenses, this would be a bad outcome, contrary not only to our overall mission, but also our policy that CC licenses not effectively introduce restrictions not present by default, e.g., by attempting to make license requirements and prohibitions obviate copyright exceptions and limitations (see “public domain” and “other rights” on our deeds, and the relevant FAQ). Simply licensing database rights, just like copyright, but only to the extent they apply, just like copyright, is an option — but any option we take will be taken very carefully.
What does all this mean right now?
(1) We do recommend CC0 for scientific data — and we’re thrilled to see CC0 used in other domains, for any content and data, wherever the rights holder wants to make clear such is in the public domain worldwide, to the extent that is possible (note that CC0 includes a permissive fallback license, covering jurisdictions where relinquishment is not thought possible).
(2) However, where CC0 is not desired for whatever reason (business requirements, community wishes, institutional policy…) CC licenses can and should be used for data and databases, right now (as they have been for 8 years) — with the important caveat that CC 3.0 license conditions do not extend to “protect” a database that is otherwise uncopyrightable.
(3) We are committed to an open transparent discussion and process around making CC licenses the best possible tools for sharing data (including addressing how they handle database rights), consistent with our overall mission of maximizing the value of the commons, and cognizant of the limitations of voluntary tools such as CC’s in the context of increasingly restrictive policy and overwhelming competitive threat from non-sharing (proprietary data). This will require the expertise of our affiliates and other key stakeholders, including you — we haven’t decided anything yet and will not without taking the time and doing the research that stewards of public infrastructure perform before making changes.
(4) is a corollary of (2) and (3): use CC licenses for data and databases now, participate in the 4.0 process, and upgrade when the 4.0 suite is released, or at least do not foreclose the possibility of doing so.
Regarding discussion — please subscribe to cc-licenses for a very low volume (moderated) list, intended only for specific proposals to improve CC licenses, and announcements of versioning milestones. If you’re interested in a more active, ongoing (unmoderated) discussion, join cc-community. You might also leave a comment on this post or other means of staying in touch. We’re also taking part in a variety of other open data discussions and conferences.
By the way, what is data and what are databases?
Oh right, those questions. I won’t try to answer too seriously, for that would require legal, technical, and philosophical dissertations. All information (including software and “content”) can be thought of as data; more pertinently, data might be limited to (uncopyrightable) facts, or it may include any arrangement of information, e.g., in rows, tables, or graphs, including with (copyrightable) creativity, and creative (copyrightable) arrangements of information. Some kinds of arrangements and collections of information are characterized as databases.
Data and databases might contain what one would think of as content, e.g., prose contained in a database table. Data and databases might be contained in what one would think of as content, e.g., the structured information in Wikipedia, assertions waiting to be extracted from academic papers, and annotated content on the web, intended first for humans, but also structured for computers.
(Note that CC has been very interested in and worked toward standards for mixing content and data — apparently taking off — because such mixing is a good method for ensuring that content and data are kept accurate, in sync, and usable — for example, licensing and attribution information.)
All of this highlights the need for interoperability across “content” and “data”, which means compatible (or the same) legal tools — a good reason for ensuring that CC licenses are the best tools for data, databases and content — indeed a mandate for ensuring this is the case. Thanks in advance for your help (constructive criticism counts, as does simply using our tools — experience is the best guide) in fulfilling this mandate.4 Comments »
CERN Library releases its book catalog into the public domain via CC0, and other bibliographic data news
CERN, the European Organization for Nuclear Research that is home to the Large Hadron Collider and birthplace of the web, has released its book catalog into the public domain using the CC0 public domain dedication. This is not the first time that CERN has used CC tools to open its resources; earlier this year, CERN released the first results of the Large Hadron Collider experiments under CC licenses. In addition, CERN is a strong supporter of CC, having given corporate support at the “creator” level, and is currently featured as a CC Superhero in the campaign, where you can join them in the fight for openness and innovation!
Jens Vigen, the head of CERN Library, says in the press release,
“Books should only be catalogued once. Currently the public purse pays for having the same book catalogued over and over again. Librarians should act as they preach: data sets created through public funding should be made freely available to anyone interested. Open Access is natural for us, here at CERN we believe in openness and reuse… By getting academic libraries worldwide involved in this movement, it will lead to a natural atmosphere of sharing and reusing bibliographic data in a rich landscape of so-called mash-up services, where most of the actors who will be involved, both among the users and the providers, will not even be library users or librarians.”
In related news, the Cologne-based libraries have made the 5.4 million bibliographic records they released into the public domain earlier this year, also via CC0, available in various places. See the hbz wiki, lobid.org (and their files on CKAN), and OpenDATA at the Central Library of Sport Sciences of the German Sports University in Cologne. For more information, see the case study.
The German Wikipedia has also used CC0 to dedicate data into the public domain; specifically, their PND-BEACON files are available for download. Since Wikipedia links out to quite a number of external resources, and since a lot of articles link to the same external resources, PND-BEACON files are the German Wikipedia’s way of organizing the various data. “In short a BEACON file contains a 1-to-1 (or 1-to-n) mapping from identifiers to links. Each link consists of at least an URL with optionally a link title and additional information such as the number of resources that are available behind a link.” Learn more from the English description of the project.1 Comment »
In addition to changing their default licensing policy from CC BY-NC to CC BY, the University of Michigan has enabled even greater sharing and reuse by releasing more than half a million bibliographic records into the public domain using the CC0 public domain dedication. Following on the heels of the British Library, who just released three million bibliographic records into the public domain, the University of Michigan Library has offered their Open Access bibliographic records for download, which, as of November 17, 2010, contains 684,597 records.
The University of Michigan Library has always been particularly advanced in regards to open content licensing, the public domain, and issues of copyright in the digital age. To learn more, see the John Wilkin’s post and help to improve the case study.
In addition, ever since we rolled out the CC0 public domain dedication, CC0 use for data has been on the increase. Check out the wiki for all current uses of CC0, and feel free to add case studies of any that are missing.Comments Off
The British Library has released three million records from the British National Bibliography into the public domain using the CC0 public domain waiver. The British National Bibliography contains data on publishing activity from the United Kingdom and the Republic of Ireland since 1950. JISC OpenBibliography has made this set downloadable at CKAN; in addition, the Internet Archive also offers the data for download.
This is a tremendous move on behalf of the British Library and the JISC OpenBibliography project, and we would like to congratulate them on their contributions to open data. From the JISC OpenBibliography project blog,
“Agreements such as these are crucial to our community, as developments in areas such as Linked Data are only beneficial when there is content on which to operate. We look forward to announcing further releases and developments, and to being part of a community dedicated to the future of open scholarship.”
For more information, see the case study on the British Library–and help us add to and improve it!Comments Off
The Design for America contest is the Sunlight Foundation‘s latest effort to modernize the United State’s information architecture and presentation. Their goal is “to make government data more accessible and comprehensible to the American public” by encouraging designers, artists, and programmers to reimagine government websites and to visualize government data and processes.
Provided you meet eligibility requirements, you can submit work to categories in Data Visualization, Process Transparency, and Redesigning the Government. Contests range from visualizing government data to redesigning government websites. The top prize in each contest is $5,000.Comments Off