This post with invaluable assistance from the CC legal and policy teams.
Text and data mining (TDM) is becoming an increasingly important scientific technique for analyzing large amounts of data. The technique is used to uncover both existing and new insights in unstructured data sets that typically are obtained programmatically from many different sources.
A few of the innovative examples include GeoDeepDive, a system that helps geoscientists discover information and knowledge buried in the text, tables, and figures of geology journal articles; improving human curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database; and discovering a new link between genes and osteoporosis.
While the science and technology of TDM are complex enough involving information retrieval (IR), optical character recognition (OCR), and natural language processing (NLP), the legal complications are, sadly, equally dizzying. The legal status of TDM is unclear at best, both because there are a multitude of techniques to engage in TDM, and because the implications of various techniques vary from jurisdiction to jurisdiction. This makes cross-national collaboration, integral to science, difficult at best. For example, TDM is generally considered to not implicate copyright in the U.S. There are several theories as to why TDM falls outside copyright, but the most obvious is that it uses copyrighted material for a transformative purpose and is therefore a fair use. Judge Baer, writing in Author’s Guild, Inc., et. al. v. Hathi Trust, et. al. (Case 1:11-cv-06351-HB)
“The use to which the works in the HDL are put is transformative because the copies serve an entirely different purpose than the original works: the purpose is superior search capabilities rather than actual access to copyrighted material. The search capabilities of the HDL have already given rise to new methods of academic inquiry such as text mining.”
Judge Baer goes on to state:
“I cannot imagine a definition of fair use that would not encompass the transformative uses made by Defendants’ MDP and would require that I terminate this invaluable contribution to the progress of science and cultivation of the arts.”
The clarity, however, is far from universal as the situation outside the U.S. gets muddy. While there have been a few welcome developments in the U.K., the copyright laws of many other countries have little to no clarity on whether TDM falls outside of the reach of copyright and related laws. Where TDM does implicate copyright, the license status of the original material can make automated access and analysis very complicated, requiring additional checks to ensure any material is only being used as permitted by the license. And, even where the relevant licenses are free and open, and conducive to TDM, contractual agreements between research institutions and publishers, who are often the gatekeepers of the corpora, can create significant hurdles.
In a comment on proposed U.K. exception for information mining, both iCommons and the Open Knowledge Foundation (OKFN) supported the UK Government’s opinion that it is inappropriate for “Certain activities of public benefit such as medical research obtained through text mining to be in effect subject to veto by the owners of copyrights in the reports of such research, where access to the reports was obtained lawfully.” PLOS opined, “Enabling content mining is a core part of the value offering for Open Access publication services.” In its response to EU copyright review, LIBER stated, “All exceptions related to education, learning and access to knowledge to be made mandatory. In particular, we would like to see a specific exception for text and data mining for all research purposes.” OKFN’s Working Group on Open Access stated:
“We assert that there is no legal, ethical or moral reason to refuse to allow legitimate accessors of research content (OA or otherwise) to use machines to analyse the published output of the research community. Researchers expect to access and process the full content of the research literature with their computer programs and should be able to use their machines as they use their eyes.”
Support for text and data mining under the guise of “The right to read is the right to mine” has been demonstrated by other organizations including the declarations by Copyright for Creativity (July 2013) and the International Federation of Library Associations and Organizations (December 2013). If we as a society wish to realize the incredible potential for text and data mining, the practice should not be controlled through contractual terms or licensing.
Instead of relying on contractual restrictions or licensing to engage in text and data mining, non-consumptive uses of texts should be expressly eliminated from the reach of copyright and contract. The UK’s Hargreaves Report (PDF, p. 47) suggested the adoption of an exception to copyright law for non-consumptive uses, which are “uses of a work enabled by technology which does not trade on the underlying creative and expressive purpose of the work.”
Most recently, the UK copyright reform legislation introduced changes that makes it easier to engage in TDM for non-commercial purposes, allows storing of the corpus locally as long as it remains protected from general public access, and perhaps most importantly, disallows contractual negotiations that would make it difficult to conduct TDM.
The above sentiments are laudable, and copyright reforms friendly to TDM are very important, and we support such efforts. However, we believe the more knowledgeable potential users of TDM are about the technology and related issues, the better they will be able to negotiate conditions that make their research easy and efficient. Hence, we want to push forward with education and awareness building as a bottom-up effort.
Building Bottom-Up Support
We are working with the ContentMine team developing an agenda for a workshop that would provide training in TDM and educate the participants regarding the legal considerations through hands-on exercises. We will introduce the topic, the tools and techniques, tackle a specific problem, and then use that to expose researchers to the legal complications that they may encounter in conducting their research and the legal considerations they should keep in mind when choosing a license for their works. We have three objectives for this series of workshops—
- Introduce participants to the basic tools and techniques of text and data mining (TDM);
- Make participants aware of the legal intricacies of TDM and the implications of choosing the right licenses that enable TDM for downstream users;
- Nurture a community of practice whose members may draw upon each other for continued help.
To be clear, we are not intending the workshop to be a detailed and comprehensive training in TDM, and it is certainly not a replacement for expertise in this deep and comprehensive technique. Instead, the workshop is designed to be both an introduction to basic technical and legal concepts as well as an opportunity to get to network with experts as well as novices with interest in the field. We hope participants intending to use TDM for their work will be better informed when seeking collaboration with TDM experts.
The first instance of this workshop will be held at the 2014 Open Knowledge Festival. We hope to follow it with one in Nairobi in Aug 2014 at the International Workshop on Open Data for Science and Sustainability in Developing Countries (OpenDataSSDC) organized by the CODATA Task Group on Preservation of and Access to Scientific and Technical Data in Developing Countries (CODATA PASTD), and one possibly at SciDataCon in New Delhi in Nov 2014. We hope to make these workshops a recurring event, building a roster of interesting exercises and problems to solve, and constantly improving the content based on audience feedback and ongoing research.
In cooperation with computing, legal and library experts, we will adapt the workshop agenda to make it more suitable and relatable to the host institutions. Our aim is to reach communities of researchers in countries that are otherwise under-represented in the global conversation on open science and data. We have identified researchers, and will continue to identify more, both on the technical as well as legal side with whom we intend to start building a network. If you are working with TDM, intend to work with TDM, and have expertise either in its technology or in related legal issues specific to your jurisdiction, please contact us.
We also intend to develop a community of practice for TDM, either standalone or via existing platforms such as StackExchange, and will utilize online resources such as forums, mailing lists, and a roster of technical, legal and institutional experts available to provide assistance with TDM.2 Comments »
“Why am I joining CC? Because its success is so vital, and I want to ensure we succeed. Creativity, knowledge, and innovation need a public commons – a collection of works that are free to use, re-use, and build upon – the shared resources of our society. The restrictions we place on copyright, like fair use and the public domain, are an acknowledgement that all creativity and knowledge owe something to what came before.”
We’re excited to announce the launch of the Open Policy Network, a coalition of organizations committed to advancing policies that require open licenses for publicly funded materials. Find out how to get involved.
Last week, Lawrence Lessig won a lifetime achievement Webby Award for his work as co-founder of Creative Commons. Have you heard his five-word acceptance speech?
The French Ministry of Culture and Communications to embrace Creative Commons licenses. Watch the beautiful new video it made with CC France to explain CC licenses.
Who is speaking up for authors who want to see their works disseminated more freely? Enter the Authors Alliance.
- Australia’s premier public scientific research institute just released over 4000 photos under CC BY. Check out the ScienceImage library.
- The US White House’s new Open Data Action Plan embraces CC0 for open data.
- Are there too many video games about zombies? The organizers of the Public Domain Jam think so.
- Learn about Redactor, a new tool to replace All Rights Reserved images with CC-licensed ones.
WindTech TV, a collection of wind turbine technician training materials and simulation modules, is now available under a CC BY license. Developed as part of a National Science Foundation (NSF) Advanced Technological Education project, WindTech TV’s modules are aligned with industry standards and designed to be integrated into two-year college wind technology programs to sustain workforce development in the field of wind power.
Modules are currently being used by community colleges across the United States, and Principal Investigator Phil Pilcher wants to expand that impact through reuse by other grantees, including those part of the U.S. Department of Labor’s $2 billion Trade Adjustment Assistance Community College & Career Training (TAACCCT) grant program.
“WindTechTV has always been free, but we think that the CC BY license will increase usage. One of our project goals is to disseminate the materials nationwide. The CC license lets instructors and administrators know that they can use our videos as they wish when they are developing and delivering courses. Also, TAACCCT grantees who are working on alternative energy courses will now be able to reuse our video content, which should speed up development.”
Earlier this week, we kicked off the Open Policy Network. We announced that the first project within the Network is the Institute for Open Leadership. The Institute for Open Leadership is a training program to develop new leaders in education, science, public policy, and other fields on the values and implementation of openness in licensing, policies, and practices. The Institute is looking for passionate public- and private-sector professionals interested in learning more about openness and wish to develop and implement an open policy in their field.
Interested applicants should review the application information and submit an application by June 30, 2014. We plan to invite about 15 fellows to participate in the first round of the Institute for Open Leadership. The in-person portion of the Institute will be held in the San Francisco bay area in January 2015 (TBD: either January 12-16 or January 19-23). Applications are open to individuals anywhere in the world.
A central part of the Institute will require fellows to develop and implement a capstone open policy project. The point of this project is for the fellow to transform the concepts learned at the Institute into a practical, actionable, and sustainable initiative within her/his institution. Open policy projects can take a variety of forms depending on the interests of the fellow and the field where the project will be implemented.
Questions about the Institute for Open Leadership should be directed to firstname.lastname@example.org. Our thanks to the William and Flora Hewlett Foundation and the Open Society Foundations for funds to kickstart the Institute for Open Leadership.3 Comments »
Yesterday marked the launch of the Authors Alliance, a nonprofit organization that supports authors who want “to harness the potential of digital networks to share their creations more broadly in order to serve the public good.”
In an interview with Publisher’s Weekly, Authors Alliance founder Pamela Samuelson explained that the Authors Alliance will have a few different roles. Inwardly, the group will “provide authors with information about copyrights, licensing agreements, alternative contract terms,” and other practical legal information so that they can make their works widely and openly available. And externally, the Alliance will “represent the interests of authors who want to make their works more widely available in public policy debates,” and advocate for these reforms alongside like-minded public interest organizations.
The Authors Alliance was developed by Samuelson and several of her colleagues at the University of California Berkeley including Molly Van Houweling, Carla Hesse, and Thomas Leonard. The Alliance also has an advisory board made up of pre-eminent scholars, writers, and public interest advocates, including several members of the Creative Commons board of directors. The Authors Alliance is now accepting new members.
The Alliance has already developed a set of copyright reform principles, outlining its vision for changes to copyright law to support authors who write to be read.
We have formed an Authors Alliance to represent authors who create to be read, to be seen, and to be heard. We believe that these authors have not been well served by misguided efforts to strengthen copyright. These efforts have failed to provide meaningful financial returns to most authors, while instead unacceptably compromising the preservation of our own intellectual legacies and our ability to tap our collective cultural heritage. We want to harness the potential of global digital networks to share knowledge and products of the imagination as broadly as possible. We aim to amplify the voices of authors and creators in all media who write and create not only for pay, but above all to make their discoveries, ideas, and creations accessible to the broadest possible audience.
The principles include:
- Further empower authors to disseminate their works.
- Improve information flows about copyright ownership.
- Affirm the vitality of limits on copyright that enable us to do our work and reach our audiences.
- Ensure that copyright’s remedies and enforcement mechanisms protect our interests.
At the core, the Authors Alliance and Creative Commons share a similar goal: to provide useful resources and tools for creators who aren’t being served well by the existing copyright system. We’re excited to work with the Alliance on issues that support authors who write to be read–and the public interest for whom these authors create.Comments Off
If you follow this blog with any regularity, you’re likely already familiar with Bassel Khartabil, the Syrian CC community leader who has been in imprisoned since March 2012 without having had any charges brought against him. Thursday, May 22, is Bassel’s birthday, and the third birthday he’ll be spending in prison. This Saturday, he will have been in prison for 800 days.
Today, join CC and the open community in honoring our friend Bassel:
- Post a message for Bassel on Twitter with the hashtags #freebassel and #itsaboutallofus. In particular, people are encouraged to tweet between 13:00 and 16:00 GMT.
- Submit a photo or message of support to the #freebassel Tumblr.
- Print one of these beautiful posters by artist Kalie Taylor (CC0), or create your own, and distribute it in your community.
- On May 22, Say Happy Birthday To Bassel (Global Voices)
- Letters for Bassel
- Imprisoned internet pioneer Bassel Khartabil wins Index on Censorship Digital Freedom Award
- Free Bassel, Free Culture
In late 2013, we blogged about a set of initiatives that French minister of culture and communications Aurélie Filippetti had unveiled. Together, the initiatives represented a commitment to a more creative, more open France. And they also represented a strong commitment to helping students, cultural creators, and society as a whole understand and use Creative Commons licenses, in partnership with CC France.
To help educate French-speaking populations on how to use CC licenses and find CC-licensed works, the Ministry and CC France produced this video. Watch it even if you don’t speak French: the excellent design and flow really speak for themselves.Comments Off
Update: The amendment to Section 303 was adopted.
Can it be salvaged to promote public access to federally funded research?
In March we wrote about the introduction of the Frontiers in Innovation, Research, Science and Technology Act of 2014 (FIRST Act). The aim of the FIRST Act is to promote the dissemination of publicly funded scientific research. But the contentious Section 303 of the bill rolls back some of the most common policies governing existing research investments.
If passed in its current state, the FIRST Act would extend embargoes to federally funded research articles to up to three years after initial publication. This means that commercial publishers would be able to control access to publicly funded research during this time, and the public would not have free public access to this research. Even the longstanding NIH Public Access Policy tolerates embargoes no longer than 12 months. We’ve said before that the public should be granted immediate access to the content of peer-reviewed scholarly publications resulting from federally funded research. Immediate access is the ideal method to optimize the scientific and commercial utility of the information contained in the articles.
The FIRST Act would allow grantees to fulfill access requirements by providing a link to a publisher’s site instead of requiring deposit in a federally-approved repository. Currently NIH research grantees must deposit in the PubMed Central repository. The reliance on publishers to make (and keep) the research available jeopardizes the long-term access and preservation of publicly-funded research in the absence of a requirement that those links be permanently preserved.
The FIRST Act would permit affected agencies to spend up to 18 additional months to develop plans to comply with the conditions of the law, thus further delaying the plans that are already being organized by federal agencies under the White House Public Access Directive and Omnibus Appropriations Act.
The bill was previously was discussed in the subcommittee of the House Committee on Science, Space, and Technology. The passage of the FIRST Act with the Section 303 language as-is would harm existing as well as proposed public access policies in the United States. Today during the full committee markup of the bill Representatives James Sensenbrenner (R-WI) and Zoe Lofgren (D-CA) will introduce an amendment that would improve Section 303.
The Sensenbrenner/Lofgren amendment would change the embargo to 12 months, with the possibility that under certain circumstances the embargo could be extended for an additional 6 months. The amendment still does not require that federally-funded research articles be deposited in an approved repository. But it would shorten the length of time agencies get to develop and implement their public access plans. Affected agencies would need to develop a public access plan and report to Congress within 90 days. And the plans would need to be implemented within a year. One interesting piece of the amended Section 303 is that after an initial three-month planning period, the agencies would be required to submit an analysis on whether covered works should be made available under an open license.
Such report shall include an examination of whether covered works should include a royalty-free copyright license that is available to the public and that permits the reuse of those research papers, on the condition that attribution is given to the author or authors of the research and any others designated by the copyright owner.
There’s still time for you to call members of the House Science, Space and Technology Committee and tell them to support the Sensenbrenner/Lofgren Section 303 amendment. The amendment is a step in the right direction to truly supporting public access to publicly funded research in the United States.Comments Off
I received a fat packet in mail, full of seeds with unusual names—Magma Mustard; Flashy Lightning Lettuce; Lemon Pastel Calendula; Cherry Vanilla Quinoa—and an even more unusual but evocative note stuck on the packets.
This Open Source Seed pledge is intended to ensure your freedom to use the seed contained herein in any way you choose, and to make sure those freedoms are enjoyed by all subsequent users. By opening this packet, you pledge that you will not restrict others’ use of these seeds and their derivatives by patents, licenses, or any other means. You pledge that if you transfer these seeds or their derivatives they will also be accompanied by this pledge.
Welcome to the Open Source Seed Initiative, a group that includes scientists, citizens, plant breeders, farmers, seed companies, and gardeners, and has its origins in both the open source software movement and in the realization among plant breeders and social scientists that continued restrictions on seed may hinder our ability to improve our crops and provide access to genetic resources.
Jack Kloppenburg, Professor, Department of Community and Environmental Sociology, and one of the founders of OSSI, contacted me a couple of years ago, just around the time I joined CC full-time. He was hoping for a CC-type license for the seeds. CC’s focus, however, is restricted to copyright. And, at least for now, copyright is an area that keeps our hands full. However, OSSI’s goals are very much in line with CC’s mission, to free information, to make it flow from those who create it to those who want to use it, with least impedance. And, what better example of information than a seed in which the very blueprint of life is embedded.
Jack’s email signature reads, “Well,” she said, “you have a high tolerance for lunatics, don’t you?” Knowing Jack, that sounds about right. You’ve got to be crazy to be able to change the world.
Yes Jack, let’s talk, heck, let’s not just talk, but let’s actually collaborate and spread the seeds of change.Comments Off
Today we’re excited to announce the launch of the Open Policy Network. The Open Policy Network, or OPN for short, is a coalition of organizations and individuals working to support the creation, adoption, and implementation of policies that require that publicly funded resources are openly licensed resources. The website of the Open Policy Network is http://openpolicynetwork.org.
Increasingly, governments around the world are sharing huge amounts of publicly funded research, data, and educational materials. The key question is, do the policies governing the procurement and distribution of publicly funded materials ensure the maximum benefits to the citizens those policies are meant to serve? When open licenses are required for publicly funded resources, there is the potential to massively increase access to and reuse of a wide range of materials, from educational content like digital textbooks, to the results of scholarly research, to troves of valuable public sector data. The $2 billion U.S. Department of Labor TAACCCT grant program is an example of a policy whereby publicly funded education and training materials are being made available broadly under an open intellectual property license.
There is a pressing need for education, advocacy, and action to see a positive shift in supporting open licensing for publicly funded materials. The Open Policy Network will share information amongst its members, recruit new advocates, and engage with policymakers worldwide. The OPN members are diverse in content area expertise and geographic location. Creative Commons is a part of the Open Policy Network because we believe that the public deserves free access and legal reuse to the the resources it funds. With simple policy changes — such as requiring publicly-funded works be openly licensed and properly marked with easy-to-understand licensing information — the public will be better able to take advantage of their rights to access and reuse the digital materials developed with taxpayer funds.
With today’s launch of the Open Policy Network, we’re announcing our first project, the Institute for Open Leadership. Through a weeklong summit with experts, accepted fellows will get hands-on guidance to develop a capstone project for implementation in their organization or institution. The Institute for Open Leadership will help train new leaders in education, science, and public policy fields on the values and implementation of openness in licensing, policies, and practices.Comments Off