science

Examining deficiencies of and limitations on data sharing

Puneet Kishor, August 18th, 2014

Whether patients, or part of traffic, or exercising or simply walking with one of the behavioral trackers du jour, we are constantly giving data about ourselves and our surroundings to data collecters with few returns. From privacy regulations to bureaucratic barriers to collecting and locking up information just in case it might create monetary value in the future, there are a multitude of barriers between those who collect information and those who want to use it.

With support from Robert Wood Johnson Foundation (RWJF), we are launching two projects exploring different aspects that often get in the way of easy sharing of citizen-sourced information.

Sharing v. Privacy

reports

Original image by Puneet Kishor released under a CC0 Public Domain Dedication

In collaboration with the Institute for Human Genetics and EngageUC at UCSF, and Personal Genome Project at Harvard University, we will explore the practical, ethical and legal implications of emphasizing benefits of sharing over the need for privacy at a workshop planned for Spring 2015 in Washington DC. A few of the questions to be tackled at the workshop: What if, instead of emphasizing the imperative of protecting privacy, we emphasized the potential benefits from sharing? Would most patients agree to let their information be shared? more →

Sensored City

inverted-model-of-data-collection

Original image by Puneet Kishor released under a CC0 Public Domain Dedication

Partnering with Manylabs, a San Francisco-based sensor tools and education nonprofit, and Urban Matter, Inc., a Brooklyn-based design studio, and in collaboration with the City of Louisville, Kentucky, and Propeller Health, maker of a mobile platform for respiratory health management, we will design, develop and install a network of sensor-based hardware that will collect environmental information at high temporal and spatial scales and store it in a software platform designed explicitly for storing and retrieving such data.

Further, we will design, create and install a public data art installation that will be powered by the data we collect thereby communicating back to the public what has been collected about them. more →

Silent-Lights

Silent Lights Image © Urban Matter, Inc., used with permission.

Please follow our progress on Sharing v. Privacy and the Sensored City projects, and get in touch with us if you want to learn more.

Comments Off

CC Signs Bouchout Declaration for Open Biodiversity

Puneet Kishor, July 11th, 2014

Bouchout CC stampCC is supporting the Bouchout Declration for Open Biodiversity Knowledge Management by becoming a signatory. The Declaration’s objective is to help make biodiversity data openly available to everyone around the world. It offers the biodiversity community a way to demonstrate their commitment to open science, one of the fundamental components of CC’s vision for an open and participatory internet.

In April 2013 CC participated in a workshop on Names attribution, rights, and licensing convened by the Global Names Project which led to a report titled Scientific names of organisms: attribution, rights, and licensing that concluded:

“There are no copyright impediments to the sharing of names and related data. The system must reward those who make the contributions upon which we rely. Building an attribution system remains one of the more urgent challenges that we need to address together.”

Many of the attendees of the workshop and of the report cited above are among those who met in June in Meise, Belgium and released the Bouchout Declaration.

Donat Agosti Bouchout Declaration

Donat Agosti introducing the Bouchout Declaration at the OpenDataWeek, RMLL, Miontpellier, France, July 11, 2014. Photo by P. Kishor released under CC0 Public Domain Dedication

The declaration calls for free and open use of digital resources about biodiversity and associated access services and exhorts the use of licenses or waivers that grant or allow all users a free, irrevocable, world-wide, right to copy, use, distribute, transmit and display the work publicly as well as to build on the work and to make derivative works, subject to proper attribution consistent with community practices, while recognizing that providers may develop commercial products with more restrictive licensing. This is not only aligned with the vision of CC itself, CC is also the creator and steward of the legal and technical infrastructure that allows open licensing of content.

Phylogeny viewer

Screenshot of phylogeny from PhyLoTA as displayed in BioNames. The user can zoom in and out and pan, as well as change the layout of the tree from BioNames: linking taxonomy, texts, and trees by Roderick D. M. Page used under a CC BY License.

The declaration also promotes Tracking the use of identifiers in links and citations to ensure that sources and suppliers of data are assigned credit for their contributions and Persistent identifiers for data objects and physical objects such as specimens, images and taxonomic treatments with standard mechanisms to take users directly to content and data. CC has participated from the beginning in the activities that led to the Joint Declaration of the Data Citation Principles and that promotes the use of persistent identifiers to allow discovery and attribution of resources.

Finally, the declaration calls for Policy developments that will foster free and open access to biodiversity data. CC works assiduously on creating, fostering, nurturing and assisting in the promulgation of open policies and practices that advance the public good by supporting open policy advocates, organizations and policy makers.

We have a few concerns: most copyright laws around the world treat data as not protected by copyright, thus would not require licensing. We are also aware that some cultures wish to preserve and protect traditional knowledge, so we want to make sure information is released by only those who have the right to do so without impinging on the rights of such segments that might otherwise be negatively affected by its release. However, overall we believe that open biodiversity information is crucial for science and society. Mancos in the App Store Be it heralding the Seeds of Change, participating in the Group on Earth Observations (GEO), or assisting the Paleobiology Database to move to CC BY license, CC is playing a vital role in the progress of open science in the areas of biodiversity and natural resources. CC has committed to assisting organizations joining Google in the White House Climate Data Initiative. On a personal front I have released the entire codebase of Earth-Base under the CC0 Public Domain Dedication making possible applications such as Mancos on the iOS App Store.

bouchout_signatories

Bouchout Signatories. Image by Plazi released under a CC0 Public Domain Dedication

Most of the world’s biodiversity is in developing countries, and ironically, most of biodiversity information and collections are in developed countries. Agosti calls this, “Biopiracy: taking biodiversity material from the developing world for profit, without sharing benefit or providing the people who live there with access to this crucial information.” (Agosti, D. 2006. Biodiversity data are out of local taxonomists’ reach. Nature 439, 392) Opening up the data will benefit the developing counties by giving them free and easy access to information about their own biological riches. Friction-free access to and reuse of data, software and APIs is essential to answering pressing questions about biodiversity and furthering the move to better understanding and stewarding our planet and its resources. Signing the Bouchout Declaration strengthens this movement.

Comments Off

Liberating the Haystack for the Needles

Puneet Kishor, June 2nd, 2014

This post with invaluable assistance from the CC legal and policy teams.

Text and data mining (TDM) is becoming an increasingly important scientific technique for analyzing large amounts of data. The technique is used to uncover both existing and new insights in unstructured data sets that typically are obtained programmatically from many different sources.

pbdb

PBDB Navigator screenshot released under a CC0 1.0 Public Domain Dedication

A few of the innovative examples include GeoDeepDive, a system that helps geoscientists discover information and knowledge buried in the text, tables, and figures of geology journal articles; improving human curation of chemical-gene-disease networks for the Comparative Toxicogenomics Database; and discovering a new link between genes and osteoporosis.

Legal Uncertainty

While the science and technology of TDM are complex enough involving information retrieval (IR), optical character recognition (OCR), and natural language processing (NLP), the legal complications are, sadly, equally dizzying. The legal status of TDM is unclear at best, both because there are a multitude of techniques to engage in TDM, and because the implications of various techniques vary from jurisdiction to jurisdiction. This makes cross-national collaboration, integral to science, difficult at best. For example, TDM is generally considered to not implicate copyright in the U.S. There are several theories as to why TDM falls outside copyright, but the most obvious is that it uses copyrighted material for a transformative purpose and is therefore a fair use. Judge Baer, writing in Author’s Guild, Inc., et. al. v. Hathi Trust, et. al. (Case 1:11-cv-06351-HB)

“The use to which the works in the HDL are put is transformative because the copies serve an entirely different purpose than the original works: the purpose is superior search capabilities rather than actual access to copyrighted material. The search capabilities of the HDL have already given rise to new methods of academic inquiry such as text mining.”

Judge Baer goes on to state:

“I cannot imagine a definition of fair use that would not encompass the transformative uses made by Defendants’ MDP and would require that I terminate this invaluable contribution to the progress of science and cultivation of the arts.”

The clarity, however, is far from universal as the situation outside the U.S. gets muddy. While there have been a few welcome developments in the U.K., the copyright laws of many other countries have little to no clarity on whether TDM falls outside of the reach of copyright and related laws. Where TDM does implicate copyright, the license status of the original material can make automated access and analysis very complicated, requiring additional checks to ensure any material is only being used as permitted by the license. And, even where the relevant licenses are free and open, and conducive to TDM, contractual agreements between research institutions and publishers, who are often the gatekeepers of the corpora, can create significant hurdles.

Public Sentiment

In a comment on proposed U.K. exception for information mining, both iCommons and the Open Knowledge Foundation (OKFN) supported the UK Government’s opinion that it is inappropriate for “Certain activities of public benefit such as medical research obtained through text mining to be in effect subject to veto by the owners of copyrights in the reports of such research, where access to the reports was obtained lawfully.” PLOS opined, “Enabling content mining is a core part of the value offering for Open Access publication services.” In its response to EU copyright review, LIBER stated, “All exceptions related to education, learning and access to knowledge to be made mandatory. In particular, we would like to see a specific exception for text and data mining for all research purposes.” OKFN’s Working Group on Open Access stated:

“We assert that there is no legal, ethical or moral reason to refuse to allow legitimate accessors of research content (OA or otherwise) to use machines to analyse the published output of the research community. Researchers expect to access and process the full content of the research literature with their computer programs and should be able to use their machines as they use their eyes.”

Support for text and data mining under the guise of “The right to read is the right to mine” has been demonstrated by other organizations including the declarations by Copyright for Creativity (July 2013) and the International Federation of Library Associations and Organizations (December 2013). If we as a society wish to realize the incredible potential for text and data mining, the practice should not be controlled through contractual terms or licensing.

Instead of relying on contractual restrictions or licensing to engage in text and data mining, non-consumptive uses of texts should be expressly eliminated from the reach of copyright and contract. The UK’s Hargreaves Report (PDF, p. 47) suggested the adoption of an exception to copyright law for non-consumptive uses, which are “uses of a work enabled by technology which does not trade on the underlying creative and expressive purpose of the work.”

Most recently, the UK copyright reform legislation introduced changes that makes it easier to engage in TDM for non-commercial purposes, allows storing of the corpus locally as long as it remains protected from general public access, and perhaps most importantly, disallows contractual negotiations that would make it difficult to conduct TDM.

The above sentiments are laudable, and copyright reforms friendly to TDM are very important, and we support such efforts. However, we believe the more knowledgeable potential users of TDM are about the technology and related issues, the better they will be able to negotiate conditions that make their research easy and efficient. Hence, we want to push forward with education and awareness building as a bottom-up effort.

Building Bottom-Up Support

Content Mine


Image by R. Mounce extracted from: doi: 10.11646/phytotaxa.163.5.1 licensed under the Creative Commons Attribution Licence (CC-BY) 3.0 license

We are working with the ContentMine team developing an agenda for a workshop that would provide training in TDM and educate the participants regarding the legal considerations through hands-on exercises. We will introduce the topic, the tools and techniques, tackle a specific problem, and then use that to expose researchers to the legal complications that they may encounter in conducting their research and the legal considerations they should keep in mind when choosing a license for their works. We have three objectives for this series of workshops—

  1. Introduce participants to the basic tools and techniques of text and data mining (TDM);
  2. Make participants aware of the legal intricacies of TDM and the implications of choosing the right licenses that enable TDM for downstream users;
  3. Nurture a community of practice whose members may draw upon each other for continued help.

To be clear, we are not intending the workshop to be a detailed and comprehensive training in TDM, and it is certainly not a replacement for expertise in this deep and comprehensive technique. Instead, the workshop is designed to be both an introduction to basic technical and legal concepts as well as an opportunity to get to network with experts as well as novices with interest in the field. We hope participants intending to use TDM for their work will be better informed when seeking collaboration with TDM experts.

TDM workshops

Original artwork by Puneet Kishor released under CC0 Public Domain Dedication

The first instance of this workshop will be held at the 2014 Open Knowledge Festival. We hope to follow it with one in Nairobi in Aug 2014 at the International Workshop on Open Data for Science and Sustainability in Developing Countries (OpenDataSSDC) organized by the CODATA Task Group on Preservation of and Access to Scientific and Technical Data in Developing Countries (CODATA PASTD), and one possibly at SciDataCon in New Delhi in Nov 2014. We hope to make these workshops a recurring event, building a roster of interesting exercises and problems to solve, and constantly improving the content based on audience feedback and ongoing research.

In cooperation with computing, legal and library experts, we will adapt the workshop agenda to make it more suitable and relatable to the host institutions. Our aim is to reach communities of researchers in countries that are otherwise under-represented in the global conversation on open science and data. We have identified researchers, and will continue to identify more, both on the technical as well as legal side with whom we intend to start building a network. If you are working with TDM, intend to work with TDM, and have expertise either in its technology or in related legal issues specific to your jurisdiction, please contact us.

We also intend to develop a community of practice for TDM, either standalone or via existing platforms such as StackExchange, and will utilize online resources such as forums, mailing lists, and a roster of technical, legal and institutional experts available to provide assistance with TDM.

2 Comments »

Seeds of Change

Puneet Kishor, May 21st, 2014

packet of seeds

I received a fat packet in mail, full of seeds with unusual names—Magma Mustard; Flashy Lightning Lettuce; Lemon Pastel Calendula; Cherry Vanilla Quinoa—and an even more unusual but evocative note stuck on the packets.

fancy seeds

This Open Source Seed pledge is intended to ensure your freedom to use the seed contained herein in any way you choose, and to make sure those freedoms are enjoyed by all subsequent users. By opening this packet, you pledge that you will not restrict others’ use of these seeds and their derivatives by patents, licenses, or any other means. You pledge that if you transfer these seeds or their derivatives they will also be accompanied by this pledge.

pledge

Welcome to the Open Source Seed Initiative, a group that includes scientists, citizens, plant breeders, farmers, seed companies, and gardeners, and has its origins in both the open source software movement and in the realization among plant breeders and social scientists that continued restrictions on seed may hinder our ability to improve our crops and provide access to genetic resources.

Jack Kloppenburg, Professor, Department of Community and Environmental Sociology, and one of the founders of OSSI, contacted me a couple of years ago, just around the time I joined CC full-time. He was hoping for a CC-type license for the seeds. CC’s focus, however, is restricted to copyright. And, at least for now, copyright is an area that keeps our hands full. However, OSSI’s goals are very much in line with CC’s mission, to free information, to make it flow from those who create it to those who want to use it, with least impedance. And, what better example of information than a seed in which the very blueprint of life is embedded.

note from Jack

Jack’s email signature reads, “Well,” she said, “you have a high tolerance for lunatics, don’t you?” Knowing Jack, that sounds about right. You’ve got to be crazy to be able to change the world.

Yes Jack, let’s talk, heck, let’s not just talk, but let’s actually collaborate and spread the seeds of change.

Comments Off

Precocious One Year Old Turning Academic Publishing On Its Head

Puneet Kishor, February 12th, 2014

 

“If we can set a goal to sequence the Human Genome for $99, then why shouldn’t we demand the same goal for the publication of research?”

 

PeerJ logo started with that bold challenge. Now, the scrappy startup that dared has done it. One year old today, PeerJ, the peer-reviewed journal, has seen startling growth having published 232 articles under CC-BY 3.0 last year. By the way, per Scimago that number is more than what 90% of any other journal publishes in a year. Then in April 2013 PeerJ started publishing PeerJ PrePrints, the non-peer-reviewed preprint server with 186 PrePrints in 2013, all under CC BY 3.0.

Now PeerJ has more than 800 Academic Editors, from a wide variety of countries and institutions. There are also five Nobel Prize winners on the PeerJ Board. PeerJ receives submissions from all over the world, and covers all of the biological, health, medical sciences. As of the time of this post’s publication, the top subject areas for PeerJ submissions were

Subject Articles
Ecology 106
Bioinformatics 69
Evolutionary Studies 66
Zoology 54
Computational Biology 49
Microbiology 48
Psychiatry and Psychology 47
Marine Biology 45
Biodiversity 45
Biochemistry 45

Not everything has been easy. Starting an entire publishing company from scratch has been a learning experience for the entire team. From no brand recognition, no history, no infrastructure etc. to having successfully established themselves in all the places that a publishing company should be in: archiving solutions; DOI issuing services; indexing services; membership of professional bodies; ISSN registrations etc. PeerJ has done very well. Last year PeerJ won the ALPSP Award for Publishing Innovation.

PeerJ’s vision/mission are deceptively simple:

  • Keep Innovating
  • Remember Whom We Serve
  • Pass on the Savings
Interpretive drawing of DNHM D2945 Hongshanornis longicresta

PeerJ decision-making process is fast, very fast. Authors get their first decision back in a median of 24 days. Being small, and non-traditional means they can take risks. They have built interesting functionality and models such as optional open peer review; Their business model is based on individuals purchasing low cost lifetime publication plans, and that has resulted in a lot of their functionality being very individual-centric.

Compared to traditional publishers, PeerJ is a very tech-focused company. They built all the technology themselves, quite unusual in the academic publishing world, which normally uses third parties for their peer-review software and publication platforms. By doing it themselves they have much more control over their destiny, cost, and can build functionality which suits their unique needs. The high percentage of authors describing their experience with PeerJ as their best publishing experience is arguably a direct result of this. Much of PeerJ’s software is open source, and their techie roots are evident in their engagement with the community via events such as Hack4ac, a hackday to specifically celebrate, ahem, CC BY!

Peter Binfield, Co-Founder, says:

We firmly believe that Open Access publishing is the future of the academic journal publishing system. With the current trends we see in the marketplace (including governmental legislation; institutional mandates; the rapid growth of the major OA publishers; and the increasing education and desire from authors) we believe that Open Access content will easily make up >50% of newly published content in the next 4 or 5 years.

 
Once all academic content is OA and under an appropriate re-use license we believe that significant new opportunities will emerge for people to use this content; to build on it for new discoveries and products; and to accelerate the scientific discovery process.

Binfield continues:

We regard the CC-BY license as the gold standard for OA Publications. Some other publishers provide authors with “NC” options, or try to write their own OA licenses, but we have a firm belief in the CC BY flavor. If there are many different OA licenses in play then it becomes increasingly difficult for users to determine what rights they have for any given piece of work, and so it is cleaner and simpler if everyone agrees on a single (preferably liberal) license. We were pleased to see the license updated to 4.0 and were quick to adopt it.

In Jan 2014, PeerJ moved to CC BY 4.0 for all articles newly submitted from that point onwards (prior articles remain under CC BY 3.0 of course). Today, on PeerJ’s first birthday, we at CC send PeerJ our best wishes, and look forward to ever more courageous, even outrageous innovations from this precocious one year old.

Comments Off

CC is now a Group on Earth Observations (GEO) Participating Organization

Puneet Kishor, January 16th, 2014

GEO logo

As of yesterday (January 15, 2014), the Group on Earth Observations approved Creative Commons as now a Participating Organization (PO) at its GEO-X Plenary in Geneva.

GEO was launched in response to calls for action by the 2002 World Summit on Sustainable Development and by the G8 (Group of Eight) leading industrialized countries to exploit the growing potential of Earth observations to support decision making in an increasingly complex and environmentally stressed world. GEO is coordinating efforts to build a Global Earth Observation System of Systems (GEOSS).

GEOSS logo

GEOSS provides decision-support tools to a wide variety of users via a global and flexible network of content providers. GEOSS lets decision makers access a range of information by linking together existing and planned observing systems around the world and support the development of new systems where gaps exist. GEOSS promotes common technical standards so that data from the thousands of different instruments can be combined into coherent data sets. The GEOPortal offers a single Internet access point for users seeking data, imagery, and analytical software packages relevant to all parts of the globe. For users with limited or no access to the internet, similar information is available via the GEONETCast network of telecommunication satellites.

GEO is a voluntary partnership of governments and international organizations providing a framework to develop new projects and coordinate their strategies and investments. As of 2013, GEO’s Members include 89 Governments and the European Commission. In addition, 67 intergovernmental, international, and regional organizations with a mandate in Earth observation or related issues have been recognized as Participating Organizations (PO).

Dr. Robert Chen, CC’s Science Advisory Board member, was at the Plenary, and he had the following comment, “The GEO Executive Director, Barbara Ryan, pointed out in plenary that there was an extensive discussion in the GEO Executive Committee about making sure that new POs are active contributors to GEO activities. She noted that all of the proposed POs in today’s slate met this criterion.”

Creative Commons has been contributing to the GEO Data Sharing Task Force’s Legal Interoperability Sub-Group and its draft white paper on “Legal Options for the Exchange of Data through the GEOSS Data-CORE (PDF).” (I was a part of the Sub-Group as a Science Fellow, and our Senior Counsel, Sarah Pearson, reviewed the paper). We intend to continue to be active contributors by guiding GEO and its members on the legal aspects of data sharing.

Thanks to Paul Uhlir of the Board on Research Data and Information, National Academies for making the right introductions; and to John Wilbanks, another Science Advisory Board member, for initially encouraging CC to get involved with GEO.

Comments Off

Paleobiology Database now CC BY

Puneet Kishor, December 19th, 2013

[written in collaboration with Shanan Peters, Professor, Department of GeoScience, University of Wisconsin-Madison and the Principal Investigator of the Paleodb Project]

The Paleobiology Database

now available under

CC BY

After a year of community feedback and discussion, the Paleobiology Database has taken the decision that “All records are made available to the public based on a Creative Commons license that requires attribution before use.” The Paleobiology Database is now licensed under a CC-BY 4.0 International License.

Paleontology

Paleontology, the description and biological classification of fossils, has spawned countless field expeditions, museum trips, and hundreds of thousands of publications. The construction of databases that aggregate these descriptive data on fossils in a way that allows large-scale, synthetic questions to be addressed, such as the long-term history of biodiversity and rates of biological extinction and origination during global environment change, has greatly expanded the intellectual reach of paleontology and has led to many important new insights into macroevolutionary and macroecological processes.

Paleobiology Database

One of the largest compendia of fossil data assembled to date is the Paleobiology Database (PBDB), founded in 1998 by John Alroy and Charles Marshall. These two pioneers assembled a small team of scientists who were motivated to generate the first geographically-explicit, sampling standardized global biodiversity curve. The PBDB has since grown to include an international group of more than 150 contributing scientists with diverse research agendas. Collectively, this body of volunteer and grant-supported investigators have spent more than 9 continuous person years entering more than 280,000 taxonomic names, nearly 500,000 published opinions on the status and classification of those names, and over 1.1 million taxonomic occurrences. Some PBDB data derive from the original fieldwork and specimen-based studies of the contributors, but the majority of the data were extracted from the text, figures, and tables of over 48,000 published papers, books, and monographs that span the range of topics covered by paleontology. Their efforts have been well rewarded by enabling new science. As of December 2013, the PBDB had produced almost two hundred official peer reviewed publications, all of which address scientific questions that cannot be adequately answered without such a database.

Ptyagnostus atavus or Leiopyge calva Zone (Cambrian of the United States)
Olenoides superbus, Late Middle Cambrian, Upper Marjum Formation, House Range, Millard County, Utah, USA - Houston Museum of Natural Science

Photo by Wikipedia user Dwergenpaartje under CC0 Public Domain Dedication

  • Where: Utah (38.9° N, 113.4° W: paleocoordinates 4.1° S, 92.0° W)
  • When: Ptyagnostus atavus or Leiopyge trilobite zone, Marjum Limestone Formation, Marjumian (513.0 – 498.5 Ma)
  • Environment/lithology: offshore ramp; burrowed, peloidal packstone
  • Size classes: macrofossils, mesofossils
  • Primary reference: A. J. Rowell and N. E. Caruso. 1985. The evolutionary significance of Nisusia sulcata, an early articulate brachiopod. Journal of Paleontology 59(5):1227-1242 [A. Hendy/A. Hendy] more details
  • Purpose of describing collection: taxonomic analysis
PaleoDB collection 262: authorized by Jack Sepkoski, entered by Mike Sommers on 20.11.1998

Shift to CC BY

From its inception, the paleontologists who have invested the most effort in entering data have made decisions about data management and access policies, which ultimately brings up the important questions of proper licensing and citation. In the first application of the PBDB licensing policy, the individual contributors chose their own CC license for each fossil collection record. As a result there were three kinds of contributors: those who didn’t know what to do, didn’t care, or didn’t know about the new policy that required them to specify how existing collections should be licensed (55% of the data), those who selected the most restricted option available to them (34% of the data), and those who selected the most unrestricted option available to them (10% of the data).

This received mostly negative response via social media and other outlets, partly because of the increased attention the database was receiving during a leadership and governance transition. Naturally, the governance group responded to the community feedback. The first actual action was by individual contributors. Many of the contributors who either didn’t know about CC licenses or who didn’t think fully about their meaning and implications changed their own individual licenses. This always went from a more restrictive license to the least restrictive option available to them: CC BY. That wave of individual choices towards the least restrictive license immediately shifted the balance for records in the database. At that point, only one contributor had a restrictive license, and the governance group quickly moved to adopt one single unifying license for the database: CC BY. Now, all new records are explicitly CC BY as part of database policy, although individual contributors still have the option of placing a moratorium on the public release of their own new data so as to protect their individual scientific interests.

Future of PBDB

In addition to being a scientific asset to the field of paleontology, the PBDB and other databases like it provide an addition means by which to participate in rapidly emerging initiatives and developments in cyberinfrastructure. To increase its reach in this area, the PBDB now has an Application Programming Interface (API), which makes data more easily and transparently accessible, both to individual researchers and to applications, such as the open source web application PBDB Navigator and the Mancos iOS mobile application. Both of these applications are built on the public API and are designed to allow the history of life and environment documented by the PBDB to be more discoverable. These new modes of interactivity and visualization highlight unintended, but potentially useful, aspects of the PBDB. The PBDB API has facilitated a loosely coupled integration with other related but independently managed biological and paleontological database initiatives and online resources, such as the Neotoma Paleoecology Database, Morphobank, and the Encyclopedia of Life. The PBDB API can also be harnessed by geoscientists outside of paleontology, thereby facilitating the integration of paleontological data with diverse types of data and model output, such as paleogeographic plate rotation and geophysical models in GPlates. The liberal CC BY license ensures interoperability and data access necessary to facilitate fundamentally new science and because it expands the reach of paleontology to a broader community of researchers and educators than is possible via any single website or application.

1 Comment »

BioMed Central moves to CC BY 4.0 along with CC0 for data

Puneet Kishor, December 18th, 2013

CC 4.0

at

BioMed Central, The Open Access Publisher chemcentral_logo SpringerOpen

BioMed Central (BMC) is one of the largest open access (OA) publishers in the world with 250 peer-reviewed OA journals, and more than 100,000 OA articles published yearly. BMC is also long-time user of CC licenses to accomplish its mission of husbanding and promoting open science. BMC has been publishing articles under a CC license since 2004.

In June of least year, BMC’s Iain Hrynaszkiewicz and Matthew Cockerill, published an editorial titled Open by default in which they proposed a copyright license and waiver agreement for open access research and data in peer-reviewed journals. The gist of the editorial was that

Copyright and licensing of scientific data, internationally, are complex and present legal barriers to data sharing, integration and reuse, and therefore restrict the most efficient transfer and discovery of scientific knowledge, (and that implementing) a combined Creative Commons Attribution license (for copyrightable material) and Creative Commons CC0 waiver (for data) agreement for content published in peer-reviewed open access journals… in science publishing will help clarify what users—people and machines—of the published literature can do, legally, with journal articles and make research using the published literature more efficient.

Starting September 3, 2013, in keeping with its forward-looking mission, BMC started requiring a CC0 Public Domain Dedication for data supporting the published articles.

This is good because CC0 reduces all impedance to sharing and reuse by placing the work in the public domain. Good scientific practices assure proper credit is given via citation, something scientists have already been doing for centuries. Marking data with CC0 sends a clear signal of zero impedance to reuse. CC0 is a public domain dedication, however, wherever such a dedication is not possible, CC0 has a public license fallback. Either way, the impedance to data reuse is eliminated or minimized. Making CC0 the default removes uncertainty, and speeds up the process of accessible, collaborative, participatory and inclusive science.

But wait, there is more… starting February 3, 2014, BMC, Chemistry Central and all of SpringerOpen family of journals are also Moving Forward to the latest CC BY 4.0 license. Changes in CC-BY — version 4.0, released on Nov 25, 2013, represent more than two years of community process, public input and feedback to develop a truly open, global license suitable for both copyright, related rights and, where applicable, database rights. By moving to CC4.0, BMC is not only getting set for reliable, globally recognizable mark of open, it is also setting a high bar for the future of open science.

We at Creative Commons are big fans of BMC, and we applaud their move to creating a stronger, more vibrant open commons of science.

Comments Off

Human Services Taxonomy

Puneet Kishor, December 2nd, 2013

[written in collaboration with Erine A. Gray, founder, Aunt Bertha and the Open Eligibility Project]

Text-based search is powerful. However, as more and more information is digitized and made available on the internet, the effectiveness of text-based search could stand to be supplemented with other technologies.

Aunt Bertha logo Aunt Bertha, an Austin, TX–based B Corporation, focuses on helping people to find government and charitable human service programs on the web. In the United States, there are 89,000 governments, a million charities, and more than three hundred thousand congregations. Many of these organizations provide food, health, housing, or education programs to those who need it (the “Seekers”). Aunt Bertha’s goal is to index all these programs so that the Seekers can find help in seconds.

Launched in the fall of 2010, Aunt Bertha founders learned something very interesting early on. In a medium-sized city, a Seeker can have at least 500 government and charitable programs to choose from. The user experience designer must ensure that the Seekers can easily find the program that fits their need, a task that’s harder than it might seem: not only are the Seekers are multi-faceted and complex; so are the programs that serve them. A common language that described both the Seekers and the available human services would go a long way to help as text-based search alone would not work. Enter the Open Eligibility Project.

Open Eligibility Project Realizing that other organizations were facing the same problem — and that there had been attempts at categorizing these types of programs before, but the terms and methodologies used were full of bureaucratic jargon — the Open Eligibility Project set out to simplify the taxonomy, the terms that describe human services.

There are two important facets to human services taxonomy: Human Services and Human Situations. Human Services are simply the services provided by the organization—examples include clothes for school, computer classes and counseling. Human Situations are simply the attributes of the Seeker—for examples, mothers, ex-offenders or veterans. Here is one example of the use of this taxonomy on Aunt Bertha:

WIC Program

It is not always easy to find the balance between comprehensiveness and ease-of-use. For this project to be successful, a tension should always exist between these two goals. Lean too far one way and it becomes suitable only for the policy wonks. Lean the other way, and it loses specificity and the Seekers can not find what they are seeking.

Since launching the Open Eligibility Project, there has been some interesting traction in the area of human services taxonomy. Just this year, a new Civic Services Schema was submitted and accepted by Schema.org. The ServiceAudience field of the spec, in particular, is a great fit for Open Eligibility’s Human Situations tags. If government agencies adopt this spec, it will make their programs more findable by people who fit those situations (ex: programs for veterans, programs for foster children, etc.).

What’s Next

Aunt Bertha seeded the Open Eligibility Project with all of the types of services and situations listed on Aunt Bertha. But, there are more out there though, and help from others would make the taxonomy even better. That is why the founders were attracted to Creative Commons, and decided to release the taxonomy on Github under a CC BY-SA 3.0 license. Hackers, coders, and those concerned generally with human services are invited to join the Google+ community, and to contribute to the project on the Github page, or to connect with Aunt Bertha on Facebook or Twitter.

Comments Off

Identifying drug targets one protein at a time

Puneet Kishor, November 19th, 2013

Protein structure

The structure of human proteins defines, in part, what it is to be human. It is very expensive, as much as a couple of million USD, to determine the structure of human membrane proteins. Improvements in methods, computers and access to the complete sequence of our DNA, however, has made it possible to adopt more systematic approaches, and thus reduce the time and cost to determine the shapes of proteins. Structural genomics helps determine the 3D structures of proteins at a rapid rate and in a cost-effective manner. Structural information provides one of the most powerful means to discover how proteins work and to define ligands that modulate their function. Such ligands are starting points for drug discovery.

The Structural Genomics Consortium (SGC) at the Universities of Oxford and Toronto, solves the structures of human proteins of medical relevance and places all its findings, reagents and know-how into the public domain without restriction. Using these structures and the reagents generated as part of the structure determination process as well as the chemical probes identified, the SGC works with organizations across the world to further the understanding of the biological roles of these proteins. The SGC is particularly interested in human protein kinases, metabolism-associated proteins, integral membrane proteins, and proteins associated with epigenetics and rare diseases.

Academics work under the lamp post Drug discovery tends to be a crapshoot. As we are not good at target validation that essentially occurs in patients, more than 90% of the pioneer targets fail in Phase 2. Nevertheless, many academics and pharmas work on the same, small group of targets in competition with each other, wasting resources and careers, needlessly exposing patients to molecules destined for failure. The SGC chooses not to work under the lamp post, focusing on those targets for which there is little or no literature. This is because it is such pioneer targets, which will deliver pioneer, breakthrough medicines.

The SGC is a not-for-profit, public-private partnership, funded by public and charitable funders in Canada and UK, and eight large pharmaceutical companies – GSK, Pfizer, Novartis, Lilly, Boehringer Ingelheim, Janssen, Takeda and Abbvie, whose mandate is to promote the development of new medicines by determining 3D structures on a large scale and cost-effectively, targeting human proteins of biomedical importance and proteins from human parasites that represent potential drug targets.

SGC building leadership The SGC is now responsible for between a quarter and half of all structures deposited into the Protein Data Bank (PDB) each year. The SGC has released the structures of nearly 1500 proteins with implications to the development of new therapies for cancer, diabetes, obesity, and psychiatric disorders. As evident from the chart, SGC has published as many protein kinases as the rest of academia combined.

The SGC’s structural biology insights have allowed us to make significant progress toward the understanding of signal transduction, epigenetics and chromatin biology, and metabolic disease. The SGC has adopted the following Open Access policy—the SGC and its scientists are committed to making their research outputs (materials and knowledge) available without restriction on use. This means that the SGC promptly places its results in the public domain and agrees to not file for patent protection on any of its research outputs. This not only provides the public with this fundamental knowledge, but also allows commercial efforts and other academics to utilize the data freely and without any delay. The SGC seeks the same commitment from any research collaborator. The structural information is made available to everyone either when the structure is released by the PDB, or pre-released on www.thesgc.org.

Prof. Chas Bountra at the University of Oxford says:

“Society desperately needs new treatments for many chronic (AD, bipolar disorder, pain…) or rare diseases. This need is growing because of aging societies and diseases of modern living. As a biomedical community, we have yet to deliver truly novel treatments for many such conditions. This is not for lack of effort or resources. It is simply that these disorders are complex and there are too many variables or unknowns. It is clear that no one group or organisation can do this on their own. What we are trying to do is to bring together the best scientists from across the world, irrespective of affiliation, pooling resources and infrastructures, reducing wasteful duplicative activity to catalyse the creation of new medicines for patients. Secrecy and competition in early phases of target identification/discovery are slowing down drug discovery, making the process more difficult and more expensive.”

We at CC applaud the SGC’s commitment to open access and look to them for leadership in this arena. We believe the SGC’s findings would be a great candidate for the CC0 Public Domain Dedication because of the CC0 mark’s global recognition and a common legal status.

Comments Off


Page 1 of 512345