now available under
Paleontology, the description and biological classification of fossils, has spawned countless field expeditions, museum trips, and hundreds of thousands of publications. The construction of databases that aggregate these descriptive data on fossils in a way that allows large-scale, synthetic questions to be addressed, such as the long-term history of biodiversity and rates of biological extinction and origination during global environment change, has greatly expanded the intellectual reach of paleontology and has led to many important new insights into macroevolutionary and macroecological processes.
One of the largest compendia of fossil data assembled to date is the Paleobiology Database (PBDB), founded in 1998 by John Alroy and Charles Marshall. These two pioneers assembled a small team of scientists who were motivated to generate the first geographically-explicit, sampling standardized global biodiversity curve. The PBDB has since grown to include an international group of more than 150 contributing scientists with diverse research agendas. Collectively, this body of volunteer and grant-supported investigators have spent more than 9 continuous person years entering more than 280,000 taxonomic names, nearly 500,000 published opinions on the status and classification of those names, and over 1.1 million taxonomic occurrences. Some PBDB data derive from the original fieldwork and specimen-based studies of the contributors, but the majority of the data were extracted from the text, figures, and tables of over 48,000 published papers, books, and monographs that span the range of topics covered by paleontology. Their efforts have been well rewarded by enabling new science. As of December 2013, the PBDB had produced almost two hundred official peer reviewed publications, all of which address scientific questions that cannot be adequately answered without such a database.
|Ptyagnostus atavus or Leiopyge calva Zone (Cambrian of the United States)|
|PaleoDB collection 262: authorized by Jack Sepkoski, entered by Mike Sommers on 20.11.1998|
Shift to CC BY
From its inception, the paleontologists who have invested the most effort in entering data have made decisions about data management and access policies, which ultimately brings up the important questions of proper licensing and citation. In the first application of the PBDB licensing policy, the individual contributors chose their own CC license for each fossil collection record. As a result there were three kinds of contributors: those who didn’t know what to do, didn’t care, or didn’t know about the new policy that required them to specify how existing collections should be licensed (55% of the data), those who selected the most restricted option available to them (34% of the data), and those who selected the most unrestricted option available to them (10% of the data).
This received mostly negative response via social media and other outlets, partly because of the increased attention the database was receiving during a leadership and governance transition. Naturally, the governance group responded to the community feedback. The first actual action was by individual contributors. Many of the contributors who either didn’t know about CC licenses or who didn’t think fully about their meaning and implications changed their own individual licenses. This always went from a more restrictive license to the least restrictive option available to them: CC BY. That wave of individual choices towards the least restrictive license immediately shifted the balance for records in the database. At that point, only one contributor had a restrictive license, and the governance group quickly moved to adopt one single unifying license for the database: CC BY. Now, all new records are explicitly CC BY as part of database policy, although individual contributors still have the option of placing a moratorium on the public release of their own new data so as to protect their individual scientific interests.
Future of PBDB
In addition to being a scientific asset to the field of paleontology, the PBDB and other databases like it provide an addition means by which to participate in rapidly emerging initiatives and developments in cyberinfrastructure. To increase its reach in this area, the PBDB now has an Application Programming Interface (API), which makes data more easily and transparently accessible, both to individual researchers and to applications, such as the open source web application PBDB Navigator and the Mancos iOS mobile application. Both of these applications are built on the public API and are designed to allow the history of life and environment documented by the PBDB to be more discoverable. These new modes of interactivity and visualization highlight unintended, but potentially useful, aspects of the PBDB. The PBDB API has facilitated a loosely coupled integration with other related but independently managed biological and paleontological database initiatives and online resources, such as the Neotoma Paleoecology Database, Morphobank, and the Encyclopedia of Life. The PBDB API can also be harnessed by geoscientists outside of paleontology, thereby facilitating the integration of paleontological data with diverse types of data and model output, such as paleogeographic plate rotation and geophysical models in GPlates. The liberal CC BY license ensures interoperability and data access necessary to facilitate fundamentally new science and because it expands the reach of paleontology to a broader community of researchers and educators than is possible via any single website or application.1 Comment »
BioMed Central (BMC) is one of the largest open access (OA) publishers in the world with 250 peer-reviewed OA journals, and more than 100,000 OA articles published yearly. BMC is also long-time user of CC licenses to accomplish its mission of husbanding and promoting open science. BMC has been publishing articles under a CC license since 2004.
In June of least year, BMC’s Iain Hrynaszkiewicz and Matthew Cockerill, published an editorial titled Open by default in which they proposed a copyright license and waiver agreement for open access research and data in peer-reviewed journals. The gist of the editorial was that
Copyright and licensing of scientific data, internationally, are complex and present legal barriers to data sharing, integration and reuse, and therefore restrict the most efficient transfer and discovery of scientific knowledge, (and that implementing) a combined Creative Commons Attribution license (for copyrightable material) and Creative Commons CC0 waiver (for data) agreement for content published in peer-reviewed open access journals… in science publishing will help clarify what users—people and machines—of the published literature can do, legally, with journal articles and make research using the published literature more efficient.
Starting September 3, 2013, in keeping with its forward-looking mission, BMC started requiring a CC0 Public Domain Dedication for data supporting the published articles.
This is good because CC0 reduces all impedance to sharing and reuse by placing the work in the public domain. Good scientific practices assure proper credit is given via citation, something scientists have already been doing for centuries. Marking data with CC0 sends a clear signal of zero impedance to reuse. CC0 is a public domain dedication, however, wherever such a dedication is not possible, CC0 has a public license fallback. Either way, the impedance to data reuse is eliminated or minimized. Making CC0 the default removes uncertainty, and speeds up the process of accessible, collaborative, participatory and inclusive science.
But wait, there is more… starting February 3, 2014, BMC, Chemistry Central and all of SpringerOpen family of journals are also Moving Forward to the latest CC BY 4.0 license. Changes in CC-BY — version 4.0, released on Nov 25, 2013, represent more than two years of community process, public input and feedback to develop a truly open, global license suitable for both copyright, related rights and, where applicable, database rights. By moving to CC4.0, BMC is not only getting set for reliable, globally recognizable mark of open, it is also setting a high bar for the future of open science.
We at Creative Commons are big fans of BMC, and we applaud their move to creating a stronger, more vibrant open commons of science.Comments Off
Text-based search is powerful. However, as more and more information is digitized and made available on the internet, the effectiveness of text-based search could stand to be supplemented with other technologies.
Aunt Bertha, an Austin, TX–based B Corporation, focuses on helping people to find government and charitable human service programs on the web. In the United States, there are 89,000 governments, a million charities, and more than three hundred thousand congregations. Many of these organizations provide food, health, housing, or education programs to those who need it (the “Seekers”). Aunt Bertha’s goal is to index all these programs so that the Seekers can find help in seconds.
Launched in the fall of 2010, Aunt Bertha founders learned something very interesting early on. In a medium-sized city, a Seeker can have at least 500 government and charitable programs to choose from. The user experience designer must ensure that the Seekers can easily find the program that fits their need, a task that’s harder than it might seem: not only are the Seekers are multi-faceted and complex; so are the programs that serve them. A common language that described both the Seekers and the available human services would go a long way to help as text-based search alone would not work. Enter the Open Eligibility Project.
Realizing that other organizations were facing the same problem — and that there had been attempts at categorizing these types of programs before, but the terms and methodologies used were full of bureaucratic jargon — the Open Eligibility Project set out to simplify the taxonomy, the terms that describe human services.
There are two important facets to human services taxonomy: Human Services and Human Situations. Human Services are simply the services provided by the organization—examples include clothes for school, computer classes and counseling. Human Situations are simply the attributes of the Seeker—for examples, mothers, ex-offenders or veterans. Here is one example of the use of this taxonomy on Aunt Bertha:
It is not always easy to find the balance between comprehensiveness and ease-of-use. For this project to be successful, a tension should always exist between these two goals. Lean too far one way and it becomes suitable only for the policy wonks. Lean the other way, and it loses specificity and the Seekers can not find what they are seeking.
Since launching the Open Eligibility Project, there has been some interesting traction in the area of human services taxonomy. Just this year, a new Civic Services Schema was submitted and accepted by Schema.org. The ServiceAudience field of the spec, in particular, is a great fit for Open Eligibility’s Human Situations tags. If government agencies adopt this spec, it will make their programs more findable by people who fit those situations (ex: programs for veterans, programs for foster children, etc.).
Aunt Bertha seeded the Open Eligibility Project with all of the types of services and situations listed on Aunt Bertha. But, there are more out there though, and help from others would make the taxonomy even better. That is why the founders were attracted to Creative Commons, and decided to release the taxonomy on Github under a CC BY-SA 3.0 license. Hackers, coders, and those concerned generally with human services are invited to join the Google+ community, and to contribute to the project on the Github page, or to connect with Aunt Bertha on Facebook or Twitter.Comments Off
The structure of human proteins defines, in part, what it is to be human. It is very expensive, as much as a couple of million USD, to determine the structure of human membrane proteins. Improvements in methods, computers and access to the complete sequence of our DNA, however, has made it possible to adopt more systematic approaches, and thus reduce the time and cost to determine the shapes of proteins. Structural genomics helps determine the 3D structures of proteins at a rapid rate and in a cost-effective manner. Structural information provides one of the most powerful means to discover how proteins work and to define ligands that modulate their function. Such ligands are starting points for drug discovery.
The Structural Genomics Consortium (SGC) at the Universities of Oxford and Toronto, solves the structures of human proteins of medical relevance and places all its findings, reagents and know-how into the public domain without restriction. Using these structures and the reagents generated as part of the structure determination process as well as the chemical probes identified, the SGC works with organizations across the world to further the understanding of the biological roles of these proteins. The SGC is particularly interested in human protein kinases, metabolism-associated proteins, integral membrane proteins, and proteins associated with epigenetics and rare diseases.
Drug discovery tends to be a crapshoot. As we are not good at target validation that essentially occurs in patients, more than 90% of the pioneer targets fail in Phase 2. Nevertheless, many academics and pharmas work on the same, small group of targets in competition with each other, wasting resources and careers, needlessly exposing patients to molecules destined for failure. The SGC chooses not to work under the lamp post, focusing on those targets for which there is little or no literature. This is because it is such pioneer targets, which will deliver pioneer, breakthrough medicines.
The SGC is a not-for-profit, public-private partnership, funded by public and charitable funders in Canada and UK, and eight large pharmaceutical companies – GSK, Pfizer, Novartis, Lilly, Boehringer Ingelheim, Janssen, Takeda and Abbvie, whose mandate is to promote the development of new medicines by determining 3D structures on a large scale and cost-effectively, targeting human proteins of biomedical importance and proteins from human parasites that represent potential drug targets.
The SGC is now responsible for between a quarter and half of all structures deposited into the Protein Data Bank (PDB) each year. The SGC has released the structures of nearly 1500 proteins with implications to the development of new therapies for cancer, diabetes, obesity, and psychiatric disorders. As evident from the chart, SGC has published as many protein kinases as the rest of academia combined.
The SGC’s structural biology insights have allowed us to make significant progress toward the understanding of signal transduction, epigenetics and chromatin biology, and metabolic disease. The SGC has adopted the following Open Access policy—the SGC and its scientists are committed to making their research outputs (materials and knowledge) available without restriction on use. This means that the SGC promptly places its results in the public domain and agrees to not file for patent protection on any of its research outputs. This not only provides the public with this fundamental knowledge, but also allows commercial efforts and other academics to utilize the data freely and without any delay. The SGC seeks the same commitment from any research collaborator. The structural information is made available to everyone either when the structure is released by the PDB, or pre-released on www.thesgc.org.
Prof. Chas Bountra at the University of Oxford says:
“Society desperately needs new treatments for many chronic (AD, bipolar disorder, pain…) or rare diseases. This need is growing because of aging societies and diseases of modern living. As a biomedical community, we have yet to deliver truly novel treatments for many such conditions. This is not for lack of effort or resources. It is simply that these disorders are complex and there are too many variables or unknowns. It is clear that no one group or organisation can do this on their own. What we are trying to do is to bring together the best scientists from across the world, irrespective of affiliation, pooling resources and infrastructures, reducing wasteful duplicative activity to catalyse the creation of new medicines for patients. Secrecy and competition in early phases of target identification/discovery are slowing down drug discovery, making the process more difficult and more expensive.”
We at CC applaud the SGC’s commitment to open access and look to them for leadership in this arena. We believe the SGC’s findings would be a great candidate for the CC0 Public Domain Dedication because of the CC0 mark’s global recognition and a common legal status.Comments Off
This past August, I facilitated an online peer-learning course in the School of Open introducing open science to newcomers, and Michelle Sidler worked behind the scenes to keep things glued together. This guest post was written by Michelle, and gives a look at how things went teaching an entirely free course on open science over the web. It’s pretty cool.
Guiding Students through the Course
During last month’s round of School of Open courses, I helped out with a facilitated version of the Open Science course supported by Creative Commons, the Open Knowledge Foundation, and PLOS. On four Tuesdays in August, Billy Meinke hosted online discussions with a handful of well-known members of the open science community while participants from around the world completed course modules and blogged about their experiences. Here’s how things went down.
Note: The course materials and online discussions are available on the Open Science P2PU course page, and will continue to grow over the next few weeks as participants share blog about their experiences working with aspects of science that are either open or not.
While completing course units, participants blogged their experiences, offering reflections and insights about open science and sharing online resources they found. Participants were researchers and scientists from around the world, including biologists, climatologists, librarians, and even musicians.
Though we are still working through much of the blog posts, here are some examples of people learning about open access, open data, and open research for free through the School of Open:
The first of three modules introduced the topic of open access (OA), and after browsing through content about OA, learners were to report on the openness of published research articles they found on the web. A learner named Peter Desmet provided a fine overview of the history of open access and the different “flavours” of open access in an entry on his blog. The second module led folks to the topic of open data for science, where a peer by the name Odon shared her process of learning through her blog, Odonlife. Her writings offered definitions and descriptions of open data and assessed the openness of datasets she found online. Drawing from these lessons, she also described her experiences contributing to open data crowdsourcing projects and how they inspired her to start a similar project. For the third unit on open research, a peer in the course named Nicki Clarkson described the work of Jon Tennant, a paleontologist and open science advocate who deposited the data from his PhD research into the Paleontology Database, a repository for similar data. Jon even commented on her post, thanking her for the shout-out—another example of the ways in which open information brings researchers together!
In addition to supporting the online course participants, Billy Meinke hosted online discussions with many open science friends and advocates from many locales and types of involvement with science around the world. Guests from a variety of organizations joined open, broadcasted Google Hangouts and shared their experiences in open science with dozens of learners watching each stream. Thanks to all the guests who took the time to chat with us about open science! Links to the video and etherpad notes (taken during the live sessions) can be found on the Open Science course page.
Taking the Open Science course further
The Open Science course doesn’t end when we complete the units and assignments. Continue the conversation by spreading the word to other scientists about this resource and encouraging them to participate. There has been interest in volunteer translation efforts and other adaptations of the material. Anyone is free to do so, in compliance with the CC BY-SA license on the course. Much of the material is licensed CC BY or CC0, which give even more open reuse rights!
If you’d like to find out more about what’s happening with this course and others in the School of Open, head on over to the School of Open Google Group and join the discussion! You can also sign up to be notified when the next facilitated course launches, likely in Spring 2014.1 Comment »
I met Peter Sand a few months ago at a #Sensored meetup in SoMa. The setting was exactly like the hardware labs from my undergraduate engineering days, and Peter was there exactly like one of my buddies showing kits and circuits cobbled together to do science (except, Peter is quieter and more polite than most of my buddies). Peter founded ManyLabs, a San Francisco-based nonprofit that wants:
students of any age to become comfortable with data, scientific processes, and mathematical representations of the world. We want people to learn about the strengths and limitations of using math and data to address real-world problems.
Hmmmm… think about that for a minute. Peter is thinking really long-term. He wants to invest in kids today (although ManyLabs kits are suitable for and to be enjoyed by anyone of any age) so they become good at using math and data in the future. Now, that is my kind of guy.
ManyLabs has released a collection of interactive science activities and projects under the Creative Commons BY-SA license. Many of these activities and projects are based on Arduino, an open-source microcontroller board. While most Arduino-based education projects are focused on electronics, programming, or robotics, ManyLabs is instead aiming for compatibility with the existing curricula of biology, physics, math, data, and my favorite, environment classrooms.
Previously ManyLabs was using a CC BY-NC-SA license. “We moved away from a non-commercial license because we want to make usage of the content more flexible. We want the materials to make the widest possible contribution to education,” explained Peter.
While the initial content has been seeded by a small group of contributors, ManyLabs hopes to make the site more community-driven by releasing authoring tools that will allow anyone to create, share, and modify interactive lessons. They also plan to release a platform for CC-licensed data that will allow students, teachers, and others in the community to share data gathered from sensors and manual observations. Together these tools aim to promote scientific reasoning and data literacy, both in schools and in the world at-large.
We are fully behind Peter and his mission. So, go ahead, share, sign in or sign up, and create a lesson. What better way to make the world more open than by teaching kids today about Open to ensure that tomorrow’s world will be full of young people who would have known nothing else.Comments Off
What do you get when you write software that becomes the basis of just about every geospatial application out there? You get perspective. Frank Warmerdam has been authoring, improving, supporting, and shepherding Shapelib, libtiff, GDAL and OGR for the past 15 years. Frank believes that by sharing effort, by adopting open, cooperatively developed standards, and avoiding proprietary licenses, adoption of open technologies could be supercharged. And lucky for us, he is right. To paraphrase him, open standards facilitate communication, capture common practice, and externalize arbitrary decisions.
Frank has done it all — worked as an independent consultant, for a proprietary remote sensing company, for a large search engine and mapping company, and now for a small, innovative space hardware maker. But most importantly, he has been a leader in the open geospatial world, at the helm of the Open GeoSpatial Foundation (OSGeo) that I myself have been involved with as long as I have personally known Frank, that is, for a good part of the past decade.
While OSGeo has faced a number of challenges, it has also enjoyed tremendous success through growing number of projects and chapters, local conferences, being perceived as a legitimate player, and recently, getting representation in its Charter Membership from 37 countries.
Frank says working on data libraries is a grungy job. Everyone wants ‘em but no one wants to work on ‘em. We relate to that as licenses are kinda like that, an essential infrastructure play that require getting the legal and technical details right, yet are most effective when they recede in the background and make us enjoy the content to the fullest.
Per Frank, the next set of challenges revolve around getting open geodata with easy to understand, interoperable license terms. As micro-satellite imagery becomes ubiquitous with frequent imagery collects, the resulting flood of imagery may lead to more ready adoption of open terms, perhaps even a current, live, or almost-live global, medium resolution basemap for OpenStreetMap. We can dream, and with my friend Frank to lead us with his quiet actions and measured wisdom, our dreams will come true.Comments Off
About 400 map makers, coders, cartographers, designers, business services providers and data mungers of chiefly spatial persuasion gathered in San Francisco to “talk OpenStreetMap, learn from each other, and move the project forward.” These conference attendees are a tip of an iceberg composed of 1.1 million registered users who have collectively gathered 3.2 billion GPS points around the world since OpenStreetMap was launched in 2004 as a free, editable map of the whole world. Unlike proprietary datasets, OpenStreetMap allows free access to the full map dataset. About 28 GB of data representing the entire planet can be downloaded in full, but also is available in immediately-useful forms like maps and commercial services. OpenStreetMap is open data licensed under the Open Data Commons Open Database License (ODbL) with the cartography in its tiles and its documentation licensed under a CC BY-SA 2.0 license.
The program ranged from building and nurturing OSM communities, to technical wizardry, to improving infrastructure. Martijn van Exel provided an insight into the OSM community in the United States (see table below). Big countries and large areas pose challenges already in the queue to be tackled.
|land area||3.7 million sq miles|
|casual (< 100 edits)||71.0%|
|active (>100 edits, active in last 3M)||6.8%|
|power (>1000 edits, active in last 3M, active for >1Y||2.6%|
|total edits, all time||723,000,000|
|edits by top 10 mappers (incl bots and import accounts)||69.8%|
|edits by power mappers (excl most bots and import accounts)||57.3%|
Scientific authoring workflow is a beast. You keep notes on paper (hopefully, a notebook, and not just loose pages), in word-processing documents unhelpfully named “notes” followed by “notes1,” “notes2″ or worse, “notes_old,” “notes_old1.” You manage your bibliography on your desktop or on the web, you have a directory folder full of images, charts, photos and other media, and you collaborate with your co-authors by emailing attachments back and forth.
Sooner or later you start doubting your sanity but you soldier on. Finally you publish your paper, heave a sigh of relief, and move on, thereby ensuring your data can’t be reused and your work can’t be reproduced easily.
Several coders, designers, scientists, and publishers met at PLOS to brainstorm toward a better, more modern way. The Markdown for Science workshop was organized by Martin Fenner and Stian Håklev and supported by a 1K Challenge Grant from FORCE11.
Photos by Puneet Kishor, CC0 PD Dedication
While a lot of good ideas were generated, we have a long way to go. Keep an eye on this project, and better yet, pitch in with your ideas and code. Together we can tame this beast.Comments Off
Today the Public Library of Science announced the Accelerating Science Award Program (ASAP). The award program seeks nominations of individuals who have used, applied, or remixed scientific research — published through open access — in order to realize innovations in science, medicine, and technology. The goal of ASAP is to build awareness of and encourage the use of scientific research published through open access. Major sponsors include the Wellcome Trust and Google.
Three winners will each receive $30,000. The nomination period opens today and runs through June 15, 2013. Potential nominees may include individuals, teams, or groups of collaborators -– such as scientists, researchers, educators, social services, technology leaders, entrepreneurs, policy makers, patient advocates, public health workers, and students -– who have used scientific research in transformative ways. The winners will be announced in Washington, DC, in October 2013 at an Open Access Week event hosted by SPARC and the World Bank.
Creative Commons is a supporter of ASAP, along with several other library organizations, publishers, and research organizations.
For more information, including the full details of the ASAP program, nomination process, and the award specifics, go to http://asap.plos.org/. For program rules visit http://asap.plos.org/nominate/rules/.Comments Off