Paleobiology Database now CC BY
Puneet Kishor, December 19th, 2013
now available under
Paleontology, the description and biological classification of fossils, has spawned countless field expeditions, museum trips, and hundreds of thousands of publications. The construction of databases that aggregate these descriptive data on fossils in a way that allows large-scale, synthetic questions to be addressed, such as the long-term history of biodiversity and rates of biological extinction and origination during global environment change, has greatly expanded the intellectual reach of paleontology and has led to many important new insights into macroevolutionary and macroecological processes.
One of the largest compendia of fossil data assembled to date is the Paleobiology Database (PBDB), founded in 1998 by John Alroy and Charles Marshall. These two pioneers assembled a small team of scientists who were motivated to generate the first geographically-explicit, sampling standardized global biodiversity curve. The PBDB has since grown to include an international group of more than 150 contributing scientists with diverse research agendas. Collectively, this body of volunteer and grant-supported investigators have spent more than 9 continuous person years entering more than 280,000 taxonomic names, nearly 500,000 published opinions on the status and classification of those names, and over 1.1 million taxonomic occurrences. Some PBDB data derive from the original fieldwork and specimen-based studies of the contributors, but the majority of the data were extracted from the text, figures, and tables of over 48,000 published papers, books, and monographs that span the range of topics covered by paleontology. Their efforts have been well rewarded by enabling new science. As of December 2013, the PBDB had produced almost two hundred official peer reviewed publications, all of which address scientific questions that cannot be adequately answered without such a database.
|Ptyagnostus atavus or Leiopyge calva Zone (Cambrian of the United States)|
|PaleoDB collection 262: authorized by Jack Sepkoski, entered by Mike Sommers on 20.11.1998|
Shift to CC BY
From its inception, the paleontologists who have invested the most effort in entering data have made decisions about data management and access policies, which ultimately brings up the important questions of proper licensing and citation. In the first application of the PBDB licensing policy, the individual contributors chose their own CC license for each fossil collection record. As a result there were three kinds of contributors: those who didn’t know what to do, didn’t care, or didn’t know about the new policy that required them to specify how existing collections should be licensed (55% of the data), those who selected the most restricted option available to them (34% of the data), and those who selected the most unrestricted option available to them (10% of the data).
This received mostly negative response via social media and other outlets, partly because of the increased attention the database was receiving during a leadership and governance transition. Naturally, the governance group responded to the community feedback. The first actual action was by individual contributors. Many of the contributors who either didn’t know about CC licenses or who didn’t think fully about their meaning and implications changed their own individual licenses. This always went from a more restrictive license to the least restrictive option available to them: CC BY. That wave of individual choices towards the least restrictive license immediately shifted the balance for records in the database. At that point, only one contributor had a restrictive license, and the governance group quickly moved to adopt one single unifying license for the database: CC BY. Now, all new records are explicitly CC BY as part of database policy, although individual contributors still have the option of placing a moratorium on the public release of their own new data so as to protect their individual scientific interests.
Future of PBDB
In addition to being a scientific asset to the field of paleontology, the PBDB and other databases like it provide an addition means by which to participate in rapidly emerging initiatives and developments in cyberinfrastructure. To increase its reach in this area, the PBDB now has an Application Programming Interface (API), which makes data more easily and transparently accessible, both to individual researchers and to applications, such as the open source web application PBDB Navigator and the Mancos iOS mobile application. Both of these applications are built on the public API and are designed to allow the history of life and environment documented by the PBDB to be more discoverable. These new modes of interactivity and visualization highlight unintended, but potentially useful, aspects of the PBDB. The PBDB API has facilitated a loosely coupled integration with other related but independently managed biological and paleontological database initiatives and online resources, such as the Neotoma Paleoecology Database, Morphobank, and the Encyclopedia of Life. The PBDB API can also be harnessed by geoscientists outside of paleontology, thereby facilitating the integration of paleontological data with diverse types of data and model output, such as paleogeographic plate rotation and geophysical models in GPlates. The liberal CC BY license ensures interoperability and data access necessary to facilitate fundamentally new science and because it expands the reach of paleontology to a broader community of researchers and educators than is possible via any single website or application.