Today, we’ve released a significant update to our working beta of the CC Search product. We launched the project in February 2017 to provide a new “front door” to the Commons with the ultimate goal to find and index all 1.4 billion+ CC licensed works on the web. Since then, our newly formed tech team – myself, Alden Page, Sophine Clachar, and Steven Bellamy – have been working to move this project toward its next iteration, which I am proud to share today.
More providers, better metadata
This is a work in progress — it has great new features, and also has a few bugs, which we’re working on as we go (you can leave feedback here or file issues at Github). This iteration of CC Search integrates access to more than 10 million images across 13 content providers. The data was obtained by processing 36 months of web crawl data from the Common Crawl corpus (an open repository of web crawl data maintained by the Common Crawl Foundation).
The full list of providers:
|Provider||Domain||# CC Licensed Works|
|Animal Diversity Web||https://animaldiversity.org/||14,839|
|Encyclopedia of Life||http://eol.org/||547,488|
|IHA Holiday Ads||http://www.iha.com/||2,058,272|
|The Metropolitan Museum of Art||https://www.metmuseum.org/||96,260|
|Science Museum – UK||https://www.sciencemuseum.org.uk/||14,280|
In addition, the new release contains several new features, including AI image tags generated from our collaborator, Clarifai. Clarifai is a best in class image classification software that provides tagging support and visual recognition. Clarifai’s API was integrated in the process-flow as a means to automatically generate tags for the new and existing images. This means that CC search has machine generated tags, user-defined tags, and platform-defined tags that were obtained from the web crawl data. Collectively, these will enhance the user’s search experience and improve the quality of the results. Currently, 10.3 million images have their respective Clarifai tags and the outstanding images will be integrated on an ongoing basis. Thank you to Clarifai for their support.
A New Look
Users can also now share content and create public lists of images without an account using an anonymous authentication scheme. Shares.cc is a new a link shortening system that makes it easy to share cool stuff you find on our platform to social media – users can share both images and lists, no login required. In addition, the new platform provides the ability to filter by provider, license, creator, tag (including those generated by Clarifai), or title.
(Please note: If you made private lists in the previous system, they will not carry over to this release. We’re sorry for any inconvenience this may have caused. If there is a list you would like us to recover, please email us at firstname.lastname@example.org.)
CC Search is made possible by a number of institutional and individual sponsors. Specifically, we would like to thank Arcadia – a charitable fund of Lisbet Rausing and Peter Baldwin, Mozilla, and the Brin Wojcicki Foundation for their support. With the generous support of our funders, Creative Commons is able to significantly advance its work in pursuit of a more open and sharing world that illuminates the Commons and recognizes the major potential of transformative human knowledge.
Full release notes available here.