This is part of a series of posts introducing the projects built by open source contributors mentored by Creative Commons during Google Summer of Code (GSoC) 2020 and Outreachy. Subham Sahu was one of those contributors and we are grateful for his work on this project.
The CC Catalog data visualization—the Linked Commons 2.0—is a web application which aims to showcase and establish a relationship between the millions of data points of CC-licensed content using graphs. In this blog, I’ll discuss the motivation for this visualization and explore the latest features of the newest edition of the Linked Commons.
The number of websites using CC-licensed content is enormous, and snowballing. The CC Catalog collects and stores these millions of data points, and each node (a unit in a data structure) contains information about the URL of the websites and the licenses used. It’s possible to do rigorous data analysis in order to understand fully how these are interconnected and to identify trends, but this would be exclusive to those with a technical background. However, by visualizing the data, it becomes easier to identify broad patterns and trends.
For example, by identifying other websites that are linking to your content, you can try to have a specific outreach program or collaborate with them. In this way out of billions of webpages out there on the web, you can very efficiently focus on the webpages where you are more likely to see an increase in growth.
Let’s look at some of the new features in the Linked Commons 2.0.
- Filtering based on the node name
The Linked Commons 2.0 allows users to search for their favorite node and then explore all of that node’s neighbors across the thousands present in the database. We have color-coded the links connecting the neighbors to the root node, as well as the neighbors which are connected to the root node differently. This makes it immaculately easy for users to classify the neighbors into two categories.
- A sleek and revamped design
The Linked Commons 2.0 has a sleek design, with a clean and refreshing look along with both a light and dark theme.
- Tools for smooth interaction with the canvas
The Linked Commons 2.0 ships with a few tools that allow the user to zoom in, zoom out, and reset zoom with just one tap. It is especially useful to users who are on touch devices or using a trackpad.
- Autocomplete feature
The current database of the Linked Commons 2.0 contains around 240 thousand nodes and 4.14 million links. Unfortunately, some of the node names are uncommon and lengthy. To prevent users from the exhausting work of typing complete node names, this version ships with an autocomplete feature: for every keystroke, node names will appear that correspond with what the user might be looking for.
What’s next for the Linked Commons?
In the current version, there are some nodes which are very densely connected. For example, the node “Wikipedia” has around 89k nodes and 102k links as neighbours. This number is too big for web browsers to render. Therefore, we need to configure a way to reduce this to a more reasonable number.
During the preprocessing, we dropped a lot of the nodes and removed more than 3 million nodes which didn’t have CC license information. In general, the current version shows only those nodes which are soundly linked with other domains and their licenses information is available. However, to provide a more complete picture of the CC Catalog, the Linked Commons needs additional filtering methods and other tools. These potentially include:
- filtering based on Top-Level domain
- filtering based on the number of web links associated with a node
We plan to continue working on the Linked Commons. You can follow the project development by visiting our GitHub repo. We encourage you to contribute to the Linked Commons, by reporting bugs, suggesting features or by helping us write code. The new Linked Commons makes it easy for anyone to set up the development environment.
The project consists of a dedicated server which powers the filtering by node name and query autocompletion. The frontend is built using ReactJS, for smooth rendering performance. So, it doesn’t matter whether you’re a frontend developer, a backend developer, or a designer: there is some part of the Linked Commons that you can work on and improve. We look forward to seeing you on board with sparkling ideas!
We are extremely proud and grateful for the work done by Subham Sahu throughout his 2020 Google Summer of Code internship. We look forward to his continued contributions to the Linked Commons as a project core committer in the CC Open Source Community!