How fast is your internet? How MLab uses CC0 data for the public interest
Open DataThough internet as infrastructure may have seemed radical only a short while ago, many technologists are now taking a different tack: as a vital part of modern life, access to reliable internet is essential to the development of a just and equitable society. Built in response to proprietary measurement datasets, M Lab has assembled the world’s largest collection of open internet measurement data, all under a CC0 license.
A collaborative project from New America’s Open Technology Institute, Google Open Source Research, Princeton University’s Planet Lab, and many others, M Lab’s success stems from their insistence on open data and an open web, maintaining the tests that keep the web free and open. From researchers to consumers, MLab’s data is the backbone of the internet, an example of open collaboration that benefits consumers, researchers, and the future of the web.
To read MLab’s reports and try their tools, visit the website. Thanks to Chris Ritzo, Georgia Bullen, Alison Yost, Collin Anderson, and Stephen Stuart for their time in answering these questions.
Why does Internet measurement matter? What is the ultimate goal of this project?
Measurement Lab’s goal is to provide an open, publicly available dataset and the platform on which to gather it. There have always been proprietary data sources about the quality of consumer broadband connections, but those were and are the intellectual property of companies like Ookla, Akamai, Google, and network operators themselves. New America’s Open Technology Institute, Google, and Princeton University’s Planet Lab formed a consortium to build a data collection platform that could host a common base of internet measurement experiments developed and vetted by the academic research community, be deployed globally, and over time provide what is now the largest open, publicly available internet measurement dataset in the world. Today we run over 100 measurement points around the world and collect an average of over 9 million tests per month worldwide.
From a consumer perspective, are you getting the speed and quality of service you purchased from an ISP? Using a speed test or internet health test provides data to help answer that question. For regulatory agencies, measurement is a means of keeping state on broadband speeds, health, consumer protections, anti-competitive practices and more. For network operators, measurement is paramount to understanding how to provision infrastructure and services. For civil society groups and human rights advocates, it is a means of assessing disparities in accessing the internet, in the quality of available internet services, whether internet traffic is surveilled by state actors or others, and whether and where the internet is censored or blocked. The research community is also keenly interested in openly available internet measurement data, in order to understand and answer many of these issues, and in many cases how they might devise ways to make the internet function better.
How did you make the decision to use CC0 data? How does your organization support the commons?
M-Lab uses a CC0 license on the data for experiments that we maintain or contribute to: NDT, Paris Traceroute and Sidestream. We don’t require researchers hosting other experiments to use the same license, but we do require data to be provided openly. In some cases M-Lab will agree to embargo data for an agreed upon period of time such that the researcher can be the first to publish on the data their test collects. But the most popular tests we maintain on our platform are licensed with CC0 because we think that this data should be in the public domain, and using a CC0 license allows anyone to freely use it without restriction, particularly those in the academic community.
The choice to use a CC0 license goes back to our beginning. The academic community interested in researching the internet needed a data source and couldn’t get that from private companies. Providing that data would have violated companies’ terms of service with their users, and even if it was legally possible, anonymizing it had been proven questionable, if not ineffective. Initiatives like Planet Lab at Princeton University had made some progress toward the idea of a research platform that could be used to collect such data, but didn’t necessarily measure at the scale of the consumer internet. Instead the M-Lab core team engaged with academics, company reps and others to map out what an internet measurement platform might look like to support the work of the research community, that would situate infrastructure to measure the consumer internet, and would provide open data in the service of the public interest. This was the genesis of M-Lab. So from the very beginning we’ve always supported the commons.
The M-Lab core team engaged with academics, company reps and others to map out what an internet measurement platform might look like to support the work of the research community, that would situate infrastructure to measure the consumer internet, and would provide open data in the service of the public interest. This was the genesis of M-Lab. So from the very beginning we’ve always supported the commons.
On your “About” page, you write that “transparency and review are key to good science.” Can you elaborate on that? How do you feel that your project participates in the scientific process to make the Web better for everyone?
M-Lab was created as a platform to produce open data about the health of consumer internet connections. Everything from the submission of proposed tests to the hosting of resulting data mirrors the process of submitting a paper to an academic journal. M-Lab defines the parameters that an experiment must adhere to, and academic or regulatory researchers apply to host their tests with us. Applications are reviewed by an experiment review committee to confirm that the researcher has ethical approval from their Institutional Review Board, that the test they propose conforms to M-Lab’s data privacy policy, determines whether the test has overlap with existing tests, and assesses capacity of the researcher for long term support of the test. M-Lab wants to encourage ongoing longitudinal research, not one-off projects, and make the data available openly for broad analysis and research.
We regularly support researchers interested in secondary data analysis with documentation, sample queries and tools to access, visualize and use M-Lab data, and where possible we produce our own analysis and research. This support varies from individual researchers and graduate students, to civil society and research organizations, to national regulatory agencies. In the United States, the FCC’s contractor, SamKnows, uses the M-Lab platform to host a portion of the tests for the annual Measuring Broadband America program. In Canada, the Canadian Internet Registration Authority (CIRA) hosts three M-Lab sites throughout Canada and has built their own national data portal using M-Lab’s data which also integrates our test.
Additionally, because our tests are open source, we support their integration into other websites, software, or other platforms. These developer integrations are key to our expansion and impact in new areas of the world and by new audiences. Most recently, Google’s Search team integrated Internet 2’s Network Diagnostic Tool (NDT) as a top level answer in their Search product. When you search for “how fast is my internet” or similar, the Google version of our test can be run immediately in your browser.
What kinds of results have you seen that are particularly exciting, surprising, or troubling from this project? What steps can people take to improve the Web? How can they use your project to do so?
M-Lab initially focused on providing the platform and data, leaving analysis to the research or regulatory community. As we’ve grown in size and interest, we have focused on building more accessible tools to run tests, visualize and download our data as well as support individuals and groups interested in using our data in their work.
The M-Lab team is also now working on our own research as well as supporting new inquiries into our data. In October 2014 the M-Lab research and operations team published a technical research report: ISP Interconnection and its Impact on Consumer Internet Performance. The data in this report helped to inform the FCC and supported its historic ruling in favor of Net Neutrality in 2015. Our data and analysis showed clear indicators of congestion and bad performance at the Interconnection points between consumer ISPs and Transit providers. We’ve since presented it to the FCC, NANOG, and at numerous international network operator gatherings. Before the M-Lab report, interconnection wasn’t even on the FCC’s radar. We’ve also supported individual researchers interested in using M-Lab data, through our support email, but also directly. In 2015, M-Lab hosted two research fellows who examined our network performance data in new ways. One fellow examined the economic geography of access by using M-Lab data and US Census data. Another worked on a machine learning algorithm that identifies anomalies in normalized M-Lab data, attempting to identify patterns in our data where known internet shutdowns had occurred.
Anyone can use M-Lab’s public data, tools and open visualizations for free.
M-Lab operates in the public interest- providing open data, open source tools, visualizations and documentation to support our own research, and yours.
People can test the speed and latency of their connection using our site: https://speed.measurementlab.net/. We also have an extension for Google’s Chrome browser, M-Lab Measure, that allows you to schedule tests to be run regularly.
Because M-Lab data is open and all of our tests are open source, developers can integrate our data or our tests into their own applications, services, web-mashups and more. We provide source code, documentation, and implementation examples to enable you to leverage our data, tests and infrastructure. learn more about the project and how to get involved in the project on our website, and contact us for more information.
Posted 30 November 2016