On Openness & Copyright, EU AI Act Final Version Appears to Include Promising Changes

by Creative Commons Better Internet, Policy

Throughout the last year, Creative Commons has actively engaged in the EU’s development of an AI Act. We welcomed its overall approach, focused on ensuring high-risk systems that use AI are trustworthy and safe. At the same time, we had concerns about the way it might impede better sharing and collaboration on the development of AI systems, and we joined with a coalition of AI developers and advocates offering suggestions for how to improve it. Rather than advocating for blanket exemptions, we supported a graduated, tailored approach – differentiating merely creating, sharing, and doing limited testing of new tools, versus offering a commercial service or otherwise putting powerful AI models into service, particularly at broad scale and impact.

We also raised concerns about late additions to the text related to copyright. While we generally support more transparency around the training data for regulated AI systems, the Parliament’s text included an unclear and impractical obligation to provide information specifically about use of copyrighted works.

This week, the EU’s political institutions announced that they have reached a tentative final agreement. We’re still awaiting a final text, and there are many other issues at stake related to the specific regulations on high-risk systems; a number of civil society organizations have raised concerns with, for example, changes to rules around predictive policing and biometric recognition, among other things.

At the same time, from the initial reported details (including this draft compromise text published by POLITICO), the final agreement appears promising relative to the recent Parliament text and from the perspective of supporting open source, open science, as well as on copyright. The devil is in the details, and we will update our views based on further review of the final text.

Open Source & Open Science

Consistent with our advocacy, the final version appears to clarify that merely providing and collaborating on AI systems under an open license is not covered by the Act, unless they are an AI system regulated by the Act (e.g., a defined “high-risk” system) that is commercially available or put into service.

As the AI Act progressed, focus shifted from particular high-risk systems to general purpose AI models (GPAI), sometimes referred to in terms of “foundation models.” This is a tricky issue, because it could have unintended consequences for a wide variety of beneficial uses of AI. In light of the Parliament’s proposed inclusion of these models, we had advocated for a tiered approach, requiring transparency and documentation of all models while reserving stricter requirements for commercial deployments and those put into service at some level of broad scale and impact.

On the one hand, the final Act also takes a tiered approach, reserving the strict requirements for models of “high impact” and “systemic risk.” On the other hand, the initial tiering is based on an arbitrary technical threshold, which at best only has a limited relationship to measuring actual real-world impact. Fortunately, it appears this tiering can be updated by regulators in the to-be-created AI Office in the future based on other quantitative and qualitative measures, and we hope that the final rules also appropriately distinguish between development of the pre-trained model, and follow-on, third party developers “fine-tuning” a model.

Interestingly, the draft text will exempt models that do not have “systemic risk” and are “made accessible to the public under a free and open-source license whose parameters, including the weights, the information on the model architecture, and the information on model usage,” with the exception of certain transparency requirements around training data and respect for copyright (see below). This provides further breathing room for open source developers, although it is worth noting that the definition of what constitutes an “open source license” in this context is still a matter of some debate. We hope those continuing discussions will help ensure these protections in the law are applied to those models that, by virtue of their openness, do provide critical transparency that facilitates robust accountability and trustworthy systems.

The exact rules will continue to evolve as the AI Act is implemented in the coming years, and other countries are also considering the role of openness. For instance, the U.S. Department of Commerce is soliciting input on “dual-use foundation models with widely available weights,” pursuant to the White House’s recent Executive Order.

As AI development and regulation continue to evolve next year, we will continue to work with a broad coalition to ensure better support for open source and open science. This fall, we were proud to join with a wide range of organizations and individuals in an additional joint statement emphasizing the importance of openness and transparency in AI – not only because it helps make the technology more accessible, but also because it can support trust, safety and security. We look forward to continuing to work with all stakeholders to make this a reality.

Copyright & Transparency

The final Act appears to take a more flexible approach to transparency around use of training data. Rather than expecting GPAI providers to list every specific work used for training and determine whether it is under copyright, it instead indicates that a summary of the collections and sources of data is enough (for example, it might be sufficient to state that one uses data from the web contained in Common Crawl’s dataset). The AI Office will create a template for meeting these transparency requirements. We welcome the new wording, which clarifies that the transparency requirement applies to any training data — not only to copyright-protected works. We will continue to engage on this topic to ensure it takes a flexible, proportionate approach, free of overreaching copyright restrictions.

The Act also requires that foundation model providers have policies in place to adhere to the copyright framework. It’s unclear exactly what this means besides restating that they must comply with existing law, including the opt-out stipulated in Article 4(3) of the DSM Directive. If that’s the intent, then it is an appropriate approach. As we said previously:

“We also believe that the existing copyright flexibilities for the use of copyrighted materials as training data must be upheld. The 2019 Directive on Copyright in the Digital Single Market and specifically its provisions on text-and-data mining exceptions for scientific research purposes and for general purposes provide a suitable framework for AI training. They offer legal certainty and strike the right balance between the rights of rightsholders and the freedoms necessary to stimulate scientific research and further creativity and innovation.”

The draft does create some uncertainty here, however. It states that models must comply with these provisions if put into service in the EU market, even if the training takes place elsewhere. On the one hand, the EU wants to avoid situations of “regulatory arbitrage,” where models are trained in a more permissive jurisdiction and then brought into the EU, without complying with EU rules. On the other hand, this threatens to create a situation where most restrictive rules set a global standard; to the extent that simply putting a model into service on a globally accessible website could put a provider in legal jeopardy, it could create uncertainty for developers.

Posted 11 December 2023

Open Source & Open Science

Copyright & Transparency

Tags

Related posts