The News Media Alliance commends the Copyright Office for releasing a pre-publication version of its report on Generative AI Training, the third portion of its AI series. This report analyzes whether the use of copyrighted content in training and retrieval augmented generation (RAG) constitutes infringement.
We appreciate the Copyright Office’s careful and nuanced analysis as over 40 litigations work themselves through the courts, recognizing that differences in technology and uses of AI models can affect liability.
News/Media Alliance President and CEO Danielle Coffey stated, “Today’s report constitutes an important and timely recognition of copyright owners’ right to protect their works and prosper in the digital ecosystem, especially for the use of real-time news media content that requires tremendous investment and human reporters. The report states clearly what we already knew: U.S. copyright law is capable of handling new technology, the primary issue we continue to face is effective enforcement and AI developers’ respect for the law. While we would have preferred the Copyright Office had issued a stronger conclusion on training of news media content, the report as a whole makes a compelling case that AI companies must rein in their excesses, respect content creators, and fall in line with copyright law.”
The report highlights the special risks that retrieval augmented generation (RAG) poses to news media stakeholders, noting “the use of RAG is less likely to be transformative where the purpose is to generate outputs that summarize or provide abridged versions of retrieved copyrighted works, such as news articles, as opposed to hyperlinks.”
While fair use is a contextual analysis, we wish the report could have offered similar clarity when it comes to AI training. On the first factor, the Office declined to make definitive statements on the transformative use defense deployed by many AI companies, noting that models may simultaneously serve transformative and non-transformative purposes. The report also notes that AI uses may encroach upon emerging licensing markets and compete with other uses for copyrighted content. In N/MA’s experience, most AI models do not consistently employ sufficient guardrails or mitigations against competitive uses for professional content, and we wish the Office had come to a broader conclusion here to incentivize more responsible development.
We are encouraged, however, that the Office recognizes that in many cases training LLMs “threatens significant potential harm to the market for or value of copyrighted works” and that “where licensing options exist or are likely to be feasible,” the fourth factor will disfavor fair use. Here, the value and nature of journalism businesses must be recognized. News media publishers were early to engage in AI licensing partnerships, and N/MA recently offered a voluntary collective license to its membership. For these reasons, N/MA remains a champion of voluntary licensing, and believes it is premature to consider regulatory interventions in licensing markets.
Before considering the defense of fair use, the Copyright Office also analyzed AI training processes and concluded that “creating and deploying a generative AI system using copyright-protected material involves multiple acts that, absent a license or other defense, may infringe one or more rights.” Although LLMs train on a vast quantity of data, the report notes that model performance “also depends heavily on the quality of the data used to train them,” and explains how training can result in the memorization and retention of valuable content within the model itself, a topic of an N/MA White Paper.
The Copyright Office launched its study on AI and copyright in August 2023. The first part – focusing on digital replicas – was published in July 2024 while the second part on copyrightability came out in January 2025. N/MA submitted initial and reply comments to the Copyright Office, and met ex parte with the Office to highlight special considerations related to retrieval augmented generation. N/MA also published a White Paper detailing how the pervasive copying of expressive works to train and fuel generative AI systems is copyright infringement and not a fair use.
Read the third prepublication version of the study here.