The Openai logo is on the phone screen held in front of the computer screen that displays output from ChatGpt. Michael Dwyer/AP hidden caption
Toggle caption
Michael Dwyer/AP
A federal judge on Wednesday rejected an open request to abandon the New York Times copyright lawsuit alleging that technology companies used newspaper content without permission or payment.
In an order allowing the case to move forward, Judge Sidney Stein of the Southern District of New York narrowed the scope of the case, but allowed the case’s main copyright infringement claims to move forward.
Stein did not immediately give his opinion, but he promised to come “quickly.”
The decision was a victory for the newspaper, joining forces with other publishers, including the New York Daily News and the Research Reporting Center, to challenge the way OpenAI has collected a huge amount of data to train ChatGpt, a popular artificial intelligence service.
Attorney Stephen Lieberman, who is on the legal team representing the news publisher, celebrated the judge’s decision.
“We are grateful for the opportunity to present to the ju apprentice the fact that Openai and Microsoft are bolstering fiercely from stealing the original content of newspapers across the country,” Lieberman said in a statement to NPR.
Lawyers for the New York Times consider the article in this paper to be one of the biggest sources of copyrighted texts Openai used to incorporate ChatGpt into the premier AI chatbot, claiming Openai violated copyright laws in siphoning journalism in newspapers.
In a statement to NPR, Openai spokesman Jason Deutrom said the company welcomed the smears in the judge’s case and expressed that the ChatGpt maker’s lawyers “look forward to clarifying that using fair and available data to build AI models, build on fair use, and support innovation.”
Openai leaders argue that the company’s large-scale data scraping, including articles from The Times, is protected under a legal doctrine known as “fair use.”
This allows for the reuse of materials in certain instances, such as research, education, commentary, and more, without permission.
The judge’s decision means that the case can proceed to trial, but no trial date has been set. The collection of evidence, including deposits with executives on both sides of the case, is expected to occur in secret, along with public hearings to resolve disputes over evidence and other issues.
The legal battle between one of the world’s most influential news outlets and the leading Silicon Valley AI company has done a lot to both the news industry and the future of AI tools.
For publishers, fear encourages a decline in ads that could affect industry revenue by repeating powerful chatbots that means Queried frequently visits news websites.
The suits are also named only Openai and its financial backer Microsoft, but other AI companies also rub the web to train their content. In most cases, the AI industry tracks Openai leads when it comes to training Chatbot and other AI services, and operates on the premise that processing data found on the open web has been found in Chatbot’s answer.
However, the law remains unsettled about the issue.
The court states that a fair use of copyrighted work must produce a new “transformative” or comment or mention the original work. The Times argues that this does not apply to how Openai replicates the original report of the paper.
Another part of the legal analysis involves an idea known as a “market alternative.” Instead of reading the Times website, chatbot answers refer to whether chatbots and newspapers work in different markets.
At a New York hearing in January, the publisher’s lawyers claimed that Chatbot was verbatim when asked about subjects covered by the Times.
However, Openai’s legal team fought back what appeared to be the news outlet manipulating a specially manipulated prompt to force the chatbot to spit out a large chunk of lifted up from a paper website. This isn’t how most people interact with the service, nor how the chatbot operates, says Joseph Gratz, an attorney at Openai.
“This is not a document search system. It’s a large language model,” Gratz says.