“Independent Journalism is vital to our democracy. It is also increasingly rare and valuable.”
These were the opening lines of the New York Times in its case against OpenAI’s ChatGPT and Microsoft’s Bing Chat. The case was brought by the New York Times against the defendants ‘for copying and using millions of the Times’s copyrighted news articles, in-depth investigations, opinion pieces, reviews, how-to-guides and more.’ This might sound very similar to the charges imposed by ANI against OpenAI. Another accusation made by the affected news agencies was also the attribution of false information to them. Living in the modern time, a lot of people are pretty oblivious to the enormous amounts of false information they might be consuming on a day-to-day basis. While GenZ has coined the term ‘Whatsapp University Students’ for the millennials and Boomers consuming false information, they are themselves unaware of the heaps of the same devoured by them through the usage of AI models.
A victim of such fake information were attorneys Peter LoDuca and Steven A. Schwartz, who had cited fake AI-generated cases before the United States Southern District Court of NewYork and later received a show cause notice for their actions. These Generative AI models have advanced so much that they can create poems and artworks in the style of formerly existing poets and artists. Additionally, these poems are becoming increasingly indistinguishable from human-written ones, and reports also suggest that they are being favoured more. Conclusively, it can easily be interpreted that if generating poems is not hard for ChatGPT, nor would be the generation of fake news reports.
Let us think of a future where robots are writing news articles, generating viral social media posts, and even drafting bestselling novels. Sounds crazy, doesn’t it? Think again. In 2023, The Guardian published an op-ed entirely written by GPT-3, and recently, an AI-generated art sold for over $430,000 at Christie’s. But as AI continues to revolutionize industries, it’s also sparking fierce debates: Who owns the content it creates? Can AI be an author? And is it stealing jobs—or creating new ones?
The numbers are diabolical. A McKinsey report says that AI could replace up to 375 million jobs globally by the year 2030, with creative fields like marketing, journalism, and design increasingly at high risk. Last year, BuzzFeed, a digital media company known for its quizzes, listicles, videos, and news content, laid off 12% of its workforce, replacing them with AI tools to generate listicles and quizzes. Meanwhile, AI-powered platforms like ChatGPT and Gemini are generating content at an unprecedented scale, which is now clearly raising questions about ownership, copyright, and the very nature of creativity, and it raises significant questions about how AI can shape the future of journalism and media.
‘Artificial intelligence and copyright law intersect when expressive data is used to train machines to learn, reason, and act as humans do.’ This article explores this complex relationship by delving into the recent case of ANI vs. OpenAI. Three out of the four questions taken into consideration by the court deal with the training and output generation by Generative AI by using copyright data and the unclear stance of whether such usage would qualify for ‘fair use’ under Section 52 of the Copyright Act. The authors have attempted to analyze these questions, examining their broader implications for copyright law in the age of AI.
Can AI be an Author?
When we think of AI, some of the most well-known models, like ChatGPT and Gemini, come to mind. These large language models usually function by processing user instructions and generating outputs, whether it be a well-drafted mail, an essay, a college assignment or any image. However, the question arises then: Can the AI be considered the author of the content it generates? In most jurisdictions around the world, including India, AI cannot be considered the author of the work. AI, being a tool rather than a natural person or the original creator of the work, cannot be recognised as an author.
The Copyright Act of 1957 does not recognise AI as an author within its statutory framework. Section 2(d) of the Act defines an author as “the person responsible for creating a work.” The courts of India have again and again, through various decisions, reinstated the verdict that an artificial entity cannot be considered the author and would not possess any copyright. Furthermore, the structure of the Copyright Act is fundamentally designed around human involvement. For example, the registration process under Form XIV mandates that the claimant disclose their name, nationality, and address—implying that the law assumes authorship to be limited to natural persons or legally recognized entities. This underscores the notion that copyright protection is intended for human creators, not AI.
Now, one might also think that the prompt-giver can be considered to be the generator of the work, making the AI the co-author. This issue has been raised a number of times before different courts in different countries, and most courts have stated that the prompt-giver cannot be considered the author. The US Office of Copyright, while dealing with Zarya of the Dawn and the SURYAST case, held that ‘the mastermind behind the images generated is the AI. This is because while the information in the prompt may influence generated images, the prompt text does not dictate a specific result. As a result of the significant distance between what a user may direct and what might be the result, there is a lack of sufficient control on the part of the prompt giver.’
However, a slightly different approach was taken by the Beijing Internet Court in the Li vs Liu case, wherein copyright was granted to the prompt-giver. The court stated that ‘the prompt given reflects the plaintiff’s choice and management. It also reflects the plaintiff’s aesthetic choice and personal judgment, thereby making him the author and providing him copyright.’ This demonstrates the ongoing ethical and legal debate regarding the copyrightability and authorship of AI-generated work.
Can training AI Models constitute copyright infringement?
How are AI models trained?
AI models have existed for a longer term than we generally perceive them to be. For example, the ‘chatbots’ or the ‘you might like these’ recommendations on the different apps are all products of Artificial Intelligence. However, the AI models that exist now are pretty different in their functioning from the ones that already exist. These AI models work and learn through a process called deep learning. This is a form of machine learning inspired by how our brain functions. Large Language Models (LLMs) employ neural networks for their functioning that mimic the way biological neurons work together to identify phenomena. These neural networks consist of layers of nodes or interconnected points. The early layers recognise basic patterns, shapes, edges, or simple words. The later layers build on these simple patterns to understand complex structures-like faces in an image or full sentences in a paragraph. For example, if you train an AI to recognise cats, it will first identify basic features like fur, texture and ear shapes. As it moves through deeper layers, it starts recognising full cat images, distinguishing them from other animals.
To develop these early and later layers, AI models rely on vast amounts of data sourced from the internet. While AI primarily learns patterns from this data to generate outputs, there are instances where it may reproduce parts of its training data almost exactly as they originally appeared. This raises serious copyright concerns, as AI developers often use copyrighted works without permission from the original authors.
Whether usage of online available data to train AI Models result in copyright infringement?
Section 51 of the Copyright Act deals with copyright infringement. The section states that ‘when any person without a licence granted by the owner of the copyright or the Registrar of Copyrights under this act or in contravention of the licence granted does anything for which they do not possess the exclusive right, it shall amount to copyright infringement.’
A very common defence that is usually used when it comes to copyright infringement is the availability of the information in the ‘public domain.’ Many assume that anything publicly accessible or found on the internet automatically falls into this category. However, this is a misconception. In reality, ‘the public domain refers specifically to creative works that are not protected by intellectual property laws such as copyright, trademark or patent law and not simply content that is easy to find online.’ As mentioned earlier, AI developers train models using vast amounts of internet data, regardless of whether it is freely accessible or behind paywalls. Therefore, while this data might be readily available, it is not necessary that it is essentially in the public domain. As a result, there is always the chance of copyright infringement.
In recent years, AI developers have also become less transparent about the specific datasets used for training. For example, OpenAI openly disclosed the primary data sources used for GPT-3 but took a different approach with GPT-4. Instead of providing detailed information, they simply stated that the model was trained on a mix of ‘publicly available data’ and content licensed from third-party providers, leaving the exact nature of the data largely undisclosed. These developing companies usually adopt the practice of Text and Data Mining (TDM) for training their AI models.
Indian laws do not explicitly address the practice of Text and Data Mining (TDM), nor is there a specific provision regulating it. However, the judiciary has made attempts to tackle the issue. In a recent ruling, the Delhi High Court, in OLX BV & Ors. vs. Padawan Ltd., sided with OLX and issued a permanent restraining order against Padawan Ltd., prohibiting the company from using automated or manual methods to scrape any data, including commercial information from OLX’s website.
The issue, however, for ANI, much like the neural networks of AI, is multilayered. The Indian laws also provide the defence of fair use under Section 52 of the Copyrights Act when it comes to copyright infringement. The Indian courts have also recognised the crucial four-factor test, as developed by the American courts in the Campbell vs Acuff Rose Music case. ‘One of the factors of this four-factor test is the nature of the copyrighted work. The US Supreme Court, in the case of Sony Corp of America vs Universal City Studios Inc.,’ while commenting on the nature of the copyrighted material, stated that “copying a news broadcast may have a stronger claim to fair use than copying a motion picture. This is because copying from informational works encourages the free spread of ideas and encourages the creation of new scientific or educational works, all of which benefit the public.”
Indian law follows a similar principle wherein facts and information are seen as valuable resources that should remain accessible to everyone. The Delhi High Court, in Akuate Internet Services Pvt. Ltd. vs. Star India Pvt. Ltd., ruled that ‘even if a work is protected by copyright, the facts and information within it cannot be exclusively controlled.’ The court emphasized that restricting access to such information, even under claims of unfair competition, would limit the public’s right to share and use knowledge. ‘This, in turn, would go against the fundamental right to freedom of speech and expression under Article 19(1)(a) of the Indian Constitution.’ The court noted that information about current events isn’t something an author creates but rather a report on facts that are already public knowledge, essentially, the history of the day.
In the ongoing case of ANI vs OpenAI, the use of informational and factual data by AI developers exists in a legal grey area. On one hand, using someone’s work without a license could constitute copyright infringement, while on the other hand, the data in question consists of factual information which is meant for public benefit. This makes the issue at hand highly ambiguous.
Whether the usage of copyrighted data by AI developers to generate responses for the users would amount to copyright infringement?
While ‘the development of an AI system, including data mining, machine learning and training programmes, is normally supervised by the AI developers, they do not necessarily exercise the same control when it comes to output generation. The black box of AI is where developers of AI networks are unable to explain why their AI programs have produced a certain result.’ The proposition that makes this question even more complex when it comes to the present case is the terms of use provided by OpenAI. OpenAI states that “As between you and OpenAI, and to the extent permitted by applicable law, you (a) retain your ownership rights in Input and (b) own the output. We hereby assign to you all our rights, title, and interest, if any, in and to Output.”
A similar question was put forth before the French Competition Authority (FCA), wherein Google was accused of using articles from different media outlets to train its AI technology and generate responses for the users without informing the media outlets. FCA fined Google €250 million for the unfair practice. While the Indian courts may also take a similar stand in the present case, the legal precedent in the copyright law takes a different approach, i.e., assessment through the lens of fair use. Another limb of the four-factor test of fair use is ‘the transformative character of the use.’ In Chancellor Masters & Scholars of the University of Oxford v Narendra Publishing House, the Delhi HC held, “the subsequent work must be different in character. It must not be a mere substitute; in that, it is not sufficient that only superficial changes are made while the basic character remains the same to be called transformative.”
An important precedent in relation to this is the case of Authors Guild v. Google, wherein Google was accused of scanning and digitising excerpts from books, leading to the formulation of Google Books. The court ruled in favour of Google and held that the usage by Google was transformative as it did not just copy books but created a new, beneficial service that enhanced the public’s access. This doctrine was also relied upon by OpenAI in the case of NYT v. OpenAI. OpenAI emphasised the ability of its generative AI to transform the work of NYT into a new form which is not in any way substantially similar to the copies of the original work. It also provided the argumentation that any work which was an exact reproduction of NYT articles was a bug that OpenAI intended to resolve. Therefore, in the present case, if OpenAI is able to prove that the usage of copyright work is transformative in use, it might be able to escape liability.
The Fourth Limb of the Four-Factor Test
The Four-Factor Test, as the name suggests, includes four key elements, and the final one is “the effect of the use on the potential market or value of the copyrighted work.” This means that if someone uses a copyrighted work in a way that financially harms the copyright owner or puts their work at risk, the court will take this factor into account when deciding whether the use qualifies as fair use. In Blackwood and Sons Ltd v. AN Parasuraman, the Madras HC held that “the possibility of competition is all that is necessary for determining infringement of copyright.”
In this present case, ANI has accused ChatGPT of generating responses that closely resemble, or even directly copy, its original content. It is not only targeting the work that is publicly available but also the content behind paywalls which are available only through subscription. Such content is typically a major source of revenue for news agencies like ANI. ANI argues that this threatens its financial stability since subscription-based content is a key part of its business model. ANI also points out that ChatGPT is also producing responses that might be misleading or providing false information, thereby hampering the agency’s credibility and reputation.
News agencies and several different creative fields have always relied on such structured financial support, and if these structures break down due to the malpractices of AI in the name of fair use, the creators will have to suffer unprecedented losses. Parallels may be drawn from the case of Harper & Row v. Nation Enterprises, wherein the Nation magazine obtained an unpublished copy of former President Gerald Ford’s memoir and printed an article that included a 300-word passage taken directly from the book. Although this excerpt made up only 0.15% of the entire work, it focused on a key section. The Supreme Court ruled that this portion was central to the memoir’s purpose and that publishing it without permission likely impacted book sales. As a result, the Court decided that this use did not qualify as fair use.
Applying this reasoning to the present case, the usage by OpenAI of work produced by ANI, whether it is a verbatim copy or substantially similar, undermines its business model. If OpenAI generates responses based on ANI’s reports, even in small portions, they are effectively providing ANI’s content without requiring users to visit ANI’s platform or pay for access. This diminishes the incentive to subscribe, thereby impacting ANI’s financial viability. Given this clear financial harm and the impact on ANI’s market, OpenAI’s use may not qualify as fair use under copyright law.
Conclusion and Way Forward
The ANI vs OpenAI case has the potential to be a landmark decision for the Indian Judiciary, shaping the legal framework for AI-related copyright disputes in the country. The case also raises a critical question about the ability of India’s legal system to regulate artificial intelligence effectively. While the Indian government claims that the laws as present in India are capable enough to address the challenges posed by Artificial Intelligence, a lot of the examples state otherwise.
Similar to the present case, in January 2024, the Digital News Publishers Association (DNPA) urged the Central Government to amend the Information Technology Rules to ensure fair compensation for news publishers whose content is used to train generative AI models. The Union IT Minister responded to this by stating that legislation in this direction must be discussed. Legal experts have highlighted that India’s current legislative framework does not explicitly address AI training, leaving significant gaps in the regulation of AI-generated content.
The Indian legislature could look to the European Union for guidance in regulating artificial intelligence. In 2023, the EU introduced the world’s first comprehensive AI law, setting a global precedent for transparency and accountability in AI development. ‘The legislation mandates AI models to disclose AI-generated content, prevent the creation of illegal material, and publish summaries of copyrighted data used for training. Additionally, it grants companies the right to “opt out” of having their data used for AI training. This means copyright holders can explicitly prohibit AI developers from extracting and utilizing their text and data, offering stronger protections for intellectual property in the age of AI.’
AI developers can themselves advocate for a more cooperative and transparent ecosystem by adopting the practice of obtaining licenses and entering into agreements with different companies. A notable example is the Associated Press, which, in July 2023, signed a two-year deal with OpenAI, ‘permitting the use of select portions of its text archive dating back to 1985 for training ChatGPT. Similarly, in December 2023, Axel Springer entered an agreement with OpenAI, ensuring compensation for the use of its publications to refine ChatGPT’s responses and train AI models. Such collaborations allow companies greater control over their intellectual property and also enables AI developers to access high-quality data and train their AI models without infringing copyright laws.’
This brings us back to the question: who owns the news in the age of AI? The answer to this question is quite complicated and maybe two-pronged. While news itself cannot be subjected to copyright, the way it is presented along with the additional expressive content can be. A better question, therefore, to ask would be ‘what is the future of Journalism in an AI-Driven world?’ The answer to this depends upon the present case. This case, therefore, not only challenges OpenAI but also will determine the future of Journalism in India. The ANI vs OpenAI case is not merely a copyright dispute; it is a litmus test of how legal systems will adapt to the ever-evolving landscape of artificial intelligence.
Author- Sakshi Tiwari Co-author- Tushar Gaur, in case of any queries please contact/write back to us via email to chhavi@khuranaandkhurana.com or at Khurana & Khurana, Advocates and IP Attorney.
References
- Kaigeng Li, Hong Wu and Yupeng Dong, ‘Copyright protection during the training stage of generative AI: Industry-oriented U.S. law, rights-oriented EU law, and fair remuneration rights for generative AI training under the UN’s international governance regime for AI’ (Science Direct, November 2024) <https://www.sciencedirect.com/science/article/abs/pii/S0267364924001225#:~:text=Regarding%20whether%20using%20copyrighted%20works,does%20not%20infringe%20on%20copyright.>.
- Shivam Vikram Singh and Vanshika Mittal, ‘Training AI Models: Intersection Between AI and Copyright’ (Mondaq, 20 February 2025) <https://www.mondaq.com/india/copyright/1587932/training-ai-models-intersection-between-ai-and-copyright>.
- Audrey Pope, ‘NYT vs OpenAI: The Time’s About-Face’ (Harvard Law Review, 10 April 2024) <https://harvardlawreview.org/blog/2024/04/nyt-v-openai-the-timess-about-face/>.
- Lucy Rana and Shubham Raj, ‘Data Scraping and Legal Issues in India’ (Mondaq, 4 March 2020) <https://www.mondaq.com/india/copyright/900156/data-scraping-and-legal-issues-in-india>.
- ‘Related Rights: The Autorité fines Google €250 million’ (Autorité de la concurrence, 20 March 2024) <https://www.autoritedelaconcurrence.fr/en/article/related-rights-autorite-fines-google-eu250-million>.
- Anuj Jain Kumar, Gunjan Jadiya and Hriday Chokshi, ‘Text and Data Mining- Decoding Copyright Challenges in India’ (Veritas Legal) <https://www.veritaslegal.in/legal-update-text-and-data-mining-decoding-copyright-challenges-in-india/#:~:text=Indian%20copyright%20law%20does%20not,under%20the%20law%20are%20limited.>.
- Arpit Gupta, Aman Taneja and Nedhaa Chaudhari, ‘Legality of Data Scraping in India’ (Ikigai Law, 2 July 2020) <https://www.ikigailaw.com/article/263/legality-of-data-scraping-in-india>.
- Sneha Jain and Akshat Agrawal, ‘AI and Copyright: Legal Perspectives on Transformative and Extractive Uses of Copyrighted Works’ (Medianama, 2 July 2024) <https://www.medianama.com/2024/07/223-ai-copyright-legal-perspectives-transformative-extractive-uses-copyrighted-works/>.
- Aklovya Panwar, ‘Generative AI and Copyright Issues Globally: ANI Media v OpenAI’ (Tech Policy Press, 8 January 2025) <https://www.techpolicy.press/generative-ai-and-copyright-issues-globally-ani-media-v-openai/>.
- Mira T Sundara Rajan, ‘Is Generative AI Fair Use of Copyright Works? NYT v. OpenAI’ (Kluwer Copyright Blog, 29 February 2024) <https://copyrightblog.kluweriplaw.com/2024/02/29/is-generative-ai-fair-use-of-copyright-works-nyt-v-openai/>
- Vaishali Mittal, ‘ANI v OpenAI: A Copyright, AI training and false attribution dispute’ (Law Asia, 5 December 2024) <https://law.asia/ani-vs-openai-legal-case/>.
- Vallari Sanzgiri, ‘News Publishers Rally for Copyright Protection Against AI’s Rising Tide’ (Medianama, 8 February 2024) <https://www.medianama.com/2024/02/223-news-publishers-gen-ai-copyright-centre/>.
- Annapurna Roy, ‘Indian Publishers seek rules for Copyright Protection against Generative AI Models’ (Economic Times, 26 January 2024) <https://economictimes.indiatimes.com/tech/technology/indian-publishers-seek-rules-for-copyright-protection-against-generative-ai-models/articleshow/107154425.cms?from=mdr>.
- Wouter Van Wengen and Radboud Ribber, ‘EU AI Act’s Opt-Out Trend may Limit Data Use for Training AI Models’ (Greenberg Traurig, 3 July 2024) <https://www.gtlaw.com/en/insights/2024/7/eu-ai-acts-opt-out-trend-may-limit-data-use-for-training-ai-models>.
- Brian Porter and Edouard Machery, ‘AI-generated poetry is indistinguishable from human-written poetry and is rated more favorably’ (nature, 14 November 2024) <https://www.nature.com/articles/s41598-024-76900-1>.