Perplexity AI Before the Courts: A Comparative Analysis of Recent Copyright Litigation in the United States and Italy

Dec 11, 2025
12 min read

Updated: Dec 12, 2025

In the final months of 2025, Perplexity AI has become one of the most prominent defendants in the emerging field of artificial‑intelligence and copyright litigation. In just a few weeks, a series of actions has been filed in different fora against the company behind the Perplexity “answer engine” and its Comet browser. Encyclopaedia and dictionary publishers, newspaper groups, a social‑media platform and an Italian broadcasting and film conglomerate have all chosen to frame their disputes with Perplexity through the language of copyright, technological protection measures and, in some instances, trademark law.

Although these actions arise from diverse sectors and jurisdictions, they share a central narrative: Perplexity is alleged to have built and operated its services by relying, on a large scale and without adequate authorisation, on high‑value protected content produced and monetised by others. At the same time, the cases differ in their factual focus (training versus output), in the specific statutory grounds invoked, and in the economic interests each claimant seeks to defend. What follows is a detailed, case‑by‑case account of these proceedings, followed by a more synthetic comparison of their convergences and divergences.

Encyclopaedia Britannica and Merriam‑Webster v. Perplexity AI (Southern District of New York)

The first major action in this sequence was brought in September 2025 by Encyclopaedia Britannica, Inc. and Merriam‑Webster, Inc. before the United States District Court for the Southern District of New York. The dispute centres on Perplexity’s use of reference material: encyclopaedic entries and dictionary definitions which the plaintiffs present as the product of long‑term, human‑verified editorial work.

According to the complaint and accompanying public statements, Perplexity’s answer engine is said to reproduce, in response to user queries, content taken from Britannica and Merriam‑Webster “verbatim or near‑verbatim”. In the plaintiffs’ description, these responses are not mere high‑level summaries but extensive passages that closely track the wording of the original entries. The system is alleged to deliver this content without authorisation, without directing users back to the publishers’ websites and therefore without preserving the traffic and advertising revenue associated with those sites.

This factual description is coupled with a trademark dimension. In the eyes of Britannica and Merriam‑Webster, Perplexity does not simply reuse their text; it also capitalises on their brands. The complaint alleges that the interface presents AI‑generated material alongside the names and marks of the two publishers in a manner capable of suggesting that the text originates from Britannica or Merriam‑Webster, or is endorsed by them. The allegation is that users are likely to understand such responses as “Britannica’s answer” or “Merriam‑Webster’s definition”, when in fact they are outputs of Perplexity’s generative system which may include inaccuracies or hallucinations.

On this basis, the action combines traditional copyright claims with causes of action under the Lanham Act (U.S. trademark protection statute). On the copyright side, the plaintiffs invoke unauthorised reproduction and public display of their reference works, and ask for statutory and actual damages for the copying which they allege to be systemic, not occasional. On the trademark side, they plead infringement and false designation of origin, arguing that Perplexity’s presentation of outputs under their marks creates confusion as to source and risks diluting the reputation for accuracy and reliability that these brands have built. Injunctive relief is sought both to stop further use of their content and to restrain Perplexity from presenting any material as if it came from the plaintiffs’ publications.

From a procedural standpoint, the case remains at a very early stage. The complaint was filed in early September 2025 and Perplexity has been called to respond, but no decision on the merits has yet been rendered. Public statements by Encyclopaedia Britannica’s leadership, including its chief executive, frame the lawsuit as a necessary step to protect its “trusted and human‑verified” data from unauthorised exploitation by AI services.

Reddit v. Perplexity AI and Scraping Intermediaries (Southern District of New York)

A different configuration emerges in the action filed in October 2025 by Reddit, Inc., again in the Southern District of New York, this time not only against Perplexity AI, Inc. but also against three companies that provide web‑scraping and proxy services (SerpApi LLC, Oxylabs UAB and AWMProxy). This case shifts attention away from curated reference texts and towards user‑generated content and the infrastructure of automated data collection.

Reddit’s complaint presents Perplexity as the ultimate beneficiary of what it calls “industrial‑scale” scraping of posts and comments from its platform. The factual narrative is that the scraping is carried out largely through Google search results and caches: by querying Google and collecting its results and cached pages, the defendants allegedly gain access to Reddit content without respecting Reddit’s own rate limits, bot‑detection systems and access conditions. The result, in Reddit’s account, is the construction of a very large training and retrieval corpus composed of Reddit users’ contributions, obtained outside any licensing agreement.

Particularly notable in this pleading is the emphasis on technological measures and their circumvention. Reddit claims that its anti‑scraping protections, together with Google’s, qualify as “effective technological measures” in the sense of section 1201 of the Digital Millennium Copyright Act. Scraping via search results and caches is portrayed as a way to bypass these measures: instead of accessing Reddit directly under its terms, the defendants would be exploiting an indirect route through Google’s indexing infrastructure.

In doctrinal terms, this gives rise to a set of anti‑circumvention claims under the US Digital Millennium Copyright Act, which sit alongside causes of action based on breach of contract (violation of Reddit’s terms of use and robots.txt policies) and unjust enrichment. The claim is that Perplexity, as customer of the scraping providers, enjoys the economic benefits of these practices: it can offer answers and summaries shaped by community discussions and niche expertise, without paying the platform that hosts those discussions. The remedies sought include damages and, crucially, injunctive orders aimed at preventing further scraping of Reddit content and further use of material obtained through the alleged circumvention.

Reddit also uses the complaint to situate this lawsuit within a broader strategy. The company has already entered into licensing agreements with some AI developers, and presents litigation as a mechanism to enforce the difference between licensed and unlicensed use. In public commentary around the action, the practice of large‑scale scraping for AI is compared to a form of “data laundering” in which valuable user‑generated content is extracted and transformed into training data with little regard for the originating communities.

Newspaper Actions: Chicago Tribune and The New York Times in New York

The early days of December 2025 saw two relevant actions by newspaper publishers, again in New York. Both pick up themes already visible in the Britannica and Reddit cases, but adapt them to the particularities of contemporary journalism and its business models, notably subscription‑based access and paywalled archives.

The first of these actions involves Tribune Publishing and MediaNews Group, which act, among others, on behalf of the Chicago Tribune. Their complaint, filed in a federal court in New York and reported as assigned to the Southern District, describes a pattern of conduct centred on the use of Chicago Tribune articles, including those available only to paying subscribers. According to the publishers, Perplexity’s answer engine and its Comet browser systematically ingest Tribune content, whether directly or through intermediaries, and then use it as retrieval‑augmented generation (RAG) material.

The alleged effect is that users can obtain detailed summaries and substantial portions of Tribune articles simply by querying Perplexity, without visiting the newspaper’s website. From the plaintiffs’ perspective, this effectively neutralises the economic role of the paywall: reporting that should be accessible only behind a subscription barrier becomes, in functional terms, accessible via the AI interface. The complaint reportedly includes examples of responses that mirror the wording of Tribune pieces closely enough to be considered near‑verbatim reproductions.

An additional element in this action concerns the pre-litigation exchanges between the parties. According to the complaint, Tribune’s lawyers contacted Perplexity in mid-October to ask whether its models were using Tribune content. Perplexity’s legal representatives allegedly replied that the company did not train its models with the Tribune’s work, but that its systems “may receive non-verbatim factual summaries” of that reporting. The complaint then seeks to show, by way of concrete examples, that the outputs delivered to users go significantly beyond such non-verbatim summaries and amount to verbatim or near-verbatim reproductions. Legally, the action is framed primarily as a copyright infringement case, focused on unauthorised reproduction and public display, with arguments about harmful substitution of the original service.

The lawsuit filed the next day by The New York Times Company combines similar worries with an even broader description of the alleged copying. The Times contends that Perplexity has copied, distributed and displayed millions of its articles without permission, and that this has occurred both at the training stage and at the stage of generating answers for users. In addition, the complaint emphasises the paywalled nature of much NYT content: scraping and reutilising that material is presented as a direct attack on the business model which supports the reporting.

The New York Times’ pleading is also notable for its treatment of trademarks and reputational harm. It does not merely complain that Perplexity reproduces its articles; it also alleges that Perplexity’s systems produce fabricated or inaccurate statements and present them next to the New York Times name and logo. In other words, hallucinated outputs are visually coupled with one of the most recognisable brands in news publishing. On this basis, the complaint asserts not only copyright infringement but also claims under the Lanham Act for trademark infringement and false designation of origin: the concern is that users may think they are reading NYT journalism when in reality they are reading AI‑generated content which may be erroneous.

Public reporting indicates that this lawsuit did not come out of the blue. The Times had previously sent a cease‑and‑desist letter to Perplexity in 2024, objecting to the unlicensed use of its content. According to the complaint, Perplexity did not modify its practices in response, which reinforces the characterisation of the alleged conduct as knowing and systematic rather than inadvertent. As in the other New York actions, the Times seeks both damages and an injunction requiring Perplexity to stop using its content and to remove such material from its systems.

Perplexity’s public reaction to these newspaper suits has been combative. Its representatives have compared the actions brought by publishers against AI firms to earlier waves of litigation against radio, television, the internet and social media, suggesting that attempts to block new technologies through lawsuits have historically failed and will continue to do so. The message is that the company views these claims as part of a familiar pattern, rather than as an existential legal threat.

RTI and Medusa v. Perplexity AI Before the Courts of Rome

While the New York proceedings unfold within a US statutory and doctrinal framework, the claim filed in Rome by RTI – Reti Televisive Italiane S.p.A. – and Medusa Film S.p.A. opens a European front. Both RTI and Medusa belong to the Mediaset group, and together they control a very large catalogue of television programmes and cinematographic works. Their lawsuit, announced publicly in early December 2025, is presented as the first Italian action specifically targeting alleged copyright violations committed through the training of generative AI systems.

From the information made public through Mediaset’s statements and press coverage, the factual core of the action is that Perplexity has used, without permission and on a large scale, audiovisual and cinematographic works from RTI’s and Medusa’s catalogues to train its generative AI systems. The plaintiffs argue that these works have been ingested and processed to improve the quality and breadth of Perplexity’s services, without any form of licence or compensation. Public commentary often links these allegations to Perplexity’s “Sonar” family of models, which the company presents as its main answer-engine technology, but the publicly available materials on the case do not themselves single out any specific model name as the object of the claim.

The focus here is very clearly on the training phase, not on specific output that reproduces scenes or dialogues from films or programmes. In this respect, the Italian action differs from the newspaper cases which rely heavily on examples of near‑verbatim reproduction in user‑facing answers. RTI and Medusa present the mere act of using their broadcasts and films as training data, in the absence of authorisation, as a violation of their exclusive rights under Italian law.

Doctrinally, the case rests on the structure of Italian copyright law as harmonised by EU directives, including the DSM Directive. RTI and Medusa invoke their rights as producers and broadcasters over the audiovisual works in question and portray AI training as a form of exploitation that cannot be justified by any exception. In the European context, the question inevitably arises whether and how the text‑and‑data‑mining exceptions introduced by the DSM Directive might apply. Commentary suggests that the claimants are likely to argue either that they have effectively opted out of any such exception by appropriate machine‑readable reservations, or that training of this breadth and scale falls outside the scope of what those provisions were designed to permit.

The remedies requested, as described in public sources, combine several elements. RTI and Medusa seek a judicial declaration that Perplexity’s conduct is unlawful and a corresponding order requiring the company to cease any further use of their content for both training and operation of its services. They also claim damages for the economic harm suffered and, according to some reports, ask the court to impose a recurring financial penalty for continued violations after the judgment, thereby strengthening the deterrent effect of the order. Mediaset has framed the action as a test case for the Italian market and, more generally, as a signal that European audiovisual groups do not intend to tolerate unlicensed use of their catalogues as fuel for AI.

At the time of writing, the case before the Tribunale civile di Roma is still at an early procedural stage. The ricorso has been filed and entered on the court’s docket, and publicly available sources focus primarily on the filing and on the remedies requested. No interim measures or judicial assessments on the merits have yet been reported.

Common Themes Across the Perplexity Litigation Wave

If one steps back from the details of each individual case, several common threads become visible. The first and most obvious is the narrative of “free riding”. Each claimant, in its own language, presents Perplexity as a service that extracts value from costly inputs without bearing the corresponding costs of production or licensing. Encyclopaedia Britannica and Merriam‑Webster stress the labour‑intensive nature of their reference works, which Perplexity is said to appropriate to make its answers more authoritative. Newspaper publishers underline how fragile subscription and advertising models become when paywalled reporting can be reconstructed through AI interfaces that sit outside the paywall altogether. Reddit insists that community discussions and user contributions are not just ambient internet chatter but assets that it has begun to license to some AI developers on an exclusive or semi‑exclusive basis. RTI and Medusa point to the capital‑intensive nature of audiovisual production and broadcasting, stressing that their catalogues are the result of substantial investment and ongoing rights management.

In this sense, copyright functions not only as the substantive legal basis for the claims but also as a bargaining tool. Litigation is being used to push AI developers towards licence‑based relationships. The implicit message is that generative AI systems can draw on high‑quality content only if the owners of that content are brought into the value chain, either through direct licences or through broader industry agreements.

A second theme concerns the relationship between scraping, paywalls and technological protection measures. Reddit’s lawsuit foregrounds this issue most explicitly by invoking the anti‑circumvention provisions of the DMCA and arguing that the defendants have designed their reliance on Google search and cache infrastructure precisely to avoid the constraints of Reddit’s own access controls. The Chicago Tribune and New York Times actions, while framed primarily in terms of copyright infringement, also revolve around the idea that paywalls and subscription mechanisms are being rendered ineffective: detailed reporting that should be accessible only to subscribers becomes, in practice, accessible via Perplexity’s products.

In the Italian context, the issue of “technological measures” takes a different doctrinal shape. Here, the focus is not on DMCA anti‑circumvention but on the architecture of the DSM Directive’s text‑and‑data‑mining provisions, which require rightholders to express any opt‑out in machine‑readable form. RTI and Medusa’s case is thus read, in commentary, as a test of how those provisions will be interpreted when the mining at stake is not a one‑off academic project but the continuous training of commercial AI models on large audiovisual archives.

A third recurring element is the concern with trademarks, false attribution and reputational harm. Britannica and Merriam‑Webster argue that the presentation of AI‑generated responses under their marks threatens to erode the trust they have built as authoritative sources. The New York Times similarly contends that hallucinated statements displayed next to its name and logo may mislead users and undermine its reputation for accuracy. In these instances, trademark and unfair‑competition law are used to capture a dimension of harm that pure copyright doctrine does not fully address: the risk that AI systems, by blending scraped content with their own generated text, will blur the distinction between vetted journalism or reference material and unverified machine output.

Despite these commonalities, there are also important differences between the cases, particularly with respect to the stage of the AI lifecycle on which they focus. Some actions, notably those brought by Britannica, the Chicago Tribune and the New York Times, concentrate heavily on the output stage. They compile examples of responses and pages where Perplexity appears to reproduce substantial portions of the claimants’ texts, treating the visibility of those outputs as the primary locus of infringement. Others, such as the Reddit and RTI/Medusa cases, direct their attention primarily to the training and data‑acquisition stage, suggesting that the core legal and economic problem lies in the construction of the training corpus itself, regardless of whether any given user‑facing output reproduces a specific underlying work.

The statutory frameworks invoked also vary. In the United States, the litigation unfolds at the intersection of the Copyright Act, the DMCA and the Lanham Act, supplemented by state‑law doctrines of unfair competition and unjust enrichment. In Italy, the relevant instruments are national copyright rules as harmonised by EU directives, the DSM Directive’s text‑and‑data‑mining provisions and the general rules on unfair competition in the Italian Civil Code. These differences will shape not only the arguments available to the parties but also the types of remedies realistically obtainable, especially in relation to training datasets that may already be deeply interwoven into model parameters.

Finally, the variety of plaintiffs involved in the Perplexity litigation – from reference publishers to newspapers, from a social‑media platform to audiovisual producers – illustrates the breadth of sectors that now perceive their catalogues as “AI‑relevant” and in need of active legal protection. Each of these actors is seeking, through litigation, to assert control over whether and how their content can be used as input for generative systems and retrieval‑augmented engines. The cluster of proceedings against Perplexity offers a particularly concentrated view of this broader phenomenon, in which courts are being asked to delineate, within existing legal frameworks, the boundaries of permissible data use for AI development and deployment.

Perplexity AI Before the Courts: A Comparative Analysis of Recent Copyright Litigation in the United States and Italy

Encyclopaedia Britannica and Merriam‑Webster v. Perplexity AI (Southern District of New York)

Reddit v. Perplexity AI and Scraping Intermediaries (Southern District of New York)

Newspaper Actions: Chicago Tribune and The New York Times in New York

RTI and Medusa v. Perplexity AI Before the Courts of Rome

Common Themes Across the Perplexity Litigation Wave

Recent Posts

Comments