Japan Issues Clarification on AI, Copyright, and Machine Learning

Scott A. Warren

Email

813-5774-1800

Bio and Articles

Joseph Grasser

Email

415-954-0243

Bio and Articles

Find Your Next Job !

Experienced Family Law Attorney

Specialist: Legal Information Center Research

LEGAL ASSISTANT II

Explore More Job Openings

Japan’s New Draft Guidelines on AI and Copyright: Is It Really OK to Train AI Using Pirated Materials?

by: Scott A. Warren, Joseph Grasser of Squire Patton Boggs (US) LLP - Privacy World

Tuesday, March 12, 2024

Print Mail Download info_icon_img

/>i

On January 23, 2024, the Japan Agency for Cultural Affairs (ACA) released its draft “Approach to AI and Copyright” for public comment, to clarify how ingestion and output of copyrighted materials in Japan should be considered. On February 29, 2024, after considering nearly 25,000 comments, additional changes were made. This document, created by an ACA committee, will likely be adopted by the ACA in the next few weeks. This article provides a summary of the key points of the draft itself and as modified.

By way of background, on January 1, 2019, Japan’s revised Copyright Act came into effect, making Japan one of the world’s most AI-friendly countries. Article 30-4 was implemented, allowing broad rights to ingest and use copyrighted works for any type of information analysis, including for the purpose of training AI models.

Unlike the UK and the EU, which allow the ingestion of copyrighted works only for non-commercial purposes, Japan allows it also for commercial use, purposes other than production and apparently including the ingestion of illegally obtained content, such as pirated copyright material. According to reports, in a committee meeting, Japan’s Minister of Education, Culture, Sports, Science and Technology, Keiko Nagoaka indicated AI companies in Japan can use “whatever they want” for AI training “regardless of whether it is for non-profit or commercial purposes, whether it is an act other than reproduction, or whether it is content obtained from illegal sites or otherwise.” This position led to Japan being called a “machine learning paradise.” The only exceptions imposed under the law are when such ingestion was for the “enjoyment” of the thoughts or sentiments expressed in a work in a way that “unjustly harms” the interests of the copyright holder. The law, however, provides little detail as to how this exception was to be interpreted by courts.

Why Japan Passed the 2019 Provisions and Why the Clarification Now

So, why would Japan, generally a legally conservative country, implement the 2019 provisions, which allow for far broader uses of copyrighted materials in “information analysis” than the provisions of many other countries? Some suggest it is because AI is generally seen in Japan as a potential solution to a swiftly aging population. In addition, there are currently no real local Japanese AI providers, so to quickly develop AI capabilities, Japan implemented a flexible AI approach. Furthermore, Japan has long been a bastion for robot development, and AI can be a critical piece to improving their functionality.

However, especially given the dramatic improvement in the output of new generative AI tools like ChatGPT, there has been increasing pushback by Japan-based content creators, such as the developers of manga (comics) and anime (animation), movies/TV shows and music, as well as a broad range of international content creators, who voiced concern over the lack of protection they were receiving under the law. For this reason, the ACA assembled a committee to address the concerns, clarify the limitations of the law and highlight the areas that yet need to be determined. After six months of study, they released the January 23 draft “Approach to AI and Copyright Report,” which is now expected to be formalized in the next few weeks.

What Is New in the “Approach to AI and Copyright”?

After an exhaustive review of the rationale for implementing Article 30-4 and summarizing the concerns of AI developers, users and copyright holders, the ACA summarizes its position over the main issues.

Use Over the Learning/Development Stage

The committee essentially embraced Article 30-4 allowing the ingestion and analysis of copyrighted materials for AI learning to promote creative innovations in AI. It removes the need of acquiring consent from copyright holders, as long as it would not have a “material impact on the relevant markets” and that the AI usage does not “violate the interests of the copyright holders.”

The Enjoyment Prong

In teasing out this apparent dichotomy, the committee first focuses on distinguishing between digesting copyrighted content for “information analysis” (which is allowed) versus the use for a “purpose of enjoyment of the thoughts and or sentiments expressed in the copyrighted work” (which is not allowed). The committee’s report states that “enjoyment” consists of satisfying the intellectual and mental needs of the individual through the viewing/experiencing the works. By contrast, information analysis might include a scholarly assessment of movie themes across various genres, while “enjoyment” would likely include the ability to play and view all or a significant part of the individual movie.

Thus, the legality of the ingestion of copyrighted material essentially appears to be tied to its intended generative AI output. Thus, if the AI creator uses the copyrights material solely for training the data base and doing pure data analytics, Article 30-4 allows such use without consent of the copyright owner. However, where the AI creator has a joint purpose of non-enjoyment (such as the above pure data analysis) and enjoyment (such as outputting creative expressions of creative works), such use without the consent of the copyright holder is not allowed under Article 30-4.

Accordingly, the focus is that ingestion of copyrighted material is prohibited if the intention is to output products that can be perceived as creative expressions of copyrighted works, including mimicking the style of specific creators. The committee notes that an AI service provider could also help remove this concern by taking “practical and technological measures” to prevent the generation of infringing works.

The committee notes the fact that a single or infrequent generative AI output violates copyright is not determinative that its intended use is for enjoyment but concedes that frequent violations might be. However, it also states that if such frequent cases are due to the AI user’s input/instructions requesting such material, it cannot be presumed the AI service provider intended a use for enjoyment.

The Unjust Harm Prong

The committee then analyzed what it means to unjustly harm the interest of the copyright holder, finding this occurs when the AI output “conflicts with the copyright holder’s market or inhibits potential marketing channels for such copyrighted works in the future”. The committee explored various aspects of databases used for data analysis, generally finding that such use is allowed under Japan’s Copyright Act, but noted there may be exceptions, such as where the database owner applies technical protection measures to prohibit AI crawlers to ingest their work. Such exceptions would need to be determined on a case-by-case basis.

Besides encouraging copyright owners to implement technical protection measures, the committee noted some newspaper publishers and other copyright owners are licensing their content for AI ingestion. Such steps can help resolve issues of copyright infringement and show the copyright holder has a market for such copyrighted works, which is an important element of establishing the “unjust harm” element. For example, if other AI developers/service providers fail to take a license when one is available, it is easy to prove the direct impact on the copyright holder. The committee notes that the merits of any copyright violation claim will need to be determined on a case-by-case basis.

As for AI training from pirated copyright content, the committee noted the difficulty of the AI developer/service provider knowing whether the content its AI crawlers ingest is pirated or legitimate. In the end, the committee found that if the AI developer/service provider knowingly (or should have known) that it had ingested pirated/infringing materials, it is one factor to consider them a contributory infringer, increasing the likelihood of liability. If the AI provider knowingly/should have known that it ingested pirated materials, it should take steps to prohibit infringing copyright output, which could assist in defeating a claim for contributory infringement.

The committee further encourages the ACA to continue to engage in countermeasures against piracy, perhaps without recognizing that many of these pirate sites are abroad, making it more difficult to prohibit access to the sites in Japan. In the revisions made in response to the public comments, the committee adds, “it is desirable to realize a state in which piracy is not encouraged, such as by enabling right holders to provide information on known websites carrying pirated copies to these operators and other related parties in advance to an appropriate extent, so that operators can recognize websites carrying pirated copies and exclude these websites from the collection of learning data.”

The committee notes the removal of a copyrighted work from the AI database may be ordered where there is a “high probability” that an infringing reproduction will occur in the future. However, damage claims would only be allowed where there is willful intent or negligence. Further, criminal liability should only be considered where there is willful intent.

Use in the Generation/Utilization Stages

Turning to the output phase, in order to determine whether a copyright infringement occurs, it is necessary to prove both “similarity” and a “reliance” on an existing copyrighted work. “Similarity” may be found even if only parts of the work are essential features of the output. “Reliance” is shown when the creator of the new work is aware of the prior work.

Accordingly, when the AI user requests the AI to generate a specific copyrighted work, such as referencing to it in a request prompt, reliance is established, and the AI user may be found to be the infringer. However, if the AI user is not aware of an existing copyrighted work, and an infringing output is generated, then the AI developer/service provider may be found a contributory infringer, especially if there are frequent infringements by the tool. To avoid this, AI developers/service providers are encouraged to implement technical measures to ensure that ingested copyright content is not generated. They also suggest there be measures to prevent AI user’s from requesting infringing material.

The Copyrightability of the AI Output

Under Japan’s Copyright Act, Article 2(1), a copyright-protected work is defined as a creation expressing “human thoughts and emotions.” Thus, it appears difficult for AI to become the author of its own creations under the current law. However, the committee explored that a joint work, with human input and AI generated content, may be eligible for copyright protection as a whole, based on certain factors. These include:

The amount and content of the instructions and input prompts by the AI user
The number of generation attempts, modifying the output for a desired result by the AI user
The AI user selecting the work from multiple generated works
The subsequent human modifications to the AI generated work

Differences from US, UK and EU Laws on AI Training

The US has no specific law regulating the use of copyrighted material for training of AI models, either for commercial or non-commercial purposes. Rather, in the US, the issue is currently being litigated in a number of lawsuits that pit content creators and copyright holders against the creators and operators of generative AI tools. While there are a number of novel issues raised in the various cases, the most critical issues will be determining (1) whether the collection and use of copyrighted materials to develop the generative AI tools was a “fair use” under the US Copyright Act, and (2) whether the fact that certain outputs, generated by third-party input, are (a) substantially similar to a registered copyright such that it would be a copyright infringement and (b) that the AI creator/operator is liable even though it did not input the information that generated the purportedly infringing output.

In both the EU and the UK, the laws allow for the harvesting/scraping of data for purposes of training an AI model only for noncommercial use, in general. Some jurisdictions recognize the right to use it for commercial purposes in very specific instances. For example, the German Copyright Act (Sections 44b and 60d) allows the reproduction of lawfully accessible works in order to carry out text and data mining even in commercial situations in the absence of a reservation of such use by the right owner or for scientific research purposes. My colleagues Sandra Mueller and Julia Jacobson (with hyperlinks to their profiles) have noted a recent case in the Hamburg District Court by photographer Robert Kneschke against LAION E.V. regarding the use of his photograph in training AI image generators and claiming copyright infringement. According to the photographer’s legal representatives, one of the main claims the defendant relies upon relates to the above provisions. The oral hearing in this case is reportedly scheduled for 25 April 2024.

In Singapore, as reported by our colleague there, Charmian Aw, Singapore has released its proposed Model AI Governance for Generative AI guidelines identifying data as a core element of model development, significantly impacting the quality of the model output. This makes it necessary to ensure data quality, such as through the use of trusted data sources. Where the use of data for model training is potentially contentious, such as personal data and copyright material, they state it is “important to give business clarity, ensure fair treatment, and to do so in a pragmatic way.”

As to the copyright issues, the Singapore guidelines suggest doing so through (1) creating an open dialogue with all stakeholders, (2) encouraging AI developers to “undertake data quality control measures” using “data analysis tools to facilitate data cleaning,” (3) more globally expanding the available pool of trusted data sets and (4) governments “working with their local communities to curate a repository of representative training data sets for their specific context (e.g. in low resource languages).”

Conclusions

The ACA committee concludes that the relationship between AI and copyright will need to be determined on a case-by-case basis, including precedents and judicial decisions against the backdrop of the unpredictable and fast-moving development of technology, and the progress of studies in other countries. For this report, the committee focused on the exceptions for acquiring consent of the copyright owner under Article 30-4 of the Copyright Act. In the future, the committee indicates that it will need to consider the impact of an author’s moral rights and neighboring rights that are personal to the author and protected under Japanese law.

What can be gleaned by this detailed study, is that Japan wants to aggressively support the development of AI tools and the market for AI solutions by allowing broad rights to ingest copyrighted works. However, it is struggling to do so in a way that properly rewards content-holders for their creations. As AI tools continue to improve, this will certainly to be an interesting effort to watch, especially against the backdrop of the various approaches to the issue by the US, the EU, the UK and other countries across the globe.