In December 2024, the Centre for Information Policy Leadership (“CIPL”) at Hunton Andrews Kurth published a discussion paper titled, “Applying Data Protection Principles to Generative AI: Practical Approaches for Organizations and Regulators” (“Discussion Paper”).
The Discussion Paper considers the following privacy and data protection concepts and explores how they can be effectively applied to the development and deployment of generative AI (“genAI”) models and systems:
- fairness;
- collection limitation;
- purpose specification;
- use limitation;
- individual rights;
- transparency;
- organizational accountability; and
- cross-border data transfers.
CIPL presents the following recommendations, discussed in greater detail in the Discussion Paper, to organizations and regulators:
- To enable beneficial development and use of AI technologies in the modern information age, laws and regulatory guidance should facilitate lawful mechanisms for the use of personal data in model training. Lawmakers and regulators should avoid legal interpretations that are unduly restrictive regarding the use of personal data in AI model training, development and deployment.
- Different data privacy rules, considerations and mitigations apply in different phases of the AI lifecycle—data collection, model training, fine tuning and deployment. Regulators and organizations should interpret data protection principles separately in the context of each relevant phase of the AI technologies.
- Organizations should be able to rely on the “legitimate interests” legal basis for processing publicly available personal data collected through web scraping and personal data that they already have in their possession and control (first-party data) for genAI model training, as long as the interest concerned (which could be the controller’s, users’ or society’s at large) is not outweighed by the fundamental rights of individuals and appropriate, risk-based mitigation measures are put in place.
- Laws and regulatory guidance should be drafted or interpreted to recognize and enable the processing and retention of sensitive personal data for AI model training, as this is necessary to avoid algorithmic bias or discrimination and ensure content safety. In addition, sensitive personal data may be necessary for the training and development of certain AI systems whose sole purpose is based on the processing of sensitive personal data or to deliver benefits to protected categories of individuals (such as accessibility tools, or health systems).
- Developers should explore opportunities to employ privacy-enhancing and privacy-preserving technologies (“PETs/PPTs”), such as synthetic data and differential privacy. This would enable genAI models to have the rich datasets they need during training while reducing the risks associated with the use of personal data. Laws and regulatory guidance should encourage the use of and acknowledge the need for continued research and investment in PETs/PPTs.
- The fairness principle is useful in the genAI context and should be interpreted to facilitate personal data processing in genAI model development to train accurate and accessible models that do not unjustly discriminate. Considerations of fairness also need to take into account the impact on the individual or society of not developing a particular AI application.
- Data minimization should be understood contextually as limiting the collection and use of data that is necessary for the intended purpose (g., model training, model fine-tuning, or model deployment for a particular purpose). Data minimization should not stand in the way of enabling the collection and use of data that is necessary and appropriate for achieving a robust and high-quality genAI model. As such, this principle does not prohibit or conflict with the collection and use of large volumes of data.
- Training general-purpose AI models should be recognized as a legitimate and permissible purpose in itself, so long as appropriate accountability measures and safeguards are reasonably and sufficiently implemented.
- Purpose or use limitation principles should be sufficiently flexible. In the context of genAI, purpose limitation principles in laws and regulations should allow organizations to articulate data processing purposes that are sufficiently flexible for the range of potentially useful applications for which genAI models may be used. Furthermore, processing personal data for the development of a genAI model should be treated as a separate purpose from processing personal data for the development, deployment or improvement of a specific application that uses a genAI model.
- The responsibility to inform individuals about the use of their data should fall to the entity closest to the individual from whom the data is collected. Where data is not collected directly from individuals, organizations should be able to fulfil transparency requirements through public disclosures or other informational resources.
- Where appropriate and practicable, individuals should be able to request that their input prompts and model output responses not be included in genAI model fine-tuning, especially if such prompts include personal or sensitive data.
- Transparency in the context of genAI models should be contextually appropriate and meaningful, while also fulfilling transparency requirements under applicable laws and regulations. Transparency should not come at the expense of other important factors, such as usability, functionality, and security, or create additional burdens for users. Organizations should also consider transparency in the wider sense, beyond individuals and users—to regulators, auditors and red-team experts.
- Lawmakers and regulators should consult with developers and deployers of genAI systems to clarify the distinctions in duties and responsibilities across the phases of genAI development.
- Organizations developing and deploying genAI models and systems must invest in comprehensive and risk-based AI and data privacy programs, continually improving and evolving their controls and best practices. Lawmakers and regulators should encourage and reward organizational accountability in development and deployment of AI, including the existence and demonstration of AI and data privacy management programs.