Data typically is needed to train and fine-tune modern artificial intelligence models. AI can use data – including personal information – to recognize patterns and predict results.
Companies that utilize personal information to train an AI may either be acting as a controller or a processor depending on the degree of discretion that they exercise in deciding how the AI will function, the type of personal information that will be used to train the AI, how the AI will be allowed to process the training data, and the conditions by which the AI will be allowed to retain or share the training data.
If a company is considered a controller, it must satisfy the following requirements under the GDPR with respect to training data:
GDPR Requirement |
GDPR Citation |
Impact on Controller’s Use of Training Data |
1. Lawful basis of processing |
Art. 6 |
Controllers are required to identify one of six lawful purposes of processing.[1] Some supervisory authorities have suggested that if a company uses publicly sourced data to train an AI (e.g., data scraped from the internet), the only plausible lawful purposes would be either (1) the consent of the individuals whose personal information is being provided or (2) the legitimate interest of the controller.[2] |
2. Record of processing activities |
Art. 30(1) |
Controllers are required to record within their records of processing activities, among other things, the type of personal information that was used to train an AI, the individuals about whom the personal information related, the purpose for which the data was utilized, and any restrictions imposed upon the AI’s use or retention of such data. |
3. Data minimization |
Art. 5(1)(c), (e) |
Controllers are required to minimize the extent to which personal information is utilized, and the duration in which it is kept in identifiable form. In the context of training an AI, the controller should consider how to minimize the type and amount of data provided to train the AI, as well as the length of time to which the AI will have access to such data. |
4. Privacy notice |
Art. 12 – 14 |
Controllers are required to provide individuals with information relating to how personal information is processed.[3] Some supervisory authorities have specifically taken the position that companies that use personal information to train an AI must draft and publish a privacy notice that provides “data subjects whose data have been collected and processed for the purposes of training algorithms . . . with information on how the processing is carried out, the logic underlying the processing . . . , [and] the rights to which they are entitled.”[4] If a controller is utilizing publicly sourced data (e.g., data scraped from the internet) some supervisory authorities have suggested that it may be appropriate for controllers to inform the public via mass media (e.g., radio, television, newspapers) about the scraping and how they can find the company’s privacy notice.[5] |
5. Access rights |
Art. 15 |
Controllers are required to permit individuals to access any personal information held about them. In the context of training an AI, a controller should be prepared to respond to an individual’s request for access to the personal information about them that may have been involved in the AI training. |
6. Correction rights |
Art. 16 |
Controllers are required to permit individuals to request that inaccurate information be corrected. In the context of training an AI, some supervisory authorities have taken the position that companies that use publicly sourced data (e.g., data scraped from the internet) should create an online tool “by which to request and obtain rectification of any personal data relating to them” both in the context of data used to train an AI and any data created by the AI. [6] |
7. Erasure rights |
Art. 17 |
Controllers are required to permit individuals to request that personal information about them be deleted if the processing is no longer necessary in relation to the purposes for which it was collected. In the context of training an AI, if a controller receives a deletion request it should consider whether personal information from the requester can be deleted from the training set.[7] |
8. Right to withdraw consent/object |
Art. 7(3), 21 |
If a controller has based their use of training data on the consent of individuals, the GDPR requires that they provide individuals the ability to withdraw consent. Similarly, if a controller has based their use of training data on the controller’s legitimate interest, the GDPR requires that the controller provide an ability for users to object to the continued use of their data.[8] |
9. Data protection impact assessments |
Art. 35 |
The GDPR requires that controllers conduct data protection impact assessments if they are using new technologies that are “likely to result in a high risk” to individuals. As a result, a controller should consider whether it is appropriate to conduct a DPIA in connection with using personal information to train an AI. |
10. Cross-border data transfers |
Art. 44-50 |
To the extent that personal information will be sent to an AI that is hosted outside of the European Economic Area, a controller may need to take steps to ensure that such data is adequately protected in the jurisdiction to which it is sent. |
11. Vendor management |
Art. 28 |
To the extent that a controller will rely on a third party to host personal information used to train an AI (e.g., a third-party hosted AI product), the GDPR may require that the third party agree to specific contract provisions required of processors. |
[1] EDPB-EDPS Joint Opinion 5/2021 on the proposal for a Regulation of the European Parliament and of the Council laying down harmonized rules on artificial intelligence (Artificial Intelligence Act) at para. 60 (June 18, 2021).
[2] Garante Per La Protezione Dei Dati Personali, Provision of April 11, 2023[9874702] (English translation).
[3] EDPB-EDPS Joint Opinion 5/2021 on the proposal for a Regulation of the European Parliament and of the Council laying down harmonized rules on artificial intelligence (Artificial Intelligence Act) at para. 60 (June 18, 2021) (stating that data subjects should be informed when their data is used for AI training).
[4] Garante Per La Protezione Dei Dati Personali, Provision of April 11, 2023[9874702] (English translation).
[5] Garante Per La Protezione Dei Dati Personali, Provision of April 11, 2023[9874702] (English translation).
[6] Garante Per La Protezione Dei Dati Personali, Provision of April 11, 2023[9874702] (English translation).
[7] See EDPB-EDPS Joint Opinion 5/2021 on the proposal for a Regulation of the European Parliament and of the Council laying down harmonized rules on artificial intelligence (Artificial Intelligence Act) at para. 60 (June 18, 2021) (stating that data subjects have a right to deletion/erasure in connection with their personal data being used to train an AI).
[8] See EDPB-EDPS Joint Opinion 5/2021 on the proposal for a Regulation of the European Parliament and of the Council laying down harmonized rules on artificial intelligence (Artificial Intelligence Act) at para. 60 (June 18, 2021) (stating that data subjects have a right to restriction in connection with their personal data being used to train an AI).