Many companies are sitting on a trove of customer data and are realizing that this data can be valuable to train AI models. However, what some companies have not thought through, is whether they can actually use that data for this purpose. Sometimes this data is collected over many years, often long before a company thought to use it for training AI. The potential problem is that the privacy policies in effect when the data was collected may not have considered this use. The use of customer data in a manner that exceeds or otherwise is not permitted by the privacy policy in effect at the time the data was collected could be problematic. This has led to class action lawsuits and/or enforcement by the FTC. In some cases, the FTC has imposed a penalty known as “algorithmic disgorgement” to companies that use data to train AI models without proper authorization. This penalty is severe as it requires deletion of the data, the models, and the algorithms built with it. This can be an incredibly costly result.
The rapid growth of generative AI has led to a flurry of activity, including the training of AI models on various types of content. Whether you are training models based on content you already possess or are newly acquiring, it is important to ensure you have the right to use that content for those intended purposes. The issues in each situation are fact dependent, including the nature of the content, how it was obtained, any agreements or policies relevant to such use, and for what the AI tool is used. Sometimes, with AI-based medical tools, other regulatory issues may be relevant. For example, see ChatGPT And Healthcare Privacy Risks. Another example is, if you are dealing with the government, other considerations may also be relevant. See ChatUSG: What government contractors need to know about AI. Training AI models for use in other regulated industries or uses may implicate other considerations.