For AI companies in the health care space, data is everything. It fuels model performance, drives product differentiation, and can make or break scalability. Yet too often, data rights are vaguely defined or completely overlooked in commercial agreements. That is a critical mistake.
Whether you are contracting with a health system, integrating into a digital health platform, or partnering with an enterprise vendor, your data strategy needs to be reflected clearly and precisely in your contracts. If it is not, you may find yourself locked out of the very assets you need to grow — or worse, liable for regulatory violations you never anticipated.
This article outlines three areas where we see health AI companies exposed to the most risk: training rights, revocation and retention terms, and shared liability. These are not just technical contract points. They are foundational to your valuation, your compliance posture, and your long-term defensibility.
1. Training Rights: Define Them with Precision
Most health AI vendors want some right to use client data to improve or train their models. That is expected. The problem is that many agreements use imprecise language like “improving services” or “analytics purposes” to describe what the vendor can actually do with the data. In the health care context, this can be legally problematic.
Under HIPAA, a use of protected health information (PHI) for purposes beyond patient treatment by a covered entity, payment, or health care operations, generally requires patient authorization. As a result, use cases like model training or product improvement require a strong case that the activity qualifies as the covered entity’s health care operations — otherwise it will require patient authorization. Even so-called “anonymized” or “de-identified” data is not always safe if it has not been properly and fully de-identified under the HIPAA standard for de-identification.
If your business model relies on using client data for generalized model training, you need to be explicit about that in your contract. This includes clarifying whether the data is identifiable or de-identified, what de-identification method is being used, and whether the outputs of that training are limited to the specific client’s model or can be used across your entire platform.
On the flip side, if you are offering a model-as-a-service that is customized to each client and does not rely on pooled training data, you should say that clearly and ensure the contract supports that structure. Otherwise, you risk a dispute down the road about how data can be used or whether outputs were improperly shared across clients.
It is also critical to align this provision with your business associate agreement (BAA) if you are processing PHI. A mismatch between the commercial terms and the BAA can raise compliance issues and raise red flags, especially during diligence. Remember, in most cases, if your commercial agreement states that you may de-identify data, but your BAA says that de-identification is prohibited, the BAA likely governs as a result of the typical conflict language in the BAA favoring the BAA when PHI is at issue.
2. Revocation and Retention: Address What Happens When the Contract Ends
Too many AI contracts are silent on what happens to data, models, and outputs after a contract is terminated. That silence creates risk on both sides.
From a client’s perspective, allowing a vendor to continue using the client’s data after termination to train future models can be problematic under HIPAA and even feel like a breach of trust. From the vendor’s perspective, losing rights to previously accessed data or trained outputs can disrupt product continuity or future sales.
The key is to define whether rights to use data or model outputs survive termination, and under what conditions. If you want to retain the ability to use data or trained models post-termination, that should be an express, bargained-for right. If not, you must be prepared to unwind that access, destroy any retained data, and potentially retrain models from scratch.
This is particularly important when derivative models are involved. A common trap is allowing the vendor to claim that once data is used to train a model, the model is no longer tied to the underlying data. Courts and regulators may not agree if that model continues to reflect sensitive patient information.
At minimum, your agreement should answer three questions:
- Can the vendor retain and continue to use data accessed during the contract?
- Do any rights to trained models or outputs survive termination?
- Is there an obligation to destroy, de-identify, or return data after the relationship ends?
For higher-value relationships, consider negotiating a license that survives termination, coupled with appropriate compensation and confidentiality obligations. In the health care context, this post-termination license will generally involve only de-identified data. Otherwise, build in a data return or destruction clause that outlines how and when data must be returned or destroyed.
3. Shared Liability: Clarify Responsibility for Downstream Harms
One of the most overlooked issues in health AI contracting is liability allocation. AI-generated recommendations can influence medical decisions, billing practices, and patient communications. When something goes wrong, the question becomes: who is responsible?
AI vendors typically position themselves as only a tool for health care providers and try to disclaim liability for any downstream use. Health care providers, on the other hand, increasingly expect vendors to stand behind their products. This is especially true if those products generate clinical notes, diagnostic suggestions, or other regulated outputs.
The reality is that both sides carry risk, and your contract needs to reflect that. Disclaimers are important, but they do not replace a thoughtful risk allocation strategy.
First, vendors should require that their clients include representations that the client has the appropriate authorizations or consents under applicable laws, including HIPAA, the FTC Act, and state privacy laws, for all processing of data by the vendor contemplated by the agreement. The use of client data for training and processing, or otherwise, should be clearly articulated in the agreement. This helps protect you if a client later claims they were not aware of how their data was used or that it was used improperly.
Second, consider indemnity provisions tied to misuse of third-party intellectual property, violation of patient privacy, or regulatory enforcement actions. For example, if your model relies on PHI that was not properly authorized or de-identified under HIPAA, and that triggers an investigation by the Office for Civil Rights, you may be on the hook.
Third, be thoughtful about limitations on liability. Many Software as a Service (SaaS) agreements use a standard cap tied to fees paid in the prior 12 months. That may be insufficient in the health AI context, especially if the model is being used in a clinical setting. A tiered cap based on use case (e.g., documentation vs. clinical decision support) may be more appropriate.
The Takeaway
Data terms are not boilerplate in health AI contracts. They are a core part of your business model, your compliance posture, and your defensibility in the market. If you do not define your rights around training, revocation, and liability, someone else will, typically in a lawsuit or regulatory action.
For AI vendors, the goal should be to build trust through transparency. That means clear language, reasonable limitations, and defensible use cases. The companies that succeed in this space will not just build good models, but they will also build good contracts around them.
If you are looking to operationalize this strategy, audit your top five contracts this quarter. Flag any vague data rights, mismatched BAAs, or termination gaps. And if needed, renegotiate before your model performance, client trust, or legal exposure gets tested.