Data Definitions, Types, and Distinctions – Contract Nerds

Key Takeaways:

The Customer Data definition is the single most important provision in your DSA because every security obligation, notification trigger, and financial remedy in the document is only as broad as the data it attaches to.
Derived data, AI Output, metadata, and telemetry are the categories vendors most aggressively try to exclude from DSA definitions, and they are often the most sensitive data in the relationship.
A vendor’s unilateral characterization of data as anonymized, de-identified, or aggregated should never, by itself, remove that data from the security framework of your DSA.

Data Definitions, Types, and Distinctions
By Nada Alnajafi

Ask ten lawyers to define “Customer Data” and you’ll get ten different answers. Ask the same question about “Personal Data,” “Confidential Information,” and “De-identified Data” and the variation gets wider, while the consequences get more serious.

Data definitions are the foundation of every Data Security Addendum. They determine what the vendor must protect, what triggers notification obligations, what falls within your indemnification provisions, and what falls outside them. Getting these definitions right and understanding how they relate to each other is not a technical exercise. It is a strategic one.

This article breaks down the key data types you’ll encounter in DSA negotiations, explains the legal distinctions between them, and gives you a reference framework, including pro-customer sample language, for making sure your definitions are working for you, not against you.

Every security obligation in your DSA attaches to a defined category of data. If the definition is narrow, the obligation is narrow. For example, a vendor who processes your customer transaction records, derives behavioral profiles from them, generates model outputs using them, and stores telemetry data about how your users interact with their platform may argue that only the raw transaction records fall within “Customer Data” as defined. Everything else sits in a gap. And in that gap, their security obligations disappear.

The goal of this article is to close that gap using the DSA, which is why I’ll start by breaking down each common type of data, the risks involved, how they relate to one another, and provide sample pro-customer language for each one.

The Core Data Categories and Their Legal Distinctions

Think of data definitions in a DSA as a system of concentric circles with imperfect edges that create both overlap and gaps. Understanding each category means understanding not just what it covers, but where it touches the others and where it doesn’t.

Free Download: You can download this Data Types Reference Table that goes over what it covers, the key regulatory framework, obligations, and primary risk to a customer.

Customer Data: The Anchor

Customer Data is, or should be, the broadest operational category in your DSA. It is the master definition that security obligations anchor to. In theory, it covers everything the vendor touches in connection with your relationship. In practice, how it’s defined determines whether that theory holds up.

The most common drafting mistake is defining Customer Data as information “provided by Customer to Vendor in connection with the Services.” Every word in that phrase is a limitation. “Provided by Customer” excludes data the vendor generates or derives. “In connection with the Services” invites a narrow reading that excludes background processing, metadata generation, and incidental data collection. Your definition needs to capture not just what you hand the vendor, but everything they touch, generate, or derive throughout the relationship.

Pro-Customer Sample Language: Customer Data Definition

Watch out for this type of vendor-proposed language: “Customer Data means data provided by Customer to Vendor for processing under this Agreement.” This version excludes derived data, metadata, telemetry, and anything Vendor generates during performance. Every word is a limitation. Redline it to the broader definition above.

Personal Data: The Regulated Subset

Personal Data is a legally defined subset of Customer Data. It refers to information that identifies or could reasonably identify a natural person. The definition varies significantly by jurisdiction. GDPR uses “personal data” broadly, HIPAA uses “protected health information” with specific enumerated categories, and CCPA uses “personal information” with its own scope.

This jurisdictional variation creates a specific drafting risk. A Personal Data definition that tracks only one regulatory framework may be compliant under that framework while leaving you exposed under others. If your vendor processes data about EU residents, California consumers, and HIPAA-covered individuals, and many enterprise vendors do, your definition needs to be expansive enough to capture all of them. The way to accomplish this is to draft a jurisdiction-neutral definition: one that does not import any single framework’s language, but instead sets a functional standard broad enough to encompass all of them. Rather than copying GDPR’s recitals or HIPAA’s Safe Harbor list, you anchor the definition to identifiability as the operative concept and then enumerate the specific categories that trigger the heaviest obligations across frameworks.

Pro-Customer Sample Language: Personal Data Definition

Vendor-proposed language to watch for: “Personal Data means personal data as defined under applicable data protection laws.” This is framework-dependent and circular: it requires you to determine which law applies before you know what is protected, and it fails to cover data types that are regulated under one framework but not another. The pro-customer version above sets a floor that travels across jurisdictions without that ambiguity.

Confidential Information: The Commercial Layer

Confidential Information is the broadest of the three primary categories, and it exists primarily in the commercial rather than regulatory context. It typically covers all non-public information exchanged between the parties, such as trade secrets, business strategies, financial data, technical architecture, and pricing, in addition to Customer Data and Personal Data.

But confidentiality obligations and security obligations are not the same thing, and they should not be treated as interchangeable. A vendor bound only by confidentiality must keep your information secret. A vendor bound by security obligations must actively protect it from unauthorized access. Those are different duties, and the gap between them is where breaches live.

Confidential Information definitions almost always include carve-outs for publicly available information, independently developed information, and information received from third parties. If your security obligations attach only to Confidential Information and not independently to Customer Data, a vendor can potentially argue that a carve-out removes a category of data from their security duties entirely. Your DSA must make clear that security obligations operate independently of confidentiality obligations and are not limited by any carve-out applicable to the latter.

Pro-Customer Sample Language: Security Obligations Independent of Confidentiality

DSAs that define security obligations only by reference to “Confidential Information” without a separate Customer Data definition are not sufficient. If security obligations attach only to Confidential Information, the vendor can argue that any data falling within a standard carve-out, such as information received from a third party, is outside their security duties. Redline to add independent Customer Data and Personal Data definitions with security obligations that stand alone.

Derived Data

Derived data includes things like model outputs, behavioral profiles, risk scores, and AI-generated analytics. This data type is typically characterized by vendors as their own intellectual property, developed independently using their own systems and expertise. The counterargument is straightforward: if it was built using your data, it is your data.

The re-identification risk here is real, well-documented, and a growing concern for regulators on both sides of the Atlantic. Academic research has demonstrated that anonymized datasets can be re-identified with high accuracy using a small number of demographic attributes, and both the FTC and the European Data Protection Board have concluded that data labeled as anonymous does not lose its protected status if re-identification remains technically feasible, regardless of how a vendor characterizes it.^[1]

Your DSA should include derived data within the Customer Data definition and prohibit its use for any purpose beyond providing the contracted services.

Pro-Customer Sample Language: Derived Data

Don’t let vendors get away with this type of language: “Vendor may use anonymized, aggregated, or de-identified data derived from Customer Data to improve its products and services.” This clause, often buried in a permitted use section rather than the definitions section, strips Derived Data of all Customer Data protections by relabeling it. Push back on the permitted use entirely, or at minimum require that any use of Derived Data be subject to the same security obligations as Customer Data and require Customer’s express prior written consent.

Anonymized and De-identified Data

Anonymized and de-identified data occupy genuinely ambiguous legal territory. Under GDPR, if there is any realistic possibility of re-identification, the data remains personal data and all obligations attach. Under HIPAA, de-identification under the Safe Harbor or Expert Determination standard removes the data from HIPAA’s scope, but not necessarily from CCPA or state law obligations.

The practical consequence for your DSA is that vendor characterization of data as anonymized or de-identified should not, by itself, remove that data from your security protections. Require a defined technical standard, documentation of the method used, and security obligations that apply regardless of the label.

Pro-Customer Sample Language: Anonymized and De-identified Data

Vendor-proposed language to watch for: “Vendor’s obligations under this Agreement do not apply to data that Vendor has anonymized or de-identified.” This single sentence gives Vendor unilateral authority to remove any data from DSA protections simply by labeling it. There is no standard, no verification, no audit right, and no Customer consent required. Redline to the provision above, which requires demonstration, documentation, and a recognized standard before any data loses its protected status.

Metadata and Log Data

Metadata and log data are perhaps the most frequently overlooked category. This includes access logs, telemetry, usage patterns, and session records. Vendors treat these as their own operational data and routinely exclude them from Customer Data definitions. But this data can be extraordinarily sensitive: it reveals who in your organization accessed what, when, and from where. Under GDPR and many state laws, it is increasingly treated as personal data. Your Customer Data definition should expressly capture it, as the pro-customer Customer Data definition above does.

AI Output

AI Output is an emerging, defined term appearing with increasing frequency in modern DSAs, and it deserves its own definition precisely because it does not fit cleanly within the existing data categories. AI Output refers to content, predictions, recommendations, classifications, or other results generated by an artificial intelligence or machine learning system in the course of processing Customer Data or performing the contracted services.

AI Output is a subset of Derived Data (it is produced using Customer Data). It may constitute Personal Data if it identifies or can reasonably identify an individual. Either way, it should be expressly included within the Customer Data definition so that all security obligations, permitted use restrictions, and data return requirements attach to it as fully as they attach to the underlying inputs.

Without an express AI Output definition, vendors routinely argue that model-generated content is their own proprietary product, placing it entirely outside the DSA’s protective framework at exactly the moment when re-identification risk and downstream misuse risk are highest.

Pro-Customer Sample Language: AI Output Definition

Learn More: Learn how to draft and review AI Addendums and get access to a free, attorney-drafted template that you can get started with today!

Build Definitions as a System

The most important takeaway from this framework is that data definitions should work together as a system, not as isolated provisions. A well-drafted DSA establishes that:

Security obligations attach to all Customer Data as the broadest operational category
Heightened obligations apply specifically to Personal Data within that category
Confidential Information obligations supplement but do not replace security obligations
Derived data, metadata, and log data are expressly included in Customer Data so there is no gap
Anonymization or de-identification does not, by itself, remove data from the security framework

When you review a vendor’s proposed DSA, read the definitions section first and read it carefully. Map each defined term against the scope of security obligations and ask: is there any data the vendor touches that falls outside these definitions? If the answer is yes, you’ve found your negotiating agenda before you’ve reached page two.

The vendor who resists a broad Customer Data definition, despite reasonable arguments grounded in risk, regulatory reality, and their own security practices, is telling you something important about how they intend to use your data. That conversation is worth having before you sign, not after something goes wrong. Because as we all know, the data is in the details.

Check out my new column, The Data is in the Details, where I dive into the details of drafting and negotiating better data security agreements.

[1] Luc Rocher, Julien M. Hendrickx, and Yves-Alexandre de Montjoye, “Estimating the success of re-identifications in incomplete datasets using generative models,” Nature Communications 10, no. 3069 (2019), https://doi.org/10.1038/s41467-019-10933-3 (finding that 99.98% of Americans could be correctly re-identified using fifteen demographic attributes). Federal Trade Commission, Protecting Consumer Privacy in an Era of Rapid Change: Recommendations for Businesses and Policymakers (March 2012), https://www.ftc.gov/reports/protecting-consumer-privacy-era-rapid-change (concluding that data is not truly anonymous if it can be combined with other available information to identify an individual). Article 29 Data Protection Working Party, Opinion 05/2014 on Anonymisation Techniques, WP216 (April 10, 2014), https://ec.europa.eu/justice/article-29/documentation/opinion-recommendation/files/2014/wp216_en.pdf (finding that most common anonymization techniques carry residual re-identification risk that organizations cannot fully eliminate).

The post Data Definitions, Types, and Distinctions appeared first on Contract Nerds.

– Read More

Data Definitions, Types, and Distinctions – Contract Nerds