GDPR Article 9 and Voice Data: What AI Companies Training on Vocals Need to Know

If you are training a voice or music AI model on EU data, Article 9 of the General Data Protection Regulation is one of the most frequently cited and most frequently misunderstood provisions in the entire regulation. It is cited because "voice is biometric" has become shorthand for a compliance concern. It is misunderstood because the actual scope of Article 9 depends on a purpose test that most marketing copy ignores.

This post walks through what Article 9 actually says, when voice data falls inside it, when it doesn't, what "explicit consent" really requires, and what an audit-ready consent chain looks like in practice. It draws directly from the GDPR text, the European Data Protection Board's guidelines, and the ICO's published guidance. Your legal team will still want to review specific implementations, but by the end of this post you should be able to ask them precise questions.

What Article 9 actually says

Article 9(1) of the GDPR prohibits processing of "special categories" of personal data. The list includes racial or ethnic origin, political opinions, religious beliefs, trade union membership, genetic data, health data, data concerning sex life or sexual orientation, and, crucially for our purposes, "biometric data for the purpose of uniquely identifying a natural person."

The definition of biometric data is in Article 4(14): "personal data resulting from specific technical processing relating to the physical, physiological or behavioural characteristics of a natural person, which allow or confirm the unique identification of that natural person."

Read those two provisions together and a specific structure emerges. Biometric data is a category of personal data. Article 9 applies to biometric data only when the processing purpose is to uniquely identify a natural person. If the processing purpose is something else (training a generative voice model, for example, or synthesizing new vocals), Article 9 may not apply even though the raw material is a voice recording.

This is not a loophole. It is the written structure of the regulation. But it is also not a free pass, because the downstream use of a voice model can drag you back into Article 9 territory if the resulting system is used for identification.

The EDPB position: purpose matters

The European Data Protection Board's Guidelines 02/2021 on virtual voice assistants address this question directly. The EDPB acknowledges that voice data is "inherently biometric personal data" in the sense that voices are unique physical characteristics. But the guidelines then draw the Article 9 line at purpose: the special category regime applies "where the processing… has the purpose of identifying an individual."

For a B2B vocal dataset used to train a generative singing model, the dataset provider's purpose is typically not unique identification. The purpose is to teach a model what singing sounds like, statistically, across a corpus of voices. The enterprise buyer's purpose is typically the same. Neither party is using the data to answer the question "is this the same person who sang in that other recording?" In that case, the processing is lawful under Article 6, not Article 9, provided the other GDPR requirements are met.

Where Article 9 does bite

If the downstream use of the model is speaker identification (voice authentication for a banking app, for example, or speaker diarization for a surveillance product), Article 9 applies to both the training and the operation. Enterprise buyers building those products need Article 9-compliant consent, not Article 6 consent, and they need it specifically for the identification purpose.

This is why enterprise vocal dataset agreements often include purpose limitation clauses: the buyer can train generative or synthesis models, but cannot use the data to train identification systems without separate written consent.

Article 9(2)(a): explicit consent

When Article 9 does apply, the processing is only lawful if one of the ten exceptions in Article 9(2) is satisfied. For commercial AI training, the only realistic exception is Article 9(2)(a): "the data subject has given explicit consent to the processing of those personal data for one or more specified purposes."

"Explicit consent" is a higher bar than regular GDPR consent. It is defined through a combination of the general Article 4(11) consent standard, the Article 7 conditions, and the additional requirement of an express confirmation.

The EDPB's Guidelines 05/2020 on consent

The EDPB has published detailed guidance on what counts as valid consent, and the guidance on explicit consent is stricter than the guidance on ordinary consent. To qualify as explicit, the consent must:

Meet all the ordinary Article 4(11) requirements: freely given, specific, informed, and unambiguous
Be "expressly confirmed in a clear statement" — this can be written, oral, or electronic, but the key word is "statement"
Reference the specific purpose, not a blanket category; "we may use your data for AI training" is probably too broad, while "we will use your recordings to train generative voice synthesis models" is more defensible
Be separable from other consents; bundling Article 9 consent with other terms and conditions is a classic failure mode

The EDPB recommends a two-stage verification for electronic explicit consent: the user actively provides the statement, and then the controller confirms it via a second channel (typically email). This is recommended, not strictly required, but it has become the de facto standard for audit-ready Article 9 consent.

What "specific purpose" means

One of the most common compliance failures in vocal data licensing is purpose over-reach. A consent form that says "I agree to the use of my recordings for AI training" is specific at first glance but collapses on closer reading. AI training for what? By whom? For which class of models? For which downstream uses?

The GDPR's purpose limitation principle in Article 5(1)(b) requires that personal data be "collected for specified, explicit and legitimate purposes and not further processed in a manner that is incompatible with those purposes." When the original consent was for "AI training" in the abstract, a subsequent use for emotion recognition or speaker identification may constitute "further processing" that was not authorized, and the legal basis for the new use is gone.

Enterprise-grade vocal dataset agreements address this by being specific about what kind of AI training is authorized, what kind is not, and what the mechanism is for expanding the scope later if needed.

What an audit-ready consent record contains

There is no single statutory definition of "audit-ready." The requirement flows from Article 7(1), which states that when processing is based on consent, the controller must be able to demonstrate that the data subject consented. In practice, that means keeping records that would satisfy a Data Protection Authority auditor reviewing the controller's compliance posture.

The minimum elements of an audit-ready consent record, assembled from the GDPR text, the ICO's guidance, and the EDPB's guidelines, are:

Element	What it proves
User identifier	Who consented
Precise timestamp	When the consent was given
Exact purpose language shown	What they agreed to
Privacy notice version	What context they had
Consent mechanism	How they consented (checkbox, signature, click-through)
IP address	Verification trail (strongly recommended by DPAs)
Withdrawal log	Proof of ongoing ability to withdraw
Affirmative action proof	The record shows active opt-in, not pre-ticked checkbox

For Article 9 explicit consent specifically, add an express statement acknowledging the special category nature of the data (for example, "I understand my voice recordings may be processed as biometric data and I explicitly consent to their use for training generative voice synthesis models"), and ideally a second confirmation channel.

The right to withdraw

Article 7(3) of the GDPR is deceptively simple: "The data subject shall have the right to withdraw his or her consent at any time… It shall be as easy to withdraw as to give consent."

In the context of a vocal dataset, this provision has significant operational implications. If a vocalist withdraws consent, the dataset provider is required to stop processing the data. "Processing" includes continued use of the data by downstream enterprise buyers. That means the dataset provider must either:

Contractually require enterprise buyers to delete the withdrawn recordings from their training corpora
Ensure that the recordings were only used for statistical training (so that deletion of the raw data is sufficient and the model weights do not "memorize" individual contributions)
Retain the data under a different lawful basis (legitimate interests, for example) only if that basis actually applies and the withdrawal does not override it

The first option is the cleanest but most operationally painful. The second option is only defensible if you have technical evidence that individual contributions cannot be extracted from the trained model. The third option is legally risky and should not be relied on without explicit legal review.

Withdrawal propagation

The practical compliance question is: when a vocalist withdraws consent, how does that propagation reach the downstream enterprise buyer whose model is already trained? Enterprise vocal dataset licensing agreements typically handle this by including:

A notification obligation: the provider must notify the enterprise buyer within a defined window (usually 30 days) of any withdrawal
A deletion obligation: the buyer must delete the affected recordings from training corpora
A model-impact clause: the buyer must either retrain the affected model or document a technical argument for why the model no longer contains traces of the withdrawn data
An audit right: the provider retains the right to verify that the buyer complied

These clauses are painful to negotiate but they are non-negotiable if you want an Article 9-compliant program.

The purpose limitation trap

One of the subtler compliance failures in voice AI is the purpose limitation trap. A dataset was collected for training a singing synthesis model. The enterprise buyer decides two years later to fine-tune the model for speaker identification. The original consent did not mention speaker identification. The new use is "further processing for a purpose incompatible with the original purpose" under Article 5(1)(b).

At that point, the buyer has three options: obtain fresh consent from every vocalist in the dataset (operationally impossible for most datasets), establish a new lawful basis that covers the new purpose (difficult for Article 9 data), or abandon the new use.

The way to prevent this trap is to be honest about intended use at the collection stage. If there is any reasonable chance that the dataset will be used for identification-adjacent purposes down the line, the original consent should cover that use explicitly. If the consent cannot reasonably be written to cover it, the dataset was collected for the wrong use case and should not be extended.

How The Vocal Market handles Article 9 compliance

Our enterprise vocal dataset licensing program was designed around the Article 9 framework. Every vocalist who contributes to the dataset signs an explicit consent agreement that includes:

An acknowledgment that voice data may be processed as biometric data under GDPR
An explicit consent to the use of recordings for training generative voice synthesis and music generation models
A specific exclusion of speaker identification and emotion recognition training unless separate consent is obtained
A timestamped digital signature with IP address and consent text version logged
A withdrawal mechanism accessible directly from the vocalist's account, meeting the "as easy as giving" standard

Enterprise buyers receive a copy of the consent log alongside the dataset, with vocalist personal information redacted but the consent chain intact. If your legal team wants to verify the Article 9 basis for a specific recording before signing a licensing agreement, we can produce the full audit trail (including withdrawal history) within 48 hours. Request a sample dataset and ask for the accompanying compliance documentation.

What Article 9 actually says

The EDPB position: purpose matters

Where Article 9 does bite