VOICE CLONING · TRAINING DATA

The only ethical way to source voice cloning data at scale.

Voice cloning is the highest-risk use case in speech AI, and the one most exposed to lawsuits, right-of-publicity claims, and platform bans. We license speaker-specific audio under separate, explicit, written voice-cloning consent from every speaker — and we'll only sell it to customers who agree to the elevated terms below.

Request a voice-cloning consultation →Email partnerships@aipodcast.io

Per-speaker written consent · Compensation flow-through · Watermarking required · Right-of-revocation honored

Why this is a separate product

Read this before you buy anything else.

Standard catalog and custom licenses do not permit voice cloning. Cloning a speaker's voice raises distinct legal issues that the standard release does not address: right of publicity, biometric privacy under BIPA, defamation risk if the cloned voice is misused, and a much broader surface area for litigation. We refuse to bundle voice cloning rights into a generic dataset license, because doing so would be unfair to speakers and dangerous for buyers.

How it works

What an aipodcast voice cloning license looks like.

⌁

Per-speaker consent

We obtain a separate, explicit, written voice-cloning consent from every speaker in the licensed dataset, naming the customer.

⌁

Per-customer disclosure

The speaker is told who is licensing their voice and for what purpose. No anonymous resale.

⌁

Compensation flow-through

A defined share of the elevated fee flows directly to the speaker, separate from base licensing payments.

⌁

Use restrictions

No non-consensual intimate content, defamation, election interference, deceptive impersonation, or any practice prohibited by EU AI Act Article 5.

⌁

Watermarking

Customer agrees to participate in a recognized voice provenance / watermarking scheme. We can recommend a stack.

⌁

Right of revocation

On revocation, customer must cease producing new outputs derived from the cloned voice within 30 days. Already-distributed outputs are not retroactively recalled.

Specs

What we deliver.

Audio

Format: 48 kHz / 24-bit WAV
Per speaker: Several hours to several dozen hours typical
Tracks: Multi-track separation where available

Transcripts

Alignment: Word-level standard, phoneme-level on request
Formats: JSON, WebVTT, SRT, TextGrid

Metadata

Speaker: Age range, gender, L1, accent, mic, environment
Per file: Consent record IDs

Legal

Consent: Signed voice-cloning consent record from the speaker
Addendum: Dual-party MSA addendum
Indemnification: IP & consent claims arising from source audio

Fit

Who this is right for.

✓ Right fit

Companies building branded voice products with the licensed speaker’s consent and participation
Accessibility products that clone a user’s voice with their explicit consent (e.g., voice banking for ALS patients)
Audiobook and dubbing platforms cloning licensed performers
Voice agent products needing brand-consistent synthesized voices

✕ Not a fit

Anyone wanting to clone a celebrity, public figure, or third party without their consent — we will not work with you
Anyone wanting to build deepfake or impersonation tools — we will not work with you
Anyone unwilling to participate in voice provenance / watermarking — we will not work with you

Building a voice cloning product the right way?

Voice cloning is priced significantly above catalog and custom rates. Talk to sales for a project-specific quote and a walkthrough of the consent flow.

Request a consultation →or email partnerships@aipodcast.io