§ Solutions

Datasets for every
voice AI problem.

From speech recognition to text-to-speech, voice cloning, conversational LLMs, and 40+ language coverage — every catalog is built on the same foundation: consented, studio-grade audio with documented per-file provenance.

Request a sample →See pricing

48 kHz / 24-bit WAV · Word-level aligned transcripts · Verified consent · Commercial training rights

350+

Hours licensed

40+

Languages & locales

2,400+

Unique speakers

100%

Written consent

§ 01 — Catalog by use case

Pick the dataset
that matches the model.

Six specialised catalogs, all drawn from the same cleared source library. Click any dataset for full specs, sample manifests, and pricing context.

⌁

ASR training data

Two- and three-speaker conversational English with word-level alignment, diarization, and speaker metadata. Built for fine-tuning Whisper, Conformer, and frontier ASR stacks.

120 hrs · EN-US, EN-GB, EN-AU →

⌁

TTS training data

Phonetically balanced studio reads from named speakers, recorded on broadcast-grade chains. Pitch, prosody, and pronunciation tagged for high-fidelity neural TTS.

Phonetic balance · 48 kHz →

⌁

Voice cloning data

Single-speaker, multi-take, multi-emotion sets with explicit cloning rights in the speaker release. The only catalog on the market that names voice cloning in the consent form.

Cloning rights granted →

⌁

Conversational AI

Long-form, multi-turn dialogue across casual, interview, and debate registers. Disfluencies, overlap, and turn-taking preserved — exactly what conversational LLMs need.

Multi-turn · turn-taking →

⌁

Multilingual speech

40+ languages and regional locales with native-speaker verification and parallel topic coverage. Built for global ASR/TTS expansion without scraping local platforms.

40+ locales · native verified →

⌁

Podcast audio licensing

Bulk licensing of professionally produced podcast back catalogs — direct from the studios that own them, on a contract you can take to legal.

Catalog licensing →

§ 02 — By model architecture

What we ship,
by what you're training.

Model type

Recommended catalog

Hours typical

Format

ASR (Whisper, Conformer, USM)

Conversational Core

50–200 hrs

WAV + JSON + CTM

TTS / neural vocoder

TTS Studio Reads

10–40 hrs / speaker

WAV + phoneme + prosody

Voice cloning

Cloning-cleared singles

2–8 hrs / speaker

WAV + release ID

Conversational LLM (audio in/out)

Multi-turn dialogue

100–400 hrs

WAV + diarized JSON

Multilingual ASR

Multilingual Expansion

20–80 hrs / locale

WAV + per-locale JSON

Speech evaluation / benchmarks

Custom commission

Scoped to brief

Negotiated

§ 03 — Why teams choose aipodcast

The same source,
licensed every way.

⌁

One source, six packages

Every catalog draws from the same studio-grade pipeline. Switch use cases without switching suppliers, contracts, or quality bars.

⌁

Per-file provenance

SHA-256 manifest, signed release ID, and speaker metadata travel with every WAV. Audit any file in your training set back to a named, contactable speaker.

⌁

Cleared for commercial training

The release language explicitly grants AI model training rights. No "implied consent," no platform ToS ambiguity, no retroactive opt-outs.

⌁

Real studios, real engineers

Recorded on Shure SM7B, Rode NT1, and MKH 416 chains in treated rooms — not bedroom USB mics. The catalog is broadcast-grade by default.

⌁

Speakers retain ownership

We license; we do not buy. Speakers can revoke at any time and we honor it on a defined SLA — a feature, not a bug, for legal review.

⌁

Built for the legal review

Master agreement, DPA, security questionnaire responses, and provenance manifest are pre-built and Fortune-500-tested.

§ 04 — Frequently asked

Solutions, in plain English.

Can I license multiple catalogs under a single contract?

Yes. Most engagements that touch more than one model end up bundling two or three catalogs into a single master agreement with a single named contact for the lifetime of the deal. You only sign once.

Do you offer exclusive licensing on any of these datasets?

Custom commissions can be licensed exclusively. Off-the-shelf catalogs are non-exclusive by default but exclusive carve-outs (by language, by speaker, by domain) are available — talk to us about scope.

What if my use case isn't on this page?

Most of what we deliver lives somewhere on the spectrum between these six catalogs. If you're training something more exotic — emotion classification, age estimation, accent ID, speaker verification — we can scope a custom commission against the same supply network.

How long from first email to delivered data?

Sample pack within one business day. Off-the-shelf catalog delivery within one to two weeks of signed MSA. Custom commissions run four to eight weeks depending on language, hours, and casting requirements.

What's the smallest engagement you'll take?

We've shipped useful work at 10 hours and we've shipped at 400+. The floor is set by the cost of the legal review and onboarding, not the audio itself — talk to us if you're not sure where you land.

§ 05 — Get a sample pack

Hear it before
you decide anything.

Tell us which catalog you're evaluating and we'll send a free 60-second WAV, the matching datasheet, and a short note on which package fits your training run. Reply within one business day.

Request a sample pack →or email partnerships@aipodcast.io

Pick the datasetthat matches the model.