§ Solutions

Datasets for every
voice AI problem.

From speech recognition to text-to-speech, voice cloning, conversational LLMs, and 40+ language coverage — every catalog is built on the same foundation: consented, studio-grade audio with documented per-file provenance.

48 kHz / 24-bit WAV · Word-level aligned transcripts · Verified consent · Commercial training rights
350+
Hours licensed
40+
Languages & locales
2,400+
Unique speakers
100%
Written consent
§ 01 — Catalog by use case

Pick the dataset
that matches the model.

Six specialised catalogs, all drawn from the same cleared source library. Click any dataset for full specs, sample manifests, and pricing context.

§ 02 — By model architecture

What we ship,
by what you're training.

Model type
Recommended catalog
Hours typical
Format
ASR (Whisper, Conformer, USM)
Conversational Core
50–200 hrs
WAV + JSON + CTM
TTS / neural vocoder
TTS Studio Reads
10–40 hrs / speaker
WAV + phoneme + prosody
Voice cloning
Cloning-cleared singles
2–8 hrs / speaker
WAV + release ID
Conversational LLM (audio in/out)
Multi-turn dialogue
100–400 hrs
WAV + diarized JSON
Multilingual ASR
Multilingual Expansion
20–80 hrs / locale
WAV + per-locale JSON
Speech evaluation / benchmarks
Custom commission
Scoped to brief
Negotiated
§ 03 — Why teams choose aipodcast

The same source,
licensed every way.

One source, six packages

Every catalog draws from the same studio-grade pipeline. Switch use cases without switching suppliers, contracts, or quality bars.

Per-file provenance

SHA-256 manifest, signed release ID, and speaker metadata travel with every WAV. Audit any file in your training set back to a named, contactable speaker.

Cleared for commercial training

The release language explicitly grants AI model training rights. No "implied consent," no platform ToS ambiguity, no retroactive opt-outs.

Real studios, real engineers

Recorded on Shure SM7B, Rode NT1, and MKH 416 chains in treated rooms — not bedroom USB mics. The catalog is broadcast-grade by default.

Speakers retain ownership

We license; we do not buy. Speakers can revoke at any time and we honor it on a defined SLA — a feature, not a bug, for legal review.

Built for the legal review

Master agreement, DPA, security questionnaire responses, and provenance manifest are pre-built and Fortune-500-tested.

§ 04 — Frequently asked

Solutions, in plain English.

Can I license multiple catalogs under a single contract?

Yes. Most engagements that touch more than one model end up bundling two or three catalogs into a single master agreement with a single named contact for the lifetime of the deal. You only sign once.

Do you offer exclusive licensing on any of these datasets?

Custom commissions can be licensed exclusively. Off-the-shelf catalogs are non-exclusive by default but exclusive carve-outs (by language, by speaker, by domain) are available — talk to us about scope.

What if my use case isn't on this page?

Most of what we deliver lives somewhere on the spectrum between these six catalogs. If you're training something more exotic — emotion classification, age estimation, accent ID, speaker verification — we can scope a custom commission against the same supply network.

How long from first email to delivered data?

Sample pack within one business day. Off-the-shelf catalog delivery within one to two weeks of signed MSA. Custom commissions run four to eight weeks depending on language, hours, and casting requirements.

What's the smallest engagement you'll take?

We've shipped useful work at 10 hours and we've shipped at 400+. The floor is set by the cost of the legal review and onboarding, not the audio itself — talk to us if you're not sure where you land.

§ 05 — Get a sample pack

Hear it before
you decide anything.

Tell us which catalog you're evaluating and we'll send a free 60-second WAV, the matching datasheet, and a short note on which package fits your training run. Reply within one business day.