CATALOG

What's available now.

Real hours, in real languages, ready to license today. Updated weekly. Each row links to a Dataset detail page with a downloadable sample, full spec, and a request-quote button.

350+ hours of conversational English in catalog · 12 languages across our network · Studio-grade by default
350+
Hours of conversational English in catalog
12
Languages across our podcaster network
48 hr
Sample delivery from NDA
2–6 wk
Custom collection turnaround
Catalog

Browse what's licensable today.

“In collection” means recording is in flight and will be available within 4–6 weeks. “On request” means we have the supplier network in place and begin collection on a per-Order basis.

DatasetLanguageAccentDomainHoursStatus
Conversational EN-US (long-form interviews)EnglishUS General AmericanInterview350+Available
Conversational EN-US (panel discussions)EnglishUS MixedPanel / multi-speakerIn collection
Conversational EN-UKEnglishUK RP, regionalInterviewIn collection
Conversational EN-AUEnglishAustralianInterviewIn collection
Conversational ES-LATAMSpanishLATAM MixedInterviewOn request
Conversational ES-EUSpanishEU CastilianInterviewOn request
Conversational PT-BRPortugueseBrazilianInterviewOn request
Conversational FR-FRFrenchMetropolitanInterviewOn request
Conversational DE-DEGermanHigh GermanInterviewOn request
Conversational JA-JPJapaneseTokyo StandardInterviewOn request
Conversational KO-KRKoreanSeoul StandardInterviewOn request
Conversational HI-INHindiStandard, regionalInterviewOn request
Conversational ARArabicMSA, Egyptian, Gulf, LevantineInterviewOn request
Conversational ZH-CNMandarinStandard, regionalInterviewOn request
§ 02 — Featured datasets

A closer look at three
catalogs we ship today.

These are the datasets most teams start with. Each one ships under a single master agreement with the per-file provenance manifest, signed releases, and a named contact for the lifetime of the deal.

Dataset 01

Conversational EN-US

Long-form interview and panel dialogue from professional U.S. podcasts. The flagship catalog — most ASR and conversational LLM customers start here.
  • 350+ hours · available today
  • 2,400+ unique named speakers
  • 2- and 3-speaker, studio-recorded
  • Word-level CTM + diarized JSON
  • 48 kHz / 24-bit WAV, −23 LUFS
Licensed per project · request a quote
Dataset 02

Multilingual Expansion

40+ languages and regional locales drawn from native-speaker podcast networks. Parallel topic coverage where the supply allows it.
  • 20–80 hrs per locale typical
  • Native-speaker verified
  • Per-language transcripts + metadata
  • Same studio quality bar as EN catalog
  • 4–6 week ramp on new locales
Licensed per project · request a quote
Dataset 03

Custom Commission

We brief, cast, and record to your spec — domain, accents, scenarios, hours. You own the result, optionally on an exclusive license.
  • Scoped to brief
  • 4–8 week turnaround
  • Exclusive licensing available
  • Direct studio and casting access
  • Same provenance pipeline
Scoped to brief · request a quote
§ 03 — Spec sheet

What every file ships with.

Specification
Standard
Notes
Audio format
WAV, 48 kHz / 24-bit
Lossless. FLAC + 16-bit re-encodes on request.
Loudness target
−23 LUFS integrated
EBU R128 broadcast standard.
Channel layout
Mono per speaker + stereo mix
Per-speaker isolation where the studio supports it.
Transcript format
JSON + CTM + plain text
Word-level alignment, speaker IDs, confidence scores.
Diarization
Speaker-labeled JSON
Manual QC pass on every file.
Metadata
Per-file JSON sidecar
Speaker demographics, recording date, studio, mic chain.
Provenance manifest
SHA-256 + signed release ID
Audit any WAV back to a named, contactable speaker.
License grant
Perpetual, commercial AI training
Explicit grant in the speaker release. Not implied.
§ 04 — How a dataset ships

From sample pack
to signed contract.

01

Sample pack

Tell us the catalog and use case. We send a 60-second WAV, the matching datasheet, and a short note on which package fits — within one business day.

02

NDA + 5-minute sample

Sign a one-page NDA and we deliver a 5-minute representative sample with full transcript, metadata, and a draft provenance manifest for legal review.

03

MSA + first delivery

Master agreement, DPA, and security questionnaire are pre-built. First delivery ships within one to two weeks of signature, on the format you specify.

04

Per-file manifest

Every WAV arrives with a SHA-256 checksum, signed release ID, and speaker metadata. Audit any file in your training set back to a named speaker at any time.

05

Named contact for life

One person owns your account end-to-end — sourcing, contracts, deliveries, audits. No ticket queues, no rotating success managers, no escalation paths.

06

Revocation SLA

Speakers retain ownership and can revoke. We honor revocations on a defined SLA and re-issue manifests so your training set stays clean.

§ 05 — Don't see it?

We source under-represented
languages too.

Through our podcaster network we've sourced audio for several smaller-population languages. Talk to us about Swahili, Tagalog, Vietnamese, Bengali, Yoruba, or any language you don't see listed — most languages with a working podcast scene are reachable.

How to read "Hours available"

Hours currently in catalog and licensable today. "In collection" ships in 4–6 weeks. "On request" ships as a custom collection in 3–6 weeks.

How to read "Speakers"

The number of distinct named speakers contributing to the dataset. Every speaker has a signed model-training release on file in the consent vault.

How to read "Sample"

A 5-minute representative sample with transcript and metadata, delivered under NDA so you can evaluate fit before requesting a full quote.

§ 06 — FAQ

Catalog questions.

How often is the catalog updated?

Weekly. New hours land as our partner studios deliver back catalog and as commissioned recordings clear QC. Major catalog releases (new languages, new domains) ship monthly.

Can I license a single language without taking the whole multilingual bundle?

Yes. Every locale is licensable individually. Most multilingual customers start with one or two locales and expand once their training pipeline is proven.

Do you offer evaluation-only licenses?

Yes — short-term evaluation licenses are available for benchmark and ablation work with a smaller commitment than a full training license.

What happens when a speaker revokes?

We notify you within the contractual SLA, re-issue the provenance manifest with the affected files flagged, and provide replacement audio of equivalent length and demographics where possible.

Can I get the source releases for an audit?

Yes. Signed releases are stored in our consent vault and available for inspection under NDA — typically as part of a Fortune-500 vendor review or a government procurement audit.

Do you ship to on-prem / air-gapped environments?

Yes. Datasets can be delivered to S3, GCS, Azure Blob, or shipped on encrypted physical media for air-gapped training environments.

Need a custom collection?

Any language with podcast infrastructure. Tell us the language, accent mix, demographics, and hours — we'll come back with a scoped proposal in 2 business days.