CATALOG

What's available now.

Real hours, in real languages, ready to license today. Updated weekly. Each row links to a Dataset detail page with a downloadable sample, full spec, and a request-quote button.

Request a sample →Talk to sales

sourced to spec of conversational English in catalog · 12 languages across our network · Studio-grade by default

350+

Hours of conversational English in catalog

Languages across our podcaster network

48 hr

Sample delivery from NDA

2–6 wk

Custom collection turnaround

Catalog

Browse what's licensable today.

“In collection” means recording is in flight and will be available within 4–6 weeks. “On request” means we have the supplier network in place and begin collection on a per-Order basis.

Dataset	Language	Accent	Domain	Hours	Status
Conversational EN-US (long-form interviews)	English	US General American	Interview	350+	Available
Conversational EN-US (panel discussions)	English	US Mixed	Panel / multi-speaker	—	In collection
Conversational EN-UK	English	UK RP, regional	Interview	—	In collection
Conversational EN-AU	English	Australian	Interview	—	In collection
Conversational ES-LATAM	Spanish	LATAM Mixed	Interview	—	On request
Conversational ES-EU	Spanish	EU Castilian	Interview	—	On request
Conversational PT-BR	Portuguese	Brazilian	Interview	—	On request
Conversational FR-FR	French	Metropolitan	Interview	—	On request
Conversational DE-DE	German	High German	Interview	—	On request
Conversational JA-JP	Japanese	Tokyo Standard	Interview	—	On request
Conversational KO-KR	Korean	Seoul Standard	Interview	—	On request
Conversational HI-IN	Hindi	Standard, regional	Interview	—	On request
Conversational AR	Arabic	MSA, Egyptian, Gulf, Levantine	Interview	—	On request
Conversational ZH-CN	Mandarin	Standard, regional	Interview	—	On request

§ 02 — Featured datasets

A closer look at three
catalogs we ship today.

These are the datasets most teams start with. Each one ships under a single master agreement with the per-file provenance manifest, signed releases, and a named contact for the lifetime of the deal.

Dataset 01

Conversational EN-US

Long-form interview and panel dialogue from professional U.S. podcasts. The flagship catalog — most ASR and conversational LLM customers start here.

sourced to spec · available today
Named, contactable speakers
2- and 3-speaker, studio-recorded
Word-level CTM + diarized JSON
48 kHz / 24-bit WAV, −23 LUFS

Licensed per project · request a quote

Dataset 02

Multilingual Expansion

multiple languages and regional locales drawn from native-speaker podcast networks. Parallel topic coverage where the supply allows it.

20–80 hrs per locale typical
Native-speaker verified
Per-language transcripts + metadata
Same studio quality bar as EN catalog
4–6 week ramp on new locales

Licensed per project · request a quote

Dataset 03

Custom Commission

We brief, cast, and record to your spec — domain, accents, scenarios, hours. You own the result, optionally on an exclusive license.

Scoped to brief
4–8 week turnaround
Exclusive licensing available
Direct studio and casting access
Same provenance pipeline

Scoped to brief · request a quote

§ 03 — Spec sheet

What every file ships with.

Specification

Standard

Notes

Audio format

WAV, 48 kHz / 24-bit

Lossless. FLAC + 16-bit re-encodes on request.

Loudness target

−23 LUFS integrated

EBU R128 broadcast standard.

Channel layout

Mono per speaker + stereo mix

Per-speaker isolation where the studio supports it.

Transcript format

JSON + CTM + plain text

Word-level alignment, speaker IDs, confidence scores.

Diarization

Speaker-labeled JSON

Manual QC pass on every file.

Metadata

Per-file JSON sidecar

Speaker demographics, recording date, studio, mic chain.

Provenance manifest

SHA-256 + signed release ID

Audit any WAV back to a named, contactable speaker.

License grant

Perpetual, commercial AI training

Explicit grant in the speaker release. Not implied.

§ 04 — How a dataset ships

From sample
to signed contract.

Sample pack

Tell us the catalog and use case. We send a 60-second WAV, the matching datasheet, and a short note on which package fits — within one business day.

NDA + 5-minute sample

Sign a one-page NDA and we deliver a 5-minute representative sample with full transcript, metadata, and a draft provenance manifest for legal review.

MSA + first delivery

Master agreement, DPA, and security questionnaire are pre-built. First delivery ships within one to two weeks of signature, on the format you specify.

Per-file manifest

Every WAV arrives with a SHA-256 checksum, signed release ID, and speaker metadata. Audit any file in your training set back to a named speaker at any time.

Named contact for life

One person owns your account end-to-end — sourcing, contracts, deliveries, audits. No ticket queues, no rotating success managers, no escalation paths.

Named human contact

Speakers retain ownership and can revoke. We honor revocations on a defined SLA and re-issue manifests so your training set stays clean.

§ 05 — Don't see it?

We source under-represented
languages too.

Through our podcaster network we've sourced audio for several smaller-population languages. Talk to us about Swahili, Tagalog, Vietnamese, Bengali, Yoruba, or any language you don't see listed — most languages with a working podcast scene are reachable.

⌁

How to read "Hours available"

Hours currently in catalog and licensable today. "In collection" ships in 4–6 weeks. "On request" ships as a custom collection in 3–6 weeks.

⌁

How to read "Speakers"

The number of distinct named speakers contributing to the dataset. Every speaker has a signed model-training release on file in the consent vault.

⌁

How to read "Sample"

A 5-minute representative sample with transcript and metadata, delivered under NDA so you can evaluate fit before requesting a full quote.

§ 06 — FAQ

Catalog questions.

How often is the catalog updated?

Weekly. New hours land as our partner studios deliver back catalog and as commissioned recordings clear QC. Major catalog releases (new languages, new domains) ship monthly.

Can I license a single language without taking the whole multilingual bundle?

Yes. Every locale is licensable individually. Most multilingual customers start with one or two locales and expand once their training pipeline is proven.

Do you offer evaluation-only licenses?

Yes — short-term evaluation licenses are available for benchmark and ablation work with a smaller commitment than a full training license.

What happens when a speaker revokes?

We notify you within the contractual SLA, re-issue the provenance manifest with the affected files flagged, and provide replacement audio of equivalent length and demographics where possible.

Can I get the source releases for an audit?

Yes. Signed releases are stored in our consent vault and available for inspection under NDA — typically as part of a Fortune-500 vendor review or a government procurement audit.

Do you ship to on-prem / air-gapped environments?

Yes. Datasets can be delivered to S3, GCS, Azure Blob, or shipped on encrypted physical media for air-gapped training environments.

Need a custom collection?

Any language with podcast infrastructure. Tell us the language, accent mix, demographics, and hours — we'll come back with a scoped proposal in 2 business days.

Scope a custom collection →or email jaeden@fiund.com