Conversational AI data that sounds like people, not scripts.
Conversational AI needs more than tidy turns. It needs interruptions, overlap, backchannels, topic drift, register shifts, and disfluencies — the things every other dataset silently scrubs out. Our catalogue is built from working podcasts where all of that happens naturally, on consent.
Built for the work.
Multi-speaker dialogue
Real interviews, panels, and round-tables with two to six speakers per recording.
Natural turn-taking
Backchannels, hedges, interruptions, and overlap captured in context with timestamped boundaries.
Disfluencies preserved
Ums, uhs, false starts, repetitions, self-corrections, and laughter — kept in the transcript and tagged. Audio LLMs need them.
Speaker metadata
Per-speaker language, dialect, age range, and role (host, guest, expert, caller).
Long sessions
30–120 minute continuous conversations — not isolated turns. Context windows your model can actually use.
Aligned transcripts
Word-level alignment, RTTM diarization, overlap regions, and backchannel tags included by default.
What is in the manifest.
The things scripted data never captures.
Turn-taking
Real conversational floor management — gaps, latching, smooth handoffs, and the rare clean turn.
Overlap
5–18% of speech overlaps. Your full-duplex agent has to handle it. Ours is tagged by region.
Disfluencies
“So, um, the — the thing is” is how humans talk. We keep every um and uh, tagged.
Backchannels
Mhm. Yeah. Right. Wow. The acknowledgments that keep dialogue alive. Tagged inline.
Topic shifts
Annotated segment boundaries so dialog-state models learn how humans actually pivot subjects.
Register shifts
Casual to technical and back inside the same conversation. The thing audio-LLMs are weakest on.
From email to first manifest.
Sample request
Tell us the model and target overlap rate. We return a 30-min sample with full disfluency and overlap tagging within 48 hours.
Mutual NDA
Standard one-page mutual.
MSA + data licence
Perpetual commercial training licence, named consent contact for life, written speaker release on every voice.
First delivery
Pilot shard with 30–120 min recordings, RTTM, word-level JSON, disfluency and overlap tags, register labels.
Manifest & provenance
Per-file lineage: speakers, recording date, mic chains, room, consent version, jurisdiction, SHA-256.
Ongoing delivery
Monthly increments, locale expansion, custom overlap targets, written revocation SLA.
Common questions.
What is a conversational AI dataset?
A dataset of natural multi-speaker dialogue used to train models that understand or generate conversation — dialog systems, audio-in/audio-out LLMs, meeting summarisers, and full-duplex voice agents.
Are interruptions, overlap, and backchannels labelled?
Yes. Diarization captures overlap regions; backchannels (mhm, yeah, right) are tagged; interruptions are marked at the turn boundary. Overlap is preserved, not edited out.
Are disfluencies preserved?
Yes. Ums, uhs, false starts, repetitions, self-corrections, and laughter are preserved in the transcript and tagged. Most datasets silently clean these out — we do not, because audio-LLMs need them.
How long are recordings?
Typical recordings run 30–120 minutes of continuous dialogue. Some go to 180+. Long-context conversation is the gap most datasets leave open.
How many speakers per recording?
Two to six speakers per recording. Two-person interviews dominate; panels and round-tables are available for crosstalk-heavy training.
Is this suitable for audio-in/audio-out LLMs?
Yes. This is the cleanest source of long-form, full-duplex, naturalistic dialogue with consent — exactly what GPT-4o-class audio LLMs and Moshi-style full-duplex stacks need.
Can I filter by topic, register, or speaker role?
Yes. Recordings are tagged by domain (tech, health, finance, sports, politics, culture, science), register (casual, formal, technical), and per-speaker role (host, guest, expert, caller).
Is the dataset suitable for evaluation?
Yes. Held-out benchmark slices with human-reviewed transcripts are available, balanced for speaker diversity and overlap rate.
Can I get a sample?
Yes. Email partnerships@aipodcast.io and we will send a 30-minute representative sample with audio, alignment, diarization, and disfluency tags within 48 hours of NDA.
Want a representative sample?
30 minutes of audio + transcripts + metadata, delivered within 48 hours of NDA.