VOICE AGENTS · CONVERSATIONAL AI

Train voice agents that sound like people, not IVR menus.

Most “conversational” datasets are scripted reads with two voice actors taking polite turns. Real conversation has interruptions, overlapping speech, backchannels, false starts, repair sequences, and pacing that varies with topic. We license multi-speaker dialogue from real podcasters who do this every day for a living.

Multi-speaker dialogue · Verified turn boundaries · Backchannels & repairs preserved · Multi-track where available
Specs

What we deliver.

Multi-speaker conversations

Verified turn boundaries and per-speaker labels. Two-, three-, and four-way dialogue available.

Natural turn-taking

Overlap regions preserved, not edited out. The social signal that makes voice agents feel human.

Backchannel events

“Uh-huh,” “right,” “mm-hmm” — annotated where available.

Repair sequences

False starts, restarts, and self-corrections preserved, not cleaned up to look pretty.

Topic & domain metadata

Sample by scenario for retrieval-augmented agent training.

Multi-track audio

One channel per speaker where the original recording supports it — diarization training becomes trivial.

Use cases

Common voice-agent use cases we support.

Foundation training

For
Full-duplex speech agents
Why
The kind that can interrupt and be interrupted

Turn-taking modeling

For
Backchannel & gap detection
Why
Social signals that make agents feel human

Domain adaptation

Interview
Support agents
Panel
Multi-party agents
Narrative
Monologue agents

Latency benchmarking

Source
Natural conversations
Reference
Real human-baseline turn latencies

Evaluation corpora

For
End-to-end voice agent quality scoring
Held-out
Designed to mirror production traffic

Diarization

Source
Multi-track audio
Labels
Verified speaker boundaries per file
Compare

Why this is hard to source elsewhere.

Most “conversational” datasets in the open and commercial markets fall into one of these traps. We don't.

SourceWhat's wrong with it
Two-speaker scripted readsClean but unnatural — no interruptions, no overlap, no real pacing
Telephone customer service recordingsNatural but legally unusable — no consent for AI training
Single-speaker podcastsWrong shape — no dialogue dynamics
YouTube-scraped interviewsLegally unusable — no consent, no provenance, no contactable speakers
aipodcast multi-speaker conversationsReal dialogue, signed releases, contactable speakers, per-file provenance — what you actually need

Want a multi-speaker sample with diarization?

Get a representative conversational sample with full diarization within 48 hours of NDA. Real overlap, real backchannels, real repairs.