Datasets

Speech datasets you can license today

Conversational, read, and domain-specific recordings across multiple languages. Every dataset is human-recorded with verified transcripts. Request free samples to evaluate before you license.

New datasets are coming soon. Talk to us if you have something specific in mind.

FAQ

Licensing, customs, and how to buy

What does a Spirelight dataset license cover?

The price shown on each dataset page is for the Spirelight Standard License: a non-exclusive commercial license to use the dataset for training, evaluating, and shipping speech and language models. The dataset is also licensed to other customers, and your trained models and their outputs remain yours. For exclusive licenses, restricted redistribution, or any custom terms, pricing is set per project. Book a call to discuss.

Can I get an exclusive license?

Yes. Exclusive licenses, where the dataset is licensed only to you, are negotiated per project. Book a call and tell us which dataset and what window of exclusivity you need, and we will come back with terms and pricing.

Do you build custom datasets?

Often. Send us the language, dialect, recording conditions, speaker mix, hours, and intended use. We coordinate recording with our contributor network, transcribe, verify, and deliver in the format you need. Book a call for a timeline and a quote.

How do the samples relate to the full dataset?

The sample bundle is a representative slice of the full dataset: same speakers where applicable, same recording conditions, same transcript style. You can validate audio quality, transcription accuracy, and speaker variety before committing.

What languages do you cover?

Active datasets are listed above. For languages, dialects, or domains that are not in the catalogue, we build to spec. Talk to us about your requirements and we will scope a custom recording.

What audio and transcript formats do you ship?

Each dataset lists its default formats. On request we can deliver alternate sample rates (16 kHz, 44.1 kHz, 48 kHz), MP3 or FLAC, mono or stereo audio, and transcripts as JSON, SRT, VTT, or CSV.

How do I license a dataset?

Click "Request samples" on the dataset page to receive download links and the price by email. To finalize, reply with your team and intended use; we send the Spirelight Standard License and the invoice. For custom terms, exclusivity, or volume pricing, book a call instead.

Have a question that is not on this list? Book a call and tell us what you are building.