Real conversations build real AI
We provide truly usable voice data from people's natural conversations.
Why overly clean data doesn't cut it
AI trained on data created in ideal environments — synthetic speech or studio recordings — cannot handle real-world noise and unpredictable situations.
The key lies in the "authenticity" of training data.
Kataro!! Natural Data
Real everyday conversations
- Real environments with ambient sounds and noise
- Unscripted natural conversations with emotions
- Includes hesitations and overlapping speech
Our data is naturally collected as users engage with our product as an everyday conversation tool. It's packed with "everyday life as-is" — ambient sounds not found in studios, hesitations and emotional fluctuations not in scripts. By learning from these, AI becomes capable of functioning even in complex real-world environments.
Data Types
We can collect and provide data in three formats tailored to your needs.
Free Talk (2 people)
Data from two users freely conversing. Back-channel responses, laughter, overlapping speech, and self-corrections are recorded as-is. Ideal for conversational AI and emotion analysis.
Topic Talk (1-2 people)
Data where users freely discuss a given theme like "something fun that happened recently." Useful for collecting vocabulary and expressions on specific topics.
Scenario / Task (1 person)
Conversation data for specific scenarios like "ask AI about the weather" or "give instructions to a robot." Recreates actual usage situations.
Use Cases
Our data can be used for developing and training various AI products.
Request Sample Data
We distribute sample datasets through our contact form.
Check the data format and file structure first.