Paid Dataset
Every dataset is fully licensed and vetted, ready for immediate commericial use
Expert Q&A
- 3.8M+ Public Q&A pairs with expert answers, categorized by domain(e.g., Legal, Medical, Finance)

News & Media
Partnership with major South Korean press and specialized media(Legal, Economy, etc.)
- Text, video+text, multimodal, and more

Problem-Solution
- Problem-solution datasets covering the Korean Elementary, Middle, and High School core curricula, including KMO* level content
*KMO: The Korean Mathematical Olympiad

Broadcast Video / Audio
- Partnership with major domstic broadcasting companies
- Can provide various broadcast video and radio data, etc.

Harmless AI Eval
LLM safety evaluation questions
Includes bias, hate, illegality, sensitivity, and timeliness reflection

* New partnerships and data sourcing can be arranged based on specific client requirements
* Available dataset types include multilingual(Dialogue/translation), image(photo/illustration/synthetic), and coding test datasets
* New partnerships and data sourcing can be arranged based on specific client requirements
* Available dataset types include multilingual(dialogue/translation), image(photo/illustration/synthetic), and coding test datasets
Free Dataset
DATUMO has been at the forefront of empowering the Korean AI ecosystem through our 'AI Dataset Support Project.'
We are proud to release our high-quality data assets to the public via 'OPEN DATASETS.' Available at no cost, these resources are designed to accelerate the work of researchers and companies alike.
Cochl.
Background noise
Datumo enhanced Cochl’s sound recognition AI by collecting diverse real-world audio data.
POSTECH
InstaOrder
Approximately 1.45 million combinations of positions of two objects
Marqvision
Merchandise images
Collection and bounding-box annotation of images of items according to ten categories
GIST
Gesture
Gestures of Korean gestures, a means of communication for supplementing the disabled
Computer Vision Lab
Food image
A food image dataset with labeled ingredient information
Magentarobotics
Posterior body image
Essential posterior image datasets for massage robot development
AI-Ready Datasets
The perfect data for your AI
Leveraging expertise in data collection and processing, Datumo guarantees premium quality and legal compliance. Develop your AI with confidence, knowing every dataset is fully licensed.
200
Founded
0+
Clients
0M+
Processed Data
0K+
Crowd Workers
LLM Evaluation
From Question Generation to Analysis
Enhance the performance of your LLM-based services with Datumo Eval. Create questions tailored to your industry and intent, and systematically analyze model performance using custom metrics.













