...

Premium Datasets

Train your AI with premium licensed data, delivered instantly

Premium Datasets

Train your AI with premium licensed data, delivered instantly

Paid Dataset

Every dataset is fully licensed and vetted, ready for immediate commericial use

Expert Q&A

  • 3.8M+ Public Q&A pairs with expert answers, categorized by domain(e.g., Legal, Medical, Finance)

Book

  • Comprehensive collection of books distributed within South Korea

News & Media

  • Partnership with major South Korean press and specialized media(Legal, Economy, etc.)

  • Text, video+text, multimodal, and more

Problem-Solution

  • Problem-solution datasets covering the Korean Elementary, Middle, and High School core curricula, including KMO* level content

*KMO: The Korean Mathematical Olympiad

Broadcast Video / Audio

  • Partnership with major domstic broadcasting companies
  • Can provide various broadcast video and radio data, etc.

Harmless AI Eval

  • LLM safety evaluation questions

  • Includes bias, hate, illegality, sensitivity, and timeliness reflection

* New partnerships and data sourcing can be arranged based on specific client requirements

* Available dataset types include multilingual(Dialogue/translation), image(photo/illustration/synthetic), and coding test datasets

* New partnerships and data sourcing can be arranged based on specific client requirements
* Available dataset types include multilingual(dialogue/translation), image(photo/illustration/synthetic), and coding test datasets

Free Dataset

DATUMO has been at the forefront of empowering the Korean AI ecosystem through our 'AI Dataset Support Project.'

We are proud to release our high-quality data assets to the public via 'OPEN DATASETS.' Available at no cost, these resources are designed to accelerate the work of researchers and companies alike.

Upstage

KLUE

Korean Language Understanding Evaluation Benchmark

Cochl.

Background noise

Datumo enhanced Cochl’s sound recognition AI by collecting diverse real-world audio data.

POSTECH

InstaOrder

Approximately 1.45 million combinations of positions of two objects

Wesee

Crosswalks and Currency

AI algorithm development based on video recognition

Fitogether

Soccer field images

Beyond the Pitch: A New Era of Sports Ecosystems

Marqvision

Merchandise images

Collection and bounding-box annotation of images of items according to ten categories

GIST

Gesture

Gestures of Korean gestures, a means of communication for supplementing the disabled

TILDE

Digital numeric

Biometirc information display labeling and transcription

Computer Vision Lab

Food image

A food image dataset with labeled ingredient information

RebuilderAI

Real-world Dataset

A real-world datset for object material recognition

Magentarobotics

Posterior body image

Essential posterior image datasets for massage robot development

AI-Ready Datasets

The perfect data for your AI

Leveraging expertise in data collection and processing, Datumo guarantees premium quality and legal compliance. Develop your AI with confidence, knowing every dataset is fully licensed.

200

Founded

0+

Clients

0M+

Processed Data

0K+

Crowd Workers
LLM Evaluation

From Question Generation to Analysis

Enhance the performance of your LLM-based services with Datumo Eval. Create questions tailored to your industry and intent, and systematically analyze model performance using custom metrics.

Generate Questions
Evaluate Answers
Adjust Metrics