AI for The Underdogs #3

Technology to tear down communication barriers

*This dataset has been collected and annotated as part of Datumo’s <2021 AI Training Data Sponsorship Program> and is downloadable from Datumo’s Open Datasets website.

Datasets for those with visual and mobility impairment

How is the development of AI helping people with disabilities?

The third episode of “AI for The Underdogs” will cover GIST’s gesture dataset.

Who would benefit from this dataset?

Creating this dataset has been part of GIST’s research project to enable smoother communication between people with and without disabilities through gestures.

People with particular disabilities often face challenges in communicating with others. For those with severe disabilities, it may be difficult to use sign language, which makes it very difficult to communicate with their families or guardians. In order to make communication easier, GIST and Datumo together created dataset of Korean symbolic gestures based on sign language to complement the communication process of those with severe disabilities.

How did Datumo create this dataset?

All data were collected and labeled using Datumo’s crowd-sourcing platform, Cash Mission.

Total of 21,020 video datasets(205 categories) of gestures were created- each gesture was repeated five times per video.

The most important aspect in training AI models is having dataset of high accuracy and variety. Due to Datumo’s 200K crowd-workers from Cash Mission and strict guidelines, we were able to achieve accurate and numerous data.

Main apparatus: DATUMO’s crowd-sourcing platform “Cash Mission”
Standards and means of data collection and guidance set, based on client’s needs
Project designed and guidelines (utilizing “Sondam” video tutorials) and tutorials set for crowd-workers
Pilot test carried out before the actual project for assurance
Collected “Sondam” video data via crowd-workers
Metadata tagging carried out by crowd-workers
Final data delivered after modification and inspection by in-house workers

Dataset Specification

4,204 video datasets on 205 categories
Total of 21,020 video datasets of gestures built- each gesture repeated five times per video produced by a crowd-worker

Great opportunity to secure high-quality data

The gesture dataset was created based on Datumo’s 4 core values:

[ ACCURACY / CONSISTENCY / BALANCE / COVERAGE ]

The dataset consists of a number of videos of people acting out the gestures in order to train the relevant AI model without any error.

How would you use this dataset in the real world?

“Sondam”, which is the name of the gestures, could be used to train gesture recognition AI models or for educational games to teach the gestures to those who need to use them.

This is also a meaningful attempt as this is the first benchmark dataset of the kind in Korea.

The video below is of the AI model developed using Sondam gestures. The model recognizes the user’s movements and teaches particular gestures.