| WEBINAR Sep 25 Wed 2PM PST | Criteria and Metrics for LLM Evaluation
Open Datasets – Marqvision

Open Datasets – Marqvision

Have you ever spotted counterfeits while shopping online? Counterfeits invade both intellectual and physical properties of those who make the original product. Marqvision, founded by a customer who received a counterfeit toothbrush from online shopping, came to us to fix this problem. This project started to automate the process of detecting, reporting, and analyzing counterfeits through machine learning, and reduce the resources, time, and cost in doing so.

In order to provide a monitoring solution for counterfeits online, vast amount of image data of numerous items were needed because AI models perform best when trained with large amount of accurate data.

For more effective and accurate monitoring of counterfeits, DATUMO took part in data crawling and designing the process of data preparation. Also, by meticulously pre- and post-processing data, we were able to build this project suitable for Cash Mission, DATUMO’s mobile crowd-sourcing platform, which immensely sped up the whole data preparation process.

The three projects carried out using Cash Mission were:

  1. Collect images of merchandise
  2. Draw bounding boxes around the merchandise and label accordingly
  3. Validate the labeled boxes

One of the main reasons Datumo is able to collect and label data with high precision is the <User Guidelines Team>, which solely focuses on writing detailed guidelines for the crowd-workers to help them understand the project better.

Let’s take a look at how the guidelines were written in order to provide quality datasets for a smarter counterfeit monitoring solution.

After reading the guidelines, all crowd-workers are required to take a quiz. Sometimes we also wish we could have everyone participate, but we cannot compromise on data quality.

 

 

Collected images are first inspected by going through the project.

 

To achieve a highly performing AI model, consistency in training dataset is essential. In order to deliver datasets that meet the requirements in consistent, high quality, Datumo runs a strict final validation for every single data.

Marqvision has reduced the time and cost consumed in monitoring and discovering counterfeits by 98 and 96 percent, respectively, compared to those when done manually. They currently monitor counterfeits in twenty-five global online marketplaces of ten different countries. Some of the clients are Amazon, Ebay, Alibaba, Taobao, Coupang, and Naver. In 2021, Marqvision has been selected as YC 21 by Y-Combinator and have raised $3.2M seed round.

 

Let’s hear what Marqvision has to say about Datumo:

“Datumo provided us with very high quality data. We have worked on the same project with other data crowd-sourcing companies and yet Datumo’s data inaccuracy rate was less than 10% compared to others.

Datumo excels at setting and adhering to the timeline. They have set the timeline considering our necessities and provided us with data on time. Other partners repeatedly postponed deadlines, but Datumo delivered everything on time which helped us immensely on setting up a timeline for the development of our own AI model. Also, easy and quick communication via mobile messenger/email/Slack and so on was much appreciated.”

Open Datasets for Data-Centric AI

The above datasets can be downloaded for free through DATUMO ‘OPEN DATASETS'.

DATUMO would like to support the AI industry by sharing.

Learn More About DATUMO Open Datasets

CC BY-SA

Reusers are allowed to distribute, remix, adapt, and build upon the material in any medium or format, even commercially, so long as attribution is given to the creator. If you remix, adapt, or build upon the material, you must license the modified material under identical terms.

https://creativecommons.org/licenses/by-sa/3.0/deed.en

See what we can do for you.

Build smarter AI with us.

Related Posts