Improving label validation with AI automation

How AI can improve quality and efficiency

Suho Cho
Improving label validation with AI automation

Today, we're going to take a look at how we use AI job validation at DataHunt, what information it provides, and how much it has improved our validation efficiency. We're not going to get too theoretical, so let's get straight to the action.


What is AI validation?


At DataHunt, we go through a meticulous process of work and review during data processing. It's a human-in-the-loop process where humans and AI work together to complement each other.

데이터헌트의 데이터 가공 파이프라인
Data Hunt's data processing pipeline


In this process, AI job checking is done primarily on the data that has been processed, and its role is to identify work products where the AI model is likely to be wrong and suggest corrections. In short, it assesses the reliability of the work, which can lead to several questions.


Q How does the model assess the trustworthiness of my work?

A We'll share the specifics when we're ready.


Q No AI is perfect, but isn't every piece of work reviewed by a human reviewer anyway?

Yes, but by alerting you to work that is likely to be incorrect, you can review it more carefully.


Q Does the AI review process take much time? It would be rather inconvenient if I had to wait for along time...

We're applying a number of techniques to make it work quickly while still being highly accurate.

The bottom line is that by increasing the accuracy of the review, you can create more accurate results and save time.


See AI task validation in action

In Data Hunt, we currently have validation features for classification, detection, and segmentation of objects in images. Let's take a look at how AI can help us with the detection task.

The detection task is to give a bounding box (a rectangle representing the area of an object) and the attributes of that object. For the sake of simplicity, we'll define it as"Bounding box = area + attributes". Within a single image, there maybe no objects at all, or conversely, there may be a lot of them. In either case, the worker's output needs to be inspected, and the inspector is faced with a situation where the bounding box of every object is one of the following cases.

  1. the area is properly specified and the properties are correct
  2. the area is correctly specified and the properties are incorrect
  3. the region is somewhat misspecified, but the properties are correct
  4. the region is somewhat misspecified and the properties are incorrect
  5. object exists but no bounding box
  6. there is no object, but there is a bounding box


In all of these cases, except for number 1, corrections should be made during the review process.

데이터헌트 작업 플랫폼 화면 1
The screen before reviewing a worker's work. The dog is incorrectly labeled as a rug, the sofa is unlabeled, and the area of the picture frame is incorrectly specified.

데이터헌트 작업 플랫폼 화면 - AI 자동 검수
Suggestions from theAI auto-check feature. The AI found the incorrect work in the previous screen and suggested corrections.

데이터헌트 작업 플랫폼 화면 - AI 자동 검수 2
The result of there view after accepting the AI's suggestions.


The above illustration shows a few examples of validation. When the AI proactively alerts you to areas or attributes of the work that are unreliable, the reviewer can accept or reject the AI's suggestions and continue the review. To accomplish this, the DataHunt platform trains its model from the data it has worked with so far, as requested by the customer. It's a common story, but learning has the following characteristics

  • The more accurate the work, the better the model performs
  • The more training data you have, the better your model will perform.


As a customer, you want to make sure you understand the above and that you're training at the right time. There's a trade-off: if you try to make your work too accurate from the start, you'll slow things down, and if you wait to train until you've accumulated a significant amount of data, you'll reduce the amount the model can actually help you check.

Of course, there area few other factors that can affect the performance of your model. The distribution between attributes in the training data could be too uneven, the type of model, different training techniques, etc. Currently, we don't allow customers to tweak these details, because while it's certainly possible to tweak them yourself to make a better model, in many cases it's not worth the increased monetary and time costs of training.

Once you've trained your model, you can now utilize the validation features. We'll show you how to use it in our platform user guide, which we're currently working on, but in the meantime, let's analyze how much AI Job Inspection improved the quality of our work with some quick test results.


Test results for AI validation

To evaluate the performance of this feature, we conducted a hands-on experiment with data from one of our customers that had already completed tasks and inspections. To evaluate the performance of the inspections, we modified some of the task results to generate errors. There were four types of errors

  • Attribute errors: an object's attributes were incorrectly assigned.
  • Region error: The region setting of an object is not very accurate.
  • Object undetected: The object was not found to exist.
  • Object false positives: when you say an object exists when it doesn't.
AI labelling human error distribution
Distribution of errors for a typical worker

The distribution of errors is based on the frequency of errors in real-world tasks. For each error, we determined how many were found by the AI and summed them to calculate the overall inspection performance.

AI human labeling error detection
Improving error detection with AI automated inspections


Overall, we found 57 out of 70 errors, for a detection rate of about 81%. The detection rates foreach error were as follows

  • Attribute errors: 87%
  • Region errors: 50%
  • Object undetected: 90%
  • Object false positives: 80%


We found that the detection rate for region errors was relatively low, which we believe is due to the ambiguity of the criteria for region errors: when we consider a correct answer to be one that fits the region of an object, there is no absolute standard for how different it must be from the correct answer to be considered an error.

In this experiment, we made the error so that the IoU with the correct answer was over 0.8, which is pretty close to the correct answer, which is probably why it was difficult to check. (To summarize, we made it an error, but it was still so close to the correct answer that it was hard to tell if it was an error or not)


AI task validation, conclusion?

As you can see, we are implementing AI automated checking for a few tasks at DataHunt, and through our experiments with real data, we have found that it can be a great help in checking. Tasks - In the process leading up to checking, you might think that if you can do it very painstakingly and accurately from the task stage, then checking becomes a little less important, but no one can guarantee that.

AI is playing a big role in finding those human mistakes that will always happen, and as a result, we're able to reduce the review time and therefore the cost. I think that's one of the strengths of DataHunt is that we're able to significantly improve the quality of the data labels that we end up with, so that we can deliver quality data.


Talk to Expert