Image
Hello, in this article, I'd like to share with you a real-world example of how Datahunt utilized OCR technology for data processing and how it worked for KEPCO.
Optical CharacterRecognition (OCR) technology is a computer algorithm used to recognize and extract text from photos, scanned documents, videos, and more. This technology is an optical character reader, which converts characters visible to the human eye into a data format that can be read by a computer.
OCR is widely used in a variety of industries and applications, and plays an important role in document management, automated data entry, print digitization, translation, text mining, robotic process automation, and more. For example, it can be used to extract content from scanned images of various documents and leave them as digital information, or it can be used to enhance security by identifying sensitive personal information such as licenses, social security cards, etc.
The main steps of OCR technology are as follows
At Datahunt, we are conducting R&D on all of the above processes to create better OCR models.
Last year, Datahunt worked with KEPCO on an OCR project, which was expected to be very labor-intensive as tens of thousands of images needed to be labeled.
The task was to regionize all the text in the images and attribute everything that fell into a given category, a difficult task that actually took at least five minutes per image for one of Datahunt's experienced operators. If the quantity of images to be processed was 100,000, this would take more than 8,000 hours in total, so this is a task where AI really shone through.
At DataHunt, we took several steps to improve our OCR model. The steps were as follows
This led to faster and more accurate work.
While the model itself is a performance, preprocessing was more important for the OCR task than for the other tasks. The reason is that for each text, we draw a bounding box or a polygon, and in general, for either task, a polygon is harder to work with than a bounding box. Is it because it's harder for humans to work with too?Leaning letters have less polygon accuracy and less transcription accuracy than letters that are standing upright.
To summarize
Therefore, if you can straighten the text before asking the AI to do the work, it will be able to create more accurate labels.
While this can definitely improve your results, you may still have questions about this process, which isn't all automatic. To answer in advance,
there are several improvements beyond preprocessing, but I've only outlined them here.
To see how much the above process helped our OCR work, we measured the efficiency of our work by directly comparing the time with and without AI. For reference, we performedOCR work with Korean characters.
The results were surprising: all workers saw a significant reduction in work time! While there was some variation, the average reduction was around 40%, which leads to the conclusion that with a little extra effort, you can turn one image into two in the same amount of time.
This conclusion has implications not only in terms of cost or time, but also in terms of accuracy, because even if you give your workers plenty of breaks, they'll quickly get tired and make more mistakes when they're doing a lot of work. However, as mentioned above, you can expect to see more accurate work as a result of easier work.
In this article, we've explained what OCR is, the process and real-world examples, and how we use AI to streamline our work at DataHunt, even down to the quantitative level.