Guide for YOLO - Concept, Principles, Use Cases, Know-How

You Only Look Once, Real-Time Object Detection

Sangsun Moon
Guide for YOLO - Concept, Principles, Use Cases, Know-How

Object detection is a technique used in the field of image processing and computer vision. A recent development in the field of computer vision is multi-labeled, Key point is a big umbrella term for techniques like this. The YOLO model is the most representative of the object detection techniques.

Understanding the many uses of YOLO can help you gain insights for your business using computer vision. In this article, we'll cover the definition, principles, and use cases of the YOLO model, as well as DataHunt's know-how to make it more accurate.

What is YOLO? (YOLO definition and concepts)

Object Detection

Object detection is a computer vision task that involves identifying and locating objects in an image or video. The detected objects can be people, cars, buildings, or animals.

Object detection technology exists to parse two questions.

  1. What is this? To identify objects in a specific image
  2. To set the exact location of an object within an image.

Traditional object detection has attempted to address data limitations and modeling issues with a variety of approaches, but the difficulty of detecting objects in a single algorithm run has led to the emergence of YOLO algorithms.

How to improve object detection accuracy with YOLO - definition, principles, use cases, know-how
Source: Joseph Redmon, YOLO: Real-Time Object Detection

You Only Look Once

YOLO (You Only Look Once) is a state-of-the-art real-time object detection system. It's been making waves for its ability to process data faster and more accurately than traditional models. YOLO is a deep learning-based approach to object detection.

In simple terms, YOLO grids the input image into a certain segmentation and then passes it through a neural network to generate bounding boxes and class predictions to determine the final detection output. Before testing on real images and videos, we first train several instances on the entire dataset.

YOLO was first introduced by Joseph Redmon et al. a 2015 paper and has been updated several times since. It has since been released by many developers up to v8.

YOLO is a very fast model because it doesn't deal with complex pipelines, which makes it particularly good for applications that require real-time decision making.

The YOLO Principle (How YOLO works, how to use it)


  • As mentioned above, YOLO passes images segmented on a grid through a neural network and then generates the final detection output using techniques like Bbox or prediction.
  • To compute the Bbox, YOLO implements the main post-processing steps of intersect over union (IoU) and non-maximum suppression (NMS).
  • First, the IoU is to see how well the bounding box predicted by the model matches the actual object. The overlap of the two results gives the IoU.
  • Object algorithms often suffer from over-identification of specific objects. During the localization phase of the concept of interest, multiple detection groups are created near the actual location, which is an inherent phenomenon of imperfect detection algorithms.
  • NMS has been used in many areas of computer vision to prevent this. NMS can be used to identify the best cell among all the candidates for a face to belong to. Rather than determining that there are multiple faces in the image, NMS selects the box with the highest probability of being for the same object.

YOLO Structure, Principles
Source: Joseph Redmon, YOLO: Real-Time Object Detection

How To

YOLO utilizes both IoU and NMS to quickly predict different objects in an image. As the model is being trained, it looks at the entire image, providing contextual information about the class or the appearance of the image as seen by the class Implicitly encoding합니다.

YOLO first looks at the input image and divides the photo into an N×N grid. For each grid, it starts to classify and localize the image. It determines where the objects are, and draws a Bbox around the objects it needs to identify. The YOLO algorithm then estimates the Bounding Box and the class probability of each object, respectively.

If you want to try training a YOLO model with a custom dataset, it's easier than you might think. Clone YOLO's repository, install the necessary files, and load your custom annotated dataset. After that, you just need to provide the model configuration and architecture definitions and variables, and you're ready to start your own Object Detection project.

Why Pay Attention to YOLO

YOLO utilization

YOLO is fast and relatively accurate. There are many models that are more accurate than YOLO, but at the end of the day, real-time is what matters most in business, so we don't see YOLO being dethroned as the king of object detection anytime soon.

That said, here's why the YOLO algorithm is important.

  • Speed: Predict objects in real time to speed up detection
  • High accuracy: Delivers accurate results with minimal background error
  • Learning capabilities: YOLO has excellent learning capabilities to learn representations of objects and apply them to object detection.


YOLO computes the location and boundaries of objects in a single scan of a given image. In the past, R-CNN required predicting where an object might be in an image and extracting features using a convolutional net, but YOLO simplified these steps to enable real-time processing. This made it an object detection deep learning model that can be applied to a variety of services and deliver significant results.

Compare YOLO version differences

YOLO's update history by version is shown below.

YOLO Compare by version (v1~v4)
출처: MS-COCO(test-dev) mAP(%)와 FPS(V100)을 기준으로 한 YOLO 모델 버전별 비교

  • YOLO v1 (2016): A deep learning-based network for real-time object detection
  • YOLO v2 (2017): Performance improvements and speed improvements from v1
  • YOLO v3 (2018): Improves object detection accuracy and speed by improving network structure and learning methods.
  • YOLO v4 (2020. 04): Improved object detection accuracy and speed by applying SPP and AN technology
  • YOLO v5 (2020. 06): Over 10% more accurate than its predecessor, smaller model size
  • YOLO v6 (2022. 07): Optimization of the training process, proposal of a trainable bag-of-freebies
  • YOLO v7 (2022. 09) Improved the efficiency of the algorithm and introduced quantization and distillation methods for system mounting.
  • YOLO v8 (2023. 01): Released a new repository to build a unified framework for object detection, instance segmentation, and training image classification models

When comparing YOLO versions, the choice is usually between v5 and the newest version, v8. YOLOv5 has the advantage of being easier to use, while YOLOv8 is faster and more accurate. Ultimately, deciding which model to use depends on the requirements of your application, but generally speaking, you'll want to choose YOLOv8 if you need real-time object detection.

YOLO Use Case

Leveraging real-time - detecting marine distress from CCTV and drone footage

The Shipbuilding and Offshore Engineering Research Institute utilized a deep learning object detection method for its marine distress detection model. To make detection more efficient and accurate, they developed a real-time survivor search solution based on AI. By leveraging the real-time nature of the YOLO model, the deep learning model was able to detect objects and people in distress based on drone video data.

YOLO models have also been used to analyze CCTV footage to identify traffic or crowded areas. During the COVID-19 pandemic, YOLO's real-time object detection made it easy to identify mask wearers and fever patients.

Defect Detection and Quality Assessment - Agriculture, Manufacturing

The quality of the harvested crops and the classification of defects is very important, but it is not easy due to the lack of capital and manpower. Therefore, a defect detection model was created using artificial intelligence-based deep learning algorithms. The model, which was created using the YOLOv3 algorithm, was able to automate the process from data collection, quality assessment through analyzed data, and finally defect detection.

As such, YOLO models are evolving to train on customized training data to meet the needs of different users. YOLO is free to train with user-created datasets. Typically, training a model with a large dataset requires a long computation time and amount of computation. Using a pre-trained model file before training not only increases the number of training distributions, but also reduces the training time. Using transfer learning techniques makes YOLO faster and more accurate.

However, in the absence of a pre-trained model that fits my needs, the process of training a large dataset is necessary. At the end of this process, the quality of the training data must be guaranteed for YOLO to be as accurate as expected. 

YOLO know-how: The key to optimizing YOLO is high-quality training data

YOLO running on sample artwork and natural images from the internet.
Source: Joseph Redmon, YOLO: Real-Time Object Detection

Limitations of YOLO

According to the original paper that introduced YOLO, the model's shortcomings can be summarized as four.

  1. Spatial constraints of only guessing 8 bounding boxes for each Grid Cell makes it difficult to determine objects that are close together.
  2. The problem of using multiple down sampling can often reveal features that are not well-detailed
  3. Incorrect Localization
  4. Since the bounding box is trained from the data, it is difficult to detect when something that is not in the training data is given as the test data.

The biggest hurdle for deep learning is training data. However, there is only so much data you can apply in your own application area.

Despite their limitations, YOLO models are still popular because they are extremely fast to compute compared to traditional models. However, to achieve high accuracy, models need complete training data. If the model has to detect things it hasn't learned due to incomplete training data, it will be less accurate, which can lead to a loss of business confidence.

Building high-quality training data, how do you datahunt?

당시 Implementing a parking robot, we needed a photo of the vehicle on the ground. They also needed data to learn the height of the vehicle, a challenging task that required them to derive a method for measuring the height of a large number of vehicles and take photos from the robot's perspective. The training data that DataHunt annotated with high accuracy helped them build a more complete model of the parking robot.

In addition, from road data and autonomous driving data collected from around the world, Lane and road boundary polyline labeling. The task was particularly challenging and required meticulous work, so we collaborated with experienced domestic labelers and conducted intensive training on training data processing tasks. We were able to build 300,000 pieces of data and reduce the error rate to around 5%.

How to improve object detection accuracy in Datahunt

To develop a deep learning-based object classification model, training data needs to be prepared, filtered, labeled, etc. It is necessary to remove duplicate data or carefully check for incorrect data so that the model can learn well. Also, too little training data can lead to poor model quality.

Whereas in the past we focused on developing high-performance models, these days the first step and keyword for any AI-powered endeavor is data accuracy. That's because high-quality data is critical to unlocking the full power of your model. At DataHunt, we recognize what data our clients need most and translate it into It specializes in processing.

Like YOLO, It's easy to think that any Data processing partners can make all the difference in how well your model performs. That's why you need data processing and training data building experts like DataHunt.


  • The YOLO (You Only Look Once) model is a real-time object detection system, introduced by Joseph Redmon and others, which was released in June 2023 as far back as existing v8.
  • The YOLO model is characterized by its speed and accuracy, as it passes images segmented on a grid through a neural network and then uses the Bbox technique to generate the final detection output.
  • Models like YOLO are driven by what they learn from data, so building high-quality training data is necessary to improve performance and accuracy.

Talk to Expert