What is Computer vision?

Computer vision definitions, principles, and use cases in a nutshell

Sangsun Moon
What is Computer vision?

What is computer vision?


Computer vision definition
All About Computer Vision

Computer vision is the ability of a computer to recognize features in a digital image or video that A branch of computer science that extracts and interprets information to perform specific tasks. In layman's terms, it's the art of putting "eyes" in a machine, and it's how computers analyze and interpret image and video data.

computer vision category
All About Computer Vision

Computer vision is revolutionizing many fields. It's the underlying principle behind tasks like image processing, model learning, and pattern recognition that we're all familiar with. For example, resizing an image, color correcting it, removing noise, and Image enhancements and transformations. In addition, pattern recognize, which looks for specific patterns or features in a given image, has led to advances in technologies such as face recognition and character recognition.

In addition to this, object detection and tracking can also be used to detect objects in images or videos, such as The task of identifying and tracking specific objects. Segmentation refers to dividing the pixels that make up an image into objects or regions. The principles behind all of these techniques lie in computer vision, which is playing a key role in areas such as autonomous driving, security systems, and medical diagnostics.


computer vision history
컴퓨터 비전의 역사

Computer vision is a technical field with a long history. Let's summarize it in chronological order based on key milestones.

  • Pre-1960s: Research began to recognize image patterns based on Frank Rosenblatt's "perceptron" concept. Computer vision at this time was focused on simple pixel-level processing and pattern recognition.
  • 1970s-80s: David Man's concept of a "computerized implementation scheme" laid the theoretical groundwork for image processing algorithms and led to technical advances such as edge detection and histogram resolution.
  • 1990s: With the introduction of more complex and sophisticated algorithms and methodologies, Object detection, Tracking, Segmentation technology.
  • 2000s: Deep learning and the rise of neural network algorithms have accelerated innovation in computer vision. The accuracy and performance of the models improved dramatically, and a variety of applications began to emerge, including image recognition, face recognition, and object classification.

Current computer vision technology is constantly evolving. Advances in deep learning and computer vision models are delivering amazing results every day. In the past, model-centric development has been prioritized, emphasizing real-time processing power. However, in recent years, we've seen great progress not only in real-time, but also in accuracy through improved training data quality.

Why is computer vision important?

Why computer vision is important

CV Depth Perception

Computer vision provides accurate data analysis and recognition through image processing and pattern recognition. In addition, the structure of deep learning algorithms has helped solve some of the key missions that computer vision must address. 예를 들어, Convolutional Neural Network (CNN)'s structural properties allow it to detect and extract spatial patterns and features present in visual data. Experts say that this new reality is possible thanks to the popularization of computer vision and artificial intelligence technologies. Increasingly affordable hardware, as well as improvements in depth perception and machine learning techniques, state-of-the-art CV/AI systems are becoming Deploy in a real solution.

If machines can process and understand images and video, the potential for automation and productivity gains across a wide range of industries is huge. It's also rapidly transforming industries like automotive, healthcare, and robotics. With the benefits of accuracy, efficiency, and automation, it is possible to analyze and recognize. It is now self-evident that computer vision will be the driving force behind innovations and advancements that will change much of the future.

What's the most important thing about computer vision?

If so, it's important to know the key technologies that will not only impact the future of computer vision development, but will also continue to evolve. Below are some of the technology areas cited as the most important core values in computer vision development.

Cloud computing

Cloud computing provides the vast amounts of computing power and data storage needed to train and deploy computer vision models. This is especially important for large-scale applications such as self-driving cars and facial recognition systems.

In recent years, there has been a marked increase in the number of devices and systems capable of computer vision, including pose estimation for gait analysis, face recognition for smartphones, and lane detection for autonomous vehicles. To support this, many optimizations have been made to accelerate computer vision solutions, such as increasing GPU speed, but this has not been enough to fill the scalability and uptime needs of these applications when serving thousands or millions of consumers. Cloud computing is the answer to the We can provide you with the resources you need to reduce and fill in the gaps in your infrastructure.

Auto ML

Auto ML refers to the process of automatically building and optimizing machine learning models. In the process of building a machine learning model, there are various Hyperparameter tuning, feature extraction, model selection, and more, automating these tasks reduces the need for developers to do them manually. Auto ML can automatically find the best machine learning model for a given dataset, using a variety of algorithms combined with technology. This allows developers to build high-performance machine learning models without specialized knowledge, and improves the accuracy and efficiency of their models.

For example, hyper-parameter tuning is the act of adjusting the parameters of a machine learning model to get the best performance. Auto ML automates this process to help find the best model. These techniques have a lot of potential in the field of computer vision, helping developers save time and effort while still developing high-quality models.

Machine learning libraries and frameworks

Machine learning libraries and frameworks support developers' computer vision work with a variety of algorithms and models, data preprocessing capabilities, and performance evaluation tools. Major deep learning libraries such as TensorFlow, PyTorch, Keras, and MXNet have become key tools in computer vision through constant updates and modifications.

Deep learning strategies have long been used to solve computer vision problems. We've utilized neural network architectures based on CNNs in missions that detect faces and lanes or estimate pose. More recently, Transformers, a new architecture for computer vision algorithms, has gained traction.

Transformers is one of the architectures of deep learning, primarily A model used in the fields of natural language processing and computer vision. Unlike conventional recurrent neural networks (RNNs), an "Attention Mechanism" that can process an entire sequence of information at once is introduced to improve performance and efficiency.

Principles and limitations of computer vision

How it works

Image recognition
All About Computer Vision Data

So let's take a look at the principles of how computer vision works. As a first step, let's assume that an image or video has been acquired from a camera or other sensor. Images on a computer are sometimes stored as a large grid of pixels. Each pixel is defined by a color and is stored in three additional primary color combinations: Red/Green/Blue (RGB). These are combined in varying intensities to represent different colors. Colors are stored within pixels. The type of camera used, lighting conditions, and distance from the object can all affect image quality.

The computer then processes the image to remove noise and other artifacts. This might include things like resizing or cropping, and adjusting brightness and contrast. Next, features are extracted from the image. For example, we look for things that can identify things in the image, from edges to shapes to objects. By using image processing techniques to improve the quality of the image, we can extract features and make it easier to identify objects. Simple features extracted from an image correspond to edges or corners, while complex features refer to objects or parts of objects.

Finally, we use these features to identify objects in the image. This can be done using a variety of techniques, such as machine learning or template matching, and can be done using Also known as Object Detection. Once the objects in an image are identified, the scene can be understood. It can be used to understand the relationships between objects or the Understand the overall context.

Multiple Object Tracking
Understanding Multiple Object Tracking using DeepSORT

For example, consider an algorithm that tracks the movement of athletes. The model stores the RGB values of the pixels that it believes are athletes on the field. Once you store the values, you can feed the computer program an image and ask it to find the pixels with the closest color match. The algorithm works by checking each pixel at a time to see if it differs from the target color. However, tracking the movement of a player is not as simple as tracking the movement of a ball, so algorithms have adopted methods that consider small regions of pixels or combine multiple kernels to avoid confusion. More recently, the advent of convolutional neural networks (CNNs) has made them even more accurate.

What is a Convolutional Neural Network (CNN)?

Convolutional Neural Network
What Are Convolutional Neural Networks?

A CNN is a type of artificial neural network specifically designed to process data with a grid-like structure, such as images. They are commonly utilized in computer vision tasks such as image classification, object detection, and image segmentation.

CNNs work by learning to identify patterns in images. image by applying a series of convolution operations, which are mathematical operations that take two signals as input and produce a third signal that is a combination of the two input signals. The convolutional operation applies a filter to the image by sliding one pixel at a time, multiplying the filter by the corresponding pixel in the image at each step and summing the results. This creates a new signal that represents the image features extracted by the filter. If you repeat this several times, using different filters each time, the CNN will learn to identify different features of the image. The features it learns can then be used to classify the image or detect objects in the image.

CNNs have been very successful in a variety of computer vision tasks. They are particularly suited to tasks that require identifying local features in an image, such as object detection and image segmentation. CNNs are also relatively efficient, making them ideal for real-time applications.


Computer vision is one of the hottest areas of technology today, but there are still some hurdles that have yet to be solved, especially since it's very difficult to fully replicate the human eye through technology. Below, you can read more about the challenges that computer vision has yet to solve.

  • Data quantity and quality issues
    Computer vision technology can be used to analyze large amounts of data and We need high quality data. However, there are still many challenges in obtaining such data. In particular, advances in sensor technology have created massive amounts of data, and building the systems and infrastructure needed to manage and analyze it has become a major challenge.
  • Generalization issues
    Computer vision techniques can only be used on training datasets. Unlike the human eye, the same object that appears in a new situation will have a different It can be difficult to generalize. To solve this problem, you need to implement a model that can generalize by learning about different angles, unwanted noise, and different regions.
  • Interpretability issues with models that are not intuitively predictive
    Because models in the deep learning family are so complex, the output of these models can be difficult to interpret. For example, if a misclassification occurs, it can be difficult to understand why.
  • Computational resource limitations
    Computer vision techniques based on deep learning require a lot of computational resources. In particular, as the depth of a deep learning model increases and the size of the dataset it trains on increases, the computational resources required for model training and prediction increase even more. For this reason, if the hardware used to train or predict the model is limited, you may experience poor performance.

To overcome the limitations and challenges of these computer vision techniques, we are currently A variety of studies are underway. For example, the development of lightweight models that perform well with less data and resources, research on how to efficiently collect and manage datasets, and various techniques and algorithms to improve the interpretability of models.

Computer vision trends and use cases


Computer vision market size
Computer vision Market size

The AI in computer vision market is valued at $15.9 billion in 2021 and is expected to expand to $51.3 billion by 2026, at a compound annual growth rate of approximately 26.3%, driven by the increase in video data from smartphones, according to Research and Market. Here's a look at the trends in computer vision that are poised to dominate the industry in the near future. For more information, check out this article.

  • Deep learning
    Deep learning is a type of machine learning that uses artificial neural networks to learn from data. Deep learning has had great success in the field of computer vision, and is now the dominant approach for many tasks, such as image classification and object detection.
  • 3D Vision
    3D vision, the ability to see in three dimensions, has long been a challenging area of computer vision, but has recently made significant progress. We now see 3D vision in a variety of applications, including augmented reality, self-driving cars, and robotics.
  • Multi-Modal vision
    The AI algorithms behind computer vision solutions are becoming more powerful. Computer vision is ultimately about computers having visual intelligence that mimics and exceeds human visual perception. It also includes the use of Multi-modal technologies have emerged, combining computer vision and natural language processing.
  • Edge Computing
    Edge computing is a distributed computing paradigm that brings computing and data storage closer to the edge of the network. This is important for computer vision applications because it can improve performance and latency for applications that require real-time processing of images and video.

Computer vision is increasingly being used in real-world applications such as self-driving cars, medical imaging, and retail. When promoted to businesses, computer vision systems are susceptible to noise, occlusion, and other problems that exist in the real world, such as To ensure that it works robustly, it's important to pay close attention during the build phase.

Use cases and applications

So let's take a closer look at how computer vision is being used in practice.

Self-driving cars

스트라드비젼 자율주행 SW
"Stradvision's Autonomous Driving SW Technology Demonstrated in Global Markets" - 헤럴드경제

Autonomous vehicles utilize a variety of sensors and computer vision technology to perform functions such as recognizing and avoiding objects that the car needs to pay attention to. Computer vision technology is utilized to do this. In autonomous driving, computer vision plays a crucial role in determining and recognizing the surrounding environment. You can find computer vision in action in the following features.

  • Object Recognition
    CCTV, radar, LIDAR, which combines sensor information to detect objects in the environment with images taken by cameras (computer vision) to create a performs object recognition in the field of view. This allows the car to recognize objects such as lanes, vehicles, and pedestrians, and to be aware of its surroundings.
  • Vehicle Recognition
    Cars use a combination of CCTV, radar, and LIDAR sensor data and cameras to detect and recognize other cars, buses, trucks, and other vehicles. Cars use this information to determine their distance from other vehicles, their speed, and more, and to drive safely.
  • Lane Detection
    Recognizes and determines lanes on the road by analyzing camera footage. Based on this, it determines the location of the car and whether it is drifting out of its lane, and performs safe driving.
  • Traffic Sign Recognition
    Cameras on cars can collect traffic sign information and recognize things like the type of signal and its location. This allows them to understand traffic signal information and communicate it to the driver in the same way as an audible warning.


computer vision Virtual fitting
Virtual fitting startup DeepPixel

Augmented reality is a technology that adds virtual elements to the existing real world. AR works with computer vision technology to display virtual elements on the screen based on images recognized by the camera, paths taken, and more.

Computer vision and AR are being utilized to provide translations of written text, or to apply filters directly to objects in the world we see. For example, styleAR, shown above, is an AI-powered virtual fitting solution for jewelry, beauty, and fashion products based on next-generation virtual/augmented reality. It provides a fitting solution for products such as rings, watches, earrings, and bracelets through computer vision that precisely recognizes and tracks specific areas of the body in real time.

Medical imaging

Medical imaging AI in the spotlight due to COVID-19
Medical imaging AI in the spotlight due to COVID-19

Medical imaging is a subfield of computer vision that uses X-ray and 3D scan images, such as MRIs, to classify diseases like pneumonia and cancer. With computer vision, early diagnosis of diseases that could not be identified by the human eye has become possible. It can also monitor a patient's condition in real time, helping to Predict and take action.

Intelligent video analytics

Artificial intelligence that recognizes the face of a person wearing a mask
Artificial intelligence that recognizes the face of a person wearing a mask

Computer vision has helped develop cutting-edge algorithms for security camera monitoring through methods such as pose estimation, face and person detection, and object tracking. Object detection, for example, can look at how shoppers interact with products in retail stores, factories, airports, and transportation hubs. It has also made intelligent video analytics (IVA) possible wherever CCTV cameras are present, such as watching queue lengths or tracking people approaching restricted areas.

Manufacturing and construction

machine vision
Machine Vision

The development of computer vision systems, such as defect detection and safety inspection, has allowed for process improvement and quality enhancement of manufactured products. Additionally, 3D vision allows for efficient inspection on production lines where humans are not available.


OCR 광학 문자 인식
OCR - Azure AI services | Microsoft Learn

The oldest application of computer vision, and one of its core technologies, is Optical Character Recognition (OCR). As early as 1974, a simple optical character recognition algorithm was developed, and today it has been further developed with deep learning. OCR can now detect and translate text in natural environments and random places, without human supervision.


computer vision
computer vision in retail (source: adobe korea blog)

With AI stores like "Amazon Go" popping up across the U.S., retail is turning to computer vision to achieve goals like increased productivity. The term "retail tech" began to be used around 2018, as the need for personalized shopping meant that the days of just selling more were over for retailers, so it became important to use technology to improve "behind-the-scenes systems" such as fulfillment, curation, and logistics.

What can we do going forward?

We've covered some of the latest trends and applications of computer vision above. To summarize the future value of computer vision, here's what we know.

Computer vision applications can be used in a myriad of ways in a variety of fields. Today, the day may not be far off when computer vision systems surpass the human eye.

Bottom line: Computer vision is making headway in many areas, and training data quality is critical for accurate performance.

What you need to know before adopting and deploying computer vision

Training data quality plays a critical role in the performance and accuracy of computer vision. This is why it's important to consider the variety and quality of data when collecting training data, and to process it with appropriate procedures to ensure that the model is not biased.

Regular data updates and data governance are also critical to maintaining the performance of your model. Data is constantly changing, and New patterns and situations arise, so training a model solely on existing data will inevitably lead to poor performance. Especially for data that occurs in real time, not incorporating new data can lead to predictive errors in the model. Continuously adding new data increases the flexibility and scalability of your model. This increases the predictive power of your model, enables efficient analysis and reproduction, and helps you to Greater business value.

Computer vision is a revolutionary technology for analyzing and interpreting image and video data. In fact, it is seeing active use cases in a variety of fields, and in the process, the importance of training data quality is becoming increasingly important. Therefore, when applying and adopting computer vision in your business, you should always pay attention to the quality and updating of your data, which will further maximize the accuracy and performance of your models.

Talk to Expert