What is Image Segmentation? - Definition, Types, Use case, Deep Learning, Trends

From the definition of image segmentation to Segment Anything

Sangsun Moon
What is Image Segmentation? - Definition, Types, Use case, Deep Learning, Trends

What is Image Segmentation?

What is image segmentation Definition

Image Segmentation is an extension of image classification, a computer vision technique used to understand what is at the pixel level in an image, in addition to classifying the information in the image. It outlines the boundaries of objects to find out what they are, where they are, and how to detect individual object detection to individually labeling different regions in an image.

Image segmentation is one of the most important fields, especially in computer vision. In machine learning, image segmentation refers to the process of separating data into discrete groups. In deep learning, image segmentation is all about creating a segment map that associates labels or categories with every pixel in an image, for example, for self-driving cars to identify vehicles, pedestrians, traffic signs, and other roadways.

Input Image to Segmentation Labels
Input Image to Segmentation Labels

Image segmentation and video segmentation

Image segmentation is a subdomain of computer vision and digital image processing. It aims to group similar regions or segments of an image into their respective class labels.

Image classification vs Object detection vs Image Segmentation
Image classification vs Object detection vs Image Segmentation | Deep Learning Tutorial 28

Image segmentation is also an extension of image classification, where in addition to classification, you can show the relationships of objects to pinpoint where they are. To summarize

  • Image Classification: Give only one piece of information about the entire image.
  • Image Segmentation: Assign boundaries within an image and label independent information about those boundaries.

Video segmentation is the task of segmenting a video into different regions based on characteristics such as object boundaries, motion, color, texture, or other visual features. The goal of video segmentation is to identify and separate different objects from the background and temporal events in a video to provide a more detailed and structured representation of the visual content.

Video segmentation is a very important task in the field of computer vision and multimedia because it allows for the identification and characterization of individual objects and events in a video, as well as the organization and classification of video content. Various techniques for video segmentation are constantly being developed to maximize accuracy and efficiency.

Image Segmentation Type

Segmentation is a key technique in machine learning that allows you to divide data into groups with similar characteristics. Segmentation makes AI models more accurate and efficient, and can be used to improve performance. The types and principles of segmentation, and the models for each type, can be thought of as an evolution from A to D.

Source: Computer vision algorithms and hardware implementations

Semantic segmentation

Semantic segmentation is the task of assigning a class label to every single pixel in an input image. The resources below will help you understand the differences between the different ways of working with computer vision.

Semantic segmentation is a type of image segmentation that assigns a semantic label to each pixel in an image, meaning that each pixel is assigned a label that describes its content, such as "road", "tree", or "building".

Semantic segmentation

Semantic segmentation is based on the principle that the content of an image can be divided into several semantic classes. These semantic classes can then be used to identify and track objects in the image. Semantic Segmentation is used in Supervised learning and Unsupervised learning. In the former, you train a machine learning model on a dataset of manually semantically labeled images and then let it do the work. The latter doesn't require a labeled image dataset, but uses a variety of methods to learn the labels of images without any prior knowledge.

Semantic Segmentation Networks started out as a slight modification of image classification models. The first successful Semantic segmentation Network started with a fully convolutional network (FCN), and now there are many different architectures with many small and large improvements. Some of the most talked about Semantic segmentation models in recent years are DeepLab, FastFCN, DeepLabV3, and Transformer-based models.

Instance segmentation

Instance segmentation is a type of image segmentation that identifies and segments individual objects in an image, meaning it assigns a unique label to each object in the image and can also identify the boundaries of each object.

Instance segmentation

Instance segmentation is the ability to identify and segment individual objects in an image based on their appearance or context. Recently, its applicability has emerged in various fields and it is affecting our daily lives. Generally, instance segmentation is built on the Mask R-CNN Architecture model.

Techniques for performing instance segmentation can be divided into two categories.

  • Bottom-up method: It starts by detecting individual pixels in an image, and then groups them together to form an object.
  • Top-down method: Detect the entire scene in the image, identify individual objects, and segment them.

Panoptic segmentation

Computer vision has allowed artificial intelligence to have mimicked our eyes, they are now trying to mimic our senses and mind's ability to evaluate and understand the world around us. Panoptic segmentation can be thought of as a merger of the concepts of semantic segmentation, which simply distinguished, and instance segmentation, which understood the boundaries of objects.

Panoptic segmentation models can not only identify and segment individual objects in an image, but they can also analyze the Improved to also identify semantic content. This is based on the principle that semantic segmentation and instance segmentation are complementary: semantic segmentation allows you to identify the semantic content of a scene, while instance segmentation is used to identify and segment individual objects in the scene. By combining these two tasks, PANOPTIC SEGMENTATION ensures a higher level of accuracy and detail than either one alone.

Machine Learning, what sets panoptic segmentation apart from other image segmentation techniques is that it can accurately represent both objects and things in the output. For example, people, vehicles, and trees are countable objects, while roads, skies, etc. are harder to quantify. Semantic segmentation inherently focuses on things, and instance segmentation works in the direction of focusing on things, but panoptic segmentation is more detailed because it doesn't exclude either.

Semantic segmentation vs. Instance segmentation

Semantic segmentation vs. instance segmentation
Panoptic Segmentation: A Review

The distinction between semantic segmentation and instance segmentation is that they do not distinguish between different objects belonging to the same class. Some notable differences include

  • Semantic segmentation cannot distinguish between different instances of the same category, meaning it shows all the same objects in the same color. Instance segmentation, on the other hand, can distinguish between different instances of the same category, so different chairs are separated by different colors.
  • In semantic segmentation, target detection occurs first, and then each pixel is labeled. Instance segmentation, on the other hand, combines target detection with semantic segmentation.

Image Segmentation Applications

Segmentation is the foundation and core of computer vision technology projects. Image segmentation has undergone many improvements, both large and small, that have had a profound impact on our lives. Let's take a look at some use cases that utilize segmentation models below.

Object detection and scene understanding

Segmentation improves the accuracy of object detection models. By identifying and segmenting individual objects in an image, segmentation techniques help improve the accuracy of object detection models. By identifying the semantic content of a scene, scene understanding can be improved, and the ability to segment objects can be used for tracking, recognition, and relational understanding.

Image Segmentation - Use Cases for 2D and 3D Computer Vision

For example, cameras mounted on drones, satellites, and airplanes can analyze buildings, land, and streets. This is used for military reasons, or in agriculture for flood monitoring. In particular, as a result of the segmentation algorithm, 3D mapping to 2D bird's eye view.

Medical imaging

In medical image analysis, segmentation serves to identify and segment different organs and tissues. This information can be used to diagnose diseases and create treatment plans. In fact, image segmentation is one of the most popular algorithms in medical AI today.

Dental radiography is an important part of clinical diagnosis, treatment, and surgery. In recent years, the development of systems to analyze X-ray images using clinical computational dentistry has become a hot topic. Image segmentation techniques have also been applied to automated dental radiography and analysis algorithms.

Automatically Segmenting Brain Tumors with AI | NVIDIA Technical Blog
Automatically Segmenting Brain Tumors with AI | NVIDIA Technical Blog

This image shows a segmentation technique applied to a brain tumor image. A malignant brain tumor called a glioma is a challenging technology to diagnose. The medical community has struggled with manually delineating images and determining the exact location of tumor borders or boundaries. Not only did it require anatomical expertise, but it also faced the challenges of cost and human error. Once a 3D automatic segmentation model was published, an encoder-decoder convolutional neural network using an MRI scanner was able to produce predicted segmentation results that were similar to the ground truth.

Autonomous driving

Autonomous driving vehicles are primarily subject to semantic segmentation techniques. By associating each pixel in an image with a predefined class, Interpret ambient data collected from the car's sensors and cameras.

Semantic segmentation in autonomous driving
Supercharging One of the World’s Fastest Segmentation Models by 3X - Deci

Semantic segmentation provides a clear picture of every single input. As such, it can provide an excellent understanding of sensory input. In short, it's easy for humans to understand. 또한 이 정보는 The advantage is that autonomous vehicles can understand their surroundings and plan a safe route for the vehicle.

Image Segmentation Structure and Principles

Recognition + Localization

The image segmentation model first extracts features from the image to segment the image into [ Recognition ]. Recognize the object and [ Localization ] Determine the location. The features we extract here are things like color, texture, and shape. The model used for image segmentation can use these features to partially organize the image, creating a segment map and defining information based on pixel-by-pixel boundaries.

After the image is segmented, the model classifies each segment as either an object or a background. This is done using the Classification feature trained on a dataset of labeled images as objects contained in the image.

Finally, the model locates the object in the image by identifying the bounding box around the object. A bounding box is a set of four coordinates that define the boundaries of an object.

Encoder + Decoder

In the field of computer vision, most image segmentation models consist of an Encoder - Decoder structure. The segment map that comes out of the Decoder can be thought of as a kind of map that represents the location of each object in the image.

  • Encoder: A layer that extracts images through a series of increasingly narrow and deep filters.
  • Decoder: A layer mask that scales the output of the encoder to a segmentation mask similar to the pixel resolution of the input image.

image segmentation Auto-encoder Architecture
Auto-encoder Architecture

Image Segmentation with Deep learning

Traditional Image Segmentation Techniques

The principle of how AI performs segmentation on image data is based on the idea that objects in an image or video have similar properties. By extracting features from raw data and clustering pixels based on their similarity, AI models can learn to identify objects and track their movement.

Object recognition and localization techniques applied to the image segmentation model. Which technique to apply will depend on your specific application and the type of images you want to segment. However, there are advantages and disadvantages to each, so it's worth keeping them separate.

Image Segmentation Methods

  • Threshold Method: A technique that divides an image into two regions, foreground and background, based on a threshold.
  • ~Best for images with high contrast between foreground and background
  • Region Based Method: Find similarities between neighboring pixels and group them into a common class.
  • ~Watershed based method can be considered similar in that it starts from the local maximum of the euclidean distance map and has the constraint that no two seeds can be classified as belonging to the same region or segment map.
  • ~Best for images with a wide range of features, slower for large images
  • Edge Based Method: Also known as edge detection, it involves classifying which pixels in an image are edge pixels and picking out those edge pixels according to a separate class.
  • ~Best for low-contrast images, sensitive to noise
  • Clustering Based Method: Modern image segmentation typically uses a clustering algorithm. It is a technique that starts with sheet pixels and then expands regions around those pixels based on similarity criteria.
  • ~Best for images with many objects, can be computationally expensive
  • Deep Learning-based Method: Uses machine learning algorithms to learn the features of objects in an image, and then uses the algorithms to classify segments as objects or background.
  • ~Suitable for images with a wide variety of features, requiring a large amount of training data

Deep learning based Method

The integration of deep learning into the traditional techniques of image segmentation can be viewed as a combination of performance and accuracy.

Image segmentation models using deep learning are very diverse, but we'll cover a few of the most popular ones here.

Convolutional Encoder-Decoder Architecture: The aforementioned encoder-decoder architecture became popularized in 2015 with the start of research like SegNet (Badrinarayanan et al). SegNet proposes to use a combination of convolutional blocks and downsampling blocks to bottle-neck the information and form a representation of the input. The decoder then reconstructs the input information to form a segment map that highlights regions of the input and groups them according to their classes.

SegNet Explained | Papers With Code

U-Net Model: U-Net is basically used for segmentation of biological microscope images. It can learn from available annotated images using data augmentation techniques. The U-Net architecture consists of two parts: contracting and summetric expanding, which capture context and enable precise localization, respectively.

Segment Anything Model - SAM

Segmentation Anything is a model in Meta built for image segmentation. Basically, it's built for the process of automatically segmenting images without human intervention.

SAM, inspired by large-scale language models, was first introduced in the Segment Anything paper by Alexander Kirillov et al. The authors wanted to open-source the world of image segmentation to expand its possibilities: a deep learning model that could run in real-time in the browser.

Universal segmentation model - Segment Anything

Previous deep learning approaches required collecting massive amounts of training data, manually labeling it, and then taking the time to train it. While this approach has resulted in well-performing models, if the dataset changes, it requires a significant amount of model retraining. With SAM, however, users can not only make almost any segmentation on an image, but it's also real-time.

DataHunt's know-how - How to improve image segmentation accuracy

The metrics for evaluating algorithms can be broadly described as [pixel accuracy, coefficients, and Jaccard index (IOU)]. Improving segmentation accuracy can lead to better detection in medical imaging or Object detection and many other applications. Below, we introduce suggestions for improving segmentation accuracy.

  • There are many different deep learning models for image segmentation tasks, each with their own advantages and disadvantages. It's important to choose the right model for your application to get the best results.
  • The quality of your training dataset has a huge impact on the performance of your deep learning model. It's important that your dataset is large enough to capture the diversity of the objects you're segmenting, and that your image dataset is well-labeled so that the model knows which pixels belong to each object.
  • After the model is trained, you can fine-tune the model to improve performance. Tweaking the parameters of the model to better fit your specific application should be done in conjunction with monitoring.
  • Developing new segmentation algorithms: Continuous development of new algorithms to improve accuracy is essential. Recent examples include Adversarial machine learning or Reinforcement Learning, we were able to refine our segmentation results.
  • Refine evaluation metrics: Explore new evaluation metrics beyond the traditional Dice coefficient and Jaccard index, such as the Boundary F1 score, which can better capture the quality of object boundaries.

Datahunt wanted to improve segmentation accuracy with two proprietary algorithms. As a result, they were able to reduce work time by over 50% and improve accuracy.

  • Slope-based filtering: referencing the slope one by one from an arbitrary starting position, deleting points where there is little change
  • Ratio-based filtering, which directly reduces the number of points by taking into account the size and curvature of the object.

In the same way as above, Datahunt can create a Polygon and measured the efficiency. On average, we've been able to achieve up to a 50% reduction in work time by applying our own AI models to tens of thousands of images, as well as thorough secondary and tertiary checks on our work. You can read more about DataHunt's know-how in this article.

Conclusion: Model performance is improving, so quality of training data continues to matter

Modern image segmentation models are based on deep learning and learn from large datasets. This allows the models to be widely generalizable, making them more versatile.

Deep learning-based image segmentation models also have limitations. If you don't have access to a large labeled image dataset, it can be difficult to get a project off the ground because the models require a large amount of training data. Also, if the training data is too good to train on new data, the No change may occur. In this case, you can use the appropriate Data augmentation can help overcome this.

Of course, the underlying data is more accurate than in the past because model performance is constantly improving. However, if you left everything up to AI, you'd be left with a lot of confusion when the model made a learning error and The downside is that you have to learn from scratch. DataHunt has succeeded in building 99% accurate data through model refinement with its own algorithms and secondary and tertiary checks by intermediate PMs and workers. Rather than leaving it all to the models, AI-Assisted via the form We were able to create a much more finished product.


Talk to Expert