Insight

Guide for GAN - concepts, understanding, use cases, training, latest trends

GAN concepts, understanding, and know-how

2023
.
06
.
02
by
Sangsun Moon
Guide for GAN - concepts, understanding, use cases, training, latest trends

What is a GAN? (What is a GAN, GAN explained)

GAN is short for Generative Adversarial Network, which is a type of generative model. A GAN is a machine learning method in which a generative model and a discriminative model compete to automatically create images, videos, audio, etc. that are close to the real thing. The goal is for the AI to learn from real-world examples and infer commonalities to create highly sophisticated forgeries.

GAN principles and structure

GANs are composed of a generator, which learns probability distributions and produces data that looks real, and a discriminator, which distinguishes the data. The Generator constantly produces false examples, and the Discriminator learns to distinguish between real data and fake data. Training the Generator to fool the Discriminator in this way is called an adversarial process.

By competing and learning from each other, the generative model produces more and more realistic data, and its opponent becomes better at distinguishing between real and fake data. Eventually, the goal is for Generator to become a Generative Model that produces something that resembles real data.

At some point, the Generator will be able to create the most perfect fake data, and the Discriminator will eventually be unable to distinguish between true and false. This is the mechanism that ultimately terminates learning when the probability converges to 50%.

GAN GAN algorithm guide (concepts)
GAN algorithm concepts

When training a GAN model, the Discriminator is trained first and the Generator is trained afterwards. This does not mean that the Discriminator is constantly being touched, but rather that the weight of the Generator is updated with the error of the Discriminator. Therefore, the Generator with the updated weight can generate a fake date that the Discriminator can think is real.

GAN architecture

With the invention of GANs, the field of Generative Models took a concrete direction towards generating realistic images. It has since begun to show promising results in the field of computer vision. More recently, it has shown significant changes not only in images, but also in audio and video.

Before we dive into examples of how GANs have been utilized, here's an introduction to the architecture of popular GANs.

CycleGAN

CycleGAN is often used to learn conversions between images of different styles. For example, converting an artistic image to a realistic one, or a horse to a zebra.

StyleGAN

StyleGAN is able to memorize features about the human face and generate new images of human faces that don't exist in the real world. It has 26.2 million parameters. That's a lot of training data to get the best out of this model.

You can see the human face images created with StyleGAN here: here.

Text2image

In this architecture, the image is generated by converting a description in text into an embedding, which is then associated with a noise vector and fed into the generator. You can generate images for your business based on a text description.

In addition to the

  • PixelRNN: Autoregressive generative model. It can model the discrete probability distribution of an image and predict pixels in an image in two spatial dimensions.
  • DiscoGAN: When we have images that have two different domains but are not paired, machine learning has not been able to grasp these relationships well. However, DiscoGAN has shown promising results in learning two different domain relationships.
  • IsGAN: Generate high-quality images by using a least-squares loss function for the discriminant model.

More useful architectures of GANs can be found in this article.

GAN Use Cases (GAN Technology, Trends)

GAN_Drag Your GAN_GAN Trend, issue
Drag Your GAN

Drag Your GAN

Maybe you've tried to use an image in your content, but the pose or expression just isn't quite right. Or maybe you want to see what happens when you make a small change to your product design. The Drag Your GAN is a model that lets you select every point in an image.

DragGAN consists of two components.

  • Supervise a handle point to move to a target location
  • An approach to leverage differentiators and track new points

With DragGAN, anyone can transform an image with precise control over its position. You can manipulate different poses, shapes, expressions, and layouts based on pixel movement.

Why GANs are important for building datasets

GANs play two important roles for AI training data.

First, you can solve the problem of sparse data by improving the quality and quantity of your data. Building a dataset for AI training involves collecting and processing data. If this process isn't perfect, it won't perform as well as it could.

However, when you need data for special situations or case-specific data for special cases, data collection can be challenging. For example, you can't vandalize a car for an AI model that generates accident estimates, so it would be great if you could get vandalized car data for the generative model. Similarly, you may have difficulty preparing training data for examples where data processing is difficult.

You can also address data usability issues to protect the privacy of your data. Although the steps of data collection and processing are light, there are some data that are difficult to utilize. For example, large-scale anonymization of data that contains personal information is required to make it usable in enterprises, schools, and research labs. However, common privacy removal methods such as blurring and mosaics cannot be used for AI research, so techniques are needed to make the data irreversibly unidentifiable, but leverage it to minimize performance degradation in AI training and testing.

In this case, GANs can be used to build training datasets by augmenting datasets that have been collected and processed in small amounts. A GAN-based image synthesis technique called DeepFake can also de-identify personal information by superimposing a virtual face onto an original photo or video. The idea is to retain key information that may be relevant to your research, such as the original person's facial expressions or posture, but replace it with a new person that doesn't exist in the universe.

GAN use cases : facebook <real eye opener>
It's a more natural way to create an open-eyed photo than compositing eyes into a photo.

Facebook

Facebook's Real eye Opener is a technology that creates fake eyes in photos with closed eyes to make them appear open. The idea is that the desired eye shape is reflected in a latent vector to generate an entire new face with the eye shape replaced. It feels much more natural and similar to a real person's photo than simply Photoshopping eyes in.

NAVER

Naver's webtoon <Encountered> was a webtoon that allowed users to upload a selfie of themselves and have their face appear in the comic. The model learned the art style of the webtoon and used the reader's face style to generate new data.

Alessio, a startup housed at NAVER D2SF, unveiled a model that uses deep learning to read and extract features that make up a baby's face from a three-dimensional ultrasound photo of a fetus, and then predicts the baby's face after birth. The model relies on removing various noises from the ultrasound photo. The company used the GAN algorithm, which is used for image generation and restoration, to remove the noises.

OpenAI Generative Model Logic
OpenAI Generative Model Logic

OpenAI

OpenAI uses artificial intelligence to create art and inspire users. OpenAI's DALL-E 2 generates original, realistic visuals and art from textual descriptions. It can also modify existing photos with natural language captions, adding and removing items to account for shadows, reflections, and textures.

Trained on massive image datasets, generative models have been a stepping stone to advances in computer vision. OpenAI has also used Optimal Transport GAN (OT-GAN), a variant of the GAN algorithm, for optimization.

Conclusion: As multimodal development becomes more diverse, GAN use cases will become more diverse.

Limitations of GANs

GAN is an early model, so it's still not very stable. In particular, a phenomenon called mode-collapse means that learning becomes less diverse. This means that once the discriminator has data that it thinks is real, the generator no longer produces diverse data, but rather mass-produces what it thinks is real. The generator doesn't need to be more realistic or aesthetic, it just needs to fool the discriminator.

If the fake data that the Generator perfectly fooled the Discriminator with was 1, then a well-built Generator should produce uniformly from 0 to 9 to be considered a good model. This is because the process of fooling the Discriminator can be delayed to produce more varied results.

GAN Trends

Generative models often require some sort of bad data to train. You can train the GAN algorithm with some real bad data you have acquired, and then create more bad data.

The use cases for GANs will become more and more diverse: not only can you create images for your business directly with text input, but you can also transform them freely, especially in the field of image transformation, which is closely related to the development of artificial intelligence models that touch our lives and convenience, such as healthcare and autonomous driving.

At DataHunt, we actively utilize GANs to build data for model training. This allows us to build datasets that are difficult to collect and process in a way that scales small amounts of data.

We also utilize GAN models, which are constantly being researched and developed, to create new business opportunities. As the flexibility of the generative model evolves day by day, we believe there will be many more opportunities to utilize the GAN algorithm.

Summary

  1. A GAN is a Generative Model where the Generator that generates the data is a Discriminative Model that generates fake data to fool the Discriminator.
  2. Training ends when the Discriminator is unable to distinguish between the data created by the Generator and finally gets the correct answer 50% of the time.
  3. GAN is a field that is being actively researched by various companies dealing with AI such as Facebook, NAVER, and OpenAI. Although it is evaluated as having clear limitations, it is standing out in the field of data construction through image creation/conversion.

Talk to Expert