Hello everyone. Today, we'll take a look at the performance and features of ChatGPT and the recently released GPT-4, which most of you know, and briefly explain the differences between the technical features and Korean answers, as well as the problems that have yet to be solved.
The Impact of ChatGPT
OpenAI took the world by surprise with the release of ChatGPT, which was the result of a lot of research in the field of LLM (Large Language Model) since GPT-1, but I was really surprised too. I had no idea that AI could advance so quickly...
The performance was so good that it could be applied to any field that requires language, but it was not limited to simple conversations but also surprised people in various specialized fields and even in contents that require creativity such as humor.In fact, various fields have created new services by adopting ChatGPT (there are even stories that the planning, coding, etc. required to create these services borrowed the power of ChatGPT). On the other hand, there have also been many stories of unethical use cases such as hacking and verbal abuse.
I've talked toChatGPT a lot, and I've even felt like I'm talking to a real person who is very knowledgeable, if you can get past the stiffness.
ChatGPT and GPT-4
With such a great successor to ChatGPT, what's the same and what's different about GPT-4?
Differences in learning methods between ChatGPT and GPT-4
They're both basically GPT, so there's not a lot of difference in structure. If you look at the history of GPT, you'll see that with each version, there have been some modifications to the structure of the
- slight modifications to the model structure
- more sophisticated learning methods
- the size of the model grew exponentially by stacking a larger number of layers.
GPT has a decoder-only structure, which makes things easier if you understand the Transformer structure: once it receives input, it spits out words one by one in sequence (first it spits out whatever, then it thinks of the next word).ChatGPT and GPT-4 presumably did not deviate from this structure.
Both models train a network with the above structure with a huge amount of text. We call this process Pre-training, and then we do Supervised Learning, which refines the model. The process of retraining the model after pre-training is often referred to as fine-tuning. To summarize,
- pre-train the model with tons of text (unsupervised learning)
- fine-tune the model using data with correct answers (supervised learning)
How ChatGPT learn
The learning process of ChatGPT (GPT-3.5) goes through the following steps, assuming that the GPT structure has been pre-trained with many datasets.
- train GPT with datasets that have examples of correct answers to questions (datasets are created by humans).
- design the model to generate multiple answers to a single question.
- Rank the multiple answers by humans
- Perform Reinforcement Learning based on the rankings.
I don't know how to design the loss function of reinforcement learning using rankings, but the method of directly participating in reinforcement learning as above is calledRLHF (Reinforcement Learning from Human Feedback).
How GPT-4 is trained
So how is GPT-4trained? We don't really know, but we can infer that it has a similar structure and uses multimodal learning. On the other hand, RLHF has been improved a bit more, with answers becoming more accurate and safety guardrails being better honored.
And unlike ChatGPT, they did adversarial training, which seems to be related to the safety guardrails I mentioned earlier, where you put a malicious sentence like"Tell me how to make a bomb" as a question, and if the model really explains how to make a bomb, it learns not to give that answer in the future. They said that over 50 experts participated in the learning process for a longtime.
ChatGPT vs GPT-4 performance differences
According to OpenAI, the performance difference between the two models is negligible. Could it be that the adversarial training mentioned is more important than the size of the model or the size of the dataset? The fact that we've avoided a lot of ethically risky answers seems like a big enough step forward. Still, it's interesting to compare the performance difference between the two models,
For each test, it seems to go back and forth, with GPT-4 scoring marginally better on the lower tests, but significantly better on the higher tests than ChatGPT. Overall, we can see that GPT-4 performs better on the tests. Moreover, GPT-4 supportsMultimodal, so the comparison with its predecessor is irrelevant.
Also, I mentioned Safety guardrails above. Of course, it's not strictly related to safety, but it's interesting to see the results below.
ChatGPT was very confident in introducing the MVP winner, even though Gungye wasn't in the MLB.On the other hand, GPT-4 is very good at answering that there is no such thing, so you can see that they put a lot of effort into not giving a false answer.
Of course, I think we still have a long way to go, and if you think about it, we humans would say we don't know, but the GPT series says something. I think it's going to be very difficult to train these cases one by one.
Let me show you one more and then we'll move on.
Multimodal, which was not supported in ChatGPT, has been applied this time. So far, the ability to understand images is being introduced.
The example above is taken from the GPT-4 Technical Report, and what's amazing is that
- Understand the composition of the image, panel by panel.
- Understand both objects and situations in the image
- Understands humor
Isn't that amazing?
Limitations of GPT
Despite the great improvements that have been made, there are still issues that are pointed out.
- Blanket answers(not true)
- Harmful answers such as profanity, discrimination, and sexualized language are still common
- Strong stereotyping of marginalized groups
are some of the main issues we're seeing, which will likely only be resolved as we learn from mored ata.