Learning
In the context of AI training, active learning is the process of selecting data from unlabeled data that is likely to be the most effective for training a model. In this context, effective learning means that if the expected performance of a well trained model on the entire dataset you have is 100, you might ask, "How can I get to 100 faster?" or"Can I get more than that?" in terms of sampling speed.
From the AI model's point of view, these are problems(data) that it hasn't even studied yet, so how does it know which one is going to be more helpful to me?", but in fact, humans sometimes know intuitively. Which problems are more difficult, which chapters will be more important for me now if I study them.
In Datahunt, we use an AI model to do the pre-labeling, and then a human operator to correct the work. To illustrate this, let's create a situation.
I (the human worker), who has injured my hand and it takes me a lot of time to solve the problems by myself, asked my friend (theAI) to pre-label a cone paper for me in preparation for a school exam. The better my friend studies, the fewer problems I will have to solve myself and the better my grade in the exam will be. So I, being a good student, decided to give my friend a lesson.
Coning paper version 1
At first glance, this sounds great, but in the end, the model is likely to be a not-so-smart model trained on an initial small amount of data. It's hard to trust a concepts paper from a friend who hasn't studied it properly... it's more likely to mess with your head.
So I decided to periodically tutor a friend. It's hard to stay with someone for 24 hours to help them study, so I'll do it at a reasonable interval.
Coning Paper Version 2
The more you work, the more the model learns. If we do the lessons once a day, we can build a model that gets smarter every day, and the coning papers we get will get closer and closer to the right answer and seem more reliable.
However, even if we study everything once, it's unlikely that we'll be equally skilled at each topic, because some things we understand well, others we don't. So I think it would be more effective if we focused on the topics that our friends don't know well.
Coning Paper Version 3
I'm so proud of my friend for identifying his weaknesses and studying more. I'm looking forward to taking the test with the concepts paper he created and seeing if I can get a perfect score.
Here's an illustration of this process in ActiveLearning.
In this diagram, there are several elements of active learning.
Let's take a look at the most researched and important of these sampling methods, and how we at DataHunt have used Active Learning to effectively speed up our sampling efforts.
There are actually several different methodologies withinActive Learning.
Membership Query Synthesis (where you create the data you want the current model to learn), Stream-based Selective Sampling (where you look at the data as it comes in sequentially and decide what to learn), andPool-based Sampling (where you look at all the data you have and decide what to learn first). Today we're going to talk about Pool-based Sampling, and I'll explain why below.
At DataHunt, we often apply pool-based sampling among other strategies for active learning because we often have a full dataset to work with, and pool-based methods have been studied the most in real-world services, where you often have a lot of data to start with.
There are a number of ways to do this, but UncertaintySampling is gaining traction. This is because it's easy to implement, universal, and has the advantage of fast sampling rates. (This is important.Because even when your model is resting, work is still going on, so you need to be able to pass pre-labels as quickly as possible.)
Uncertainty Sampling is exactly what it sounds like: away to pick data based on uncertainty. The premise is that the more confused your model is about the answer, the harder the problem is. There are three main ways to determine uncertainty, the most famous of which is an entropy-based uncertainty calculation called Maximum Entropy.
Let's look at the simplest Classification problem, where the model is classifying 3 classes and the results obtained for each input data are shown below.
Least Confident: Choose the data with the lowest probability of the most likely class
Margin Sampling: Selecting data with the smallest difference in probability between the most likely class (Top-1) and the next most likely class (Top-2).
Maximum Entropy: Select data with the highest entropy
You can get different results from each method, but theMaximum Entropy method is known to work best (information theory emphasized the concept of entropy so hard for a reason).
In our example, we used just three pieces of data, but in practice, you'd have to pick the best ones to learn first from a large data set of at least 10,000 and as many as 100,000. It's a time-consuming process, and for some reason we don't do this with all of our data, but I'll explain why we did it at DataHunt and how the experiment turned out.
A study by Beluch et al. states that the method of selecting the K-most uncertain samples from an unlabeled set is not very effective. The reason is that it is easy to duplicate information in data with a small number or complexity of classes relative to the dataset size. Therefore, the key is to sample from a smaller subset.
The size of the sample can vary depending on the experiment. In our experiments, we set it to 10% of the total data size.
How to experiment
In this way, the experimental results did not change from 0.41 → 0.41 in terms of mAP, where the comparison group is a model that did not perform active learning but randomly selected data and trained.
Why did the algorithm, which was clearly effective in the paper, not work in this experiment? There are a number of possible reasons.
We decided to tweak the third of these - the sampling method - because the first can have a significant impact on task scheduling, and the second is harder to control.
If you think about it, what are the implications of this method? Let me give you an example.
If you have a school with 1000 students, and you arbitrarily divide them into 5 classes, and you select 100 students from each class in order of their grades.
Rather than just collecting data that the model is currently struggling with, wouldn't it be more effective to train on a slightly wider range? We call this biasing the training data near the decision boundary, and this will lead to learning that is biased towards certain classes, or less generalized learning that doesn't take into account the distribution of other data that needs to be learned in the future. So we proposed a different method to solve this problem.
Experiment method
In sampling the data, we perform a truly random sampling from the rest of the group to create a generalized subset that is not too concentrated on the decision boundary. We could make the ratio of data sampled from and to be 50:50, or we could make it 30:70. But for this experiment, we didn't want to go too far out on a limb and went with 50:50.
The result was 0.41 → 0.43, which is a significant improvement in sampling rate performance over multiple iterations despite training the same amount of data.
In this article, we described Active Learning and showed how we applied it to speed up sampling operations in DataHunt.
In the process of learning the model used to deliver pre-labels to improve the work efficiency of workers, we did not learn without any strategy, and although it is not very complicated, we were able to create a better model through our own strategy and experimentation.