I'm graduate student who has a strong interest in Machine Learning and Data Analysis. I also study Deep Learning (especially, computer vision) during my free time.
笨鸟先飞 & 耐心
The paper presents a semi-supervised learning approach, Noisy Student Training, that works well even though when labeled data is abundant. Noisy Student Training extends the idea of self-training and distillation with the use of equal-or-larger student models and noise added to the student during learning. On ImageNet, (1) a teacher EfficientNet model is training with labeled data, (2) after that it is used to generate pseudo labels for unlabeled images. (3) Then, a larger EfficientNet model is training as student on the combination of labeled and pseudo labeled images. (4) Iterating this process by putting the student as the teacher. During the training of student, noise is injected such as dropout, stochastic depth, and data augmentation via RandAugment.
Noise is added when training the student and the teacher to make them consistent. It is, however, not added when the teacher predicts pseudo labels. Two types of noises are introduced in this paper, input noise and model noise.
The student model is given much lager capacity to be better than the teacher. Thus, the method can be also called Knowledge Expansion.
Data filtering and balancing. Images are filtered out if they have low confidence from the predictions by the teacher. To ensure the distribution of unlabeled data, images are duplicated if they do not have enough samples in a class, otherwise, they are removed if they have too many in a classe based on the confidence from the teacher predictions.