Awesome Semi-Supervised Learning

A curated list of awesome Semi-Supervised Learning resources. Inspired by awesome-deep-vision, awesome-deep-learning-papers, and awesome-self-supervised-learning.

Background

What is Semi-Supervised Learning?

It is a special form of classification. Traditional classifiers use only labeled data (feature / label pairs) to train. Labeled instances however are often difficult, expensive, or time consuming to obtain, as they require the efforts of experienced human annotators. Meanwhile unlabeled data may be relatively easy to collect, but there has been few ways to use them. Semi-supervised learning addresses this problem by using large amount of unlabeled data, together with the labeled data, to build better classifiers. Because semi-supervised learning requires less human effort and gives higher accuracy, it is of great interest both in theory and in practice.

How many semi-supervised learning methods are there?

Many. Some often-used methods include: EM with generative mixture models, self-training, consistency regularization, co-training, transductive support vector machines, and graph-based methods. And with the advent of deep learning, the majority of these methods were adapted and intergrated into existing deep learning frameworks to take advantage of unlabled data.

How do semi-supervised learning methods use unlabeled data?

Semi-supervised learning methods use unlabeled data to either modify or reprioritize hypotheses obtained from labeled data alone. Although not all methods are probabilistic, it is easier to look at methods that represent hypotheses by p(y|x), and unlabeled data by p(x). Generative models have common parameters for the joint distribution p(x,y). It is easy to see that p(x) influences p(y|x). Mixture models with EM is in this category, and to some extent self-training. Many other methods are discriminative, including transductive SVM, Gaussian processes, information regularization, graph-based and the majority of deep learning based methods. Original discriminative training cannot be used for semi-supervised learning, since p(y|x) is estimated ignoring p(x). To solve the problem, p(x) dependent terms are often brought into the objective function, which amounts to assuming p(y|x) and p(x) share parameters

(source: SSL Literature Survey.)

An example of the influence of unlabeled data in semi-supervised learning. (Image source: Wikipedia)

Contributing

If you find any errors, or you wish to add some papers, please feel free to contribute to this list by contacting me or by creating a pull request using the following Markdown format:

- Paper Name. 
  [[pdf]](link) 
  [[code]](link)
  - Author 1, Author 2, and Author 3. *Conference Year*

and adding them to the corresponding markdown file in files/.

Books

Semi-Supervised Learning Book. Olivier Chapelle, Bernhard Schölkopf, Alexander Zien. IEEE Transactions on Neural Networks 2009

Codebase

Surveys & Overview

Realistic Evaluation of Deep Semi-Supervised Learning Algorithms. Avital Oliver, Augustus Odena, Colin Raffel, Ekin D. Cubuk, Ian J. Goodfellow. NeurIPS 2018
Semi-Supervised Learning Literature Survey. Xiaojin Zhu. 2008
An Overview of Deep Semi-Supervised Learning. Yassine Ouali, Céline Hudelot, Myriam Tami. 2020
A survey on semi-supervised learning. Jesper E Van Engelen, Holger H Hoos. 2020
A Survey on Deep Semi-Supervised Learning. Xiangli Yang, Zixing Song, Irwin King. 2021

Computer Vision

Image Classification: list of papers here
Semantic and Instance Segmentation: list of papers here
Object Detection: list of papers here
Other tasks: list of papers here

Note that for Image and Object segmentation tasks, we also include weakly-supervised learning methods, that uses weak labels (eg, image classes) for detection and segmentation.

NLP

List of papers here

Generative Models & Tasks

List of papers here

Graph Based SSL

List of papers here

Theory

List of papers here

Reinforcement Learning, Meta-Learning & Robotics

List of papers here

Regression

List of papers here

Other

List of papers here

Talks

Semi-Supervised Learning and Unsupervised Distribution Alignment. CS294-158-SP20 UC Berkeley.
Semi-supervised learning with GANs. Pydata, Andreas Merentitis, Carmine Paolino, Vaibhav Singh.
Overview of Unsupervised & Semi-supervised learning. AISC, Shazia Akbar.
Semi-Supervised Learning, [slides]. CMU Machine Learning 10-701, Tom M. Mitchell.

Thesis

Fundamental limitations of semi-supervised learning. Tyler Tian Lu.
Semi-Supervised Learning with Graphs. Xiaojin Zhu.
Semi-Supervised Learning for Natural Language. Percy Liang.

Blogs

Learning with not Enough Data Part 1: Semi-Supervised Learning. Lilian Weng.
An overview of proxy-label approaches for semi-supervised learning. Sebastian Ruder.
The Illustrated FixMatch for Semi-Supervised Learning. Amit Chaudhary.
An Overview of Deep Semi-Supervised Learning. Yassine Ouali.
Semi-Supervised Learning in Computer Vision. Amit Chaudhary.

Name		Name	Last commit message	Last commit date
Latest commit History 207 Commits
files		files
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Awesome Semi-Supervised Learning

Background

What is Semi-Supervised Learning?

How many semi-supervised learning methods are there?

How do semi-supervised learning methods use unlabeled data?

Contributing

Books

Codebase

Surveys & Overview

Computer Vision

NLP

List of papers here

Generative Models & Tasks

List of papers here

Graph Based SSL

List of papers here

Theory

List of papers here

Reinforcement Learning, Meta-Learning & Robotics

List of papers here

Regression

List of papers here

Other

List of papers here

Talks

Thesis

Blogs

About

Releases

Packages

Contributors 24

License

yassouali/awesome-semi-supervised-learning

Folders and files

Latest commit

History

Repository files navigation

Awesome Semi-Supervised Learning

Background

What is Semi-Supervised Learning?

How many semi-supervised learning methods are there?

How do semi-supervised learning methods use unlabeled data?

Contributing

Books

Codebase

Surveys & Overview

Computer Vision

NLP

Generative Models & Tasks

Graph Based SSL

Theory

Reinforcement Learning, Meta-Learning & Robotics

Regression

Other

Talks

Thesis

Blogs

About

Topics

Resources

License

Stars

Watchers

Forks