Into Self Supervised Learning

Charting the Path to Autonomous AI

Supervised Learning

In the paradigm of supervised learning, the available dataset consists of input data points and their associated labels, represented as a set of n samples ${ (x_i, y_i)$, where $x_i$ denotes the data points and $y_i$ signifies the corresponding labels. However, obtaining a completely labeled dataset is often impractical due to the labor-intensive and costly nature of the labeling process.

In the supervised learning framework, the assigned task is well-defined: given an input $x_i$, the aim is to predict its corresponding output $y_i$. This characteristic, while providing clear guidance, imposes limitations on the flexibility of task definition.

Unsupervised Learning

In contrast to supervised learning, the unsupervised learning framework presents us with data points ${x_i}$, but does not furnish the corresponding labels ${y_i}$. This scenario necessitates the redefinition of learning tasks, consequently enabling greater flexibility and facilitating the handling of more intricate processing. Noteworthy instances of unsupervised learning include,

Generative Modelling: This approach aims to fabricate synthetic yet plausible data samples by leveraging the identical data distribution that engendered the original data instances. Various methodologies, including but not limited to Generative Adversarial Networks (GANs), Variational Autoencoders (VAEs), and flow-based models, are utilized to achieve this objective.
Other tasks such as clustering, anomaly detection etc also fall under the bracket of unsupervised learning.

Semi-supervised learning

Semi-supervised learning is a machine learning strategy that utilizes a small amount of labeled data with a large quantity of unlabeled data during training. This approach enhances the model’s learning efficiency and prediction accuracy, particularly when obtaining fully labeled data is expensive or impractical. Examples of semi-supervised learning applications include image classification and natural language processing.

Self-supervised learning

In self-supervised learning, the model is trained to predict or fill in missing or hidden parts of the input data, learning a representation of the data in the process. The goal of this approach is to find good embeddings or powerful representations of the data samples. These representations are expected to help solve a downstream task more efficiently.

For instance, given an unlabeled data sample $x$, we employ a transformation $f(x)$ such that, when substituting $f(x)$ instead of $x$ in a subsequent task like classification, we can expect improved efficiency compared to using the original $x$. To evaluate this, a linear classifier can be utilized to compare the performance, assuming that $f(x)$ provides a more effective representation of the data. However, it is important to note that during the process of learning these data representations, the knowledge of the downstream task may not be available. In other words, we lack the labels necessary to learn the appropriate embeddings for the downstream task. Nevertheless, the two objectives must be interconnected. Hence, determining the suitable transformation $f(x)$ for the downstream tasks becomes a crucial undertaking.

The advantage of self-supervised learning is that it doesn’t require manual labeling of data, which is often labor-intensive and costly. As a result, self-supervised learning can make use of large amounts of unlabeled data available, making it a very active area of research in machine learning.

Example 1: Surprising observation for image embedding

In the paper “Unsupervised Representation Learning by Predicting Image Rotations” by Spyros Gidaris, Praveer Singh, and Nikos Komodakis , the authors propose an interesting self-supervised learning method for image recognition.

The central idea of their approach is to pre-train convolutional neural networks (CNNs) to recognize the rotation degree of an image. The image is rotated by 0, 90, 180, or 270 degrees, and the CNN is trained to predict the rotation angle. The task of predicting the rotation angle forces the network to understand certain important features and semantics of the image, even without explicit label information.

The primary motivation behind this approach is to learn a representation that captures the semantics of the images in an unsupervised manner, i.e., without the need for hand-labeled data. After this pre-training phase, the CNN can be fine-tuned for specific tasks using smaller amounts of labeled data.

The authors show that their method of unsupervised pre-training provides a substantial boost in performance on different recognition tasks and datasets, including ImageNet and Places, demonstrating the power of their self-supervised learning approach. They also empirically show that their approach learns representations that are on par or even better than those learned with supervised pre-training.

This paper is a significant contribution to the field of self-supervised learning, showing that simple tasks like rotation prediction can lead to useful learned representations.