t-SNE
CompSoc
-by Chenna S [CompSoc]
published on Feb. 15, 2018, 11:32 p.m.

It’s manually impossible to understand the structure of a dataset with 100 dimensions. And it’s fairly common to come across data having so many features. And the exploratory-analysis of data is a preliminary step, before proceeding to other aspects of data-wrangling and models. This is where t-SNE comes in.

t-SNE is non-linear dimensionality reduction algorithm. It reduces each high-dimensional point and embeds them into a 2D/3D map. Additionally, it ensures that similar objects are nearby and dissimilar points are kept apart.

Uses of t-SNE

Here's a nice example of t-SNE's usage in last year's IEEE project(GoT chabot) https://gist.github.com/kaushiksk/ea0c0f25c082acb8604ad84466f85ca8

It’s necessary to know about other dimensionality reduction techniques like Principal Component Analysis (PCA) to truly appreciate the distinctive nature of t-SNE.

PCA

PCA tries to retain the maximum information about high-dimensional data even after embedding it into a low dimension. It does this by finding out a few components in the higher dimension along which the maximum variation of the data is ensured. Hence it preserves the variation in the data even after reducing it to a lower dimension. Mathematically, PCA minimises the squared-error between points - essentially ensures that far-away points in the original dataset remain in the same way in the 2D map also. So PCA preserves large pairwise distances(global structure and properties), but these distances do not imply any meaningful property. But it loses the the low-variance deviations between neighbours. t-SNE filled this gap by accounting for local properties, which have a lot of significance irrespective of the overall structure/topology of data.

Moreover PCA is not very useful for unlabelled data. This is quite evident by looking at the visualizations produced by PCA and t-SNE on the MNIST dataset.

Output of PCA over the MNIST dataset

Distinct clusters are not produced by PCA. The distinctness is evident only because of the color-coding, so PCA is not very helpful for unlabelled data.

Output of t-SNE over the MNIST dataset

On the other hand, the presence of clusters is striking with unlabelled-data using t-SNE. The bottom right, we can see the labelled version of the same. Hence, t-SNE finds a lot of utility with unlabelled data, over and above PCA.

Some Interesting properties of t-SNE:

So the effectiveness/relevance of t-SNE is put into question if the required output is in a fairly higher dimension, where the possibility of crowding is low. On the other hand, PCA always gives the best components along which the variance of the data is maximised.

Hence, the results of t-SNE is much better if the degrees of freedom is increased, because the crowding problem is less severe. Consequently, visualising data in 3D rather than 2D gives a lot of gains in using t-SNE.

Do not apply clustering upon the output of t-SNE

Multiple runs with the same parameters might not give the same result

Consider a gaussian distribution with 250 points in (-2, 0) and 500 points in (0, 2).

Original Distribution

Using EM(Expectation Maximisation) clustering algorithm produces this result:

Clustering using EM

The following are the visualizations produced by t-SNE:

Perplexity = 20

It’s easy to conclude that there are 4 clusters here, while there are only two.

Perplexity = 40 (default)

Here, there are too many clusters, and it’s hard for any dim-reduction algorithm to find them. (Note: The color coding is only for our interpretation)

Perplexity = 80 (optimum)

This result is from the optimum perplexity of 80, but since perplexity is a global parameter that needs to be tuned and is not the same for every dataset.

The results are even more stark when these visualizations are obtained from a gaussian distribution as illustrated here. (https://distill.pub/2016/misread-tsne/)

But later implementations of t-SNE have better time complexity

There are certain cases t-SNE could be sub-optimal:

Further Reading: 1. t-SNE 2. Visualizing Representations: Deep Learning and Human Beings Christopher Olah's blog, 2015 3. https://distill.pub/2016/misread-tsne/ 4. Visualizing Data Using t-SNE

Despite these misgivings, t-SNE might probably be the first tool a data scientist uses in the morning! Have fun exploring it. Do comment below for any clarifications or suggestions.

Check out more blogs!