Data visualization and decomposition methods

Scripts with own implementations of PCA, EVD, and MDS methods used for visualization.

Visualization

I've tested results of various decomposition methods:

Custom MDS
Sklearn MDS
Sklearn Isomap
Sklearn tSNE
Sklearn Locally Linear embedding

I used Kaggle's "Weedle's Cave" dataset to visualize distances between Pokémons. To get use of Pokémon types I've changed raw string types such as "Grass", "Poison" to strengths and weaknesses against all other types. In result for example water and fire types are close because Pokémons of this types are strong against fire and ground but weak against grass.

Visualisation shows that next evolutions of Pokémons are often near each other. Also "Mega" Pokémons are close in visualization even if there wasn't direct information about that in dataset, what is really fascinating.

Result

Image compression

With SVD it is also possible to compress images. After constructing U, T and Vt matrices such as:

Where each matrix has defined size.

A (m x n)
U (m x m)
T (m x n)
Vt (n x n)

We can take k rows or columns to compress data, result with sizes:

A (m x n)
U (m x k)
T (k x k) (diagonal matrix)
Vt (k x n)

Summarizing, if image has 3 channels we can compress it with:

What it is less than original size

Have in mind that there are sophisticated algorithms for image compression such as JPG which can do it better.

Usage

usage: compress.py [-h] -f INPUT_FILE [-out OUTPUT_FILE]
                   [-svd {sklearn,custom,numpy}] [-k K]
compress.py: error: the following arguments are required: -f

mrugacz95/data_visualization

Data visualization and decomposition methods

Visualization

Result

Image compression

Usage

Results

Sources