/data_visualization

Scripts with own implementations of PCA, EVD and MDS methods used for visualization.

Primary LanguagePythonMIT LicenseMIT

Data visualization and decomposition methods

Scripts with own implementations of PCA, EVD, and MDS methods used for visualization.

Visualization

I've tested results of various decomposition methods:

  • Custom MDS
  • Sklearn MDS
  • Sklearn Isomap
  • Sklearn tSNE
  • Sklearn Locally Linear embedding

I used Kaggle's "Weedle's Cave" dataset to visualize distances between Pokémons. To get use of Pokémon types I've changed raw string types such as "Grass", "Poison" to strengths and weaknesses against all other types. In result for example water and fire types are close because Pokémons of this types are strong against fire and ground but weak against grass.

Visualisation shows that next evolutions of Pokémons are often near each other. Also "Mega" Pokémons are close in visualization even if there wasn't direct information about that in dataset, what is really fascinating.

Result

Pokémon

Image compression

With SVD it is also possible to compress images. After constructing U, T and Vt matrices such as:

matrices sizes

Where each matrix has defined size.

A (m x n)
U (m x m)
T (m x n)
Vt (n x n)

We can take k rows or columns to compress data, result with sizes:

A (m x n)
U (m x k)
T (k x k) (diagonal matrix)
Vt (k x n)

Summarizing, if image has 3 channels we can compress it with:

compressed size

What it is less than original size

orginal size

Have in mind that there are sophisticated algorithms for image compression such as JPG which can do it better.

Usage

usage: compress.py [-h] -f INPUT_FILE [-out OUTPUT_FILE]
                   [-svd {sklearn,custom,numpy}] [-k K]
compress.py: error: the following arguments are required: -f

Results

mountains

mountains compressed

Sources