/Cryptocurrencies

Using unsupervised machine learning, I created a report that includes what cryptocurrencies are on the trading market and how they could be grouped to create a classification system for this new investment.

Primary LanguageJupyter Notebook

Cryptocurrencies

Overview:

Accountability Accounting, a prominent investment bank, is interested in offering a new cryptocurrency investment portfolio for its customers. The company, however, is lost in the vast universe of cryptocurrencies. My job is to create a report that includes what cryptocurrencies are on the trading market and how they could be grouped to create a classification system for this new investment. The data I was given is not in an ideal format for my algorithms, so it will need to be processed to fit the machine learning models. Since there is no known output for what the company is looking for, I will use unsupervised learning.

  • Using my knowledge of Pandas, I preprocessed the dataset to perform PCA.
  • Using my knowledge of PCA (Principal Component Analysis) algorithm, I reduced the dimensions of the X DataFrame to three principal components and placed these dimensions into a new DataFrame.
  • Next, I clustered the cryptocurrencies using the K-Means algorithm.
  • Finally, using my knowledge of creating scatter plots with Plotly Express and hvplot, I visualized the distinct groups that corresponded to the three principal components.

Resources:

  • Software: Python 3.9.7 and Jupyter Notebook
  • Data: crpto_data.csv

Definitions:

  • K-Means: The K-means algorithm groups the data into K clusters, where belonging to a cluster is based on some similarity or distance measure to a centroid.
  • PCA: PCA is a statistical technique to speed up machine learning algorithms when the number of input features (or dimensions) is too high. The technique reduces the number of dimensions by transforming a large set of variables into a smaller one that contains most of the information in the original large set.

Results:

After removing non-tradable currencies, null values, and the "IsTrading" column, I created a new DataFrame that holds all of the crypto names. The new DataFrame consisted of 532 rows = 532 tradable cryptocurrencies on the market at that time.

Screen Shot 2022-09-06 at 6 33 25 PM

Then I used the K-means algorithm to cluster the cryptocurrencies using the PCA data. The following steps took place:

  • An elbow curve was created using hvPlot to find the best value for K.
  • Predictions were made on the K clusters of the cryptocurrencies’ data.
  • A new DataFrame was created with the same index as the crypto_df DataFrame and had the following columns: Algorithm, ProofType, TotalCoinsMined, TotalCoinSupply, PC 1, PC 2, PC 3, CoinName, and Class.

Screen Shot 2022-09-06 at 6 15 28 PM

Elbow Curve:

Screen Shot 2022-09-06 at 6 16 21 PM

New DataFrame:

Screen Shot 2022-09-06 at 6 16 47 PM

3D-Scatter plot with the PCA data and the clusters:

Screen Shot 2022-09-06 at 6 17 46 PM

Finally, I created a new table with the tradable cryptocurrencies. The total number of tradable cryptocurrencies = 532

Screen Shot 2022-09-06 at 6 20 25 PM

Screen Shot 2022-09-06 at 6 21 00 PM

Summary:

As you can see from the final 2D-Scatter plot, most of the clusters are overlapping and not quite forming the distincts groups as we had hoped. That's why I also created a 3D graph to help better visualize each group, providing three distinct groups that correspond to the three clusters that we expect the model to break the data into.

Screen Shot 2022-09-06 at 6 27 58 PM

- The 3D scatter plot can be rotated using the mouse to click and drag using the scroll wheel. Try hovering over a unique point to receive information such as each principal component.