Involves data cleaning such as:
- Remove all cryptocurrencies that aren’t trading.
- Remove all cryptocurrencies that don’t have an algorithm defined.
- Remove the IsTrading column.
- Remove all cryptocurrencies with at least one null value.
- Remove all cryptocurrencies without coins mined.
- Store the names of all cryptocurrencies on a DataFramed named coins_name, and use the crypto_df.index as the index for this new DataFrame.
- Remove the CoinName column.
- Create dummies variables for all of the text features, and store the resulting data on a DataFrame named X.
- Use the StandardScaler from sklearn (Links to an external site.) to standardize all of the data from the X DataFrame. Remember, this is important prior to using PCA and K-means algorithms.
Reduce the dimensions of the X DataFrame down to three principal components.
Using elbow curve and determined best K-value of 5.
KMeans algorithm was used to predict K clusters and create new dataframe clustered_df.
3D Scatter plot shows the principal component clusters.
Table and scatter plot shows several coins that are mined (majority skewed to left). Most mined was BitTorrent.