TODO Keep only English songs List of hyperparameters
- unigrams/bigrams/trigrams
- number of features for the BoW
- TF-IDF
- lematize? stem?
Sequential:
-
Keep lines seperated with end of line tag?
-
hard categorization VS soft categorization
:christian - for my christian.py, just make sure to change the data loader to wherever you have the kaggle lyrics data on your computer. my python file is in a 'code' folder and my data is in a 'data' folder, both folder is the same directory. So if yours is like this, then it should work fine. Also I am just pulling the first 100 songs since we are still in the data preprocessing stage.