# musical-genre-identification Dan O'Connor oconnord6@gmail.com https://github.com/oconnord12 Musical genre identification - identifying musical genres based soley off acoustic features. This project I endeavored to predict musical genres from a wide array of pre-extracted acoustic features using various ML classification models. The acoustic features were extracted from a 30 second clip of over 100,000 songs on Free Music Archieve. The feature extractions were completed by Michaël Defferrard in a paper titled 'FMA: A Dataset For Music Analysis' (2017) (arXiv link: https://arxiv.org/pdf/1612.01840.pdf, GitHub link:https://github.com/mdeff/fma) This project was inspired by creating more genre specific playlists on streaming services, while certainly was not able to accomplish that I learned a substantial amount about feature extraction from audio samples, multi-class classifcation, and the process of creating and testing models. Documents included in this project folder: Data: in included at this google drive link: https://drive.google.com/drive/folders/1-kxpI6UU6_g4QpmqSZ4TjdMH7LcrNN2y?usp=share_link In the google drive you will find the following: -Original Data: -features.csv -tracks.csv -genres.csv -Added Data: -scores_df(1-4) was updated notebook to notebook, see scores_df4.csv for final success metrics. -accuracy_df(1-4) was updated notebook to notebook, see accuracy_df4.csv for model accuracies. -filtered_model_df.csv the data used for all modeling -features_single_header.csv a modified version of features.csv where the multi-level header is compressed to a single -Models: All of the trained optimized models were saved -decision_tree_optimized.joblib -final_nn_model.h5 -final_smote_nn_model.h5 -KNN.joblib -logistic_regression_optimized.joblib -random_forest.joblib -Audio: These are songs from FMA -Bird Names - Referents -Delta Dreambox - Without A Sound -Felipe Sarro - Bach, Prelude, BWV 855a, Siloti transcription -Nonima - Frenetic (ft. theAudiologist) -Images: Used in notebooks -Genre_hierachy -feature tree Jupyter Notebooks: The notebooks are ordered by the number appearing first -1_Dan_OConnor_GenreIdentification_Data_Cleaning: Clean and load original data -2_Dan_OConnor_GenreIdentification_Feature_Extraction: Demonstrate and explain the feature extraction process -3_Dan_OConnor_GenreIdentification_EDA: Basic EDA -4_Dan_OConnor_GenreIdentification_Baseline_RandomForest_DecisionTree: Create baseline model, RF, and DT -5_Dan_OConnor_GenreIdentifation_LogReg_KNN: Create LR and KNN models -6_Dan_OConnor_GenreIdentification_NN: Create neural network -7_Dan_OConnor_GenreIdentification_SMOTE_Modeling: Train a neural network with SMOTE training data -8_Dan_OConnor_GenreIdentification_Model_Analysis: Analyze accuracy and other success metrics of models -9_Dan_OConnor_ConfMatrix_ModelingFun: Inspect the best models confusion matrix and try some unseen samples on models Python files: -myfunction: stores a single function used throughout the notebooks PDFs: -Final capstone report -Final presentation slides Required Python Libraries: -Pandas -Tensorflow (NN modeling) -Seaborn -Librosa (feature extraction) -Matplotlib -os.path (used to properly load original .csv's) -ast (used to properly load original .csv's) -numpy -IPython.display (play audio files) -scikit learn A special thank you to my mentors and peers at BrainStation for all the support and help throughout this project.