rasbt/musicmood

About running the tests and webapp.

upsteer opened this issue · 7 comments

I am just starting with Machine Learning, so please bear with me :).

I was just wondering if you could give a list of steps to be done for the tests and acquire the results and also to implement the acquired result to the webapp.

Much appreciated.

rasbt commented

Hi there. As far as I remember, I put all the experimental procedures into Jupyter notekook (in the code subdirectory. For example, https://github.com/rasbt/musicmood/blob/master/code/classify_lyrics/nb_init_model.ipynb

The webapp code itself is in the code/webapp subdirectory, but I haven't written up a step-by-step procedure of how I set it up. However, I have described a very similar web app approach in my Python Machine Learning book, which explains it step-by-step. Maybe write me an email and I can share the relevant part of the chapter with you

Is there someway I can input the song lyrics in the jupyter notebook itself, and get the classification?

rasbt commented

I would say that the notebooks are more elaborate and go through the whole data fitting pipeline. For just using the classifiers, you could use the pickle objects in https://github.com/rasbt/musicmood/tree/master/code/webapp-lyricsonly_py27/pkl_objects

E.g.,

pkl_dir = 'link/to/pkl_objects'

try:
    d = open(os.path.join(pkl_dir, 'label_encoder.p'), 'rb')
    le = pickle.load(d)
finally:
    d.close()

try:
    d = open(os.path.join(pkl_dir, 'countv.p'), 'rb')
    vect = pickle.load(d)
finally:
    d.close()

try:
    d = open(os.path.join(pkl_dir, 'clf_countv.p'), 'rb')
    clf = pickle.load(d)
finally:
    d.close()



def classify(document):

    x_vect = vect.transform([document])
    proba = np.max(clf.predict_proba(x_vect))
    pred = clf.predict(x_vect)[0]
    label = le.inverse_transform(pred)
    return label, proba

classifiy('this is some text to classify')

It might be though that the pickle files don't work on your system. In this case, it's probably easiest to train the classifiers yourself following the flow in the jupyter notebook at https://github.com/rasbt/musicmood/blob/master/code/classify_lyrics/nb_init_model.ipynb

rasbt commented

Hm, this could be because these original models were created with Python 2.7 (I did this because the server I was using for the webapp only supported Py 2.7 at that time as far as I remember). In that case, you probably need to use Python 2.7 or you would have to regenerate the pickle files using the Jupyter notebook (it should run fine in python 3). The notebook should be this one: https://github.com/rasbt/musicmood/blob/master/code/classify_lyrics/nb_init_model.ipynb

rasbt commented

You are right ... and this is weird! It's been almost 4 years ago, and I don't remember the details, but looking at the code, I think the last section in the notebook was not completely saved and is missing the pickling of the vectorizer (the cell numbers in the last section when the objects are created for the webapp, it looks like something went out of order).

If you add something like that


import pickle

pickle_out = open('./countv.p', 'wb')
pickle.dump(your_fitted_count_vectorizer, pickle_out)
pickle_out.close()

regarding the issue you got earlier:

'ascii' codec can't decode byte 0x83 in position 28: ordinal not in range(128)

the following SO thread might be helpful?

https://stackoverflow.com/questions/28218466/unpickling-a-python-2-object-with-python-3