SVM model for domains detection

By means of a SVM model (linear kernel) we classify reviews between two specific domains: hotels and electronics. The main details of the model in the follow stages are:

Trainning:
- Corpus with 80000 vectors, 40000 per each domain (80000, 23789)
- SVM with 6833 Supported vectors (6833, 23789)
- Label 0 for hotels and label 1 for electronics
- K-fold validation (30%): 0.9573%
Classification: In this stage, we load the previous saved models, therefore we have the next elapsed times to:
- SVM model elapsed time 0.0060 s.
- Vectorized elapsed time 0.1760 s.
- Whole model elapsed time 0.1830 s.
- Text classification: 0.002480 s.

Classification

Firstly, we should load the Svm model

# Load previus calculated SVM model with a specific language
clf = joblib.load('models/en/svm_model.pkl')

# Load vectorized index with a specific language
with open('models/en/vectorizer.pkl', 'rb') as i_file:
    vectorizer = pickle.load(i_file)

With the SVM model and the vectorized DS, we should convert the sample texts into a vector through vectorized DS

vector = vectorizer.transform("some text to classify").toarray()

later, we can predict the label by means of the calculated SVM model

label = clf.predict(vector)

rmaestre/SVM-for-domains-detection

SVM model for domains detection

Classification