DigiMag fasttext

Text classification with fasttext and digikala magazine dataset.

Model classifies texts in 7 categories :

Video Games
Shopping Guide
Health Beauty
Science Technology
General
Art Cinema
Books Literature

Requirements

fasttext
numpy
pandas

$ pip install -r requirements.txt

Training and testing model

At first you should download digimag dataset from this link and extract it into src directory.

Now enter following command to preprocess dataset, train and save model(model will saved in model.bin file) :

$ python preprocess.py && python train.py

Test model :

$ python test.py

Example

import fasttext

categories = {
    1 : 'Video Games',
    2 : 'Shopping Guide',
    3 : 'Health Beauty',
    4 : 'Science Technology',
    5 : 'General',
    6 : 'Art Cinema',
    7 : 'Books Literature',
}

classifier = fasttext.load_model('model.bin')
input_txt = "your text"
result = classifier.predict(input_txt)
print("Predicted label :", categories[int(result[0][0].replace("__label__", ""))])

See example.py.

License

MIT

bistcuite/digimag-fasttext

DigiMag fasttext

Requirements

Training and testing model

Example

License