/nyaaCategorizer

Primary LanguageJupyter Notebook

nyaaCategorizer

Nyaa torrent classifier. Uses torrent description text to predict its category.
Example input.
image Output here is Literature - English-translated.
Possible outputs.
image

Data located at:
nyaaCategorizer_database repo

Models


Model 1
trainModel.zip, nyaaCategorizer.ipynb
accuracy: 0.5989, cr accuracy: 0.28

Model 2
trainModel_balanced.zip, nyaaCategorizer_balanced.ipynb
accuracy: 0.1159, cr accuracy: 0.03

  • balanced classes (class_weights)

Model 3
trainedModel_balanced_nosub_extra.zip, nyaaCategorizer_balanced_nosub_extra.ipynb
accuracy: 0.987, cr accuracy: 0.57

  • balanced classes (class_weights)
  • only look at the main category (subcategories converted)
  • extra
    extra = {
    • shuffle the dataset at the beginning, //irrelevant
    • removed fileAmount feature,
    • removed more100File feature,
    • increased epochs from 5 to 10,
    • decreased batch size from 100 to 64,
    • removed Dropout layers,
    • learning_rate=0.0001
      }

Model 4
trainedModel_balanced_nosub_extra_maincat.zip, nyaaCategorizer_balanced_nosub_maincats_extra.ipynb
accuracy=0.9919, cr accuracy: 0.56

  • balanced classes (class_weights)
  • only look at the main category (subcategories converted)
  • discard all data that is 'Software' or 'Pictures' since there isnt a lot of it
  • extra /<br>
  • use only one mid layer instead of 3
  • removed "from_logits=True"
  • changed accuracy to categorical_accuracy
    versions = {
    1. above + removed Software and Pictures data
    2. added back fileAmount = same
    3. only one mid layer (instead of 3) -- best performance (f1: 0.59)
    4. removed "from_logits=True", changed accuracy to categorical_accuracy - same as always
      }

Model 5
trainedModel_final.zip, nyaaCategorizer_final.ipnyb
accuracy: 0.9907, cr accuracy: 0.58

  • no class balancing
  • only look at the main category (subcategories converted)
  • discard all data that is 'Software' or 'Pictures' since there isnt a lot of it
  • back to 3 Dense layers
  • added metrics

Model 6
trainedModel_LSTM.zip, nyaaCategorizer_lstm.ipynb
accuracy: 0.9860, cr accuracy: 0.57

  • LSTM with 2 dense layers of 128 and 64 nodes

Model 7
nyaaCategorizer_final_allcats.zip, nyaaCategorizer_final_allcats.ipynb
accuracy: 0.9189, cr accuracy: 0.27

  • same as final, but on all categories, with class balancing

Model 8
nyaaCategorizer_final_allmaincats.zip, nyaaCategorizer_final_allmaincats.ipnyb
accuracy: 0.9903, cr accuracy: 0.57

  • same as final, but on all main categories

Figures


Lograithm of sorted file sizes:

Model1 evaluation:

Model1 classification report:

Model1 confusion matrix:

Refer to /images for other figures.