/Text-Classification-NLP

The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. To the best of my knowledge, it was originally collected by Ken Lang, probably for his Newsweeder: Learning to filter netnews paper, though he does not explicitly mention this collection. The 20 newsgroups collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classification and text clustering.

Primary LanguageJupyter Notebook

Text Classification

* The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. * To the best of my knowledge, it was originally collected by Ken Lang, probably for his Newsweeder: Learning to filter netnews paper, though he does not explicitly mention this collection. * The 20 newsgroups collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classification and text clustering.

Classification Report

-------------------------------------------------------
      Sr.no.  precision    recall  f1-score   support 
-------------------------------------------------------             
          0       0.55      0.77      0.64       168
          1       0.68      0.69      0.69       171
          2       0.66      0.69      0.67       192
          3       0.60      0.68      0.64       190
          4       0.66      0.69      0.67       176
          5       0.77      0.75      0.76       175
          6       0.74      0.76      0.75       177
          7       0.69      0.72      0.71       174
          8       0.73      0.78      0.76       182
          9       0.90      0.86      0.88       198
         10       0.92      0.89      0.90       200
         11       0.88      0.87      0.87       171
         12       0.74      0.68      0.71       207
         13       0.84      0.80      0.82       175
         14       0.94      0.78      0.85       198
         15       0.79      0.75      0.77       200
         16       0.73      0.82      0.77       170
         17       0.88      0.81      0.85       175
         18       0.66      0.62      0.64       174
         19       0.55      0.44      0.49       209
-------------------------------------------------------         
avg / total       0.75      0.74      0.74      3682
-------------------------------------------------------

Cross Validation score

-----------|------------|----------
0.65746753 | 0.69543974 | 0.6685761
-----------|------------|----------

File Runtime

  • 29.508705174922945 mins