19th Century English Fiction1 books Genre Identification on Gutenberg Corpus
The task contains 2 main parts:
- Extracting ficiton books related features by using feature engineering techniques
- Apply supervised learning technique for classification of books to one genre class
1. Feature Extraction related to fiction book:
- Sentiment analysis - Beginning of book
- Sentiment analysis - End of book
- Sentence count
- Average sentence length
- Flesch reading score
- Word count
- Proper noun count
2. Supervised Learning techniques used:
- SVM
- Naive Bayes
- Random Forest
- Run feature_extractor.py to create features from the books
- Input: books folder containing html format books
- Output: features.csv file containing extracted features
2. Supervised Learning Algorithms:
- Simple_NB_SVM.py file containing SVM and NB method
- Leave_One_Out_SVM_NB.py file containing SVM and NB with leave one out method