Program to analyze transfer loss across domains using example of books and electronics. Creating TF-IDF vectors and selecting k best with Chi Squared. Logistic regression model is used for training.
-
Source Domain: Books
- number of positive reviews = 1000
- number of negative reviews = 1000
- source domain training set vector: (2000, 4500)
-
Target Domain: Electronics
- number of positive reviews = 1000
- number of negative reviews = 1000
- target domain training set vector: (1600, 4500)
- target domain test set vector: (400, 4500)
- Direct Transfer:
- Training a logistic regression classifier on the Electronics training dataset.
- Evaluating it on the Electronics test dataset.
- Cross-domain Transfer:
- Training a logistic regression classifier on the Books training dataset.
- Evaluating it on the Electronics test dataset.
- Transfer Loss Across Domains:
- LOSS = direct_transfer_accuracy - cross_domain_transfer_accuracy = 0.39