/Transfer-Loss-NLP

Program to analyze transfer loss across domains using TF-IDF vectors with Chi squared into logistic regression model.

Primary LanguageHTMLMIT LicenseMIT

Transfer Loss Across Domain

Program to analyze transfer loss across domains using example of books and electronics. Creating TF-IDF vectors and selecting k best with Chi Squared. Logistic regression model is used for training.

Dataset:

  1. Source Domain: Books

    • number of positive reviews = 1000
    • number of negative reviews = 1000
    • source domain training set vector: (2000, 4500)
  2. Target Domain: Electronics

    • number of positive reviews = 1000
    • number of negative reviews = 1000
    • target domain training set vector: (1600, 4500)
    • target domain test set vector: (400, 4500)

Result:

  1. Direct Transfer:
    • Training a logistic regression classifier on the Electronics training dataset.
    • Evaluating it on the Electronics test dataset.


  1. Cross-domain Transfer:
    • Training a logistic regression classifier on the Books training dataset.
    • Evaluating it on the Electronics test dataset.


  1. Transfer Loss Across Domains:
    • LOSS = direct_transfer_accuracy - cross_domain_transfer_accuracy = 0.39