Given a text and a reason, predict if text satisfies the reason. You can use the train file for any training and report metrics on evaluation file.
- The CSV files have 3 columns
- text
- reason: a short description
- label:
- 0: text does not satisfy the reason
- 1: text satisfies the reason
- The dataset has been cleaned to a certain extent. You can probe more.
Note: Small train dataset with only positive samples is intentional.
https://drive.google.com/drive/folders/1HInfR5Sspv-k3rMPgJyXjXiJJEoCyOtY?usp=sharing
The python scripts in this repository addresses the issues below. Run on Google colab, script can be foundhere
- Required packages
- Label class Imbalance
-
- Data insights:
- Baseline approach (use only transformer models)
- Training approach (use only transformer models)
- Artificial neg generation techniques.
- Data insights:
- Metrics
- Ablation Study table (different tabular model architecture results comparison)
- Fine-tuned the learning rate.
- Used a learning rate scheduler.
- Used a pre-trained model specifically designed for semantic similarity, such as sentence-transformers/bert-base-nli-mean-tokens.
- Insufficient data from data insights analysis