This repository contains a TensorFlow implementation that demonstrates how to fine-tune BERT-based models for the sentence-pair classification task. By leveraging the Hugging Face transformers and datasets libraries, this project enables researchers and developers to quickly experiment and evaluate various BERT models.
The sentence-pair classification task is a crucial aspect of the semantic textual similarity/duplicate problem. The goal is to model the textual interaction between two pairs of questions and determine their relationship.
The dataset used in this project is the Quora Question Pair (QQP) dataset, which forms a part of the GLUE benchmark. The QQP dataset consists of question pairs, and the objective is to predict whether one sentence is the paraphrase of the other. The evaluation metrics for this task include F1 score and accuracy.
To get started, follow these steps to set up the environment and install the necessary dependencies:
- Clone this repository
- Install the required libraries:
pip install -r requirements.txt
Follow the instructions below to run the fine-tuning process and experiment with different BERT-based models:
- Download and prepare the QQP dataset.
- Use the provided Jupyter notebook to fine-tune the BERT model.
- Evaluate the performance using the F1 score and accuracy metrics.
- Compare the results with a model that has not undergone fine-tuning.
Contributions are welcome! If you have any ideas or improvements, feel free to create a pull request. For major changes, please open an issue to discuss the proposed changes.
This project is licensed under the MIT License.
Special thanks to the creators and maintainers of the Hugging Face transformers and datasets libraries, which made this work possible.
If you have any questions or inquiries, feel free to contact the project maintainers:
- Piyush Sukhija - Project Maintainer
Happy fine-tuning!