/BERT-on-Quora

Fine tuning BERT for text classification with TensorFlow on Quora QA dataset

Primary LanguageJupyter Notebook

Fine-Tuning BERT for Text Classification and Sentiment Analysis with TensorFlow

Overview

This guided project demonstrates how to fine-tune BERT (Bidirectional Encoder Representations from Transformers) for text classification and sentiment analysis using TensorFlow and TensorFlow Hub. The project aims to predict whether sentences are toxic or sincere by leveraging the power of BERT, a state-of-the-art language representation model.

Project Objectives

  1. Build TensorFlow input pipelines for text data and natural language processing using the tf.data API.
  2. Tokenize and preprocess text as input for BERT models.
  3. Fine-tune BERT for text classification on a custom dataset using TensorFlow and TensorFlow Hub.

Dataset

The dataset used in this project is sourced from a question-and-answering website called Quora. It includes labeled questions, with insincere questions marked as toxic (label 1) and sincere questions marked as non-toxic (label 0).

Project Structure

The project code is organized as follows:

  • notebook: Jupyter notebooks for each project step.
  • README.md: Project documentation (you're reading it).

Dependencies

Ensure you have the following libraries installed:

  • TensorFlow
  • TensorFlow Hub
  • Pandas
  • NumPy
  • Matplotlib

Fine-Tuning Process

The fine-tuning process involves:

  1. Preprocessing the dataset and converting raw text data into a format suitable for BERT.
  2. Using TensorFlow Hub to import the pre-trained BERT model as a Keras layer.
  3. Fine-tuning BERT on the custom dataset for text classification.
  4. Evaluating the model's performance on the test data.

Acknowledgements

Special thanks to Snehan Kekre, the instructor at Coursera.org, for providing this project and valuable resources for understanding BERT.

Certificate of Completion: link