/STAT_154

Course Project for STAT 154 in UC Berkeley

Primary LanguageJupyter Notebook

STAT154

Course Project for STAT 154 in UC Berkeley

Trade-off: Lightweight BERT for QA

Overview

"Trade-off" explores a lightweight BERT model for Chinese Question Answering (QA), demonstrating its effectiveness and efficiency compared to Large Language Models (LLMs).

Dataset

  • DRCD: Delta Reading Comprehension Dataset.
  • ODSQA: Open-Domain Spoken Question Answering Dataset.

Models

  • BERT and its variants (ALBERT, RoBERTa).
  • Comparative analysis with larger LLMs (Qwen-7B, Baichuan 2).

Methodology

Fine-tuning BERT for QA tasks, focusing on preprocessing, training, and postprocessing techniques.

Results

BERT variants, particularly RoBERTa, showed high accuracy, outperforming some larger LLMs in specific tasks.

Future Work

Expanding dataset size, diversifying QA tasks, and adjusting language settings for comprehensive analysis.