STAT154

Course Project for STAT 154 in UC Berkeley

Trade-off: Lightweight BERT for QA

Overview

"Trade-off" explores a lightweight BERT model for Chinese Question Answering (QA), demonstrating its effectiveness and efficiency compared to Large Language Models (LLMs).

Dataset

DRCD: Delta Reading Comprehension Dataset.
ODSQA: Open-Domain Spoken Question Answering Dataset.

Models

BERT and its variants (ALBERT, RoBERTa).
Comparative analysis with larger LLMs (Qwen-7B, Baichuan 2).

Methodology

Fine-tuning BERT for QA tasks, focusing on preprocessing, training, and postprocessing techniques.

Results

BERT variants, particularly RoBERTa, showed high accuracy, outperforming some larger LLMs in specific tasks.

Future Work

Expanding dataset size, diversifying QA tasks, and adjusting language settings for comprehensive analysis.

lingjiechen2/STAT_154