Human Attention for Text Classification

Re-implementation of the paper Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL2020).

Install requirements

$ poetry install

Download and Split Yelp dataset

Download from Yelp.com

https://www.yelp.com/dataset/download

Split the dataset

The Yelp dataset is so large that it is divided into subsets in advance.
- After that, we can get tng.jsonl, val.jsonl, and tst.jsonl from data directory.

$ allennlp split-dataset \
    --input-file data/yelp_academic_dataset_review.json \
    --output-dir data/ \
    --tng-ratio 0.8 \
    --val-ratio 0.1 \
    --tst_ratio 0.1

Preprocess HAM dataset

$ allennlp preprocess-ham-dataset \
    --ham-dataset-dir data/ham-dataset/raw_data/ \
    --output-dir data/

Train RNN model

$ CUDA_VISIBLE_DEVICES=0 allennlp train config/base.jsonnet -s outputs -o '{"trainer": {"cuda_device": 0}}'

Reference

Sen, Cansu, et al. "Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words?." Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020.

shunk031/human-attention-map-for-text-classification

Human Attention for Text Classification

Install requirements

Download and Split Yelp dataset

Download from Yelp.com

Split the dataset

Preprocess HAM dataset

Train RNN model

Reference