SLIM: Explicit Slot-Intent Mapping with BERT for Joint Multi-Intent Detection and Slot Filling

Pytorch implementation of SLIM: Explicit Slot-Intent Mapping with BERT for Joint Multi-Intent Classification and Slot Filling

Update 24 May:

Thanks for Feifan Song, who reminded us the fact that all the baseline methods are evaluated on MixSNIPS and MixATIS dataset, while our method is evaluate on MixATIS_clean and MixSNIPS_clean. We decide to re-evaluate our method on MixSNIPS and MixATIS dataset, and update the result very soon.

Insight

The previous multi-intent works predict intents and slots by feeding the same coarse-grained information distribution to assist slot prediction. However, in the multi-intent setting, different slots are mapped to different intents. Therefore, we take advantage of the slot-intent mapping to guide the intent detection and slot filling.

Model Architecture

Predict intent and slot at the same time from one BERT model (=Joint model)
total_loss = intent_loss + slot_coef * slot_loss + slot_intent_coef * slot_intent_loss

Dependencies

Please refer to requirements.txt

Dataset

The following table includes the train/dev/test split of MixSNIPS and MixATIS. Also, we reports the number of intent labels and slot labels in the training set. Also, based on the mechanism of MixSnips / MixATIS construction, we label the slot-level intent.

	Train	Dev	Test	Intent Labels	Slot Labels
MixATIS	13,161	759	828	21	118
MixSnips	39,776	2,198	2,199	7	71

Also, we use DSTC4. The contact of data is Teo Poh Heng
The number of labels are based on the train dataset.
Add UNK for labels (For intent and slot labels which are only shown in dev and test dataset)
Add PAD for slot label

Training & Evaluation

All experiments are conducted using a single GeForce GTX TITAN X GPU.

$ python3 main.py --task {task_name} \
                  --model_type {model_type} \
                  --model_dir {model_dir_name} \
                  --do_eval

# For MixSNIPS
$ python3 main.py --task mixsnips \
                --model_type multibert \
                --model_dir mixsnips_model \
                --multi_intent 1 \
                --intent_seq 0 \
                --tag_intent 1 \
                --BI_tag 1 \
                --intent_attn 1 \
                --cls_token_cat 1 \
                --num_mask 6 \
                --slot_loss_coef 2 \
                --patience 0 \
                --seed 25\
                --do_train

# For MixATIS
$ python3 main.py --task mixatis \
                --model_type multibert \
                --model_dir mixatis_model \
                --multi_intent 1 \
                --intent_seq 0 \
                --tag_intent 1 \
                --BI_tag 1 \
                --intent_attn 1 \
                --cls_token_cat 1 \
                --num_mask 6 \
                --slot_loss_coef 2 \
                --patience 0 \
                --seed 12\
                --do_train

Prediction

$ python3 main.py --task {task_name} \
                  --model_type {model_type} \
                  --model_dir {model_dir_name} \
                  --do_eval

# For MixSNIPS
$ python3 main.py --task mixsnips \
                --model_type multibert \
                --model_dir mixsnips_model \
                --multi_intent 1 \
                --intent_seq 0 \
                --tag_intent 1 \
                --BI_tag 1 \
                --intent_attn 1 \
                --cls_token_cat 1 \
                --num_mask 6 \
                --slot_loss_coef 2 \
                --patience 0 \
                --seed 25 \
                --do_eval

# For MixATIS
$ python3 main.py --task mixatis \
                --model_type multibert \
                --model_dir mixatis_model \
                --multi_intent 1 \
                --intent_seq 0 \
                --tag_intent 1 \
                --BI_tag 1 \
                --intent_attn 1 \
                --cls_token_cat 1 \
                --num_mask 6 \
                --slot_loss_coef 2 \
                --patience 0 \
                --seed 12 \
                --do_eval

Results

We will later provide the detailed hyperparameter settings
Only test with uncased model

F2-Song/SLIM