/SFA

[IJCAI2024]This paper is accepted by IJCAI2024

Primary LanguagePythonApache License 2.0Apache-2.0

Modeling Selective Feature Attention for Lightweight Text Matching

This repository is the implementation presented in the paper "Modeling Selective Feature Attention for Lightweight Text Matching"

Install the package

To use the model defined in this repository, you will first need to install PyTorch on your machine by following the steps described on the package's official page (this step is only necessary if you use Windows). Then, to install the dependencies necessary to run the model, simply execute the command pip install --upgrade . from within the cloned repository (at the root, and preferably inside of a virtual environment).

Load the data

The load_data.py script located in the scripts/ folder of this repository can be used to download some NLI dataset and pretrained word embeddings. By default, the script fetches the SNLI corpus and the GloVe 840B 300d embeddings.

The script's usage is the following:

python load_data.py

where the downloaded data must be saved (defaults to ../data/).

Preprocess the data

Before the downloaded corpus and embeddings can be used in the base model, they need to be preprocessed. This can be done with the preprocess_snli.py scripts in the scripts/preprocessing folder of this repository.

The scripts' usage is the following:

python preprocess_snli.py

where config is the path to a configuration file defining the parameters to be used for preprocessing. Default configuration files can be found in the config/preprocessing folder of this repository.

Train the model

The train_snli.py scripts in the scripts/training folder can be used to train the model on some training data and validate it on some validation data.

The script's usage is the following:

python train_snli.py [-h] [--config CONFIG] [--checkpoint CHECKPOINT]

where config is a configuration file (default ones are located in the config/training folder), and checkpoint is an optional checkpoint from which training can be resumed. Checkpoints are created by the script after each training epoch, with the name esim_*.pth.tar, where '*' indicates the epoch's number.

Citation

If you find the code is helpful, please cite:

@article{zangmodeling,
  title={Modeling Selective Feature Attention for Lightweight Text Matching},
  author={Zang, Jianxiang and Liu, Hui}
}