Authors: Pitts
This repository contains the implementations of the system described in the paper "XXX" on XXX at the XXX conference.
ACD
└── src
├── commons
│ ├── globals.py
│ └── utils.py
├── data # implementation of dataset class
├── modeling
│ ├── layers.py # implementation of neural layers
│ ├── model.py # implementation of neural networks
│ └── train.py # functions to build, train, and predict with a neural network
├── experiment.py # entire pipeline of experiments
└── main.py # entire pipeline of our system
We have updated the code to work with Python 3.10, Pytorch 2.0.1, and CUDA 11.7. If you use conda, you can set up the environment as follows:
conda create -n ACD python==3.10
conda activate ACD
conda install pytorch==2.0.1 torchvision==0.15.2 torchaudio==2.0.2 pytorch-cuda=11.7 -c pytorch -c nvidia
conda install -y transformers tqdm matplotlib pandas pylint
conda install -c conda-forge sklearn-contrib-lightning
Also, install the dependencies specified in the requirements.txt:
pip install -r requirements.txt
In this repository, we provide some toy examples to play with the code. Due to the policy, we are not allowed to release the data. If you need, please email Shuguang Chen (schen52@uh.edu) and we will provide the following data:
XXX
We use config files to specify the details for every experiment (e.g., hyper-parameters, datasets, etc.). You can modify config files in the configs
directory and run experiments with following command:
CUDA_VISIBLE_DEVICES=[gpu_id] python src/main.py --config /path/to/config
If you would like to run experiments with VisualBERT, please download the pretrained weights from VisualBERT and replace pretrained_weights
in the config file:
...
"model": {
"name": "mner",
"model_name_or_path": "bert-base-uncased",
"pretrained_weights": "path/to/pretrained_weights",
"do_lower_case": true,
"output_attentions": false,
"output_hidden_states": false
},
...
@inproceedings{chen-etal-2021-images,
title = "Can images help recognize entities? A study of the role of images for Multimodal {NER}",
author = "Chen, Shuguang and
Aguilar, Gustavo and
Neves, Leonardo and
Solorio, Thamar",
booktitle = "Proceedings of the Seventh Workshop on Noisy User-generated Text (W-NUT 2021)",
month = nov,
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.wnut-1.11",
pages = "87--96",
abstract = "Multimodal named entity recognition (MNER) requires to bridge the gap between language understanding and visual context. While many multimodal neural techniques have been proposed to incorporate images into the MNER task, the model{'}s ability to leverage multimodal interactions remains poorly understood. In this work, we conduct in-depth analyses of existing multimodal fusion techniques from different perspectives and describe the scenarios where adding information from the image does not always boost performance. We also study the use of captions as a way to enrich the context for MNER. Experiments on three datasets from popular social platforms expose the bottleneck of existing multimodal models and the situations where using captions is beneficial.",
}
Feel free to get in touch via email to schen52@uh.edu.