MORE : a Metric learning based framework for Open-domain Relation Extraction

This repository is built for the source code of paper -- MORE: A METRIC LEARNING BASED FRAMEWORK FOR OPEN-DOMAIN RELATION EXTRACTION . You can follow the steps below to use our code.

1. Configure the Environment

To run the code, you need ：

torch>=1.3.0
tensorflow>=1.9.0
keras>=2.2.5
transformers==3.2.0

You can run the following command to set up a new Anaconda environment:

conda create -n more python=3.6
pip install -r ./requirements.txt

We suggest that you use the same environment as ours to avoid any problems.

2. Prepare the Datasets

In this code, we use two real-world RE datasets:

FewRel : We follow RSNs. The processed dataset is already in ./data/datasets/fewrel_ori/ .
NYT+FB-sup: We use the original NYT+FB and process it to NYT+FB-sup. The dataset is not open source, but you can get the sample if you need.

To process nyt_ori.txt (suppose you already own it and store it in the ./data/datasets/nyt_su/ ), run the following command:
```
python ./data/datasets/nyt_su/process2json.py
python ./data/datasets/nyt_su/nyt_divide_supervision.py
```
then the original .txt file will be processed into .json format and be divided into train\dev\test(6:2:2).

3. Run it

In our experiments, we use CNN and BERT for our extractor. The architecture of CNN is same as RSNs used, and the pre-trained language model we exploit is huggingface transformers.

On FewRel:

MORE(CNN)

python main_cmd.py --dataset fewrel

MORE(CNN)+VAT

python main_cmd.py --dataset fewrel --VAT 1 --epoch_num 4 --warm_up 3 --power_iterations 1 --p_mult 0.03 --lambda_V 1

MORE(BERT)

python main_cmd.py --dataset fewrel --learning_rate 0.00001 --batch_num 1000 --BERT 1

On NYT+FB-sup:

MORE(CNN)

python main_cmd.py --dataset nyt

MORE(CNN)+VAT

python main_cmd.py --dataset nyt --VAT 1 --epoch_num 6 --warm_up 4 --power_iterations 1 --p_mult 0.5 --lambda_V 1.5

MORE(BERT)

python main_cmd.py --dataset nyt --learning_rate 0.00001 --batch_num 1000 --BERT 1

Note that if you have enough computing resources, you can try to use MORE(BERT)+VAT (We didn't list this result on paper due to the limitation of GPU memory) :

python main_cmd.py --dataset fewrel --VAT 1 --epoch_num 4 --warm_up 0 --power_iterations 1 --p_mult 0.03 --lambda_V 1 --learning_rate 0.00001 --batch_num 1000 --BERT 1
python main_cmd.py --dataset nyt --VAT 1 --epoch_num 4 --warm_up 0 --power_iterations 1 --p_mult 0.5 --lambda_V 1.5 --learning_rate 0.00001 --batch_num 1000 --BERT 1

4. Future Work

Optimize virtual adversarial training.
Complete MORE(BERT)+VAT.

RenzeLou/MORE

MORE : a Metric learning based framework for Open-domain Relation Extraction

1. Configure the Environment

2. Prepare the Datasets

3. Run it

4. Future Work