PlusLabNLP/Plot-guided-Coherence-Evaluation

Problems in use

HappyGu0524 opened this issue · 2 comments

Hi,
I want to use your work as an evaluation metric. But I come across some problems.

When evaluating stories with run_glue.py, I can not import pytorch_model.bin. May you provide a detailed argument setting?

On the other hand, I provide an easy implementation but can not achieve the performance in the paper.

import torch
from transformers import RobertaConfig, RobertaTokenizer, RobertaForSequenceClassification, LongformerConfig, LongformerForSequenceClassification, LongformerTokenizer
import argparse
from tqdm import tqdm

def main():
    parser = argparse.ArgumentParser()
    parser.add_argument('--model', type=str, default='RoBERTa')
    parser.add_argument('--input_file', type=str, default='Data/AMT/Pred_advROC_roberta.txt')
    parser.add_argument('--output_file', type=str, default='roberta_ROC_predict.txt')
    parser.add_argument('--cuda', action='store_true')
    args = parser.parse_args()

    if args.model == 'RoBERTa':
        args.model_path = "Models/Ft_RoBERTa/pytorch_model.bin"
        model = RobertaForSequenceClassification(RobertaConfig.from_pretrained('roberta-large'))
        model.load_state_dict(torch.load(args.model_path))
        model.eval()
        tokenizer = RobertaTokenizer.from_pretrained('roberta-large')
    elif args.model == 'Longformer':
        args.model_path = "Models/Ft_Longformer/pytorch_model.bin"
        model = LongformerForSequenceClassification(LongformerConfig.from_pretrained('allenai/longformer-base-4096'))
        model.load_state_dict(torch.load(args.model_path))
        model.eval()
        tokenizer = LongformerTokenizer.from_pretrained('allenai/longformer-base-4096')
    else:
        raise Exception('Wrong models')
    
    output_file = open(args.output_file, 'w')

    with open(args.input_file, 'r') as f:
        for line in tqdm(f.readlines()):
            line = line.split('\t')[0].strip()
            inputs = tokenizer(line.strip(), return_tensors="pt")
            outputs = model(**inputs)
            output = torch.softmax(outputs[0],dim=-1)
            output_file.write(line.strip() + '\t' + str(output[0][1].item()) + '\n')
    output_file.close()

if __name__ == "__main__":
    main()

I perform evaluation on the ROCStories and get:

The Spearman correlation is 0.5322315059606276 (2.413935997251995e-23)
The Kendall correlation is 0.38673689381499476 (4.993793099715013e-21)

which is slightly lower than the results reported in the paper.
Would you mind giving me some help?

Hi,

Do you manage to figure out how to use run_glue.py and produce the results from paper?
Is it possible for you to provide some help on how to use the provided pre-trained models to evaluate new stories?
Thank you in advance!

Hi,

I have added run_glue.py from transformers==3.1.0 with some modifications.

To run the model for predictions:
python run_glue.py --data_dir=DATA_DIR  --test_fname=TEST_FILE --model_type=roberta --model_name_or_path=MODEL_PATH --task_name=CoLA --output_dir=MODEL_PATH --do_lower_case --do_pred_textscores