RUCAIBox/TextBox

ModuleNotFoundError: No module named 'bert_score'

Howene opened this issue · 14 comments

Hi, thanks for providing such a powerful tool. After I clone the Textbox from the source code, I tried to run the command:"python run_textbox.py", the result was reported as follows: ModuleNotFoundError: No module named 'bert_score'. Is it a bug? And how to run the code correctly?

You need to install the package in the requirements.txt.

You need to install the package in the requirements.txt.

Hi, after I install the package from requirements.txt, it reported that "ModuleNotFoundError: No module named 'files2rouge'.
"

You can run bash install.sh if you don't have files2rouge.

You can run bash install.sh if you don't have files2rouge.

Thanks. And when I download the IMDB dataset from google drive, installed the files2rouge and executed the commands:
import argparse

from textbox.quick_start import run_textbox

if name == 'main':
parser = argparse.ArgumentParser()
parser.add_argument('--model', '-m', type=str, default='TransformerEncDec', help='name of models')
parser.add_argument('--dataset', '-d', type=str, default='IMDB', help='name of datasets')
parser.add_argument('--config_files', type=str, default=None, help='config files')

args, _ = parser.parse_known_args()

config_file_list = args.config_files.strip().split(' ') if args.config_files else None
run_textbox(model=args.model, dataset=args.dataset, config_file_list=config_file_list, config_dict={})

it would report that:
Traceback (most recent call last):
File "run_IMDBTransformer.py", line 18, in
run_textbox(model=args.model, dataset=args.dataset, config_file_list=config_file_list, config_dict={})
File "/TextBox/textbox/quick_start/quick_start.py", line 82, in run_textbox
best_valid_score, best_valid_result = trainer.fit(train_data, valid_data, saved=saved)
File "TextBox/textbox/trainer/trainer.py", line 339, in fit
train_loss = self._train_epoch(train_data, epoch_idx)
File "TextBox/textbox/trainer/trainer.py", line 183, in _train_epoch
losses = self.model(data, epoch_idx=epoch_idx)
File "anaconda3/envs/torchforgpu/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "TextBox/textbox/model/Seq2Seq/transformerencdec.py", line 168, in forward
source_text = corpus['source_idx']
KeyError: 'source_idx'

If you use the source code, please follow this instruction and run in the command line.

If you want to use from API, please first run pip install -e . in the TextBox folder. And then follow this instruction.

If you use the source code, please follow this instruction and run in the command line.

If you want to use from API, please first run pip install -e . in the TextBox folder. And then follow this instruction.

I used the source code and followed the "Quick-Start" section. I guess that the "IMDB.yaml" file need to be changed as follows:
max_vocab_size: 30000
max_seq_length: 100
split_strategy: "by_ratio"
split_ratio: [0.8,0.1,0.1]
overlength_strategy: "truncate"
language: "English"
task_type: "unconditional"
source_suffix: bin
target_suffix: bin

right?

If you use the source code, please follow this instruction and run in the command line.
If you want to use from API, please first run pip install -e . in the TextBox folder. And then follow this instruction.

I used the source code and followed the "Quick-Start" section. I guess that the "IMDB.yaml" file need to be changed as follows:
max_vocab_size: 30000
max_seq_length: 100
split_strategy: "by_ratio"
split_ratio: [0.8,0.1,0.1]
overlength_strategy: "truncate"
language: "English"
task_type: "unconditional"
source_suffix: bin
target_suffix: bin

right?

If you use the source code, please follow this instruction and run in the command line.

If you want to use from API, please first run pip install -e . in the TextBox folder. And then follow this instruction.

I am confused about the meaning of the parameter 'source_idx', how should I do can make the "Quick-Start" work well?

  1. clone the latest the repository
  2. download the IMDB dataset (raw data), and put the corpus.txt in the folder TextBox/dataset/IMDB
  3. run the command python run_textbox.py --model=TransformerEncDec --dataset=IMDB
  1. clone the latest the repository
  2. download the IMDB dataset (raw data), and put the corpus.txt in the folder TextBox/dataset/IMDB
  3. run the command python run_textbox.py --model=TransformerEncDec --dataset=IMDB

I've done all the above steps and it reported that:
File "run_textbox.py", line 18, in
run_textbox(model=args.model, dataset=args.dataset, config_file_list=config_file_list, config_dict={})
File "TextBox/textbox/quick_start/quick_start.py", line 82, in run_textbox
best_valid_score, best_valid_result = trainer.fit(train_data, valid_data, saved=saved)
File "TextBox/textbox/trainer/trainer.py", line 339, in fit
train_loss = self._train_epoch(train_data, epoch_idx)
File "/data/home/yangzuoxi/HPCC/TextBox/textbox/trainer/trainer.py", line 183, in _train_epoch
losses = self.model(data, epoch_idx=epoch_idx)
File "anaconda3/envs/tfgpu/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
return forward_call(*input, **kwargs)
File "TextBox/textbox/model/Seq2Seq/transformerencdec.py", line 168, in forward
source_text = corpus['source_idx']
KeyError: 'source_idx'

Sorry I didn't notice before, IMDB is a dataset for unconditional generation, we do not support unconditional generation with Transformer.

Sorry I didn't notice before, IMDB is a dataset for unconditional generation, we do not support unconditional generation with Transformer.

I think Transformer can be used on the IMDB, would you plan to make it support unconditional generation with Transformer?

Yes, we plan that. We only support unconditional generation with RNN, now.
Thanks for your suggestion.

Yes, we plan that. We only support unconditional generation with RNN, now.
Thanks for your suggestion.

And if it is possible, I suggest that you can provide a table to show which model can be used on different datasets.

OK, I will provide it in the next version. Thank you!