Test PTB Dependency Parsing Model
woshiyyya opened this issue · 9 comments
Hi there!
I am trying to test with your pretrained dependency parsing model. However, I cannot find your processed PTB dataset. Can you share it with a link?
Also, I am wondering how to inference with my own data. For example, how can I feed one sentence and get its tagging result?
I have just uploaded the ptb dataset on onedrive.
For inference, you may make a file like this (add dummy tags in the 7,8,9-th column) and follow the instruction:
1\tBut\t_\t_\t_\t_\t_\t0\troot\t0:root
2\tI\t_\t_\t_\t_\t_\t0\troot\t0:root
3\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root
4\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root
5\tlocation\t_\t_\t_\t_\t_\t0\troot\t0:root
6\twonderful\t_\t_\t_\t_\t_\t0\troot\t0:root
7\tand\t_\t_\t_\t_\t_\t0\troot\t0:root
7.1\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root
8\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root
9\tneighbors\t_\t_\t_\t_\t_\t0\troot\t0:root
10\tvery\t_\t_\t_\t_\t_\t0\troot\t0:root
11\tkind\t_\t_\t_\t_\t_\t0\troot\t0:root
12\t.\t_\t_\t_\t_\t_\t0\troot\t0:root
Hi Xinyu,
Thanks for uploading the data!
I created a folder named data
and put a train.tsv
file with the demo case you provide.
Run:
CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order
But still got an error:
2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified
2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu
2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu
2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu
Traceback (most recent call last):
File "train.py", line 85, in <module>
config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict)
File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__
self.corpus: ListCorpus=self.get_corpus
File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus
current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target)
File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__
train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True)
File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__
assert path_to_conll_file.exists()
AssertionError
Do you know how to fix that?
Have you checked whether the datasets is at the correct place?
I have just uploaded the ptb dataset on onedrive.
For inference, you may make a file like this (add dummy tags in the 7,8,9-th column) and follow the instruction:
1\tBut\t_\t_\t_\t_\t_\t0\troot\t0:root 2\tI\t_\t_\t_\t_\t_\t0\troot\t0:root 3\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root 4\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root 5\tlocation\t_\t_\t_\t_\t_\t0\troot\t0:root 6\twonderful\t_\t_\t_\t_\t_\t0\troot\t0:root 7\tand\t_\t_\t_\t_\t_\t0\troot\t0:root 7.1\tfound\t_\t_\t_\t_\t_\t0\troot\t0:root 8\tthe\t_\t_\t_\t_\t_\t0\troot\t0:root 9\tneighbors\t_\t_\t_\t_\t_\t0\troot\t0:root 10\tvery\t_\t_\t_\t_\t_\t0\troot\t0:root 11\tkind\t_\t_\t_\t_\t_\t0\troot\t0:root 12\t.\t_\t_\t_\t_\t_\t0\troot\t0:root
Hi Xinyu,
Is there something wrong with the data format provided?
i just find, the code token = Token(fields[1], head_id=int(fields[6])) shows me ValueError: invalid literal for int() with base 10: '_'.
So I guess the 0-th column is token id,
the 1-th column is token,
the 2,3,4,5-th column is "",
the 6-th column is 0, (dummy tags)
the 7-th column is "",
the 8-th column is "root", (dummy tags)
the 9-th column is "0:root", (dummy tags)
is that right?
Hi Xinyu,
Thanks for uploading the data!
I created a folder named
data
and put atrain.tsv
file with the demo case you provide.Run:
CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order
But still got an error:
2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified 2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu 2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu 2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu Traceback (most recent call last): File "train.py", line 85, in <module> config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict) File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__ self.corpus: ListCorpus=self.get_corpus File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target) File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__ train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True) File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__ assert path_to_conll_file.exists() AssertionError
Do you know how to fix that?
after I change the data format, I also face the same problem.
have you resolved it?
Hi Xinyu,
Thanks for uploading the data!
I created a folder nameddata
and put atrain.tsv
file with the demo case you provide.
Run:CUDA_VISIBLE_DEVICES=0 python train.py --config config/ptb_parsing_model.yaml --parse --target_dir data --keep_order
But still got an error:2022-09-07 02:59:16,391 Reading data from /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified 2022-09-07 02:59:16,391 Train: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu 2022-09-07 02:59:16,391 Test: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/test.conllu 2022-09-07 02:59:16,391 Dev: /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/dev.conllu Traceback (most recent call last): File "train.py", line 85, in <module> config = ConfigParser(config,all=args.all,zero_shot=args.zeroshot,other_shot=args.other,predict=args.predict) File "/projects/clio1/probing/ACE/flair/config_parser.py", line 63, in __init__ self.corpus: ListCorpus=self.get_corpus File "/projects/clio1/probing/ACE/flair/config_parser.py", line 329, in get_corpus current_dataset=getattr(datasets,corpus)(tag_to_bioes=self.target) File "/projects/clio1/probing/ACE/flair/datasets.py", line 360, in __init__ train = UniversalDependenciesDataset(data_folder/'train_modified.conllu', in_memory=in_memory, add_root=True) File "/projects/clio1/probing/ACE/flair/datasets.py", line 1006, in __init__ assert path_to_conll_file.exists() AssertionError
Do you know how to fix that?
after I change the data format, I also face the same problem. have you resolved it?
Have you ensured the path /home/yunxuan2/.flair/datasets/ptb_3.3.0_modified/train_modified.conllu
exist? If not, you may download the data above and put them at this path.
yes! I have done it! and I solve this problem, it also needs to have dev/test datasets in the target_dir.
But now I can parse the dataset with CPU(very slow), fail to run it with GPU set.
It shows me :
Traceback (most recent call last):
File "train.py", line 378, in
train_eval_result, train_loss = student.evaluate(loader,out_path=Path('outputs/train.'+'.'+tar_file_name+'.conllu'),embeddings_storage_mode="none",prediction_mode=True)
File "/DM_parser/ACE/flair/models/dependency_model.py", line 1174, in evaluate
arc_scores, rel_scores = self.forward(batch, prediction_mode=prediction_mode)
File "/DM_parser/ACE/flair/models/dependency_model.py", line 597, in forward
self.embeddings.embed(sentences,embedding_mask=self.selection)
File "/DM_parser/ACE/flair/embeddings.py", line 185, in embed
embedding.embed(sentences)
File "/DM_parser/ACE/flair/embeddings.py", line 97, in embed
self._add_embeddings_internal(sentences)
File "/DM_parser/ACE/flair/embeddings.py", line 2960, in _add_embeddings_internal
self._add_embeddings_to_sentences(sentences)
File "/DM_parser/ACE/flair/embeddings.py", line 3155, in _add_embeddings_to_sentences
sequence_output, pooled_output, hidden_states = self.model(input_ids, attention_mask=mask, inputs_embeds = inputs_embeds)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/transformers/modeling_bert.py", line 753, in forward
input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/transformers/modeling_roberta.py", line 68, in forward
input_ids, token_type_ids=token_type_ids, position_ids=position_ids, inputs_embeds=inputs_embeds
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/transformers/modeling_bert.py", line 178, in forward
inputs_embeds = self.word_embeddings(input_ids)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/modules/sparse.py", line 114, in forward
self.norm_type, self.scale_grad_by_freq, self.sparse)
File "/home/anaconda3/envs/ACE_parser/lib/python3.6/site-packages/torch/nn/functional.py", line 1484, in embedding
return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
RuntimeError: Expected object of device type cuda but got device type cpu for argument #1 'self' in call to _th_index_select
I try to set
sequence_output, pooled_output, hidden_states = self.model(input_ids, attention_mask=mask, inputs_embeds = inputs_embeds)
into
sequence_output, pooled_output, hidden_states = self.model(input_ids.cuda(), attention_mask=mask.cuda(), inputs_embeds = inputs_embeds)
it also shows me the same question.
T T,
You may try to uncomment these lines
Lines 226 to 238 in 7033e91