Error in using build narrative model

Question

Error in using build narrative model

Closed this issue 2 years ago · 3 comments

Error in running the build narrative model as specified in the tutorial :

Code used is similar to the tutorial and works upto downloading the model:

Build narrative model

if run_narrative:

with open('labour_contracts/data/srl.pkl', 'rb') as f:
    file_load = pickle.load(f)

srl_res = file_load['srl']
sentences = file_load['sentences']

print(srl_res[0])
print(sentences[0])

print(len(sentences), len(srl_res))

narrative_model = build_narrative_model(
    srl_res=srl_res,
    sentences=sentences,
    embeddings_type="gensim_keyed_vectors", 
    embeddings_path="glove-wiki-gigaword-100",
    n_clusters=[[100]],
    top_n_entities=100,
    stop_words = spacy_stopwords,
    remove_n_letter_words = 1,
    progress_bar=True,
)

print(narrative_model['entities'].most_common()[:20])

with open('labour_contracts/data/model.pkl', "wb") as sent_file:
    pickle.dump({'model' : narrative_model}, sent_file, protocol=pickle.HIGHEST_PROTOCOL)

Full output trace:

length of srl :10573 length of sentences :10573
Processing SRL...
100%|██████████| 10573/10573 [00:00<00:00, 497496.90it/s]
Cleaning SRL...
100%|██████████| 10573/10573 [00:00<00:00, 59586.84it/s]
Computing role frequencies...
100%|██████████| 10573/10573 [00:00<00:00, 1281909.47it/s]
Mining named entities...
100%|██████████| 10573/10573 [00:31<00:00, 331.83it/s]
Mapping named entities...
100%|██████████| 10573/10573 [00:00<00:00, 1617182.42it/s]Loading embeddings model...
[==================================================] 100.0% 128.1/128.1MB downloaded

Traceback (most recent call last):
File "labour_contracts/run_relatio.py", line 83, in
narrative_model = build_narrative_model(
File "/cluster/work/lawecon/Projects/Siddhant_Ray/Scratch-LawEcon/venv/lib64/python3.8/site-packages/relatio/wrappers.py", line 340, in build_narrative_model
vecs = get_vectors(postproc_roles, model, used_roles=roles)
File "/cluster/work/lawecon/Projects/Siddhant_Ray/Scratch-LawEcon/venv/lib64/python3.8/site-packages/relatio/clustering.py", line 202, in get_vectors
vecs = np.concatenate(vecs)
File "<array_function internals>", line 5, in concatenate
ValueError: need at least one array to concatenate

Any ideas on why this maybe the case that all the returned vectors are probably empty even though the SRL works? Can it have something to do with the quality of the SRL?

Answer 1 · 2022-05-16T15:27:14.000Z

It could be that labor contracts use specific language (with words not part of the vocabulary of "glove-wiki-gigaword-100").

Did you try to use Universal Sentence Encoders instead? They will always provide you with an embedding.

You could also use Phrase-Bert or a spaCy model on the dev branch. It works, but it's still in development.

Answer 2 · 2022-05-16T15:28:41.000Z

I see, thanks for the tip. I'll try with USE first.

Answer 3 · 2022-05-16T16:05:10.000Z

Okay the issue was what you mentioned, due to words not being part of the vocabulary but the error was on my side, as accidentally the ids got loaded instead of the sentences, which I had saved after srl. Closing this issue now.