aimclub/AutoTM

KeyError: ('_SERVICE_', 'total_pairs_count')

Closed this issue · 3 comments

Lameus commented

I got this type of error during Stage 1 of the full_pipeline_example.py with the analysis of the custom dataset with columns [level_0, index, idxs,peer_id,message_date,message,text] text is in Russian and preprocessed. Pipeline creates "ppp.csv" but not "dataset_processed.csv."

Lameus commented

More detailed error log:
Stage 1: Dataset preparation
Saved to ../data/processed_sample_corpora/ppp.csv
Starting...
part 1/1
batches ../data/processed_sample_corpora/batches
vocabulary ../data/processed_sample_corpora/test_set_data_voc.txt
are ready
E0614 11:48:22.641057 13605 dictionary_operations.cc:381] Error at line 1, file ../data/processed_sample_corpora/test_set_data_voc.txt. Expected format: [<class_id>], dictionary will be gathered in random token order
multiprocessing.pool.RemoteTraceback:
"""
Traceback (most recent call last):
File "./miniconda3/envs/name/lib/python3.9/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "./miniconda3/envs/name/lib/python3.9/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "./AutoTM/src/autotm/preprocessing/dictionaries_preparation.py", line 81, in _calculate_cooc_tf_dict
cooc_tf_dict[RESERVED_TUPLE] += 2
KeyError: ('SERVICE', 'total_pairs_count')
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "./AutoTM/src/full_pipeline_example.py", line 48, in
prepare_all_artifacts(SAVE_PATH)
File "./AutoTM/src/autotm/preprocessing/dictionaries_preparation.py", line 324, in prepare_all_artifacts
prepearing_cooc_dict(
File "./AutoTM/src/autotm/preprocessing/dictionaries_preparation.py", line 194, in prepearing_cooc_dict
df_dicts, tf_dicts = calculate_cooc_dicts(data, n_cores=n_cores, window=cooc_window)
File "./AutoTM/src/autotm/preprocessing/dictionaries_preparation.py", line 114, in calculate_cooc_dicts
cooc_tf_dict = parallelize_dataframe(
File "./AutoTM/src/autotm/utils.py", line 89, in parallelize_dataframe
map_res = pool.map(func_with_args, df_split)
File "./miniconda3/envs/name/lib/python3.9/multiprocessing/pool.py", line 364, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "./miniconda3/envs/name/lib/python3.9/multiprocessing/pool.py", line 771, in get
raise self._value
KeyError: ('SERVICE', 'total_pairs_count')

ngc436 commented

Fixed problem by correcting dictionary update in _calculate_cooc_tf_dict function

ngc436 commented

Closing the issue