Long-form audio speaker diarization OOM in clustering
remenberl opened this issue · 1 comments
remenberl commented
Hi,
Thanks for the recent development of long-form audio speaker diarization in NVIDIA/NeMo#7737. Recently I encounter a 4-hour-long audio and observe OOM on RAM (not VRAM).
It happens after screen prints the last iteration of "Extracting embeddings for Diarization" and the program consumes more than 64GB memory when I observe job getting killed. FYI,
[NeMo I 2023-11-19 20:54:29 clustering_diarizer:343] Extracting embeddings for Diarization
[NeMo I 2023-11-19 20:54:29 collections:445] Filtered duration for loading collection is 0.00 hours.
[NeMo I 2023-11-19 20:54:29 collections:446] Dataset loaded with 52949 items, total duration of 7.25 hours.
[NeMo I 2023-11-19 20:54:29 collections:448] # 52949 files loaded accounting to # 1 labels
My telephonic config file:
clustering:
parameters:
oracle_num_speakers: False
max_num_speakers: 8
enhanced_count_thres: 80
max_rp_threshold: 0.25
sparse_search_volume: 30
maj_vote_spk_count: False
chunk_cluster_count: 50
embeddings_per_chunk: 10000
msdd_model:
model_path: diar_msdd_telephonic
parameters:
use_speaker_model_from_ckpt: True
infer_batch_size: 25
sigmoid_threshold: [0.7]
seq_eval_mode: False
split_infer: True
diar_window_length: 50
overlap_infer_spk_limit: 5
remenberl commented
Submitted to the wrong repo.