piotrkawa/deepfake-whisper-features

when im running the program shows some error like this

Opened this issue · 4 comments

python train_and_test.py --asv_path ./datasets/ASVspoof2021_DF_eval/ --in_the_wild_path ./datasets/release_in_the_wild --config configs/finetuning/whisper_frontend_mesonet.yaml --batch_size 8 --epochs 5  --train_amount 100000 --valid_amount 25000

/home/khalid/anaconda3/lib/python3.11/site-packages/torchaudio/functional/functional.py:584: UserWarning: At least one mel filterbank has all zero values. The value for `n_mels` (128) may be set too high. Or, the value for `n_freqs` (257) may be set too low.
  warnings.warn(
2023-11-28 16:41:07,747 - INFO - Loading data...
Traceback (most recent call last):
  File "/home/khalid/Desktop/temp/deepfake-whisper-features/train_and_test.py", line 125, in <module>
    evaluation_config_path, model_path = train_models.train_nn(
                                         ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khalid/Desktop/temp/deepfake-whisper-features/train_models.py", line 65, in train_nn
    data_train, data_test = get_datasets(
                            ^^^^^^^^^^^^^
  File "/home/khalid/Desktop/temp/deepfake-whisper-features/train_models.py", line 31, in get_datasets
    data_train = DetectionDataset(
                 ^^^^^^^^^^^^^^^^^
  File "/home/khalid/Desktop/temp/deepfake-whisper-features/src/datasets/detection_dataset.py", line 38, in __init__
    datasets = self._init_datasets(
               ^^^^^^^^^^^^^^^^^^^^
  File "/home/khalid/Desktop/temp/deepfake-whisper-features/src/datasets/detection_dataset.py", line 70, in _init_datasets
    asvspoof_dataset = DeepFakeASVSpoofDataset(asvspoof_path, subset=subset)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khalid/Desktop/temp/deepfake-whisper-features/src/datasets/deepfake_asvspoof_dataset.py", line 29, in __init__
    self.samples = self.read_protocol()
                   ^^^^^^^^^^^^^^^^^^^^
  File "/home/khalid/Desktop/temp/deepfake-whisper-features/src/datasets/deepfake_asvspoof_dataset.py", line 67, in read_protocol
    samples = self.add_line_to_samples(samples, line)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/khalid/Desktop/temp/deepfake-whisper-features/src/datasets/deepfake_asvspoof_dataset.py", line 81, in add_line_to_samples
    sample_path = self.flac_paths[sample_name]
                  ~~~~~~~~~~~~~~~^^^^^^^^^^^^^
KeyError: 'DF_E_2015779'

but when im searching for these file in dataset file exists

find ./datasets/ASVspoof2021_DF_eval/flac -name 'DF_E_2015779.flac' -type f
./datasets/ASVspoof2021_DF_eval/flac/DF_E_2015779.flac

Hi!
Do you have all parts of eval DF subset and use this keys & metadata file?

yes i downloaded all parts of dataset, its around 60k+ files, and also i downloaded keys & metadata file also.

for better understanding im sharing ASVspoof2021_DF_eval directory contents

ls -al
total 32388
drwxrwxr-x 4 khalid khalid     4096 Nov 28 17:34 .
drwxrwxr-x 4 khalid khalid     4096 Nov 28 14:41 ..
-rw-r--r-- 1 khalid khalid  7953777 May 27  2021 ASVspoof2021.DF.cm.eval.trl.txt
drwxrwxr-x 2 khalid khalid 25182208 Nov 28 17:34 flac
drwxr-xr-x 3 khalid khalid     4096 Dec  1  2021 keys
-rw-r--r-- 1 khalid khalid     2374 May 28  2021 LICENSE.DF.txt
-rw-r--r-- 1 khalid khalid     2227 May 28  2021 README.DF.txt
-rw-rw-r-- 1 khalid khalid      410 Nov 28 15:10 remove_those_which_not_exist.py

remove_those_which not exist is a script in python i try to see which files are actually not exist

import os

def is_file_exists(file_path):
    return os.path.exists(file_path) and os.path.isfile(file_path)


files = open('keys/CM/trial_metadata.txt', "r")

data = files.readlines()

files.close()

for idx, line in enumerate(data):
    data[idx] = line.split('- ')[1].strip()

for line in data:
    if not is_file_exists("flac/" + line + ".flac"):
        print(line)
    else:
        print("OK: " + line)

I recreated the preparations of the dataset, please check if you follow all of them exactly - the following structure works.

  1. Download all tar files from here (I downloaded them earlier, but I checked and md5sums match).
  2. Download keys and metadata - https://www.asvspoof.org/resources/DF-keys-stage-1.tar.gz
  3. Untar all archives and move them to single directory e.g. DF.
  4. Move keys dir from 2) to DF dir.
  5. The final dataset structure should look as follows:
/path/to/DF# tree -L 3
.
├── ASVspoof2021_DF_eval_part00
│   └── ASVspoof2021_DF_eval
│       ├── ASVspoof2021.DF.cm.eval.trl.txt
│       ├── LICENSE.DF.txt
│       ├── README.DF.txt
│       └── flac
├── ASVspoof2021_DF_eval_part01
│   └── ASVspoof2021_DF_eval
│       └── flac
├── ASVspoof2021_DF_eval_part02
│   └── ASVspoof2021_DF_eval
│       └── flac
├── ASVspoof2021_DF_eval_part03
│   └── ASVspoof2021_DF_eval
│       └── flac
└── keys
    └── CM
        └── trial_metadata.txt
  1. Run script by providing ASV path like this - --asv_path /path/to/DF