jackyjsy/SAM-SLR-v2

File name and directory structure for WLAS dataset preparation

enrico310786 opened this issue · 3 comments

Hi, I am interested in applying the SL-GCM algorithm to the WLASL dataset. I have not understood the steps to make and the folder structure in order to correctly use your code. First I downloaded the WLAS dataset following the instructions in https://github.com/dxli94/WLASL. Once processed with the preprocess.py python script I have all the videos in one folder named 'videos'. The name of each file is like univoque_id.mp4. At this point I should use the demo.py script in data-prepare/wholepose/ https://github.com/jackyjsy/data-prepare/tree/89b556b0cb49a5a401ed939e3977c101df912257/wholepose, but how should I rename the files? Should the sign made in the video appear in the name of the file? Moreover, Should I separate the files into a train, test and validation folder and apply the demo.py script to each separate folder?

Could you explain in more details the steps to be taken, how to structure the directory with the files, the files name and the commands to run?

Thanks

Thanks for your interests. WLASL dataset is hosted on YouTube. Many of them have been deleted. So when you download the dataset, you should have observed many error / warning message about that. Please follow #requesting-missing--pre-processed-videos to request preprocessed videos from them. After that you can use my data processing code to obtain the skeletons. For your convenience, I have uploaded the preprocessed WLASL skeletons at #data-preparation. Please check it out.

Best,
Songyao

Thank you for your answer. Yes, I noticed that some videos gave errors. In any case, I am interested in retracing the entire pipeline for the reconstruction of the skeleton because I would like to subsequently apply it to another different dataset.
How do you suggest to structure the directories and filenames to apply the demo.py script? Do I have to split train, val and test? In each of these directories do I have to create further subdirectories one for each sign or should the label of the sign be inserted in the file name?

Thanks for you suggestions!
Enrico

Here's my pipeline to generate the skeleton for WLASL-2000 dataset.

  1. Put all videos of WLASL-2000 dataset in a same folder.
  2. Use data-prepare/wholepose/demo.py to extract skeletons and save every video to separate .npy files into a npy/ folder.
  3. Read WLASL_v0.3.json. Use the official data split to generate train_labels.csv and val_labels.csv. Format is like "[video_name], [class_id]\n" for each line in the csv file. Here's the google drive links to the csv files I generated. [link]
  4. Use SAM-SLR-v2/SL-GCN/data_gen/sign_gendata.py to obtain a .npy files for 27-point skeleton data for all videos. Change --data_path to npy/ folder in Step 2, then change --label_path to the csv file in Step 3.

Hope it answers your questions.