sign-language-processing/datasets

Question about the rwthphoenix you have

ShesterG opened this issue · 4 comments

Hey @AmitMY

  1. Are the videos passed in the form of (i) consecutive image frames (.jpeg) or (ii) as video itself(.mp4/.avi). https://github.com/sign-language-processing/datasets/blob/9daa7d94088f9af702dafd37[…]uage_datasets/datasets/rwth_phoenix2014_t/rwth_phoenix2014_t.py. I think it's (i) but I want you to confirm or not.

  2. Is there a particular reason why there is the '[:-7]" at then end ? https://github.com/sign-language-processing/datasets/blob/9daa7d94088f9af702dafd37[…]uage_datasets/datasets/rwth_phoenix2014_t/rwth_phoenix2014_t.py

  3. I couldnt find a notebook where the way you load your dataset is then fed to train a Sign Languge Translation model(e.g. https://github.com/neccam/slt or any sign translation model at all). Can you share one ?

THANK YOU SO MUCH

  1. if you say load_video=True, you would get a string to the video path. if you say process_video=True, you would get a tensor of the images data (frames * width * height * 3)
  2. That is a string fro a the files path. As it is, it has 7 useless characters for us, so we remove them, before reading the directory
  3. here you would find models using the datasets, for example
    (and additionally, note, the repository you linked does not get frames, it gets a vector representing these images. If you want to work with Phoneix (also, not recommended), you should use whatever preprocessing they do there)

Copying some of my comments from Slack here as well, for posterity:

(none of these models use the Phoenix dataset, because, as Amit mentioned, we don't recommend it)

@ShesterG did we answer your questions for now / can this issue be closed?

sure.