Question about the rwthphoenix you have

Question

ShesterG opened this issue 2 years ago · 4 comments

Are the videos passed in the form of (i) consecutive image frames (.jpeg) or (ii) as video itself(.mp4/.avi). https://github.com/sign-language-processing/datasets/blob/9daa7d94088f9af702dafd37[…]uage_datasets/datasets/rwth_phoenix2014_t/rwth_phoenix2014_t.py. I think it's (i) but I want you to confirm or not.
Is there a particular reason why there is the '[:-7]" at then end ? https://github.com/sign-language-processing/datasets/blob/9daa7d94088f9af702dafd37[…]uage_datasets/datasets/rwth_phoenix2014_t/rwth_phoenix2014_t.py
I couldnt find a notebook where the way you load your dataset is then fed to train a Sign Languge Translation model(e.g. https://github.com/neccam/slt or any sign translation model at all). Can you share one ?

THANK YOU SO MUCH

sure.

Answer 1 · 2022-09-14T14:41:45.000Z

if you say load_video=True, you would get a string to the video path. if you say process_video=True, you would get a tensor of the images data (frames * width * height * 3)
That is a string fro a the files path. As it is, it has 7 useless characters for us, so we remove them, before reading the directory
here you would find models using the datasets, for example
(and additionally, note, the repository you linked does not get frames, it gets a vector representing these images. If you want to work with Phoneix (also, not recommended), you should use whatever preprocessing they do there)

Answer 2 · 2022-09-15T06:39:49.000Z

Copying some of my comments from Slack here as well, for posterity:

there is no notebook that shows how to feed data into https://github.com/neccam/slt. For this particular model, the paper and author (Cihan Camgöz) do not give enough information to explain how the data is processed. Some people have tried to reproduce this model, but as far as I know nobody succeeded.
here are a couple of our own repos that use the datasets library:
https://github.com/bricksdont/easier-gloss-translation, see e.g. https://github.com/bricksdont/easier-gloss-translation/blob/main/scripts/download/extract_uhh.py#L175. This is for gloss translation models
here is an example of loading the DGS corpus for a model that uses videos: https://gist.github.com/bricksdont/f950a8319ab662dfea08d25b261c3cff
a model that does sign language segmentation (= not translation): https://github.com/bricksdont/sign-segmentation
sign language detection: https://github.com/google-research/google-research/tree/master/sign_language_detection

(none of these models use the Phoenix dataset, because, as Amit mentioned, we don't recommend it)

Answer 3 · 2022-09-20T15:50:32.000Z

@ShesterG did we answer your questions for now / can this issue be closed?