Question about the rwthphoenix you have
ShesterG opened this issue · 4 comments
Hey @AmitMY
-
Are the videos passed in the form of (i) consecutive image frames (.jpeg) or (ii) as video itself(.mp4/.avi). https://github.com/sign-language-processing/datasets/blob/9daa7d94088f9af702dafd37[…]uage_datasets/datasets/rwth_phoenix2014_t/rwth_phoenix2014_t.py. I think it's (i) but I want you to confirm or not.
-
Is there a particular reason why there is the '[:-7]" at then end ? https://github.com/sign-language-processing/datasets/blob/9daa7d94088f9af702dafd37[…]uage_datasets/datasets/rwth_phoenix2014_t/rwth_phoenix2014_t.py
-
I couldnt find a notebook where the way you load your dataset is then fed to train a Sign Languge Translation model(e.g. https://github.com/neccam/slt or any sign translation model at all). Can you share one ?
THANK YOU SO MUCH
- if you say
load_video=True
, you would get a string to the video path. if you sayprocess_video=True
, you would get a tensor of the images data (frames * width * height * 3) - That is a string fro a the files path. As it is, it has 7 useless characters for us, so we remove them, before reading the directory
- here you would find models using the datasets, for example
(and additionally, note, the repository you linked does not getframes
, it gets a vector representing these images. If you want to work with Phoneix (also, not recommended), you should use whatever preprocessing they do there)
Copying some of my comments from Slack here as well, for posterity:
- there is no notebook that shows how to feed data into https://github.com/neccam/slt. For this particular model, the paper and author (Cihan Camgöz) do not give enough information to explain how the data is processed. Some people have tried to reproduce this model, but as far as I know nobody succeeded.
- here are a couple of our own repos that use the datasets library:
https://github.com/bricksdont/easier-gloss-translation, see e.g. https://github.com/bricksdont/easier-gloss-translation/blob/main/scripts/download/extract_uhh.py#L175. This is for gloss translation models - here is an example of loading the DGS corpus for a model that uses videos: https://gist.github.com/bricksdont/f950a8319ab662dfea08d25b261c3cff
- a model that does sign language segmentation (= not translation): https://github.com/bricksdont/sign-segmentation
sign language detection: https://github.com/google-research/google-research/tree/master/sign_language_detection
(none of these models use the Phoenix dataset, because, as Amit mentioned, we don't recommend it)
@ShesterG did we answer your questions for now / can this issue be closed?
sure.