sign-language-processing/spoken-to-signed-translation

Dowloanding error lexicon

Closed this issue · 11 comments

Mazsy commented

When downloading the lexicon folder, which is very heavy, when it reaches 100%, I get this error. And it doesn't download anything

Generating splits...: 100%|██████████| 1/1 [09:50<00:00, 590.06s/ splits]�[A

                                                                         �[A2023-08-15 20:07:58.238234: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
to_json_content {'shape': [None, 1, 543, 3], 'encoding_format': 'pose', 'include_path': True}
to_json_content {'shape': [None, 1, 543, 3], 'encoding_format': 'pose', 'include_path': True}
�[1mDataset sign_suisse downloaded and prepared to /root/tensorflow_ddatasets/sign_suisse/2023-08-15/1.0.0. Subsequent calls will reuse this data.�[0m


  0%|          | 0/18222 [00:00<?, ?it/s]�[A
0it [09:54, ?it/s]
Traceback (most recent call last):
  File "/usr/local/bin/download_lexicon", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/dist-packages/spoken_to_signed/download_lexicon.py", line 114, in main
    add_data(data, args.directory)
  File "/usr/local/lib/python3.8/dist-packages/spoken_to_signed/download_lexicon.py", line 102, in add_data
    writer.writerow([row[key] for key in LEXICON_INDEX])
  File "/usr/local/lib/python3.8/dist-packages/spoken_to_signed/download_lexicon.py", line 102, in <listcomp>
    writer.writerow([row[key] for key in LEXICON_INDEX])
KeyError: 'start'

  0%|          | 0/18222 [00:00<?, ?it/s]
AmitMY commented

Fixed in: d192f34#diff-a7423a03eeaaeeb2fcb6188cbb93540c7b34e8f8b2877fc9a6d17139500ce24bR71-R72

start and end where added so that we can refer to multiple signs in the same pose file.
For the case of signsuisse, they are inconsequential. For other cases coming soon, they are important.

Thank you for bringing this up. Do you mind checking now? (it should be reasonably fast)

Mazsy commented

Thank you very much, I'll leave it running until tomorrow. the lexicon folder is heavy. I will tell you Tomorrow

AmitMY commented

Looking forward to your update.

So just so it is clear: the reason the directory is very heavy is that for the purpose of this repo, we store all pose files locally (which are 30GB+ for signsuisse).

Optimization in the pose file size (of about 75%) are possible, specifically:

  • store as float16 instead of float32
  • store only the relevant keypoints, since we filter them later anyway

Or it is also possible to direct them to a cloud bucket path and then you don't store everything locally, but whenever you need a file you download it automatically.

Mazsy commented

I understand. Three questions:
1- It keeps giving me an error. This error:

/root/tensorflow_datasets/downloads/nlp.biu.ac.il_amit_datas_poses_holis_signsrml885jM9_EQzoasyvGrp5TS0CdIaiyG_DRWgQ_2dSE.tar to root/tensorflow_datasets/downloads/extracted/TAR.nlp.biu.ac.il_amit_datas_poses_holis_signsrml885jM9_EQzoasyvGrp5TS0CdIaiyG_DRWgQ_2dSE.tar: unexpected end of data 

2- and if you upload it to the google drive? I think that since the folder is large, the server where it is cannot send it.
3- regarding the pose to video. Is it a pending task?

AmitMY commented
  1. Does the download finish, but you think the file is corrupted?
  2. For now the tar file will remain at https://nlp.biu.ac.il/~amit/datasets/poses/holistic/signsuisse.tar - you can try to download it separately, or with faster internet (cafe/university etc)
  3. yes, but a very low-priority one
Mazsy commented
  1. Thank you very much for your contribution. It downloads only the index.csv file for me, reaches 35% and stops.
  2. I will try to download from the link you gave me.
  3. As seen in the article, they tried to use pix2pix, but they don't specify how, nor the results. Still, it's a good project.
AmitMY commented

Re-3
This is the pix-to-pix model used
https://github.com/sign-language-processing/everybody-sign-now
But it is not very good.
(you can try it in https://sign.mt/?sil=ch&spl=de&text=kleine%20kinder%20essen%20pizza if on the bottom right, you choose "person" and not "skeleton")
image

Ideally, we would first improve the model, then include it here. I see no huge reason to do all of the implementation work if the output is this bad

when i download_lexicon --name signsuisse --directory lexicon

WARNING:urllib3.connectionpool:Retrying (Retry(total=9, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001F44BEC3F50>, 'Connection to nlp.biu.ac.il timed out. (connect timeout=None)')': /~amit/datasets/poses/holistic/signsuisse.tar file [00:00, ? file/s]

when I do in colab:
Extraction completed...: 0 file [00:00, ? file/s]WARNING:urllib3.connectionpool:Retrying (Retry(total=9, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7de5c006c040>, 'Connection to nlp.biu.ac.il timed out. (connect timeout=None)')': /~amit/datasets/poses/holistic/signsuisse.tar

AmitMY commented

At the moment, unfortunately, the download code only worksin Israel (you can VPN there).
This is due to the university blocking any traffic coming from abroad because of the war.

Ideally, we would set up mirror servers or other options, but for now you can only VPN to Israel or wait until the university unblocks traffic

Hello @AmitMY. I downloaded the dataset of lexicons from the link that you provided.

  1. Does the download finish, but you think the file is corrupted?
  2. For now the tar file will remain at https://nlp.biu.ac.il/~amit/datasets/poses/holistic/signsuisse.tar - you can try to download it separately, or with faster internet (cafe/university etc)
  3. yes, but a very low-priority one

Shouldn't be a index.csv in this file aswell? I have the poses but i don't know how to map them. After extracting the file i have aprox. 36.000 poses that are named like this "ss0000c2df785c4d4ad30cec403a503d4c.pose".

The script creates the index csv.