Dowloanding error lexicon
Closed this issue · 11 comments
When downloading the lexicon folder, which is very heavy, when it reaches 100%, I get this error. And it doesn't download anything
Generating splits...: 100%|██████████| 1/1 [09:50<00:00, 590.06s/ splits]�[A
�[A2023-08-15 20:07:58.238234: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1956] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
to_json_content {'shape': [None, 1, 543, 3], 'encoding_format': 'pose', 'include_path': True}
to_json_content {'shape': [None, 1, 543, 3], 'encoding_format': 'pose', 'include_path': True}
�[1mDataset sign_suisse downloaded and prepared to /root/tensorflow_ddatasets/sign_suisse/2023-08-15/1.0.0. Subsequent calls will reuse this data.�[0m
0%| | 0/18222 [00:00<?, ?it/s]�[A
0it [09:54, ?it/s]
Traceback (most recent call last):
File "/usr/local/bin/download_lexicon", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/spoken_to_signed/download_lexicon.py", line 114, in main
add_data(data, args.directory)
File "/usr/local/lib/python3.8/dist-packages/spoken_to_signed/download_lexicon.py", line 102, in add_data
writer.writerow([row[key] for key in LEXICON_INDEX])
File "/usr/local/lib/python3.8/dist-packages/spoken_to_signed/download_lexicon.py", line 102, in <listcomp>
writer.writerow([row[key] for key in LEXICON_INDEX])
KeyError: 'start'
0%| | 0/18222 [00:00<?, ?it/s]
Fixed in: d192f34#diff-a7423a03eeaaeeb2fcb6188cbb93540c7b34e8f8b2877fc9a6d17139500ce24bR71-R72
start
and end
where added so that we can refer to multiple signs in the same pose file.
For the case of signsuisse
, they are inconsequential. For other cases coming soon, they are important.
Thank you for bringing this up. Do you mind checking now? (it should be reasonably fast)
Thank you very much, I'll leave it running until tomorrow. the lexicon folder is heavy. I will tell you Tomorrow
Looking forward to your update.
So just so it is clear: the reason the directory is very heavy is that for the purpose of this repo, we store all pose files locally (which are 30GB+ for signsuisse).
Optimization in the pose file size (of about 75%) are possible, specifically:
- store as float16 instead of float32
- store only the relevant keypoints, since we filter them later anyway
Or it is also possible to direct them to a cloud bucket path and then you don't store everything locally, but whenever you need a file you download it automatically.
I understand. Three questions:
1- It keeps giving me an error. This error:
/root/tensorflow_datasets/downloads/nlp.biu.ac.il_amit_datas_poses_holis_signsrml885jM9_EQzoasyvGrp5TS0CdIaiyG_DRWgQ_2dSE.tar to root/tensorflow_datasets/downloads/extracted/TAR.nlp.biu.ac.il_amit_datas_poses_holis_signsrml885jM9_EQzoasyvGrp5TS0CdIaiyG_DRWgQ_2dSE.tar: unexpected end of data
2- and if you upload it to the google drive? I think that since the folder is large, the server where it is cannot send it.
3- regarding the pose to video. Is it a pending task?
- Does the download finish, but you think the file is corrupted?
- For now the tar file will remain at https://nlp.biu.ac.il/~amit/datasets/poses/holistic/signsuisse.tar - you can try to download it separately, or with faster internet (cafe/university etc)
- yes, but a very low-priority one
- Thank you very much for your contribution. It downloads only the index.csv file for me, reaches 35% and stops.
- I will try to download from the link you gave me.
- As seen in the article, they tried to use pix2pix, but they don't specify how, nor the results. Still, it's a good project.
Re-3
This is the pix-to-pix model used
https://github.com/sign-language-processing/everybody-sign-now
But it is not very good.
(you can try it in https://sign.mt/?sil=ch&spl=de&text=kleine%20kinder%20essen%20pizza if on the bottom right, you choose "person" and not "skeleton")
Ideally, we would first improve the model, then include it here. I see no huge reason to do all of the implementation work if the output is this bad
when i download_lexicon --name signsuisse --directory lexicon
WARNING:urllib3.connectionpool:Retrying (Retry(total=9, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001F44BEC3F50>, 'Connection to nlp.biu.ac.il timed out. (connect timeout=None)')': /~amit/datasets/poses/holistic/signsuisse.tar file [00:00, ? file/s]
when I do in colab:
Extraction completed...: 0 file [00:00, ? file/s]WARNING:urllib3.connectionpool:Retrying (Retry(total=9, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x7de5c006c040>, 'Connection to nlp.biu.ac.il timed out. (connect timeout=None)')': /~amit/datasets/poses/holistic/signsuisse.tar
At the moment, unfortunately, the download code only worksin Israel (you can VPN there).
This is due to the university blocking any traffic coming from abroad because of the war.
Ideally, we would set up mirror servers or other options, but for now you can only VPN to Israel or wait until the university unblocks traffic
Hello @AmitMY. I downloaded the dataset of lexicons from the link that you provided.
- Does the download finish, but you think the file is corrupted?
- For now the tar file will remain at https://nlp.biu.ac.il/~amit/datasets/poses/holistic/signsuisse.tar - you can try to download it separately, or with faster internet (cafe/university etc)
- yes, but a very low-priority one
Shouldn't be a index.csv in this file aswell? I have the poses but i don't know how to map them. After extracting the file i have aprox. 36.000 poses that are named like this "ss0000c2df785c4d4ad30cec403a503d4c.pose".
The script creates the index csv.