running time
tingchihc opened this issue · 7 comments
Hi,
now, I execute the step about Prepare the data to train the Agent.
Does this code need to execute very long time because now I spent one day to run this python file?
python download_youcookii_videos.py
thanks,
Hi, @ting-chih
Thanks for sharing this issue with us.
No, it should not take that long since the script downloads only a subset (110 videos) of the whole YouCook2 dataset (~2000 videos). It will depend on your connection speed, though. However, I just tried to run the script myself and I realized the videos are taking too long to be downloaded (more than 20 mins per video, ~60KiB/s). I'm not sure if there is something wrong with the YouTube API or the server. We'll investigate and get back to you shortly.
Thank you.
Hi, @washingtonsk8
Yes, this is my question. I download from 11am. to now.
Now I only have 25 files in training folder and some YouTube links can not use.
Hi, @ting-chih
I solved this issue temporally using another downloader API.
This is a fork of the original YouTube-DL which worked fine for me. To use it I took the following steps:
- Install -> pip install yt-dlp
- Replace line 31 in the download_youcookii_videos.py script file with the new system call, i.e., os.system(' '.join(("youtube-dl -o", vid_prefix, vid_url))) -> os.system(' '.join(("yt-dlp -o", vid_prefix, vid_url)))
Please tell us if that works for you.
PS.: Keep in mind that the downloaded videos' extensions may be different (webm instead of MP4). You may need to use some converter like FFMPEG if needed.
Hi, @washingtonsk8
thanks,
Now, I have new question in this python file line 10.
In /rl_fast_forward/resources/YouCook2/splits/
, I do not see test_list.txt.
Does this make sense?
Hi, @ting-chih
Great!
Thanks for the question. The reason we do not use a test_list.txt is that by the time of the paper submission there was no public test set with the recipe texts available. To get around that, we used the original validation set as a test set to report our results. We found the best set of hyperparameters (learning rate, epsilon, etc.) in a tiny set of training videos during preliminary experiments and we kept these hyperparameters fixed for the final experiments.
I hope it is clear now :-)
Hi, @washingtonsk8
thanks, I got it.
where should I add these 2 line codes to train model with main.py?
import nltk nltk.download('punkt')
Hi, @ting-chih
Open your python environment and run those commands first. It will download the necessary files to your python folder.
Then, you can run the "python main.py" command to train it.
Since the "running time" issue is solved, I'll close this issue.
If you have more questions or issues to report, please open another issue so that more people can be aware of it, ok?
Thanks!