This unofficial script is used for downloading Noah-Wukong Dataset. Please pay attention that this is not the official script. If you want to use the dataset for any purpose, please follow the community/usage rules in Noah-Wukong Dataset.
- Download Noah-Wukong Dataset meta-data, and replace the wukong_release folder
- Run script:
$ python script.py
- Adjust
MAX_CPU
variables in the script, according to network and hardware - Check
OUT_FOLDER
value is consistent with folder setup [Default is great]
- Check
wget
command works - Check
python3
is installed - Check
pandas
andtqdm
python packages are installed
Around 1%
of urls were broken!
Create a new issue, welcome!