bilibili space video crawler
usage(support windows only for now):
general concept of the command line:
python bilibili.py (mode options) (keywords options) parameters
the concept of the mode options: 1)uod : url->output->download 2)uo : url->output (default) 3)od : output->download
command line format:
python bilibili.py [-[u][o][d]] [-k:file] parameter 1 parameter 2 .... parameter n
When you use the -uo or -uod, there will be a folder and csv file created named by "mid" from bilibili. It will provide you an option to view the list of videos as well as continue to download again if needed (e.g. when download failed) You can combine multiple csv files generated by bilibili.py together and download it. Mutiple folders will be created if it contains multiple "mid".
The bilibili.py will create 20 hidden python subprocess behind and start to download the videos. It might consume your network significantly (usually above 80% - 90%). 20 is hardcoded, you can change it from the source code.
When download error, a csv file will be created at same folder of bilibili.py, to allow you re-run the download. (ref example 3)
examples:
- download all videos from https://space.bilibili.com/30652169/video
python bilibili.py -uod https://space.bilibili.com/30652169/video and you can apply keywords options to filter the video titles: python bilibili.py -uod -k:keywords.txt https://space.bilibili.com/30652169/video
- get all video information output to a file from https://space.bilibili.com/30652169/video
python bilibili.py -uo https://space.bilibili.com/30652169/video or python bilibili.py https://space.bilibili.com/30652169/video since the -uo is the default mode. and you can apply keywords options to filter the video titles: python bilibili.py -k:keywords.txt https://space.bilibili.com/30652169/video
- download videos in the specified video information file
python bilibili.py -od 30652169.csv 33432429.csv and you can apply keywords options to filter the video titles: python bilibili.py -od -k:keywords.txt 30652169.csv 33432429.csv
when proceed downloading tasks, each download of video will be put into a separated process by executing the following command line:
python download url, title, index the url and title are base64 urlsafe encoded, removed ending "="s
acceptable bilibili url : https://space.bilibili.com/30652169/video the number : 30652169 is called mid and will be used as a folder name to store all related files we can accept http or https
notice: by default, if you do not specify the keyword file, keyword.txt will be used. if you want to ignore the keywords, use "-k:", which will trigger a warning but no impact and disabled the keywords filter.