A web crawler written in python to crawl information from posts on tieba.baidu.com.
Platform | Version |
---|---|
Windows | 2.7.6 |
Linux | 2.7.3 |
Name | Description |
---|---|
Output encoding | UTF-8 |
Export directory | export/ |
Output format | CSV |
Delimiter | Vertical line |
$ python main.py [-v] [-h] url
$ python main.py -h
$ python main.py --help
main.py: error: too few arguments
Solution: This is because the program requires one URL as input to start the crawling.
Output file is all garbage code, for example many question marks. Its solution is here.
Solution: You need to import the CSV and adjust according to the default settings.
This program only works for posts from tieba.baidu.com. URL from other website may cause an exception.