/TiebaCrawler

A web crawler written in python to crawl information from posts on tieba.baidu.com

Primary LanguagePython

TiebaCrawler

A web crawler written in python to crawl information from posts on tieba.baidu.com.

Recommended dependency

Platform Version
Windows 2.7.6
Linux 2.7.3

For Windows Users

  • Python 2.7.6 Downloads
  • To learn how to run a python program on Windows, click here.

Default Settings (stored in config.py)

Name Description
Output encoding UTF-8
Export directory export/
Output format CSV
Delimiter Vertical line

Execution

$ python main.py [-v] [-h] url

Help

$ python main.py -h
$ python main.py --help

Troubleshooting

main.py: error: too few arguments

Solution: This is because the program requires one URL as input to start the crawling.

Output file is all garbage code, for example many question marks. Its solution is here.

Solution: You need to import the CSV and adjust according to the default settings.

This program only works for posts from tieba.baidu.com. URL from other website may cause an exception.