crawler base on reids-scrapy, can help you get Novel from other site and covert txt to epub.
All settings are in NovelBK/settings.py
.
REDIS_HOST
: redis server ipREDIS_PORT
: redis server portREDIS_DATA_DICT
: this can help you filter the url has been seenDB_NAME
: mongoDB nameDB_HOST
: mongoDB server ipDB_PORT
: mongoDB server portDB_USER
: mongoDB user nameDB_PWD
: mongoDB passwordWENKU8_MAX_AID
: the nax book id will be crawledDEBUG_MOD
: debug mod, it will allow get same url
if want to run scrapy in python3, it must be in virtualen
sudo apt install python3-venv
python3 -m venv ~/venv/NovelBK
source ~/venv/NovelBK/bin/activate
pip install -r requirements.txt
setting important information in environment variables, you can set in every time when reboot, or just write in .bashrc
export DB_USER=<your mongodb user name>
export DB_PWD=<your mongodb password>
run redis server.
docker-compose up -d -f redis-compose.yml
run mongoDB server.
docker-compose up -d -f mongo-compose.yml
run wenku8 slave, it will help u crawler book content.
scrapy runspider NovelBK/spiders/slave_wenku8.py
run master, it will provide the slave spider url.
scrapy runspider NovelBK/spiders/master.py