使用Scrapy + Selenium + ChromeDriver + Chrome 爬取NCBI Nature Science Cell CKB OncoKb等网站文献
docker build -t scrapy_paper:1.0.4 .
docker-compose -f docker-compose.yaml up -d
- ---[由于上述镜像都是linux环境构建的,所以在window构建及运行时,一定要用wsl模拟linux]
docker build -t scrapy_paper:1.0.4 .
docker-compose -f docker-compose.yaml up -d
- Deploying to a Scrapyd Server to control spider
add scrapyweb docker server with Dockerfile(web_Dockerfile)
docker build -t scrapy_deploy:1.0.0 .
docker pull chinaclark1203/scrapydweb:latest
docker-compose -f docker-compose.yaml up -d
默认deploy 开放端口6800, scrapyweb 开放端口5000, 通过5000端口即可访问爬虫管理页面,在可视化界面进行操作
add OncoKB database scrapy
docker build -t scrapy_deploy:1.0.1 .
# change docker-compose scrapy_deploy version to 1.0.1
docker-compose -f docker-compose.yaml up -d
# Warning: 请勿在setting文件中,添加print语句, 否则/api/listspiders/mySpider/, 会多显示print的内容