国外新闻网站爬虫,并存储至Excel中
- 支持界面操作
- 支持时间段过滤
- 支持多个关键词
- 标题和正文附带谷歌机翻
- 多线程并发加速
- 华尔街日报:https://www.wsj.com/
- 福克斯新闻:https://www.foxnews.com/
- CNN:https://edition.cnn.com/
- BBC:https://www.bbc.com/
- Olympic World:https://www.olympic.org/
- Olympic Tokyo:https://tokyo2020.org/en/
- 国会山报:https://thehill.com/
- Politico:https://www.politico.com/
win10 + python 3.9 + requests + newpaper + selenium + pyside2 + beautifulsoup4 + jsonpath
-
关键包版本: requests 2.25.1 + urllib3 1.25.8 + selenium 3.141.0 + chromedriver.exe 90.0.4430 + beautifulsoup4 4.9.3 + openpyxl 3.0.7 + PySide2 5.15.2 + newspaper3k 0.2.8 + jsonpath 0.82
-
Chrome 浏览器驱动下载地址: http://npm.taobao.org/mirrors/chromedriver/