/aricle-crawling

Primary LanguageJavaApache License 2.0Apache-2.0

🍵JAVA定时爬取网易新闻 🍵

🥦爬取案例-网易新闻🥦

温馨提示: 爬取网易新闻全部资源 需要二次爬取(爬第一次的链接后再爬取获取作者 时间 图片) 对于网易导入的外部资源链接(无法处理) 爬取少量信息

爬取成功案例

2022-09-28 20:07:33.491  INFO 5704 --- [main] c.demo.article.ArticleApplicationTests   : Started ArticleApplicationTests in 3.206 seconds (JVM running for 4.396)
2022-09-28 20:07:38.309  INFO 5704 --- [main] com.demo.article.utils.HtmlParseUtil     : 文章Article(pkId=null, articleName=乌鲁木齐航空机长基里尔尔:**是我的“第一个家”, articleAuthor=人民网-人民视频, gmtCreate=2022-09-28T11:17:12, articleUrl=https://www.163.com/news/article/HIBLUP8A000189FH.html, articleShowPic=https://static.ws.126.net/163/f2e/product/post_nodejs/static/logo.png)
2022-09-28 20:07:38.309  INFO 5704 --- [main] com.demo.article.utils.HtmlParseUtil     : 文章Article(pkId=null, articleName=精神文明建设:为中华民族伟大复兴注入不竭动力, articleAuthor=光明网, gmtCreate=2022-09-28T12:07:41, articleUrl=https://www.163.com/news/article/HIBOR6P3000189FH.html, articleShowPic=https://nimg.ws.126.net/?url=http%3A%2F%2Fcms-bucket.ws.126.net%2F2022%2F0928%2F054a535bj00riwjgg000sc000b4007ec.jpg&thumbnail=660x2147483647&quality=80&type=jpg)
2022-09-28 20:07:38.309  INFO 5704 --- [main] com.demo.article.utils.HtmlParseUtil     : 文章Article(pkId=null, articleName=外媒看**人太空漫步名场面:“创造历史”, articleAuthor=海外网 , gmtCreate=2022-09-27T22:17:13, articleUrl=https://www.163.com/dy/article/HIA9AIH90514R9L4.html, articleShowPic=https://nimg.ws.126.net/?url=http%3A%2F%2Fdingyue.ws.126.net%2F2022%2F0927%2Ff386d2e0j00riv993000qc000dc007ig.jpg&thumbnail=660x2147483647&quality=80&type=jpg)

🥬项目结构目录🥬

🌰数据库表结构🌰