Cooperating with NTHU CTM (科技管理研究所) to collect news from top 3 news company in Taiwan
- 自由時報
- 中國時報
- 聯合新聞網
This is a news crawler for 3 Taiwanese mainstream media. The crawlered media is listed below.
Media Type | Meida Name (CN) | Media Name (EN) | ID | Abbreviation |
---|---|---|---|---|
Print Media | 自由時報 | Liberty News | 0 | ltn |
Print Media | 聯合報 | UDN News | 1 | udn |
Print Media | 中國時報 | China Times | 2 | chinatimes |
Import Model
from model import chinatimes_crawler, ltn_crawler, udn_crawler
Three crawler in model.py
for different news company website:
- chinatimes_crawler
- ltn_crawler
- udn_crawler
Constructor
## Build udn news crawler 聯合報
crawler1 = udn_crawler()
## Build ltn news crawler 自由時報
crawler2 = ltn_crawler()
## Build chinatimes news crawler 中時
crawler3 = chinatimes_crawler()
Crawling
Use search(keywords, pages_num, CSV)
to retrieve the dataframe conatining news info within page_num. If CSV is set to True (default=False), the searching result will be save automatically as CSV file in the folder called search_result
.
## search keywords: covid
## search first 100 pages results
## save the result as udn_covid_100.csv in search_result folder.
crawler1.search('covid', 100, CSV=True)
crawler2.search('covid', 100, CSV=True)
crawler3.search('covid', 100, CSV=True)
## search keywords: covid
## search first 100 results
## Do not save the result as .csv file
crawler1.search('covid', 100, CSV=False)
crawler2.search('covid', 100, CSV=False)
crawler3.search('covid', 100, CSV=False)
Please refer to Demo.ipynb
TITLE | TIME | CATEGORY | DESCRIPTION | CONTENT | KEYWORDS | FROM | LINK |
---|---|---|---|---|---|---|---|
北市房仲傳染給新北客戶Ct值19.4 急匡列40人足跡公布 | 2022-01-25 15:23:00 | udn | 國內疫情持續延燒,北市日前出現不明感染源,北市男房仲(案18375)曾帶一名客戶(案1851... | 國內疫情持續延燒,北市日前出現不明感染源,北市男房仲(案18375)曾帶一名客戶(案1851... | 新冠肺炎,COVID-19 | 聯合新聞網 | https://udn.com/news/story/120940/6059544 |