This project is dedicated to scraping the North Korean http://kcna.co.jp website for analysis for a friends masters thesis. The KCNA website requires a Japanese IP in order to view its content.
#Installing
Linux and Mac
npm install
Windows
npm install --no-bin-links
Go to the directory you extracted the project to:
cd ~/kcna_scraper
From that directory, run the index file of src
node /src/index.js [dates|content|body|all|help]
Dates will go through the calendar listing and pull all available dates were content is reported to have been published
Content will go through all dates discovered and find available articles
Body will go through all available articles and parse the contents
All will run everything in order
Data is stored in the ./cache
directory.