A Chinese Works of Art downloading app built on top of cheerio.
Thanks to the open data provider 國立故宮博物院.
Key Features • How To Use • Download • Credits • Related • License
- Scrapping and downloading the high-resolution images from https://www.npm.gov.tw/
To clone and run this application, you'll need Git and Node.js (which comes with npm) installed on your computer. From your command line:
# Clone this repository
$ git clone git@github.com:vincent-cm/twarts.git
# Go into the repository
$ cd twarts
# Install dependencies
$ npm install
# Run the app
$ npm start
# Run chrome without CORS check
## Windows N
#### Right click on desktop, add new shortcut
Add the target as "[PATH_TO_CHROME]\chrome.exe" --disable-web-security --disable-gpu --user-data-dir=~/chromeTemp
#### Click OK.
## Windows 10
$ "C:\Program Files (x86)\Google\Chrome\Application\chrome.exe" --disable-web-security --disable-gpu --user-data-dir=~/chromeTemp
## Linux
$ google-chrome --disable-web-security
## macOS
$ open -n -a /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --args --user-data-dir="/tmp/chrome_dev_test" --disable-web-security
Note: If you're using Linux Bash for Windows, see this guide or use node
from the command prompt.
You can now go to http://localhost:4401 and click Get Scraper
There may be some pages, images populate errors, ignore them.
After 999 images downloaded, Chrome may stop downloading more items.
In this case, you have to collect the previously downloaded file names and run reduce.js and put them into the array in output.txt
The official npm.gov.tw may upload more items over time, so enjoy.
This software uses the following open-source packages:
Or
- DecisionEase - under construction - Web based data and machine learning pipeline platform for Fine Art
MIT
team.warekit.io · GitHub @vincent-cm · Twitter @VctSweet