This is a simple website crawler built with Node.js. It crawls a website starting from a given URL and saves the visited URLs to a file.
- Clone this repository:
git clone https://github.com/shidiqmuh0/website-crawler-httrack.git
- Navigate to the project directory:
cd website-crawler-httrack
- Install dependencies:
npm install
- Run the crawler:
node index.js
-
Follow the prompt and enter the starting URL when asked.
-
The crawler will then start crawling the website, and the visited URLs will be saved to a file named data.txt in the project directory.
- axios: "^1.6.7"
- cheerio: "^1.0.0-rc.12"
- colors: "^1.4.0"
- fs-extra: "^11.2.0"
- moment: "^2.30.1"
- url: "^0.11.3"
- readline: "^1.4.0"