Byte Walker is a simple yet powerful web crawler written in TypeScript, designed to efficiently crawl websites and generate detailed reports on the pages visited. This project is perfect for those looking to understand web crawling mechanics or as a starting point for more complex web scraping tasks.
- Crawls a website starting from a given base URL.
- Counts and tracks the number of visits to each page within the same domain.
- Generates a sorted report of visited pages in a human-readable format.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.
What things you need to install the software and how to install them:
- Node.js
- npm (Node Package Manager)
A step-by-step series of examples that tell you how to get a development environment running:
-
Clone the repository:
git clone https://github.com/jdrada/byte-walker.git
-
Navigate to the project directory:
cd byte-walker
-
Install the dependencies:
npm install
To use Byte Walker, run the following command with your desired base URL:
npm start <base-url>
For example:
npm start http://example.com
This will start the crawling process and generate a report in the ./reports
directory.
To delete all generated reports, run:
npm run clean-reports
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
- Juan Drada - Initial work - jdrada
See also the list of contributors who participated in this project.
This project is licensed under the ISC License - see the LICENSE.md file for details.
- Hat tip to anyone whose code was used