/Crawler

Kankuro-Crawler is a web crawler developed in NodeJS which navigate webs collecting the most important features (requests, screenshot, frames, webSockets, files...). The purpose of this project is to be a crawler template for different types of purposes. Is build using the chrome-remote-interface to navigate the webs.

Primary LanguageJavaScriptGNU General Public License v3.0GPL-3.0

Kankuro Crawler CI

One of my unfinished project :)


Kankuro-Crawler is a web crawler developed in NodeJS which navigate webs collecting the most important features (requests, screenshot, frames, webSockets, files...). The purpose of this project is to be a crawler template for different types of purposes. Is build using the chrome-remote-interface to navigate the webs.

Demo

Demo

Table of content

Usage

This section is a introduction to the usage of the tool like a "Getting Started".

> git clone https://github.com/AlbertoFDR/Crawler.git
> cd Crawler/
> chmod +x install.sh
> ./install.sh

Crawler Most Important Features

  • Screenshot: It takes a screenshot of the web.
  • Frames: Gather the frames in the page.
  • Requests: Collect the requests.
  • Any interstitial page.
  • Javascript created dialogs.
  • WebSockets: Gather the ws communication if any.
  • Files: Download the files of the page (HTML,JS,CSS,Wasm...).
  • Any window open.

Example Webs Usage

  • WebSockets: There are two versions, one with plaintext communication and the other one with Base64. Sudo is to use the port 80.
> pip3 install websockets
> sudo ./script.sh 

FAQ

Authors

  • Alberto FDR: Web