/crawler-benchmark

A Reference Framework for the Automated Exploration of Web Applications. Provides some general web features to let you test crawlers in a well defined environment.

Primary LanguageCSSOtherNOASSERTION

Crawler Benchmark

Build Status codecov

Crawler-Benchmark

A Reference Framework for the Automated Exploration of Web Applications. Provides some general web features to let you test crawlers in a well defined environment.

Usage

First, clone the repository and cd into the repository.

Using Docker

  1. Clone repository

  2. Install Docker

  3. Install docker-compose

  4. Build and use the docker image with docker-compose

    cd crawler-benchmark
    cp .env.example .env # then edit with desired credentials
    docker-compose up -d

When it's done, you can visit the app running at localhost:8080

Development

Run tests locally

docker-compose run --rm website bash -c 'pytest --cov --cov-report term:skip-covered'

css editing

We are using grunt to auto compile scss files into css files and we may add tasks in the future. npm dependencies are specified in package.json.

Install sass from the command line (you may need sudo privileges)

gem install sass
npm install
npm run grunt

Todos

  • build frontend using webpack and load pure.scss from node_modules
  • Publish docker image so the world can spin this
  • Add nodejs docker support
  • Add link to home page (from title)
  • Add new features!
    • Robots.txt validation
    • Visited urls
    • Provide an api
  • Website navigation generation from model
  • Improve settings
    • Import
    • Export
    • json? yaml?
  • Spread the word, make the application known by crawler authors
  • Put online
    • Get crawled by general crawlers like google bot
    • Share results to the public