/php-crawler

A php crawler that finds emails on the internets

Primary LanguagePHP

php-crawler

A crawler written in php with laravel that find email addresses on the internets. Given an entry point url, the crawler will search for emails in all the urls available for this entry point domain name. The emails are downloadable in a text file at any time. Several users can start searching for emails without viewing the other users' searches (searches are related to a user).

Installation

  • Create a mysql database (default name: php_crawler)
  • Install the repo with composer:
composer create-project hedii/php-crawler php-crawler
cd php-crawler
  • Install npm dependencies (optional):
npm install
  • Open the .env file, check the database credentials, and modify it if needed:
DB_HOST=127.0.0.1
DB_DATABASE=php_crawler
DB_USERNAME=root
DB_PASSWORD=root
  • Build the app
php artisan crawler:build
  • Point your webserver to the public directory: php-crawler/public
  • Done

Usage

  • Navigate to your php-crawler website
  • Register a new account
  • Create a new search
  • Create more searches
  • Download the found resources

Troubleshooting

Blank space in path

On some systems, if there is any blank space in the path to the crawler public directory, the crawler app won't work. Remove any space in folders that are part of the crawler path.

MAMP server

If you are running the crawler on a MAMP server, edit config/database.php and add a unix socket conf:

'mysql' => [
    'driver'    => 'mysql',
    'host'      => env('DB_HOST', 'localhost'),
    'database'  => env('DB_DATABASE', 'forge'),
    'username'  => env('DB_USERNAME', 'forge'),
    'password'  => env('DB_PASSWORD', ''),
    'charset'   => 'utf8',
    'collation' => 'utf8_unicode_ci',
    'prefix'    => '',
    'strict'    => false,
    'engine'    => null,
    
    'unix_socket' => '/Applications/MAMP/tmp/mysql/mysql.sock', // add this line
],

Todo

  • write php tests
  • write js tests
  • Crawl for other things than emails
  • ...

Screenshots