A PHP CLI mini-crawler to extract title and meta properties from a website to a CSV file.
Crawling pages is a slow task, so this script will probably fail in your browser because of max_execution_time
quota.
It is recommended to run the script with the command line interface (CLI).
php meta-crawl-bot.php italic.fr
Copy config.php and rename it to config.php
and customize your crawler!
A folder will be created at the root with the domain name and the date of the crawl.
- 404_urls.csv: list of 404 pages
- crawled_data.csv: list of meta tags per page
- external_urls.csv: list of non-parsed external links.