Originally from Dan Devine - A Content Scraping Tool
Right now this version of scrapeheap is at it's infancy. Don't expect any fancy user interface. Just pop a URL in and expect results. Simple as that.
- See the dump of content as your scraper works
- Saves Docx & HTML in separate folders
- Adds some nice helpful text so if you want to scrape again, just go ahead
- Download/Clone the project
- Install dependencies by running
composer install && npm install
- Ensure you put put the project where your valet has been parked in
- Access the project locally via Valet at http://scrapeheap.test
This assumes you have Valet installed and properly configured for your project. If not, please refer to the Valet documentation for setup instructions.
-
We're using RoachPHP here: https://roach-php.dev/docs/introduction
-
Check out Dan's original project on this: https://github.com/danieldevine/scrapeheap
-
Here's a useful guide: https://codewithkyrian.com/p/roachphp-mastering-web-scraping-with-php