Goutte is a screen scraping and web crawling library for PHP.
Goutte provides a nice API to crawl websites and extract data from the HTML/XML responses.
Goutte depends on PHP 7.1+.
Add fabpot/goutte
as a require dependency in your composer.json
file:
Create a Goutte Client instance (which extends Symfony\Component\BrowserKit\HttpBrowser
):
Make requests with the request()
method:
The method returns a Crawler
object (Symfony\Component\DomCrawler\Crawler
).
To use your own HTTP settings, you may create and pass an HttpClient instance to Goutte. For example, to add a 60 second request timeout:
Click on links:
Extract data:
Submit forms:
$crawler = $client->request('GET', 'https://github.com/');
$crawler = $client->click($crawler->selectLink('Sign in')->link());
$form = $crawler->selectButton('Sign in')->form();
$crawler = $client->submit($form, ['login' => 'fabpot', 'password' => 'xxxxxx']);
$crawler->filter('.flash-error')->each(function ($node) {
print $node->text()."\n";
});
Read the documentation of the BrowserKit, DomCrawler, and HttpClient Symfony Components for more information about what you can do with Goutte.
Goutte is pronounced goot
i.e. it rhymes with boot
and not out
.
Goutte is a thin wrapper around the following Symfony Components: BrowserKit, CssSelector, DomCrawler, and HttpClient.
Goutte is licensed under the MIT license.