Testing how a spider scrapes a given HTML file
seb-jones opened this issue · 5 comments
Hello there,
Just a question. Is there a simple way to feature test a spider by giving it some HTML and inspecting what it returns, e.g. making assertions against what would be returned by collectSpider
.
Many thanks
Seb
I'm afraid there isn't nice way to do this at the moment but it's something I will probably add in the future.
Cool cool, thanks for the response :)
For what it's worth, I've managed to implement a fairly simple, albeit inelegant, way to do these kind of tests in the meantime. It works by firing up a PHP dev server and pointing the spider to that URL by overriding the startUrls
. Thought I'd share the code here in case it was useful to anyone:
$serverProcess = null;
beforeAll(function () {
global $serverProcess;
$serverProcess = proc_open('cd resources/html && php -S localhost:8123', [], $pipes);
});
it('scrapes an html page', function () {
$scrapedItems = Roach::collectSpider(
MySpider::class,
new Overrides(startUrls: ['http://localhost:8123']),
);
// do some assertions on $scrapedItems
});
afterAll(function () {
global $serverProcess;
proc_terminate($serverProcess);
});
The above assumes that there is an index.html
file in resources/html
.
I imagine there's probably a nicer way to do it, but this seems to be working right now.
FYI, I've already started working on testing helpers for this. https://twitter.com/warsh33p/status/1543150150205538304
Shouldn't take too much longer.
Nice! I look forward to trying them out.