Web scraping not working (crawler.js)
Closed this issue ยท 3 comments
danielgara commented
Be careful, 'Print + eBook' text does not appear in the current packt book website structure. Check this example. https://www.packtpub.com/product/solutions-architect-s-handbook/9781838645649 - so, the crawler code does not work.
I modified the getPrice function, in order to work with the book website structure.
async function getPrice(bookURL, page) {
try {
await page.goto(bookURL);
await page.waitForSelector('.price-list__item .price-list__price');
const data = await page.evaluate(()=> {
if(document.querySelectorAll('.price-list__name')[2].innerText.trim() == 'eBook') {
const price = document.querySelectorAll('.price-list__price')[2];
return {
book: document.querySelector('.product-info__title').innerText,
priceeBook: price.innerText,
}
}
});
return data;
}
catch {
console.log(`Unable to get price from ${bookURL}`);
}
}
kblok commented
Thanks for the report @danielgara!
I will definitely take a look at it ๐
kblok commented
@danielgara code updated. I used a slightly different approach, just in case the print prices go back.
Also, thanks A LOT for the great review on Amazon! you rock! :)
danielgara commented
@kblok thank you! Congratulations for this book, I enjoyed it a lot!