PacktPublishing/UI-Testing-with-Puppeteer

Web scraping not working (crawler.js)

Closed this issue ยท 3 comments

if(document.querySelectorAll('.price-list__name')[1].innerText.trim() == 'Print + eBook') {

Be careful, 'Print + eBook' text does not appear in the current packt book website structure. Check this example. https://www.packtpub.com/product/solutions-architect-s-handbook/9781838645649 - so, the crawler code does not work.

I modified the getPrice function, in order to work with the book website structure.

async function getPrice(bookURL, page) {
    try {
        await page.goto(bookURL);
        await page.waitForSelector('.price-list__item .price-list__price');
        const data = await page.evaluate(()=> {
            if(document.querySelectorAll('.price-list__name')[2].innerText.trim() == 'eBook') {
                const price = document.querySelectorAll('.price-list__price')[2];
                return {
                    book: document.querySelector('.product-info__title').innerText,
                    priceeBook: price.innerText,
                }
            }
        });
        return data;
    }
    catch {
        console.log(`Unable to get price from ${bookURL}`);
    }
}
kblok commented

Thanks for the report @danielgara!
I will definitely take a look at it ๐Ÿ˜Š

kblok commented

@danielgara code updated. I used a slightly different approach, just in case the print prices go back.

Also, thanks A LOT for the great review on Amazon! you rock! :)

@kblok thank you! Congratulations for this book, I enjoyed it a lot!