HCelion/scrape_write_function

Last page is no longer displayed

Closed this issue · 4 comments

The link to the last page is no longer displayed on my site.

Is the problem known?

<nav rel="nav" class="pagination-container AjaxPager">
    <a href="/review/www.amazon.com" data-page-number="1" class="pagination-page active">1</a>
    <a href="/review/www.amazon.com?page=2" data-page-number="2" class="pagination-page ">2</a>
    <a href="/review/www.amazon.com?page=3" data-page-number="3" class="pagination-page ">3</a>
    <a href="/review/www.amazon.com?page=4" data-page-number="4" class="pagination-page ">4</a>
    <a href="/review/www.amazon.com?page=5" data-page-number="5" class="pagination-page ">5</a>
    <a href="/review/www.amazon.com?page=6" data-page-number="6" class="pagination-page ">6</a>
    <span class="pagination-ellipsis pagination-ellipsis--end">…</span>
    <a href="/review/www.amazon.com?page=2" data-page-number="next-page" class="button button--primary next-page" rel="next">Next page</a>
</nav>

Hi Mike,
Sorry I just saw your message. Yes the problem is known but I have not have time to fix it yet. Wanna give it a try, it should not be too hard?

Hi,

the total number is now in this tag:

<span class="headline__review-count">6,492</span>

You would have to divide them by 20 and add 1 if there is a remainder.

Or is there a better way?

I think that looks like a very promising approach. Another way would be to loop over the 'next' button until something on the page appears that tells the scraper that no new reviews are left.
Either way, it would improve on the current code :-)

Hi,
I finally got around to quasi fixing the problem. The website has changed a lot. The website returns data in a quasi json format and R's standard json loader did not prove strong enough. So I rewrote the scraper in Python, which is more elegant and maintainable in the first place. Sorry for the delay.