I suppose this is the end of packtpub-crawler?

Question

I suppose this is the end of packtpub-crawler?

lucymhdavies opened this issue 8 years ago · 27 comments

https://www.packtpub.com/packt/offers/free-learning

Answer 1 · 2017-05-26T09:38:15.000Z

They have done it before as part of some a/b tests, hopefully they revert it back after the stats drop (I don't think people manually check the site every day).

Maybe we can contact them, since this script turns a daily chore into a pleasant experience and all their free books are already downloadable from other sources anyways.

But otherwise, we can't do much about it…

Answer 2 · 2017-05-26T12:59:51.000Z

oh no! just started implementing this script with the packtpub Alexa skill yesterday! How frustrating!

Answer 3 · 2017-06-01T10:36:23.000Z

I have added the book title and the claim URL in the error messages, this way we can at least check if the book is interesting enough to claim it manually. #71

Still, this is a really stupid move, I immediately lost all interest in visiting packtpub :/

Answer 4 · 2017-06-01T10:39:28.000Z

That's a useful feature at least. Shame we can't automatically claim them anymore :(

Answer 5 · 2017-06-02T11:00:38.000Z

going to close this, as #71 has now been merged

Answer 6 · 2017-06-02T11:33:53.000Z

I have created a new branch with a proposal, I don't know if is worth it spend time.

I have fixed the claim, looking at the docs the recaptcha-token field should always be available in the page, but needs to be validated by the client and can be used only once. If you solve the captcha manually and plug the token here you are able to download the book.
If you run the script with an invalid captcha it will download the latest book claimed with the wrong title.

Would be interesting, just for fun, to try to de-couple the claim from the rest, solving only the captcha via mail 😊

By the way, this document (although I think is already obsolete) is an alternative, but I don't think should be the way to go 😞

Answer 7 · 2017-06-20T13:55:44.000Z

Since we have duplicated issues #75 #76 related to this one I will re-open it.

The problem is related to the captha and the error looks like this

[-] <type 'exceptions.IndexError'> list index out of range | spider.py@97
Traceback (most recent call last):
  File "script/spider.py", line 97, in main
    packtpub.runDaily()
  File "/home/ubuntu/Projects/github/packtpub-crawler/script/packtpub.py", line 161, in runDaily
    self.__parseDailyBookInfo(soup)
  File "/home/ubuntu/Projects/github/packtpub-crawler/script/packtpub.py", line 93, in __parseDailyBookInfo
    self.info['url_claim'] = self.__url_base + div_target.select('a.twelve-days-claim')[0]['href']
IndexError: list index out of range

There is a a feature branch with a proposal, but it could be a black hole!

Answer 8 · 2017-06-20T18:14:49.000Z

@niqdev really there is the problem with captcha, still, it doesn't work. Maybe implement it by using two steps with opened the page? as one more option

Answer 9 · 2017-06-22T14:40:30.000Z

@develsites yep that was the idea/proposal in the feature branch, 2 step process solving the captcha manually via email for example, but unfortunately yes at the moment the script is broken and we can't do much

Answer 10 · 2017-06-22T14:43:18.000Z

Honestly, if you have to solve the captcha manually anyway, then you may as well just go to https://www.packtpub.com/packt/offers/free-learning and claim it manually.

Packtpub-crawler is still useful for notifying what the latest book is though :)

Answer 11 · 2017-07-17T18:02:07.000Z

i had no captcha today... is it an error or did they remove it? Claiming still worked

Answer 12 · 2017-07-17T19:08:14.000Z

umh, something changed for sure, the reCAPTCHA moved to the bottom-right of the page.
Were you able to download the book with the script?

Answer 13 · 2017-07-17T19:19:52.000Z

Hi I didn't use the script but I was able to get the book manually without validating reCAPTCHA. Thanks. Le lun. 17 juil. 2017 à 21:09, niqdev <notifications@github.com> a écrit :

…

umh, something changed for sure, the reCAPTCHA moved to the bottom-right of the page. Were you able to download the book with the script? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#70 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AC6cEF-HlWIVHN0GCb0EGbUZygY6wLA5ks5sO7EfgaJpZM4NnXgq> .

Answer 14 · 2017-07-18T09:16:06.000Z

The CAPTCHA has not yet returned, but the script fails to claim the book with IndexError('list index out of range',).

Answer 15 · 2017-07-24T16:42:27.000Z

yeah, you dont have to "do" anything for the captcha to work... maybe it detects the browser or something?
For me, using chrome, it just works. no box, nothing, but blocking google prints the error "no captcha" or whatever

its new kind of captcha from google?

"insible recaptcha" - https://developers.google.com/recaptcha/docs/invisible

Answer 16 · 2017-07-25T18:21:57.000Z

Yes, they seem to analyze things like mouse movement patterns. It's called "invisible recaptcha" and it's really interesting when you are into machine learning.

Answer 17 · 2017-08-22T22:23:20.000Z

Hello, we have managed to solve the captcha to make my script-grabber working, You can use the same solution or check mine at: https://github.com/igbt6/Packt-Publishing-Free-Learning
Regards!

Answer 18 · 2017-08-23T08:39:21.000Z

@igbt6 That's awesome, thanks a lot for sharing with us!

Answer 19 · 2017-10-06T15:07:18.000Z

@niqdev I managed to get my Packt grabber working by using Selenium in headless mode AND setting useragent to Chrome (default for headless Chrome is, if I recall correctly, WebdriverChrome).

Answer 20 · 2017-10-06T18:17:16.000Z

@katka-n great! is it easy to integrate with the current project?

Answer 21 · 2017-10-06T18:24:31.000Z

@Hacktoberfest Anyone interested in integrating Anti Captcha or other solutions? Thanks

Answer 22 · 2017-10-06T18:59:40.000Z

@niqdev I am not that experienced but I will try to do so, if I succeed I will create a pull request ;)

Update: I got the basic downloading to the user's account working, but the script stops at downloading a file to the drive.

Answer 23 · 2017-10-26T05:19:12.000Z

here is a python solution for the recaptcha https://github.com/ecthros/uncaptcha

Answer 24 · 2017-10-27T17:26:13.000Z

Thanks @tjadanel , any interest in integrate it?

Answer 25 · 2018-01-21T09:55:27.000Z

I see that they have removed the recaptcha batch from the site? could this mean that recaptcha is removed?
I tried running the script and got list index out of range which either means that recaptcha is still in place or that the structure of the site has changed. Will investigate though. If you don't hear from me either I haven't gotten anywhere or recaptcha is still in place

Answer 26 · 2018-01-21T10:44:47.000Z

@justingiffard There is still reCaptcha used by Packt, They just switched to so called invisible reCaptcha. Use my script instead: https://github.com/igbt6/Packt-Publishing-Free-Learning which will do the work for you ; )

Answer 27 · 2018-01-21T11:14:36.000Z

@igbt6 thanks but you make use of a service which is not free (albeit cheap)