I suppose this is the end of packtpub-crawler?
lucymhdavies opened this issue Β· 27 comments
They have done it before as part of some a/b tests, hopefully they revert it back after the stats drop (I don't think people manually check the site every day).
Maybe we can contact them, since this script turns a daily chore into a pleasant experience and all their free books are already downloadable from other sources anyways.
But otherwise, we can't do much about itβ¦
oh no! just started implementing this script with the packtpub Alexa skill yesterday! How frustrating!
I have added the book title and the claim URL in the error messages, this way we can at least check if the book is interesting enough to claim it manually. #71
Still, this is a really stupid move, I immediately lost all interest in visiting packtpub :/
That's a useful feature at least. Shame we can't automatically claim them anymore :(
going to close this, as #71 has now been merged
I have created a new branch with a proposal, I don't know if is worth it spend time.
I have fixed the claim, looking at the docs the recaptcha-token
field should always be available in the page, but needs to be validated by the client and can be used only once. If you solve the captcha manually and plug the token here you are able to download the book.
If you run the script with an invalid captcha it will download the latest book claimed with the wrong title.
Would be interesting, just for fun, to try to de-couple the claim from the rest, solving only the captcha via mail π
By the way, this document (although I think is already obsolete) is an alternative, but I don't think should be the way to go π
Since we have duplicated issues #75 #76 related to this one I will re-open it.
The problem is related to the captha and the error looks like this
[-] <type 'exceptions.IndexError'> list index out of range | spider.py@97
Traceback (most recent call last):
File "script/spider.py", line 97, in main
packtpub.runDaily()
File "/home/ubuntu/Projects/github/packtpub-crawler/script/packtpub.py", line 161, in runDaily
self.__parseDailyBookInfo(soup)
File "/home/ubuntu/Projects/github/packtpub-crawler/script/packtpub.py", line 93, in __parseDailyBookInfo
self.info['url_claim'] = self.__url_base + div_target.select('a.twelve-days-claim')[0]['href']
IndexError: list index out of range
There is a a feature branch with a proposal, but it could be a black hole!
@niqdev really there is the problem with captcha, still, it doesn't work. Maybe implement it by using two steps with opened the page? as one more option
@develsites yep that was the idea/proposal in the feature branch, 2 step process solving the captcha manually via email for example, but unfortunately yes at the moment the script is broken and we can't do much
Honestly, if you have to solve the captcha manually anyway, then you may as well just go to https://www.packtpub.com/packt/offers/free-learning and claim it manually.
Packtpub-crawler is still useful for notifying what the latest book is though :)
i had no captcha today... is it an error or did they remove it? Claiming still worked
umh, something changed for sure, the reCAPTCHA moved to the bottom-right of the page.
Were you able to download the book with the script?
The CAPTCHA has not yet returned, but the script fails to claim the book with IndexError('list index out of range',)
.
yeah, you dont have to "do" anything for the captcha to work... maybe it detects the browser or something?
For me, using chrome, it just works. no box, nothing, but blocking google prints the error "no captcha" or whatever
its new kind of captcha from google?
"insible recaptcha" - https://developers.google.com/recaptcha/docs/invisible
Hello, we have managed to solve the captcha to make my script-grabber working, You can use the same solution or check mine at: https://github.com/igbt6/Packt-Publishing-Free-Learning
Regards!
@igbt6 That's awesome, thanks a lot for sharing with us!
@niqdev I managed to get my Packt grabber working by using Selenium in headless mode AND setting useragent to Chrome (default for headless Chrome is, if I recall correctly, WebdriverChrome).
@Hacktoberfest Anyone interested in integrating Anti Captcha or other solutions? Thanks
@niqdev I am not that experienced but I will try to do so, if I succeed I will create a pull request ;)
Update: I got the basic downloading to the user's account working, but the script stops at downloading a file to the drive.
here is a python solution for the recaptcha https://github.com/ecthros/uncaptcha
Thanks @tjadanel , any interest in integrate it?
I see that they have removed the recaptcha batch from the site? could this mean that recaptcha is removed?
I tried running the script and got list index out of range
which either means that recaptcha is still in place or that the structure of the site has changed. Will investigate though. If you don't hear from me either I haven't gotten anywhere or recaptcha is still in place
@justingiffard There is still reCaptcha used by Packt, They just switched to so called invisible reCaptcha. Use my script instead: https://github.com/igbt6/Packt-Publishing-Free-Learning which will do the work for you ; )
@igbt6 thanks but you make use of a service which is not free (albeit cheap)