KartikTalwar/Duolingo

buy_streak_freeze() is broken again

andreasscherbaum opened this issue · 15 comments

Updated my script to the latest version of this repository, and the "buy_streak_freeze()" function stopped working, with an exception. Previous versions returned something similar to the following:

{'Date': 'Thu, 20 Feb 2020 13:23:12 GMT', 'Content-Type': 'application/json; charset=utf-8', 'Content-Length': '36', 'Connection': 'keep-alive', 'pragma': 'no-cache', 'x-tid': '***', 'cache-control': 'no-store, no-cache, must-revalidate, proxy-revalidate', 'surrogate-control': 'no-store', 'x-runtime': '0.04843', 'x-ws': 'UK', 'server': 'duo-api', 'expires': '0', 'x-uid': '***', 'x-frame-options': 'SAMEORIGIN', 'x-content-type-options': 'nosniff', 'x-xss-protection': '1 ; mode=block', 'referrer-policy': 'no-referrer', 'x-envoy-upstream-service-time': '94', 'x-duo-request-host': 'www.duolingo.com'}

{"error": "ALREADY_HAVE_STORE_ITEM"}

The current version returns:

{'Date': 'Thu, 20 Feb 2020 13:20:37 GMT', 'Content-Type': 'application/json', 'Content-Length': '396', 'Connection': 'keep-alive', 'server': 'duo-api', 'set-cookie': '_pxhd=***; path=/; expires=Fri, 19 Feb 2021 13:20:37 GMT;', 'cache-control': 'no-cache, no-store, max-age=0, must-revalidate', 'pragma': 'no-cache', 'expires': '0', 'x-content-type-options': 'nosniff', 'x-frame-options': 'DENY', 'x-xss-protection': '1 ; mode=block', 'referrer-policy': 'no-referrer', 'x-envoy-upstream-service-time': '78', 'x-duo-request-host': 'www.duolingo.com'}

{"blockScript": "/***/captcha/captcha.js?a=c&u=***&v=&m=0", "vid": "", "jsRef": "", "hostUrl": "/***/xhr", "customLogo": null, "appId": "***", "uuid": "***", "logoVisibility": "hidden", "jsClientSrc": "/***/init.js", "firstPartyEnabled": "true", "refId": "***", "cssRef": ""}

Why is the host returning a different response based on what version (and which features) this client library is using? And apparently starts asking for a captcha to solve?

I guess this commit might be the reason: bc75bad

But in general you will get captcha request whenever there are too many login requests with the same IP and User-Agent. I've also encountered it.

One solution is not to perform login on each script run, instead you can do it once and save the token from the header and on the next script run use previously generated token, this way you won't have a captcha issue.

I shouldn't think that setting a clear user agent caused the issue, as previously it would have been sending the same "requests" default user agent every time.

You can test it though, by setting
duo.USER_AGENT to something different in your code after each request. (which was certainly a use case in my mind when I added it as a parameter. One couldn't previously change the user agent from requests default programmatically)

I'll give this a test later this evening hopefully, and perhaps try and make a daily automated test for it. Currently the automated tests only test methods which do not change state on duolingo

I've tested it with different user agents including setting one from Insomnia and I ran into troubles because eventually Insomnia was blocked by captcha after that, while browser could still login fine, that means the user agent is considered on the server.

I think that performing login on each run is inefficient, it's not how apps and browsers work. Why don't we store the token locally and reuse it? Is there any reason for not doing that? And perform login only when the token is expired/invalid.

Currently I just don't perform any login at all.

I have this running in a cron job which runs twice a day. If the login provides a feature to store the session somewhere, that works for me.

Could be a filename specified as second and optional parameter to _login(), if available it loads and saves the session into that file.

That would be pretty useful functionality!
Are we okay to just store it in a json file, perhaps? We don't need to do anything fancy like encrypting on disk.

I'll pop that to the top of my backlog and hopefully work on it later

Well, we should think about some security or at least warn a user that token is stored in plain text. Although it's not much different from accessing it in local storage of the browser, so it might be fine.

Not even json, just txt file with one line would be enough, I think.

I have to rework the opened PR, because I didn't consider that you use different API endpoints than I do, and now I'm actually a bit confused about the functionality. Then I can PR this feature.

I don't care about the file format. If you specify a filename as second parameter, it's basically "yours" to use in this function. Thinking a bit more, other people might want a variable, or pointer. But I'm fine with a session file. Optionally check permissions and bail out if group and world readable.

@igorskh Well, yes. Same as wherever the credentials for the script are coming from, right?

I'm going to write some tests for it, but how do you think this looks, as an implementation of the session file idea?

#80

I'll write a couple tests, then some tests for buy_streak_freeze()

Yeah, checking buy_streak_freeze(), and it seems that I get captcha stuff unless it's something that looks like a real browser's user agent.
Setting it to requests default, "python-requests/2.22.0", doesn't fix it.
Setting it to insomnia's "insomnia/7.0.6", doesn't fix it.

I think this probably just needs documenting? That some endpoints will fail unless you fake a normal browser user agent, by setting:

duolingo.Duolingo.USER_AGENT = "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0"
duo = duolingo.Duolingo(username, password)

Then it'll work.

I'm not a fan of packaging in a browser user agent by default though, it seems messy, and prone to failure. What do others think?

Setting it to insomnia's "insomnia/7.0.6", doesn't fix it.

Works fine from me with Insomnia to https://www.duolingo.com/2017-06-30/login?fields.

Looks like it's more not who does, but what. Are you sure that the browser user agent won't be blocked after some suspicious activity? Actually I've never had a problem with Insomnia until I put its user agent to the Python script.

It seems the login fields endpoint works intermittently without? In my past experience.

Can you try making a request to buy an item, using insomnia?

Works fine for me.

POST /2017-06-30/users/14397890/shop-items HTTP/2
Host: www.duolingo.com
User-Agent: insomnia/7.0.6
Authorization: Bearer Bearer <token_here>
Accept: */*
Content-Length: 59

| {
| 	"itemName": "streak_freeze",
| 	"learningLanguage": "de"
| }

Response:

HTTP/2 400 

{
  "error": "ALREADY_HAVE_STORE_ITEM"
}

I've specifically cleared cookies before doing this request to have a clean test.

Hmmm, yeah, this is interesting.
I'm seeing similar to you now.

Default requests user agent: (FAIL)

>>> resp = requests.post('https://www.duolingo.com/2017-06-30/users/222184039/shop-items', json={"itemName":"streak_freeze", "learningLanguage":"de"}, headers={"Authorization": "Bearer <token>"})
>>> resp.status_code
403
>>>resp.content
b'{"blockScript": "/AsQJFZ9W/captcha/captcha.js?a...

Insomnia user agent: (PASS)

>>> resp = requests.post('https://www.duolingo.com/2017-06-30/users/222184039/shop-items', json={"itemName":"streak_freeze", "learningLanguage":"de"}, headers={"Authorization": "Bearer <token>", "User-Agent": "insomnia/7.0.6"})
>>> resp.status_code
400
>>> resp.content
b'{"error": "INSUFFICIENT_FUNDS"}'

Simple, invalid, user agent ["Firefox"]: (PASS)

>>> resp = requests.post('https://www.duolingo.com/2017-06-30/users/222184039/shop-items', json={"itemName":"streak_freeze", "learningLanguage":"de"}, headers={"Authorization": "Bearer <token>", "User-Agent": "Firefox"})
>>> resp.status_code
400
>>> resp.content
b'{"error": "INSUFFICIENT_FUNDS"}'

Duolingo API user agent: (FAIL)

>>> resp = requests.post('https://www.duolingo.com/2017-06-30/users/222184039/shop-items', json={"itemName":"streak_freeze", "learningLanguage":"de"}, headers={"Authorization": "Bearer <token>", "User-Agent": "Python Duolingo API/0.4"})
>>> resp.status_code
403
>>> resp.content
b'{"blockScript": "/AsQJFZ9W/captcha/captcha.js...

Adding the word "python" to insomnia user agent: (FAIL)

>>> resp = requests.post('https://www.duolingo.com/2017-06-30/users/222184039/shop-items', json={"itemName":"streak_freeze", "learningLanguage":"de"}, headers={"Authorization": "Bearer <token>", "User-Agent": "python insomnia/7.0.6"})
>>> resp.status_code
403
>>> resp.content
b'{"blockScript": "/AsQJFZ9W/captcha/captcha.js...

So yeah, looks like they are referring any request with "python" in the user agent to captcha?

So, then testing with the actual library. I had to modify the buy_item() method locally to print request.json(), because otherwise the insufficient funds error looks the same as the captcha error. I think I'll file a PR to handle captcha errors more clearly.

Testing with the actual library: (FAIL)

>>> import duolingo
>>> duolingo.Duolingo.USER_AGENT
'Python Duolingo API/0.4'
>>> duo = duolingo.Duolingo(username, password)
>>> duo.USER_AGENT
'Python Duolingo API/0.4'
>>> duo.buy_streak_freeze()
{'blockScript': '/AsQJFZ9W/captcha/captcha.js?a=...
Traceback (most recent call last):
...
duolingo.DuolingoException: Not possible to buy item.

Swapping user agent after login: (PASS)

>>> import duolingo
>>> duolingo.Duolingo.USER_AGENT
'Python Duolingo API/0.4'
>>> duo = duolingo.Duolingo(username, password)
>>> duo.USER_AGENT = "insomnia/7.0.6"
>>> duo.buy_streak_freeze()
{"error": "INSUFFICIENT_FUNDS"}
Traceback (most recent call last):
...
duolingo.DuolingoException: Not possible to buy item.

Swapping user agent before object creation: (PASS)

>>> import duolingo
>>> duolingo.Duolingo.USER_AGENT = "insomnia/7.0.6"
>>> duo = duolingo.Duolingo(username, password)
>>> duo.USER_AGENT
'insomnia/7.0.6'
>>> duo.buy_streak_freeze()
{"error": "INSUFFICIENT_FUNDS"}
Traceback (most recent call last):
...
duolingo.DuolingoException: Not possible to buy item.

Swapping user agent after first captcha failure: (FAIL)

>>> import duolingo
>>> duo = duolingo.Duolingo(username, password)
>>> duo.USER_AGENT
'Python Duolingo API/0.4'
>>> duo.buy_streak_freeze()
{'blockScript': '/AsQJFZ9W/captcha/captcha.js?a=...
Traceback (most recent call last):
...
duolingo.DuolingoException: Not possible to buy item.
>>> duo.USER_AGENT = "insomnia/7.0.6"
>>> duo.buy_streak_freeze()
{'blockScript': '/AsQJFZ9W/captcha/captcha.js?a=...
Traceback (most recent call last):
...
duolingo.DuolingoException: Not possible to buy item.

Removing "Python" from user agent: (PASS)

>>> import duolingo
>>> duolingo.Duolingo.USER_AGENT
'Python Duolingo API/0.4'
>>> duolingo.Duolingo.USER_AGENT = "Duolingo API/0.4"
>>> duo = duolingo.Duolingo(username, password)
>>> duo.buy_streak_freeze()
{'error': 'INSUFFICIENT_FUNDS'}
Traceback (most recent call last):
...
duolingo.DuolingoException: Not possible to buy item.

So yeah, my problem earlier was that the exception thrown for captcha failure, and for insufficient funds, is identical, so I overlooked that.

And it looks like you can swap the user agent after logging in, but once you hit the captcha, you need to login again before that'll work.

So, how to action this:

  • Already got the PR for saving sessions, that's fine.
  • I'm thinking we should catch the captcha exceptions early and throw them more clearly, for ease of debugging in future.
  • Remove the word "Python" from the default user agent.

Thanks for all the help with this by the way!

Added a pull request for the clearer exceptions and user agent change: #81
@igorskh would you like to look over it?

Is this working for you guys now?