mozilla/fxapom

test_user_can_sign_in fails waiting at "Authorize this sign-in" step

Closed this issue · 27 comments

test_user_can_sign_in is failing -- seemingly only on Python 3.6 -- looks like it's stuck on the "Authorize this sign-in" step:

[gw2] linux -- Python 3.6.4 /home/jenkins/slave/workspace/fxapom.stage@2/.tox/py36/bin/python3.6
tests/test_fxa_login.py:21: in test_user_can_sign_in
EC.visibility_of_element_located(self._fxa_logged_in_indicator_locator))
.tox/py36/lib/python3.6/site-packages/selenium/webdriver/support/wait.py:80: in until
raise TimeoutException(message, screen, stacktrace)
E   selenium.common.exceptions.TimeoutException: Message: 
E   Stacktrace:
E       at sun.reflect.GeneratedConstructorAccessor9.newInstance (None:-1)
E       at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (None:-1)
E       at java.lang.reflect.Constructor.newInstance (None:-1)
E       at org.openqa.selenium.remote.http.W3CHttpResponseCodec.createException (W3CHttpResponseCodec.java:150)
E       at org.openqa.selenium.remote.http.W3CHttpResponseCodec.decode (W3CHttpResponseCodec.java:115)
E       at org.openqa.selenium.remote.http.W3CHttpResponseCodec.decode (W3CHttpResponseCodec.java:45)
E       at org.openqa.selenium.remote.HttpCommandExecutor.execute (HttpCommandExecutor.java:164)
E       at org.openqa.selenium.remote.service.DriverCommandExecutor.execute (DriverCommandExecutor.java:82)
E       at org.openqa.selenium.remote.RemoteWebDriver.execute (RemoteWebDriver.java:637)
E       at org.openqa.selenium.remote.RemoteWebDriver.findElement (RemoteWebDriver.java:410)
E       at org.openqa.selenium.remote.RemoteWebDriver.findElementById (RemoteWebDriver.java:453)
E       at org.openqa.selenium.By$ById.findElement (By.java:218)
E       at org.openqa.selenium.remote.RemoteWebDriver.findElement (RemoteWebDriver.java:402)
E       at sun.reflect.GeneratedMethodAccessor14.invoke (None:-1)
E       at sun.reflect.DelegatingMethodAccessorImpl.invoke (None:-1)
E       at java.lang.reflect.Method.invoke (None:-1)
E       at org.openqa.selenium.support.events.EventFiringWebDriver$2.invoke (EventFiringWebDriver.java:104)
E       at com.sun.proxy.$Proxy4.findElement (None:-1)
E       at org.openqa.selenium.support.events.EventFiringWebDriver.findElement (EventFiringWebDriver.java:188)
E       at org.openqa.selenium.remote.server.handler.FindElement.call (FindElement.java:48)
E       at org.openqa.selenium.remote.server.handler.FindElement.call (FindElement.java:33)
E       at java.util.concurrent.FutureTask.run (None:-1)
E       at org.openqa.selenium.remote.server.DefaultSession$1.run (DefaultSession.java:176)
E       at java.util.concurrent.ThreadPoolExecutor.runWorker (None:-1)
E       at java.util.concurrent.ThreadPoolExecutor$Worker.run (None:-1)
E       at java.lang.Thread.run (None:-1)
---------------------------- Captured stdout setup -----------------------------
fxapom created an account for email: zaztmtwqizzh@restmail.net at https://stable.dev.lcip.org/auth/ on 2018-03-05 19:20:39.199556 ------------------------------- pytest-selenium --------------------------------
URL: https://stable.dev.lcip.org/signin_token_code?client_id=dcdb5ae7add825d2&state=096ea96eda5ca3bd7373ca3f039d181fcc1323b8830d38a4bca1d558f0c175e2&scope=profile%20kv%3Aread%20kv%3Awrite&redirect_uri=https%3A%2F%2F123done-stable.dev.lcip.org%2Fapi%2Foauth WARNING: Failed to gather log types: Message: GET /session/XXXXXXXX-XXXX-XXXX-XXXX-XXXXf3fdee2a/log/types did not match a known command Build info: version: '3.4.0', revision: 'unknown', time: 'unknown' System info: host: 'WIN-SB3ER6JQ6ME', ip: '172.20.45.31', os.name: 'Windows 10', os.arch: 'x86', os.version: '10.0', java.version: '1.8.0_91' Driver info: org.openqa.selenium.firefox.FirefoxDriver Capabilities [{moz:profile=C:\Users\Administrator\AppData\Local\Temp\rust_mozprofile.CKn7p5jKTfmU, rotatable=false, timeouts={implicit=0.0, pageLoad=300000.0, script=30000.0}, pageLoadStrategy=normal, moz:headless=false, platform=ANY, moz:accessibilityChecks=false, acceptInsecureCerts=false, browserVersion=58.0, platformVersion=10.0, moz:processID=2844.0, browserName=firefox, javascriptEnabled=true, platformName=windows_nt, moz:webdriverClick=true}] Session ID: XXXXXXXX-XXXX-XXXX-XXXX-XXXXf3fdee2a Stacktrace: at sun.reflect.NativeConstructorAccessorImpl.newInstance0 (None:-2) at sun.reflect.NativeConstructorAccessorImpl.newInstance (None:-1) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance (None:-1) at java.lang.reflect.Constructor.newInstance (None:-1) at org.openqa.selenium.remote.http.W3CHttpResponseCodec.createException (W3CHttpResponseCodec.java:150) at org.openqa.selenium.remote.http.W3CHttpResponseCodec.decode (W3CHttpResponseCodec.java:115) at org.openqa.selenium.remote.http.W3CHttpResponseCodec.decode (W3CHttpResponseCodec.java:45) at org.openqa.selenium.remote.HttpCommandExecutor.execute (HttpCommandExecutor.java:164) at org.openqa.selenium.remote.service.DriverCommandExecutor.execute (DriverCommandExecutor.java:82) at org.openqa.selenium.remote.RemoteWebDriver.execute (RemoteWebDriver.java:637) at org.openqa.selenium.remote.RemoteWebDriver.execute (RemoteWebDriver.java:694) at org.openqa.selenium.remote.RemoteExecuteMethod.execute (RemoteExecuteMethod.java:33) at org.openqa.selenium.remote.RemoteLogs.getAvailableLogTypes (RemoteLogs.java:103) at org.openqa.selenium.remote.server.handler.GetAvailableLogTypesHandler.call (GetAvailableLogTypesHandler.java:37) at org.openqa.selenium.remote.server.handler.GetAvailableLogTypesHandler.call (GetAvailableLogTypesHandler.java:28) at java.util.concurrent.FutureTask.run (None:-1) at org.openqa.selenium.remote.server.DefaultSession$1.run (DefaultSession.java:176) at java.util.concurrent.ThreadPoolExecutor.runWorker (None:-1) at java.util.concurrent.ThreadPoolExecutor$Worker.run (None:-1) at java.lang.Thread.run (None:-1) Sauce Labs Job: http://saucelabs.com/jobs/b423cfe9be1d416bad662ed8ebdcdf61

screen shot 2018-03-05 at 12 41 51 pm

I replicated this locally in py27, not in py36 so I don't think this has anything to do with the Python version. A second run passed in py27 so perhaps this is an intermittent issue?

rfk commented

You may be triggering some rate-limiting that's causing an authorization code to be required; I thought that @restmail.net were exempt from such checks but perhaps that's not working in some cases. If you able to consistent reproduce I'll try to track down the cause on the FxA side of things.

Thanks, @rfk - that makes sense. Recently, the Travis CI builds were/are having severe problems (network lag/performance), which has led to quite a few more restarts from our side (read: me triggering them).

We're creating a verified account here:

fxapom/fxapom/fxapom.py

Lines 54 to 60 in e93fd72

self.session = self.client.create_account(self.account.email, self.password)
print('fxapom created an account for email: %s at %s on %s' % (
self.account.email, self.url, datetime.now()))
m = self.account.wait_for_email(lambda m: "x-verify-code" in m["headers"])
if not m:
raise RuntimeError("Verification email was not received")
self.session.verify_email_code(m["headers"]["x-verify-code"])

However it appears when we're attempting to sign in using this account, we're not verified. Are there any recently introduced rules related to authorising signing in, beyond the initial account creation @rfk?

@davehunt @stephendonner could be sign-in confirmation or rate-limit.

Also related pro-tip, if you are driving through web ui tweak your user agent to include FxATester/1.0 in your user agent string.

Example: var UA_OVERRIDE = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:40.0) Gecko/20100101 Firefox/60.0 FxATester/1.0';

In this case your profile will not be subject to A/B testing and thus the login page is predictable.

rfk commented

In the build log, does this indicate the URL on which the test failed out?

URL: https://stable.dev.lcip.org/signin_token_code?client_id=dcdb5ae7add825d2&state=096ea96eda5ca3bd7373ca3f039d181fcc1323b8830d38a4bca1d558f0c175e2&scope=profile%20kv%3Aread%20kv%3Awrite&redirect_uri=https%3A%2F%2F123done-stable.dev.lcip.org%2Fapi%2Foauth

If so, then you've pretty definitely been opted in to the "signin token codes" A/B test. I'm not sure what the current experiment options are on stable.dev, but it sounds like using @vladikoff's tip about might help out here.

@rfk thanks, yes:

URL: https://stable.dev.lcip.org/signin_token_code?client_id=dcdb5ae7add825d2&state=ecc6f4758d6e7b73c4375a170f35666ddf5a4c7f6868f67d09df6052ba2499a2&scope=profile%20kv%3Aread%20kv%3Awrite&redirect_uri=https%3A%2F%2F123done-stable.dev.lcip.org%2Fapi%2Foauth

I'll look into the user-agent override, and will likely need @davehunt and/or @jrbenny35's help, here

I'm not sure we can (or want to) override the user agent. Is there a pref we can set instead? Can you share more information about this experiment? If it's rolled out, we'd need to find a solution anyway.

rfk commented

Can you share more information about this experiment?

The idea of this experiment is, rather than going into their email and clicking a link which opens a new browser window, the user will go into their email to find an authorization code, and copy/paste or type that code into the existing browser window where they're performing the login. The hypothesis is that this will help users stay engaged with the login flow rather than getting lost in additional open browser windows.

We mostly intend to use it for OAuth reliers rather than for sync, but sync was the best place to do initial tests for it.

IIRC the experiment is being shut down in the upcoming FxA release, because we've already learned what we needed to know. That should hit production (and stable.dev) sometime next week.

If you wanted to work around this, you could try to detect the "Authorize this sign-in" screen and if found, dig the code out of the siginin confirmation email. But it may not be worth your time if the experiment shuts down soon anyway.

Longer-term, I think we should find a way to opt you out of these experiments. If customizing the UA doesn't work, maybe we can do this via query parameter or similar. @vladikoff thoughts?

rfk commented

IIRC the experiment is being shut down in the upcoming FxA release

It's actually already shut down in prod, but stable.dev is a little behind.

Longer-term, I think we should find a way to opt you out of these experiments.

There's navigator.webdriver in the WebDriver spec. I think this would be a reasonable way to opt out of experiments.

rfk commented

Thanks, I filed mozilla/fxa-content-server#6026 to look into that on our side.

rfk commented

It's actually already shut down in prod, but stable.dev is a little behind.

Just a quick update here, we attempted to update stable.dev to the latests release code, but it...didn't go well. It seems we'll need to wait for :jrgm to return, as he's the one that usually takes on these updates and may know how to unstick it without breaking anything. Sorry for delay here :-(

No worries on my side, @rfk - Looking forward to his eventual return!

jrgm commented

@stephendonner - I've updated stable.dev.lcip.org. Let me know how it looks to you. There's some automated tests that have failed, but I haven't yet looked at why they've failed. May be missing a new setting or something.

@jrgm thanks, but we're still seeing failures in this particular test, with the URL bar stuck on /api/login:

https://saucelabs.com/beta/tests/2b06aad0e4984b808912d5ed52f56f39/watch#37

rfk commented

Hrm, I think this is a fresh issue with 123done-stable, taking a look now....

rfk commented

PR for config error on 123done: mozilla/fxa-dev#354

rfk commented

123done seems to be back up and running in stable, has this issue been resolve from your side?

rfk commented

IIRC the experiment is being shut down in the upcoming FxA release

sigh IIUC we shut down the experiment for sync users, but it's still enabled for 123Done users, which based on #206 (comment) is what you're using here. I filed this to see about disabling it there as well:

mozilla/fxa-content-server#6102

But it'll probably be quicker to wait for train-111 to hit production (and stable.dev) next week, which brings with it the fix to opt webdriver user-agents out of all experiments by default.

rfk commented

In case it wasn't clear, that "sigh" was just directed at the number of moving parts we have involved here, not directed at you @stephendonner :-)

Thanks as always, @rfk, and no worries; we appreciate the help, and I know y'all are busy!

rfk commented

The code to detect navigator.webdriver is now in production and on stable.dev, however it looks like this test is still failing. Running it locally, it sometimes succeeds, sometimes fails on /signin_token_code, and sometimes fails on /confirm_signin, which is good indication that it's still going through the experiment selection logic.

How confident are we that the test run is actually setting navigator.webdriver? I tried to confirm it but I'm not familiar enough with the test setup here to know for sure.

@rfk - just checked with @davehunt on this; we're currently still on Firefox 59.0 (

ENV FIREFOX_VERSION=59.0
) and navigator.webdriver won't be exposed in a release build until Firefox 60, per https://bugzilla.mozilla.org/show_bug.cgi?id=1169290

I landed #234 today to update our Firefox version in Docker from 59 -> 60 (exposing navigator.webdriver) which very well should, and seems to have, helped tremendously with this suite's reliability.

Closing this out; thanks, everyone!