bcgov/MFIN-Data-Catalogue

Need "Please log in" instead of "Page not found"

Closed this issue · 19 comments

I correctly received the automated email reminding me that my metadata record needs to be reviewed.

Review needed:
Test Dan Metadata – data!
http://test.cat.data.fin.gov.bc.ca/test-dan-metadata-data

But when I click the link to my record, I see this:

image

This metadata record is not missing and the URL is correct. The only problem is that I'm not logged in yet.

The correct behaviour would be to simply take the user directly to the IDIR login page, with the redirect after that taking you to your metadata record. Or at the very least, to show an "Access forbidden" error, rather than "page not found".

@NicoledeGreef, does this make sense to you? Also, can you prioritize this? I'll create tickets for things I think of that could be improved, but they aren't necessarily always going to be something we need to do.

Is it possible in such cases to indicate to the user something along the lines of:

"The page you are trying to reach requires a user to log in.

If you have logged in and still cannot reach the page you may not have adequate permission to view the content. Please contact the Catalogue administration team to investigate if needed."

For security it is normal to redirect 403 (forbidden) to 404 (not found) so you don't give away to hackers that there is indeed a protected page at that url.

You could consider splitting up the behaviour redirect 403 to /user/login and leave 404 as page not found.

that way, nobody knows there is content at https://example.com/my-secret-stuff, they end up on /user

If you search the closed issues, this has been brought up before

Thanks for the reminder, @CraigClark - I recall we had discussed this item but didn't search up the closed tickets.

Looks like related tickets were:
#203
#472

I am not going to prioritize this until the recent user experience testing findings are compiled. I feel like we might have received feedback.

If we should decide to action, Forbidden 403 --> /user/login sounds reasonable.

/user/login is quite plain, however.
image

In the case where we redirect there I wonder if we can inject some some instructions to make the user experience a little smoother such as:

"The page you were trying to reach requires a user to log in."

The issue here is that I'm not just randomly trying to reach a page that may or may not exist -- I'm clicking a link in the email that said I need to review my record.

Perhaps rather than a simple link to the metadata-record, it should be a link with a hash-code in it (sufficiently long that hackers wouldn't bother trying to guess). When arriving at the site after having clicked this link, the site knows that it is someone (presumably me) clicking the link in the email, and either sends me to my metadata-record to edit if I'm already logged in, or otherwise to the login page first with a redirect taking me to my metadata-record to edit after that. If the hashcode is incorrect, it can still be the regular 404 error, to avoid hackers discovering where there are protected pages.

e.g. http://test.cat.data.fin.gov.bc.ca/test-dan-metadata-data?code=abcd1234fe2417

It's kinda like the one-time login code, except that it doesn't actually log you in, but just gives you the right to visit the login page rather than getting the 404.

... Oops, and I just read #472, which I see is a very similar issue. @NicoledeGreef, would you want me to copy/move this comment over there?

As per idea from @chrislaick, we'll change this to:

I have this close to working on my localhost. The links in the email are poorly formatted:

image

I thought is was broken HTML, but I think now it's that I need to change the message template to Markdown, as per https://notification.canada.ca/formatting-emails

The emails used to look like this:
image

The emails look like this now (and I have added a second item to the list, to make sure multiples format nicely):
image

Containing links that look like this:
https://dev.cat.data.fin.gov.bc.ca/user/login?destination=/June25-Test

This is working, and is in this PR: #517

@NicoledeGreef, I think one more change I can make here is to change the last part of that URL to not take you to "/June25-test", but rather to "/node/90". This is the more canonical form of the link, and will protect against the scenario where the link would cease to function if someone changed the name of the data-set. You can rename it as many times as you want, but links to its node number will always still take you to that data-set.

... I just did that, and pushed it into the PR.

I've tested this on Dev, and it's working. Pushed to Test, too, and I'll leave it to @NicoledeGreef to test it out there.

@NicoledeGreef, I'm putting this ticket back to "In Development", because I just noticed there is another email that needs to receive the same link-modifying treatment. The update-notification needs to look more like the review-notice emails:
image

It now looks more like this:
image
with the embedded link going to https://dev.cat.data.fin.gov.bc.ca/user/login?destination=/node/90

The change is in this new PR: #518

This is on Dev and Test now, @NicoledeGreef. Both types of emails (review-reminders and update-notifications) should have the new formatting and the new routing via the user-login page.

In Test, I triggered an updated metadata record email to myself. I notice that:

  • the title and the link is separate.
  • the subject line should be "An asset" as opposed to "A asset"
  • the link is not the redirect link to get around the login, e.g. https://test.cat.data.fin.gov.bc.ca/mar-1

image

We have separate ticket to address the aesthetics of the emails in more depth. #516

Over the weekend I received email from the Dev and Test environments re: records that need review or are overdue.

From Dev:

image
Confirmed that the links are the redirect links that will prompt for login, e.g.:
http://dev.cat.data.fin.gov.bc.ca/user/login?destination=/node/37

  • should it be HTTPS? it seems to redirect to HTTPS despite it being HTTP

From Test:

image

The URLs are not redirects so I think the changes are not yet in Test.

Hi @NicoledeGreef,

Good catch on the http/https The site is https, so the links really should be too. I'll see if I can see why that is getting encoded wrong.

And the latest code is NOT on test, even though I deployed it there. Lesson learned: You have to check that the deployment has completed successfully.

I'm not yet totally sure what went wrong. In Openshift, there are quota errors, so likely something about that, but I haven't found the root problem yet.

Seems that my build/deploy to Dev had not worked and somehow I had not noticed. This is why the deploy to Test was failing yesterday. I rebuilt it on Dev, and redeployed to Test. The formatting issue should now be resolved.

In the test I just did, the http/https problem was also resolved -- although I didn't do anything to resolve it. The one thing that did change is that now it's running the latest code, with my reformatting of the links that are sent. However, I just looked back at the old code - pre modifications - and I can't see why it would have been sending http links then, either. In both cases, we are using Drupal's built-in link building functions, and asking for them to be "absolute". Perhaps something in the pod config had changed temporarily. Let's keep an eye on the http/https part of the links.

But for now, I propose that this ticket is done. I'll put it back in "Ready for QA", to see if you agree, @NicoledeGreef.

Interesting - emails sent to me on Sunday the 14th of July still show http. For example:

Image

I can see the change in Test now:

Image

I wonder if it has something to do with the "protection links" that Outlook adds:

https://can01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ftest.cat.data.fin.gov.bc.ca...

I think I figured it out. It's because these emails are sent out via a cron-run, and not a browser request: Browser requests coem in through our proxy-layers (varnish, redis), while cron-runs originate in Drupal itself.

This code was added by Will to pass the 'HTTPS' variable passed from the proxy layers through to Drupal, in settings.php:

    $_SERVER['HTTPS'] = $_SERVER['HTTP_X_FORWARDED_PROTO'] === 'https' ? 'on' : 'off';

It expects the 'HTTP_X_FORWARDED...' values to be set (by the proxy), and depends on it to set the corresponding 'HTTPS' value for Drupal. And it currently does not allow for the possibility that the 'FORWARDED' value might not be set. If it is not set, that logic will set HTTPS to 'off' -- which would explain what we're seeing in these emails.

I also found here that:

"because of the Load Balancer, which handles the SSL encryption/decryption the Web Server doesn't get $_SERVER["HTTPS"], but $_SERVER["HTTP_USESSL"] is set and can be used as a flash for SSL traffic

I'll need to do some testing to see exactly what server environment variables are set when cron is running, and which one to use to determine if it's HTTP or HTTPS. (I could hard-code it to https, since all of our sites have that now, but that is an ugly, fragile way to do it.)

@NicoledeGreef, should we be going ahead and doing this? If you like, we can later investigate and resolve this http issue.

The other changes in this ticket were already deployed to Prod, along with some other fixes, last week by Chris.

@danhgov we can close this one and create a new ticket for the HTTP/HTTPS issue. Thanks for the sleuthing!