lucaswerkmeister/m3api

OAuth support

Closed this issue · 11 comments

Currently, both backends theoretically support logging in and making authenticated edits, using regular cookie-based sessions. (Though for the browser backend, you’ll be subject to CORS restrictions, so the usefulness of this is fairly limited.) However, for the Node.js backend to be more useful, we should really figure out OAuth support. This will allow m3api to be used for tool backends.

We can probably limit this to OAuth 2.0, which seems to be much simpler to use – after you’ve finished the authorization, you get an access token that you just include as a constant header in all your requests: Authorization: Bearer [hex]

(Though it’s not clear to me how tools are expected to use OAuth2. In my OAuth1a Python tools, I put the access token inside the session, which is user-readable, but that’s fine because it’s useless without the consumer token. In OAuth2, I assume tools shouldn’t put the access token into a user-readable session, because then the user can impersonate the tool to perform actions the tool isn’t expected to do.)

Adding an Authorization header will likely require changes to the internal interface, which means #13 is relevant to this issue.

So the good news is that OAuth 2.0 is pretty easy to do. At least for a confidential client (i.e. one that can keep its client secret secret – one with a backend, that is), you don’t need to do any cryptographical stuff yourself – you just make some requests with parameters that get copied around.

The bad news is that MediaWiki implements OAuth 2.0 via the REST API. So this will definitely require a breaking change to the internal interface (cc #13 again), because the m3api-oauth extension package will need to be able to make requests to rest.php instead of api.php. (I think it’s safe to infer that URL from the API URL, though.)

Also, it seems the bearer token is only valid for 4 hours. The rest.php/oauth2/access_token response also includes a refresh_token that can apparently be used to get a new access token (haven’t tried that yet, though), but that would mean that m3api-oauth would somehow have to hook into m3api’s retry mechanism to automatically refresh the access token and then retry the request on authentication errors.

And I haven’t tried out how a non-confidential client works – that sounds pretty cool (you could probably do browser-only applications that make edits?), but requires something called a PKCE code challenge. (I got that link from mw:OAuth/For Developers – so far I haven’t found it tremendously useful tbqh.) Almost certainly that’s some cryptographical stuff (i.e. beyond copy+pasting things between requests).


I think for a first release of m3api-oauth (accompanying m3api 0.8.0), we can leave out refresh tokens and non-confidential clients. Let’s solve the easier case first and then do follow-up releases for more features.

Okay, that PKCE link is so confusing because it’s only a part of the documentation. Start here for the full chapter: https://www.oauth.com/oauth2-servers/pkce/

Also, it claims that PKCE is useful for confidential clients as well, so I guess once we support PKCE, we might as well use it all the time? But we can still go ahead without it for the initial release.

Also, while it’s recommended to use SHA2-256 for the PKCE stuff, clients who don’t have access to it are allowed to use plain text as well (and no other cryptographic primitives are needed). That’s good news for us, because crypto.subtle.digest() (usage) isn’t available in all the Node versions we support. We should probably use it if available, otherwise fall back to plain text. (Perhaps with a request option to opt out of plain mode being used, for paranoid people? Not sure.)

Here’s what an error for an expired access token looks like, by the way:

{
  "code": "mwoauth-invalid-authorization",
  "info": "The authorization headers in your request are not valid: Invalid access token",
  "docref": "See https://test.wikipedia.org/w/api.php for API usage. Subscribe to the mediawiki-api-announce mailing list at <https://lists.wikimedia.org/postorius/lists/mediawiki-api-announce.lists.wikimedia.org/> for notice of API deprecations and breaking changes."
}

(The code is especially important for retry, cf. #24.)

Draft for an oauth.js that I’ll probably massage into a proper extension package on the train tomorrow:

import { DEFAULT_OPTIONS } from './core.js';

const secretTokenSymbol = Symbol('OAuthClient.secretToken');

class OAuthClient {

	constructor( consumerToken, secretToken ) {
		this.consumerToken = consumerToken;
		Object.defineProperty( this, secretTokenSymbol, {
			value: secretToken,
		} );
	}

}

async function getAuthorizeUrl( session, options = {} ) {
	const { 'm3api-oauth/client': client } = {
		...DEFAULT_OPTIONS,
		...session.defaultOptions,
		...options,
	};
	const restUrl = session.apiUrl.replace( /api\.php$/, 'rest.php' );
	const clientId = client.consumerToken;
	// TODO PKCE
	return `${restUrl}/oauth2/authorize?response_type=code&client_id=${clientId}`;
}

async function handleCallback( session, callbackUrl, options = {} ) {
	const { 'm3api-oauth/client': client } = {
		...DEFAULT_OPTIONS,
		...session.defaultOptions,
		...options,
	};
	const restUrl = session.apiUrl.replace( /api\.php$/, 'rest.php' );
	const accessTokenUrl = `${restUrl}/oauth2/access_token`;
	const code = new URL( callbackUrl ).searchParams.get( 'code' );
	const { status, headers, body } = await session.internalPost( accessTokenUrl, {}, {
		grant_type: 'authorization_code',
		code,
		callback_uri: callbackUrl,
		client_id: client.consumerToken,
		client_secret: client[ secretTokenSymbol ],
	}, { /* TODO user agent */ } );

	if ( status !== 200 ) {
		throw new Error( `OAuth request returned non-200 HTTP status code: ${status}` );
	}

	session.defaultOptions.authorization = `Bearer ${body.access_token}`;
}

export {
	OAuthClient,
	getAuthorizeUrl,
	handleCallback,
};

And how it’s used:

import * as readline from 'node:readline/promises';
import { stdin as input, stdout as output } from 'node:process';

import Session from './node.js';
import { OAuthClient, getAuthorizeUrl, handleCallback } from './oauth.js';

const session = new Session( 'test.wikipedia.org', {
	formatversion: 2,
}, {
	userAgent: 'lucas-oauth-test',
	'm3api-oauth/client': new OAuthClient( 'REDACTED', 'REDACTED' ),
} );

const rl = readline.createInterface( { input, output } );
const callbackUrl = await rl.question( 'Go to ' + await getAuthorizeUrl( session ) + ' and tell me where you got redirected: ' );
rl.close();
await handleCallback( session, callbackUrl );

console.log( await session.request( { action: 'query', meta: 'userinfo' } ) );

… because crypto.subtle.digest() (usage) isn’t available in all the Node versions we support. We should probably use it if available, otherwise fall back to plain text.

Note that the usage example also uses TextEncoder, which in some browsers is more recent than crypto.subtle.digest() (digest MDN, TextEncoder MDN), so we need to feature-test both.

Also, while it’s recommended to use SHA2-256 for the PKCE stuff, clients who don’t have access to it are allowed to use plain text as well (and no other cryptographic primitives are needed). That’s good news for us, because crypto.subtle.digest() (usage) isn’t available in all the Node versions we support. We should probably use it if available, otherwise fall back to plain text. (Perhaps with a request option to opt out of plain mode being used, for paranoid people? Not sure.)

lmaoooo

For hashed PKCE, we need:

  • crypto.getRandomBytes() (generate random code verifier)
  • crypto.subtle.digest() (hash it)
  • btoa (Base64-encode it)

and of those three… btoa is the one with the worst Node support, only being supported since Node 16 (whereas the other two are supported since Node 15).

I guess it’s fine in the grand scheme of things – Node 15 is a non-LTS version, so by now, everyone should be either on Node ≤14 or ≥16, and so btoa and crypto should have the same Node support in the wild. (And Debian Bookworm is set to support Node 18, btw.) But it does mean that the name I envisioned for the “paranoid” option, requireCrypto, is arguably not appropriate, since crypto isn’t necessarily the defining feature (though it will be in browsers, which have supported btoa since always).

Or we just polyfill / roll our own Base64. (We need a modified version of it anyways, so we could roll those modifications directly into our Base64 implementation, I suppose.)

After sleeping on it: there’s absolutely no reason to support Node 15. Let’s call the option requireCrypto (or some other verb), and just throw an error if the platform has crypto but not btoa().

With the initial release of m3api-oauth2, I think we can close this task \o/