WordPress/wordpress-playground

Network access in the browser

adamziel opened this issue ยท 20 comments

Latest status

Curl and tcp over fetch() are now a part of WordPress Playground ๐ŸŽ‰ Here's what we still need to close this issue:


Description

WordPress Playground only has a partial support for network calls.

Types of network calls in WordPress

wp_safe_remote_get

As of #724, Playground is capable of translating wp_safe_remote_get calls into JavaScript fetch() requests. This has limitations:

  • Only https:// URLs are supported
  • The server must provide valid CORS headers in the response
  • Developers can't control all the headers

Arbitrary network calls

Other methods of accessing the network, such as libcurl or file_get_contents, are not supported yet.

Web browsers do not allow the WebAssembly code to access the internet directly yet. A native socket API may or may not be released in the future, but there isn't one for now. #1093 would improve the situation.

In Node.js, Playground access the network using the following method:

  1. Set up a same-domain API endpoint that accepts network commands from the browser
  2. Capture socket function calls in the WebAssembly binary
  3. Pass them to JavaScript
  4. Pass the requested operation over the API endpoint using the fetch() or WebSocket

This may not be viable on the web as someone would have to pay for the hardware to run the proxy on, and the proxy's nature mean there are security risks related to accessing the local network.

Solution

After 1,5 years of exploring and discussing, this issue finally has a path forward:

  • Merge #1093
  • Merge #1273
  • Ship curl in the browser
  • This would enable requesting all CORS-enabled HTTPS endpoints via file_get_contents, curl_exec etc. and without going through wp_safe_remote_get().

For full networking support, we'd also need the following:

  • Expose the Node networking proxy as a separate, runnable script
  • Provide an API to connect it to the in-browser version Playground
  • Document the workflow

Nice to haves:

  • Ship a version of the network built in PHP script to enable running a full-featured Playground build in the same environments as WordPress.
  • Provide a Dockerfile to set up the network proxy and a few buttons for quickly spinning proxy cloud nodes on, e.g. CloudFlare, Digital Ocean, etc.

Limitations of the approach

Limitations without the network proxy:

  • Non-CORS URLs wouldn't work
  • Non-HTTPS traffic wouldn't work
  • gethostname and other low-level methods still wouldn't work
  • SSL certificate checks, like the ones done by Composer, wouldn't work

All of the above could be resolved by plugging in a network proxy.

Other Alternatives

For posterity: I tried a custom Request_Transport that tunneled all traffic through browser's fetch() using the vrzno extension by @seanmorris and that worked well except for sites that didn't allow cross-origin requests โ€“ which is most sites.

Interestingly, I remember that WordPress Plugin Directory did not work in this setup. However, @dd32 pointed out that it exposes the correct access-control headers:

curl -is โ€˜https://api.wordpress.org/plugins/info/1.2/?action=query_pluginsโ€™ | grep โ€˜^access-controlโ€™
access-control-allow-origin: *

So perhaps there is a way to support at least the api.wordpress.org requests with the browser's native fetch()? Let's revisit this idea.

Networking is supported in the Node.js build as of #119 โ€“ PHP sends data through a WebSocket to a local TCP proxy that handles the required network calls.

I can think of three ways to implement in-browser support:

  • A server-side TCP proxy โ€“ the least handy of all, has terrible security implications.
  • An in-browser TCP proxy โ€“ could be implemented as a browser extension, although Google Chrome deprecated the socket.tcp API for extensions.
  • TCP to HTTP rewriting โ€“ The WebSocket class could be replaced with one that concatenates all the sent data and then reconstructs a fetch() call from them. Then, PHP can be compiled without OpenSSL support OR to treat all https requests as http ones so that the WebSocket shim could read raw data. The proxy itself could work as a same-tab fetch() or as a browser-extension fetch() to work around the CORS limitations. This wouldn't support arbitrary network traffic, but would be perhaps good enough for the most popular use-cases.

Also linking to this related discussion.

Libraries like Composer require HTTPS and they verify the peer certificate by default: https://github.com/composer/composer/blob/11879ea737978fabb8127616e703e571ff71b184/src/Composer/Util/StreamContextFactory.php#L183-L197

As a workaround, networking in the browser could:

  • Give PHP a fake wildcard CA cert
  • Implement a fake endpoint for all HTTP requests that would feed PHP the fake certificate
  • Parse the incoming request and re-issue it using fetch()
  • Parse the response, encrypt using the fake certificate, feed it back to PHP

This will only work for endpoints exposing proper CORS headers, but it's a start.

Give PHP a fake wildcard CA cert

why not use a real chain of trust?
I'm very leery of building a system whose default is to strip away all security from TLS connections and present trust for everything.

particularly if we're trying to make it easy to instantly spool up systems with a blueprint, this could so easily lead to cross-site attacks: "Hey look at the plugin I wrote: [malware link]"

for what it's worth, the default Erlang net library sets verify_peer to false and it's a disaster because nobody remembers to activate it and supply proper certs.

maybe I'm misreading this, but I'd rather us avoid that mistake if it's what I think we're talking about

why not use a real chain of trust?

We do in Node.js. Browsers canโ€™t open raw TCP sockets so we need to re-issue the request using fetch(). The only way to do it is to MITM the PHP program to parse the encrypted request data.

Hosting a websocket proxy on e.g. free CloudFlare tier could solve this for now.

Hosting a websocket proxy

Possible candidates:


EDIT: Oh, I see there's already something like this implemented in @php-wasm/node, based on maximegris/node-websockify.

https://github.com/WordPress/wordpress-playground/blob/trunk/packages/php-wasm/node/src/lib/networking/outbound-ws-to-tcp-proxy.ts

Oh, I see there's already something like this implemented in @php-wasm/node, based on maximegris/node-websockify.

Yup, it is used in the @php-wasm/cli, VS Code extension, and wp-now. The same proxy would just work with the web version if it was hosted somewhere. The custom parts were added to support setsockopt().

I wonder what could be achieved, if so, by using the Cloudflare TCP Sockets and running WP Playground on Cloudflare Worker / WASM / NodeJS?

Just to add more context to @fritexvz 's reply, running the playground on wordpress has been discussed here:
#69

#732 solves the bulk of the problem with issuing HTTP requests from WordPress. For full network support, we'll need to run a WebSockets proxy on the server.

not urgent -What sort of use case would require the websocket support?

@aehlke libcurl support. Curl is used e.g. by the Friends plugin by @akirk and by Composer to download and validate the HTTPS certificate.

#1051 implements a HTTPS termination function. All PHP-initiated network traffic is intercepted by a "fake WebSocket" instance which then offers a self-signed HTTPS certificate and reads the raw HTTP traffic, rewrites it as a fetch() call, and streams the response back to PHP. Note this may only work for HTTP and HTTPS requests to URLs exposing valid CORS-headers. It won't work for arbitrary sockets.

That PR needs a lot of cleaning up, but the concept seems to be solid. It would unblock support for libcurl and stream wrappers like file_get_contents("https://...").

It took 1,5 years but we now have a clear path to resolving this issue ๐ŸŽ‰

This would enable requesting all CORS-enabled HTTPS endpoints.

For full networking support, we'd also need the following:

  • Expose the Node networking proxy as a separate, runnable script
  • Provide an API to connect it to the in-browser version Playground
  • Document the workflow

The proxy wouldn't be hosted on Playground.wordpress.net as it would be a resource drain, but we could make spinning your own proxy instance easy enough.

Nice to haves:

  • Ship a version of the network built in PHP script to enable running a full-featured Playground build in the same environments as WordPress.
  • Provide a Dockerfile to set up the network proxy and a few buttons for quickly spinning proxy cloud nodes on, e.g. CloudFlare, Digital Ocean, etc.

@adamziel would love to chat about this at WCUS Contributor Day if you'll be around?

Hey @jeffpaul! Unfortunately I won't be around at WCUS :( But let me loop in @dmsnell who I know will be there. Alternatively, we could connect on .org slack or zoom.

I merged this significant milestone earlier today:

Next up:

Curl is available in web browsers since #1935. fetch() is used as a network transport so the typical CORS limitations apply.

To solve, say, ~80% of the problem, we'd need to open up the CORS Proxy beyond talking to git. This is coming in the short to medium term.

To solve 100% of the problem, we'd need to tunnel the raw TCP traffic coming from Playground over a persistent WebSocket connection. In this scenario, we'd need a https://playground.wordpress.net/tcp-over-ws.php endpoint that would use stream_select to ingest data form Playground, pipe it to the network, and pipe the response bytes back to Playground. Definitely possible, especially with AsyncHttp\Client, but it's also non-trivial and I'm not sure what kind of appetite y'all have for such a feature. For now I'm taking a wild guess this is a very low priority project. If this is something that would help you, please comment on this issue and describe your use-case โ€“ if enough people come in, I'm happy to make it happen.

For now, here's what we need to close this issue: