This is the Cloudflare Workers proxy component of Gargantuan Takeout Rocket (GTR), a toolkit to quickly backup Google Takeout archives to Azure Storage at extremely high speeds.
This proxy is required as:
- Microsoft's Azure Storage is unable to download from download URLs used in Google Takeout directly due to an URL Escaping issue in Google's URLs that Azure "helpfully" breaks..
- To transfer fast, we tell Azure to fetch from Google with 600MB chunks simutaneously at nearly 89 connections at a time for 50GB files from the extension. Unfortunately, Chromium-based browsers have a limit of 6 connections per HTTP 1.1 host. Azure only supports HTTP 1.1 and only 6 chunks can be commanded to be copied simutaneously via the browser. As a contrast, Azure's azcopy, the command line copier application, can command copies of far more than 6 chunks simutaneously as it is not limited by browser limitations on connections.
Cloudflare Workers can be used to address these issues:
- By base64-encoding the offending URLs when passed to Azure, decoding the exact Google URLs required in the workers, and proxying the traffic through Cloudflare Workers, Azure's mangling of Google's URLs for its "server-to-server" download capabilities is circumvented. Cloudflare charges nothing for ingress and egress as well and the bandwidth to do this proxying is pretty much free.
- Cloudflare Workers are accessed over HTTP/3 or HTTP/2 which multiplex requests over a single connection and aren't bound by the 6 connections limit in the browser. This can be used to convert Azure's HTTP 1.1 endpoint to HTTP/3 or HTTP/2 and the extension in the browser can command more chunks to be downloaded simutaneously through the proxy. Speeds of up to around 8.7GB/s can be achieved with this proxy from the browser versus 180MB/s with a direct connection to Azure's endpoint.
A public instance of this service is provided but you may want to run your own private instance of this proxy for privacy reasons. If so, here is the source.
A public instance is hosted at https://gtr-proxy.677472.xyz that anybody may use with GTR. The front page of https://gtr-proxy.677472.xyz just goes to the GitHub repository for the proxy. The 677472.xyz (67=g
, 74=t
, and 72=r
from ASCII) domain was chosen because it was $0.75 every year for numeric only .xyz
domains and I wanted the bandwidth metrics for my personal site separated from this service. Visiting the domain will redirect to this GitHub repository.
Logs are not stored on this service but I reserve the right to stream the logs temporarily to observe and curb abuse if necessary.
You may be interested in running your own private instance so it does not go through my public proxy.
Use this easy-to-use button:
Out of the box, you should be able to use your workers.dev
domain.
Updates to this proxy may or may not be required in the future. If so, simply delete the old repository and old worker and redeploy.
The proxy should be usable within the free tier limits of Cloudflare Workers at a personal scale.
The usage to use the tool to download from the URL encoding test server is as follows:
-
Encode the URL you wish to download to base64. For our example, we'll encode "https://put-block-from-url-esc-issue-demo-server-3vngqvvpoq-uc.a.run.app/red%2Fblue.txt". The "
%2F
" in the URL would be silently transformed into a/
by Azure if it wasn't base64 encoded due to the bug. The URL should be this in base64:aHR0cHM6Ly9wdXQtYmxvY2stZnJvbS11cmwtZXNjLWlzc3VlLWRlbW8tc2VydmVyLTN2bmdxdnZwb3EtdWMuYS5ydW4uYXBwL3JlZCUyRmJsdWUudHh0
Append that to the proxy URL at https://gtr-proxy.677472.xyz/p/.
-
Do a
GET
of https://gtr-proxy.677472.xyz/p/aHR0cHM6Ly9wdXQtYmxvY2stZnJvbS11cmwtZXNjLWlzc3VlLWRlbW8tc2VydmVyLTN2bmdxdnZwb3EtdWMuYS5ydW4uYXBwL3JlZCUyRmJsdWUudHh0 through a web browser or an application. -
You should see "
This path exists!
" from your download.
You can append a /<a file name here of your choice>
to the end of the URL after the base64 URL to name the file a specific way for download clients that aren't aware of Content-Disposition
's filename
headers such as azcopy
.
- Get your original SAS URL from Azure and append a blob name to it in the path. For our example, we'll use this: https://urlcopytest.blob.core.windows.net/some-container/data.dat?sp=r&st=2022-04-02T18:23:20Z&se=2022-04-03T06:24:20Z&spr=https&sv=2020-08-04&sr=c&sig=KNz4a1xHnmfi7afzrnkBFtls52YIZ0xtzn1Y7udqXBw%3D
- The account name is
urlcopytest
. Construct a new proxyfied URL as such: https://gtr-proxy.677472.xyz/p-azb/urlcopytest/some-container/data.dat?sp=r&st=2022-04-02T18:23:20Z&se=2022-04-03T06:24:20Z&spr=https&sv=2020-08-04&sr=c&sig=KNz4a1xHnmfi7afzrnkBFtls52YIZ0xtzn1Y7udqXBw%3D - Perform any
PUT
operations you wish through that URL as it will go through the proxy. You can observe that the endpoint of the proxy is HTTP/3 after the first initial connection in the Network tab.
A real Google Takeout URL would look like this:
The proxified URL would be:
As this example original Takeout URL has long expired so you would see Locked Domain Expired: Not valid after 2021-11-13T13:44:21.231-08:00
when visiting Google's URL above. But now you can see it through the GTR proxy in full fidelity too!
For anti-abuse reasons, the service is limited to test servers and Google Takeout download URLs for the aformentioned pathing issue and the Google Takeout URLs as unrestricted open proxies on the internet may be abused.
- One of the following must be true:
- The URL is a test URL from
*-3vngqvvpoq-uc.a.run.app
which can respond with paths that can cause issues for Azure direct downloads. The source for this can be found at: https://github.com/nelsonjchen/put-block-from-url-esc-issue-demo-server/blob/master/main.go - Select Linux ISO Test Mirrors. They are useful for testing large-ish file downloads with some heft.
mirrors.advancedhosters.com
- They seem to have resources to spare. Known to work. Can max out 500Mbps connections at least.
*releases.ubuntu.com*
- Known to really work. But they aren't as fast and are only in the UK. Only included here as a historical interest for an early version of this proxy.
- The URL must be a valid Google Takeout download URL. Regions may have different data policies. Please create an issue if your region is unsupported.
- The URL is a test URL from
This tool is implemented to run on Cloudflare Workers as:
- Cloudflare does not charge for incoming or outgoing data. No egress or ingress charges.
- Cloudflare does not charge for memory used while the request has finished processing, the response headers are sent, and the worker is just shoveling bytes between two sockets.
- Cloudflare has the peering, compute, and scalability to handle the massive transfer from Google Takeout to Azure Storage. Many of its peering points are peered with Azure and Google with high capacity links.
- Cloudflare Workers are serverless.
- Cloudflare Worker endpoints are HTTP/3 compatible and can comfortably connect to HTTP 1.1 endpoints.
- Cloudflare Workers are globally deployed. If you transfer from Google in the EU to Azure in the EU, the worker proxy is also in the EU and your data stays in the EU for the whole time. Same for Australia, US, and so on.
I am not aware of any other provider with the same characteristics as Cloudflare.