QIN2DIM/hcaptcha-challenger

url 'https://api.hcaptcha.com/getcaptcha/' returning base64 instead json

allerallegro opened this issue · 10 comments

The return of this url is returning a base64 instead of a json, generating an error in the control.py file, line 144.

    async def handler(self, response: Response):
        if response.url.startswith("https://api.hcaptcha.com/getcaptcha/"):
            try:
                data = await response.json()
'Exception has occurred: UnicodeDecodeError
'utf-8' codec can't decode byte 0xf7 in position 3: invalid start byte'

It doesn't seem to be base64, but another encryption method. Or can you provide me with a demo to decrypt the data?

https://accounts.hcaptcha.com/demo?sitekey=c86d730b-300a-444c-a8c5-5312e7a93628
In this link, when you click on the checkbox, the website triggers a request (https://api.hcaptcha.com/getcaptcha/c86d730b-300a-444c-a8c5-5312e7a93628) that returns as an octet-stream.
data.txt

I'm having the same problem. It also happen when you refresh challenge. But it will return decoded data when you re-click the checkbox again.

@william9x interesting. I've been working on LLM Agent application stuff for the last couple months.

I haven't allocated too much effort to open source projects, I'll take a look in a couple days.

I expected to introduce YOLOV9 and LLM to handle multi-mode challenges. Strive to kill the game.

Simple

const response = new ArrayBuffer()
const responseText = new TextDecoder().decode(response);
const data = JSON.parse(responseText);

Simple

const response = new ArrayBuffer()
const responseText = new TextDecoder().decode(response);
const data = JSON.parse(responseText);

Hello, it seems that the code you provided cannot solve the problem, as it still produces garbled output. Could you please provide a complete demo?

Simple

const response = new ArrayBuffer()
const responseText = new TextDecoder().decode(response);
const data = JSON.parse(responseText);

Hello, it seems that the code you provided cannot solve the problem, as it still produces garbled output. Could you please provide a complete demo?

I will, when I have time (probably this week or the next).
Update: this seems to be affecting the collector too, https://github.com/QIN2DIM/hcaptcha-challenger/actions/runs/8318149616/job/22759735996

I have managed to bypass octet stream issues by changing the removing application/octet-stream from accept headers of requests to https://api.hcaptcha.com/getcaptcha/.

I have done this with playwright:

page = await context.new_page()
            
async def handle_route(route, request):
    headers = request.headers.copy()
    accept_header = headers.get('accept', '')
    if 'application/octet-stream' in accept_header:
        # Remove application/octet-stream from headers
        new_accept_header = ','.join(
            [part for part in accept_header.split(',') if part.strip() != 'application/octet-stream']
        )
        headers['accept'] = new_accept_header
        await route.continue_(headers=headers)
    else:
        await route.continue_()

await page.route('https://api.hcaptcha.com/getcaptcha/*', handle_route)

#await stealth_sync(page)
agent = prelude(page)

Agent has still failed on every task attempted so far however, but that is a different issue.