flathunters/flathunter

Issue with captcha/bot detection

4symmetry19 opened this issue · 7 comments

Hi,

first of all, thanks for this great project!
I'm running this on a mac in the local shell using Python 3.11.
I configured everything for IS24, incl. 2captcha and the Telegram bot.

When I run flathunter.py though, I get output the first time; when it tries again after 10min, it is apparently detected as a bot.
Note: I turned off "headless" as that wasn't working at all; at least with that off it gets me the first batch of results.

This is the outut I get after a 2nd run (verbose mode):
[2023/01/21 13:22:59|abstract_crawler.py |INFO ]: Timeout waiting for iframe element - no captcha verification necessary? [2023/01/21 13:22:59|crawl_immobilienscout.py|WARNING ]: Unable to find IS24 variable in window [2023/01/21 13:22:59|crawl_immobilienscout.py|ERROR ]: IS24 bot detection has identified our script as a bot - we've been blocked [2023/01/21 13:22:59|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/de/[confidential but seems normal] [2023/01/21 13:23:09|abstract_crawler.py |INFO ]: Timeout waiting for iframe element - no captcha verification necessary? [2023/01/21 13:23:09|crawl_immobilienscout.py|WARNING ]: Unable to find IS24 variable in window [2023/01/21 13:23:09|crawl_immobilienscout.py|ERROR ]: IS24 bot detection has identified our script as a bot - we've been blocked [2023/01/21 13:23:09|crawl_immobilienscout.py|DEBUG ]: Got search URL https://www.immobilienscout24.de/Suche/radius/wohnung-mieten?[confidential but seems normal] [2023/01/21 13:23:20|abstract_crawler.py |INFO ]: Timeout waiting for iframe element - no captcha verification necessary? [2023/01/21 13:23:20|crawl_immobilienscout.py|WARNING ]: Unable to find IS24 variable in window [2023/01/21 13:23:20|crawl_immobilienscout.py|ERROR ]: IS24 bot detection has identified our script as a bot - we've been blocked

Another thing that stands out to me is that acc. to 2captcha.com, I've only used 1 captcha so far. For a very long time, the use count was even at 0 despite me getting that first batch of results. The API code is correct though.

Any help would be appreciated!

Cheers,
asymmetry

Hey there,

You'll probably need to provide a few more arguments to the chrome driver. From the looks of your output, you might be hitting the bot detection. Try:

captcha:
  2captcha:
    api_key: 0...00
  driver_arguments:
    - "--no-sandbox"
    - "--headless"
    - "--disable-gpu"
    - "--remote-debugging-port=9222"
    - "--disable-dev-shm-usage"
    - "window-size=1024,768"

Got the same issue with IS24, tried to add driver arguments but with no luck.
I use flathunder with docker...

UPD: installed on mac instead of linux -- with headless -- all the same. Without headless it works.

I just deployed it to a PC in the cloud without docker, and it all works (with --headless), so I think it's maybe something about IP ranges or some other property that is triggering the bot detection.

We've upgraded the undetected-chrome support in the latest software. Are you still seeing this issue with the newest version?

I've updated to latest build and crawling IS24 still doesn't work for me on Google Cloud Deployment

We've upgraded the undetected-chrome support in the latest software. Are you still seeing this issue with the newest version?

It started working soon after I posted, so I guess you fixed it! Thanks so much :)

Great to hear - thanks for the report!