sherlock-project/sherlock

Yandex Music has a captcha

Closed this issue ยท 15 comments

Checklist

  • I'm reporting a website that is returning false positive results
  • I've checked for similar site support requests including closed ones
  • I've checked for pull requests attempting to fix this false positive
  • I'm only reporting one site (create a separate issue for each site)

Description

Here's a random username that can't possibly exist: ecfhlmiuewfimcuhem.

Here's the username from data.json: ya.playlist

When I visit either, I get a captcha (note: JS is disabled in my browser):
image

Unless Sherlock uses Selenium/Pyppeteer, which i highly doubt (it's not in requirements.txt), this captcha isn't really avoidable (I think). Maybe it even shows up with JS enabled, which I didn't check.

I'm not opening a PR removing YandexMusic because it could be an issue that only happens for me, or maybe it's possible to bypass this captcha.

@cd-CreepArghhh Can you share the raw html used for that page? I'll likely be able to add it to #2068

It won't bypass the captcha until circumvention is added, but it would avoid F+ hits due to the captcha when it's presented

Huh, interestingly there's no captcha now (so it's not a JS issue) but there's a 404 page and a profile. Maybe I'll run Sherlock a couple times then try again.

If you do end up hitting it again drop a ping

Testing yandex in a PITA on my end having to use vpns and such, and even when I do, it apparently trusts me implicitly and refuses to rate limit or captcha me

(if the captcha page returns a status code other than 200, we can also use that as a simpler resolution)

Okay, found out that spamming them with requests gets you a captcha fast. Running Sherlock 4 times resulted in one captcha, and my browser got 2 in 6 requests.

You're going to have to run the HTML through some prettifier though (I don't know any) since it's all on one line.

Note: Github won't let me upload .html files, so rename the .txt to a .html, thanks.

Oops, Captcha!.txt
Oops, Captcha!_files.zip

I'll spam a few requests with python now to check the status code.

Edit: the captcha page (some long URL with a hash or Base64 string in it) returns 200, I'll see what I get when redirected from the profile page (probably 200, so don't wait for me to finish).

Finished. Out of 100 requests, the first request was a 404 (i.e. no captcha) then the rest were all 200s (thus captcha). No 302s either I think, since IIRC requests doesn't automatically resolve those. Status code isn't going to be of any use.

Gonna push a hopeful fix. If you want to be added as a co-author you can drop your github no-reply email/other github email here and a name. Or link to somewhere that has it.

Otherwise I'll push as a single committer.

Just push as single committer

Done. Seems to have not broken anything on my end -- can you pull and validate all 3 cases as well

(captcha, valid, not valid)

Just realized I forgot a case --- 'not valid in country'. Will add that now. Shouldn't make a difference for the captcha tests.

Edit::: that's actually accounted for by the 404 msg I added, so we're good

I don't think it worked, since there's still a false-positive. By the way, I'm pretty sure I'm still in the blacklist or whatever Yandex Music has going on, so it will be a while before I can test the other two cases.

$ git clone https://github.com/ppfeister/sherlock.git  # hope I cloned the right repo...
$ cd sherlock
$ python sherlock ecfhlmiuewfimcuhem --site YandexMusic
[*] Checking username ecfhlmiuewfimcuhem on:
[+] YandexMusic: https://music.yandex/users/ecfhlmiuewfimcuhem/playlists

[*] Search completed with 1 results

hm......... lemme re eval and get back

@cd-CreepArghhh Just got back

Noticed that you didn't run with the --local flag. When you don't use this flag, it pulls from the repo by default instead of our local patched data.json. Can you test one more time but while using that flag? (this won't be necessary if the patch gets merged upstream)

When using that flag on my end, it seems to give the expected result for each of the four cases (not valid, valid, captcha, geoblock).

(that flag messes with me quite a bit.....)

Edit: you do not need to re-pull unless it's been deleted

Yay, it works! ecfhlmiuewfimcuhem doesn't show up, ya.playlist does, and I didn't get any false positives even after spamming the command 30+ times. I didn't realise that it grabbed a data.json from GitHub instead of the local one by default (probably so you don't need to git pull as often).

Also, I'm not sure what the geoblock case is so I can't really test that. (I assume I could try running it through a bunch of tor nodes until I hit it, but I don't have time for that right now).

I get geoblocked here in the USA, so it was an easy test for me to run, lol

I'll go ahead and link your Issue to that PR so it gets closed when and if it (hopefully) gets merged