ostrolucky/Bulk-Bing-Image-downloader

The value of user-agent affects search results

sanghoon opened this issue · 1 comments

I found something interesting.

Sometimes, the downloader gave me fewer images than the search results on my web browser.
While comparing the difference between the two requests,
I found that modifying user-agent results in more search results.

Here is an example.

Run with the default user-agent (Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:94.0) Gecko/20100101 Firefox/94.0)

  • Finished w/ less than 20 search results
$ python bbid/bbid.py smiley -o output_smiley --filters +filterui:face-face
{'search_string': ['smiley'], 'search_file': False, 'output': 'output_smiley', 'adult_filter_off': False, 'filters': '+filterui:face-face', 'limit': None, 'threads': 20}
 OK : Man_Smiling_Emoji_Icon_ios10_grande.png
...
SKIP: Image is a duplicate of Nose-Piercing-60-650x650.jpg, not saving Nose-Piercing-60-650x650.jpg

$ ls output_smiley | wc -l                                      [21:51:35]
       9

Run with another user-agent (Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.140 Safari/537.36 Edge/17.17134)

  • 318 search results
  • The results seem to be similar to the results on a web browser.
$ python bbid/bbid.py smiley -o output_smiley2 --filters +filterui:face-face
{'search_string': ['smiley'], 'search_file': False, 'output': 'output_smiley2', 'adult_filter_off': False, 'filters': '+filterui:face-face', 'limit': None, 'threads': 20}
 OK : smiley-emoticon-cartoon-with-v-sign-.jpg
...
 OK : 3595196bf47d96d3411a4faedc94a9cf.jpg

$ ls output_smiley2 | wc -l                                     [21:51:44]
     215

I'm not sure whether this behavior is dependent solely on a user-agent value or not.
(There might be something complicated I'm not aware of.)

Personally, I modified the user agent value to the latter one, and it seems to work fine for now

Then my conclusion would be that Bing picked up on BBID usage and attempts to filter it out. We should change default user agent to something more generic rather than linux on fedora, so that it's less distinguishable. Feel free to change it in your PR.