ostrolucky/Bulk-Bing-Image-downloader

Target file name

autodatabases opened this issue · 8 comments

Thank you for your code! its very useful!! can u make an additional option to "download" function - newFileName - if null = keyword...

here is modernized function:


def download(pool_sema: threading.Semaphore, url: str, output_dir: str, newname: str):
global in_progress

if url in tried_urls:
    return
pool_sema.acquire()
in_progress += 1
path = urllib.parse.urlsplit(url).path
filename = posixpath.basename(path).split('?')[0]
name, ext = os.path.splitext(filename)
name = **newname**
filename = name + ext

try:
    request=urllib.request.Request(url,None,urlopenheader)
    image=urllib.request.urlopen(request).read()
    if not imghdr.what(None, image):
        print('Invalid image, not saving ' + filename)
        return

    md5_key = hashlib.md5(image).hexdigest()
    if md5_key in image_md5s:
        print('Image is a duplicate of ' + image_md5s[md5_key] + ', not saving ' + filename)
        return

    i = 0
    while os.path.exists(os.path.join(output_dir, filename)):
        if hashlib.md5(open(os.path.join(output_dir, filename), 'rb').read()).hexdigest() == md5_key:
            print('Already downloaded ' + filename + ', not saving')
            return
        i += 1
        filename = "%s-%d%s" % (keyword, i, ext)

    image_md5s[md5_key] = filename
    imagefile=open(os.path.join(output_dir, filename),'wb')
    imagefile.write(image)
    imagefile.close()
    print("OK: " + newname)
    tried_urls.append(url)
except Exception as e:
    print("FAIL: " + filename)
finally:
    pool_sema.release()
    in_progress -= 1

BUT the name of new file is with space before extendion and new file name looks like "image .jpg"

can you correct that code? :) i'm using python first time ;) trying to forget php ;)

I'm sorry, I don't understand. Can you fix your formatting? Even better, create pull request.

So you want function to accept new newname argument. And what value would be passed into function for this argument?

I think its can be keyword but [0-9a-z] :)

your script is VERY VERY USEFUL :) thank you for it!

Glad you find it useful :)

I think its can be keyword but [0-9a-z] :)

That's unfortunately not concrete enough. If you want to change name generation from something else than URL (as it is now), you need to figure out how to fully replace it.

The fact is that saving the downloaded file under a new name works perfectly, BUT for some reason a space is inserted after the file name, before the extension. Perhaps you know why this is happening?

That doesn't happen when I test it. Does it happen for you for every keyword?

SORRY, MAN!!! I`M AN IDIOT :)) checked keyword and find space at the end.. and the trim() resolved my question :))

Good to hear you solved it :)