regosen/gallery_get

fix x3vid filenames (remove colons) because Windows is fragile

Opened this issue · 3 comments

GG gets images from x3vid.com "full image" pages, but the filenames I see on Windows are goofy because (I think) they have colons in them. [I don't know if this affects all filenames on x3vid. FWIW, the URL is: https://x3vid.com/gallery_pics/3424569/Public_nudity_35?page=1]

The HTML snippet is below and I see GG saving the image to the name used by the website as-is, but Windows displays this image as 'HQEQHE~F.JPG'. OTOH, Chrome removes the colon (silently) and then the filename is sensible. Any clues how I could hack GG to also silently remove the colons?

<a href="/i42727683/Public_nudity_35?page=1&amp;source=gallery">
  <figure>
  <img id="42727683" data-p="1" class="img-box thumb" alt="Public nudity 35 (1/10)" src="/images/14242/https:__ep5.xhcdn.com_000_146_605_573_1000.jpg" />
  </figure>
</a>

This one-line change (below, find the line marked "added THIS LINE") seems to be working. My first attempt to modify the write_to_file() method caused copy_image() to crash. I'm still not sure why.

    def copy_image(self, info):
        info.attempts += 1

        file_name = info.destination_filename()
        file_name = re.sub(r"[:]", "", file_name) # added THIS LINE
        try:
            file_info = urlopen_safe(info.path)
        except:
            return False

        try:
            modtimestr = file_info.headers['last-modified']
            modtime = time.strptime(modtimestr, '%a, %d %b %Y %H:%M:%S %Z')
        except:
            modtime = None

        if self.can_skip(file_name, file_info):
            print("Skipping existing file: " + info.path)
            return True

        if info.attempts == 1:
            print("%s -> %s" % (info.path, file_name))

        if not info.write_to_file(file_info, file_name):
            return False

        if modtime is not None:
            lastmod = calendar.timegm(modtime)
            os.utime(file_name, (lastmod, lastmod))
        return os.path.getsize(file_name) > 4096

If x3vid.com doesn't use filenames worth preserving, I would recommend this instead:

  1. go to the gallery_plugins directory, make a copy of plugins_generic.py and call it plugins_x3vid.py (keep it in that same folder)
  2. change the last line in plugins_x3vid.py to same_filename = False

Now when you re-run it should say "Using x3vid plugin", and you should get filenames that look like 001.jpg, 002.jpg, etc.

If you're happy with the result, feel free to open a pull request with your addition of the plugin!

This is a good suggestion, but I have a ticket about how limited that functionality is. It recycles the numbers across pages on some sites. (I posted a patch that fixes this, but it's probably kludgy; for example, if there are two threads, the numbering starts from 3 0003, 0004, ... and I don't know why).

I was looking for a way to just remove characters that Windows considers illegal in filenames and I wish it was easier to add things to the plugins so I can add remove_colons = true in a plug-in and then add code to GG to handle that special feature of a website.

A significant problem is that I can code in other languages, but I'm almost completely ignorant of python3...