regosen/gallery_get

imagefap crash

Closed this issue · 6 comments

Traceback (most recent call last):
  File "gallery_get.py", line 351, in run
    self.run_internal()
  File "gallery_get.py", line 340, in run_internal
    info.path = safe_url(info.redirect, info.path)
  File "gallery_get.py", line 77, in safe_url
    if not link.lower().startswith("http"):
AttributeError: 'int' object has no attribute 'lower'
Using params: ['gallery_get.py']

This is safe_url():

def safe_url(parent, link):
    print("link:  %s" % str(link)) # debug
    print("parent: %s" % str(parent)) # debug
    if not link.lower().startswith("http"):
        uri=urlparse(parent)
        root = '{uri.scheme}://{uri.netloc}/'.format(uri=uri)
        if link.startswith("//"):
            link = "%s:%s" % (uri.scheme, link)
        elif link.startswith("/") or root.strip('/').lower() == parent.strip('/').lower():
            link = root + link
        else:
            link = os.path.dirname(parent) + "/" + link
    return link.replace("&","&")

How do I modify this to test if 'link' is a string? The value passed is "60" (which, I assume, is a number)?

Also, it dies repeatably on one image (image 130; see below). Is there a place I can catch fatal errors and just move on to the next image?

Skipping existing file: https://cdn.imagefap.com/images/full/54/286/286565791.jpg?end=1587922289&secure=08d27343eb208234232de
link:  https://cdn.imagefap.com/images/full/54/165/1653609634.jpg?end=1587922292&secure=009b663d146a5cf4c218e
parent: http://www.imagefap.com/photo/1653609634/?pgid=&gid=5242434&page=5&idx=128
Skipping existing file: https://cdn.imagefap.com/images/full/54/165/1653609634.jpg?end=1587922292&secure=009b663d146a5cf4c218e
link:  https://cdn.imagefap.com/images/full/54/186/1862544636.jpg?end=1587922293&secure=002d1e42301a17af16f7b
parent: http://www.imagefap.com/photo/1862544636/?pgid=&gid=5242434&page=5&idx=129
Skipping existing file: https://cdn.imagefap.com/images/full/54/186/1862544636.jpg?end=1587922293&secure=002d1e42301a17af16f7b
link:  60
parent: http://www.imagefap.com/photo/30267529/?pgid=&gid=5242434&page=5&idx=130

Hi, can you try my changes (see above commit) and let me know if that works?

I DL'd the latest GG and ran it using python3 on CentOS (i.e., python3) and I seem to be missing a library:

Traceback (most recent call last):
  File "gallery_get.py", line 451, in run_wrapped
    root = GalleryGet(myurl, dest or DEST_ROOT, titleAsFolder, allowGenericPlugin).run()
  File "gallery_get.py", line 442, in run
    return self.queue_jobs(page, root, subtitle)
  File "gallery_get.py", line 382, in queue_jobs
    link = safe_url(self.url, link)
  File "gallery_get.py", line 72, in safe_url
    if not (isinstance(link, unicode) or isinstance(link, str)):
NameError: name 'unicode' is not defined
Using params: ['https://www.imagefap.com/pictures/5242434/Amateur-set-337-347', '/home/user8/sets', False]

And if I use Python 2, it doesn't work (I routinely use 3, so I don't know if this is new behavior):

$ python gallery_get.py
Input URL: https://www.imagefap.com/pictures/5242434/Amateur-set-337-347
Destination (/home/user8/sets):
Using imagefap plugin...
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=0
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=1
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=2
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=3
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=4
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=5
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=6
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=7
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=8
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=9
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=10
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=11
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=12
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=13
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=14
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=15
Crawling http://www.imagefap.com/pictures/5242434/Amateur-set-337-347?page=16
ERROR: Failed to copy https://cdn.imagefap.com/images/full/54/155/1552461067.jpg?end=1588106699&secure=059d6fa8e743da599e958
ERROR: Failed to copy https://cdn.imagefap.com/images/full/54/137/1370300173.jpg?end=1588106699&secure=00bc34f55d8a5e1df8a04
ERROR: Failed to copy https://cdn.imagefap.com/images/full/54/193/193540673.jpg?end=1588106701&secure=0d593868eacfa38ad76d7
ERROR: Failed to copy https://cdn.imagefap.com/images/full/54/698/698406796.jpg?end=1588106701&secure=0a3fc97bba854e59b70ef
ERROR: Failed to copy https://cdn.imagefap.com/images/full/54/186/1868846219.jpg?end=1588106703&secure=06e483e5b0cb7b6d55b1b
ERROR: Failed to copy https://cdn.imagefap.com/images/full/54/104/1048211316.jpg?end=1588106703&secure=0f2721304c360f6985488
ERROR: Failed to copy https://cdn.imagefap.com/images/full/54/143/1431985785.jpg?end=1588106704&secure=08dd9b391c5e5c3286f42
ERROR: Failed to copy https://cdn.imagefap.com/images/full/54/172/1721515185.jpg?end=1588106705&secure=0aaab830b637608ab2d2d
ERROR: Failed to copy https://cdn.imagefap.com/images/full/54/153/1536504824.jpg?end=1588106706&secure=0b9f658556a8e247ca36a
ERROR: Failed to copy https://cdn.imagefap.com/images/full/54/126/1260285408.jpg?end=1588106706&secure=01cc1601291842d4241e7

Actually, it looks like the Python3 issue was that str now implies unicode and that keyword is retired (or something), so I changed line 72 to: if not (isinstance(link, str) or isinstance(link, str)): (I guess I could have removed the redundancy, but I don't actually know what you were getting at) and now (using Python3) I get a previous error about not having the lower method:

Traceback (most recent call last):
  File "gallery_get.py", line 336, in run
    self.run_internal()
  File "gallery_get.py", line 322, in run_internal
    self.process_redirect_page(info, response)
  File "gallery_get.py", line 298, in process_redirect_page
    (info.path,info.subtitle) = safe_unpack(jpegs[0],info.subtitle)
  File "gallery_get.py", line 67, in safe_unpack
    return (obj[0],safe_str(obj[1]))
  File "gallery_get.py", line 58, in safe_str
    name = name.replace(":",";") # to preserve emoticons
AttributeError: 'int' object has no attribute 'replace'
Using params: ['gallery_get.py']

Ok, I don't know if this is super robust, but I can get GG to skip a file by wrapping safe_unpack() in a try .. except block. I gather that I should be catching specific errors, in which case you'd want to change except: to except AttributeError:

This seems to also close issues like #63 (i.e., with this fix, I was able to DL that URL).

This is all silent. I'm not told that the file is being skipped... I'd like to be told, but I don't see how I'd add that. Maybe in the except block?

def safe_unpack(obj, default):
    if is_str(obj):
        return (obj,safe_str(default))
    elif obj:
        try:
            return (obj[0],safe_str(obj[1]))
        except:
            return ("","")
    else:
        return ("","")

Thanks for digging into this! After looking into it some more, I've taken your suggested try...except AttributeError approach, but strictly within safe_url(). And yes, this should resolve issue #63 as well.

Would you mind trying the latest changes?

It's been a week and I'm fairly confident this is working now, closing issue. Feel free to reopen if needed!