Roznoshchik/Lurnby

Finding Epub Images

Roznoshchik opened this issue · 0 comments

Epubs seem to have very limited consistency with how they organize their internal file structure.

I haven't figured out a great way of finding the image folder.

images = soup.find_all('img')
        if images:
            for img in images:
                img["loading"] = "lazy" 
                filename = img['src']   
                filename = filename.replace("../", path+"/")

                if not os.path.exists(filename):
                    filename = f"{path}/{img['src']}"

                if not os.path.exists(filename):
                    filename = f"{path}/EPUB/media/{img['src']}"
 
                if not os.path.exists(filename):
                    filename = f"{path}/EPUB/images/{img['src']}"
            
                if not os.path.exists(filename):
                    filename = img['src']
                    filename = filename.replace("../", path+"/OEBPS/")
            

Whenever I encounter an epub whose images don't load, I need to load up the epub, look at the folder structure and then manually add in the branching path.

I'm sure there's a better way to search the epub to locate the image folder itself which would work for any yet undiscovered filepaths.