Nandaka/PixivUtil2

PixivUtil Settings Help and General Questions

NamelessDummy opened this issue · 9 comments

My PixivUtil sometimes re-downloads copies of images I already have (Same File Size, Same Pixel-Count). I want it to always check every image in case an image was updated/changed but I don’t want it to redownload images that are 100% identical to it.

It occasionally catches duplicate images by saying Identical Size but not always.

When PixivUtil re-downloads an image it’ll add a string of numbers to the original file.

Since PixivUtil has downloaded copies of images I already have, does anyone have any advice on how to clean up these additional and renamed images? (mass-deletion, renaming, etc.)

Also, sometimes when an image is being downloaded, there will be yellow dots and a yellow square where a solid green line should be. I’ve looked around but I can’t seem to find out what this means.

Sorry for all the questions in a single post. I’ve used PixivUtil heavily for a few years now, but I’ve hardly interacted with the community, so these questions have built up over time.

I’ll attach my config file below, in case it's helpful.

config.txt

I think the usual reason is metadata changes or updating the image, leaving the image size the same or similar but changing the metadata that changes the file hash so it cannot detect as the "same" file.

regex for files that's been renamed might look like this \.[0-9]{10}\.jpg or .png or .ugoira.
you can write python to search for those files and delete or find software with regex search option.
untested, check if files you want to delete is actually what's showing up

Ah, that explains why there are occasional redownloads of the same image. I'll look for software with a regex search option since I'm not experienced in writing Python, or even code, for that matter.

If the image itself, not just the metadata, was changed, would there be a better way to check and download them?

As for the yellow dots and squares, do you know what's causing this? This sometimes happens on fresh downloads as well. Is it something to be concerned about?

Screenshot_1

wiztree from here https://diskanalyzer.com/ does have regex support

Regular Expression Search ([regex](https://en.wikipedia.org/wiki/Regular_expression))
WizTree 4.13 and later supports regular expression search.


Type in a forward slash (/) followed directly by the regular expression you wish to use, e.g.:
/[0-9]{4}-[0-9]{2}-[0-9]{2}\.csv$

If the regular expression contains spaces, enclose it in double quotes, like this:
/"[0-9]{4} [0-9]{2} [0-9]{2}\.csv$"

If the image itself, not just the metadata, was changed, would there be a better way to check and download them?

if the image was updated, it will change the file and automatically downloaded. The problem is finding whether only the metadata was changed or the image was changed.

if you want to keep file if the image was changed. That would need comparing the image data itself not the whole file. Complicated, if there wasn't already stuff out there that does exactly that

For the yellow progress bar. I've just not concerned myself with it

Thanks for the link; I'll look into it!

I see, so the only way to feasibly do this is to manually check and clean the files afterwards.

Since the yellow progress bar doesn't concern you, then I won't let it concern me. However, I am curious as to why it's doing that.

Sorry to open this issue again; however, would there be a different way to check and download updated images without them being redownloaded if just the metadata or file hash was changed?

Let me try to explain how I think this works. Program checks if image is in database, if not, then it will save metadata about the image and downloads it. If the image was updated in any way by the artist, it will get picked up and the image is downloaded again and the old file is renamed. All it knows is the image has been updated.

There isn't a way to check how the file's been changed without downloading it and you processing it afterwards since the program just downloads the updated image.

Alright, thank you for the additional explanation. If I have anymore questions, I'll just open up a new issue.

Perhaps a feature in the future to prevent this kind of thing from happening would be a good addition, if possible.

I really appreciate your help through all this.

If you don't want the program creating files with file.[unix time], set backupOldFile=False in config.ini
But the old file will be deleted if the post is updated

I think I'll eventually try the manual option to clean up files that are duplicates, since I enjoy having all the versions of an image that were posted. However, I might turn off CheckFileSize in the future as well, just to save the headache. I'm still unsure.