Resource hash from submission downloaded elsewhere [BUG?]
Closed this issue · 2 comments
It's not really a bug however some files download to an unknown location.
I just wanted to ask where these end up.
Description
When I run the download for Reddit I sometimes get this message:
[2024-05-06 16:18:35,535 - bdfr.downloader - INFO] - Resource hash 3212b939476797933c6480962e92413f from submission 1be2qd3 downloaded elsewhere
This is just one of the ones I get, the hash number and the submission number change.
Command
python3 -m bdfr download F:\Bulk-Download\Reddit\images --opts opts_image.yaml --search-existing --no-dupes
.yaml file
skip: [mp4, avi, mov, gif]
time: all
upvoted: true
authenticate: true
user: [me]
Environment
- OS: Windows 11
- Python version: Python 3.12.3
Logs
This is long as I downloaded about 100 images, sorry in advance 😅
(I've deleted a section in the middle so it's not too long)
[2024-05-06 15:58:43,036 - bdfr.connector - DEBUG] - Disabling the following modules:
[2024-05-06 15:58:43,036 - bdfr.connector - Level 9] - Created download filter
[2024-05-06 15:58:43,036 - bdfr.connector - Level 9] - Created time filter
[2024-05-06 15:58:43,037 - bdfr.connector - Level 9] - Created sort filter
[2024-05-06 15:58:43,065 - bdfr.connector - Level 9] - Create file name formatter
[2024-05-06 15:58:43,066 - bdfr.connector - DEBUG] - Using authenticated Reddit instance
[2024-05-06 15:58:43,516 - bdfr.oauth2 - Level 9] - Loaded OAuth2 token for authoriser
[2024-05-06 15:58:43,964 - bdfr.oauth2 - Level 9] - Written OAuth2 token from authoriser to C:\Users\Admin\AppData\Local\BDFR\bdfr\default_config.cfg
[2024-05-06 15:58:44,485 - bdfr.connector - Level 9] - Resolved user to DonOwU
[2024-05-06 15:58:44,485 - bdfr.connector - Level 9] - Created site authenticator
[2024-05-06 15:58:44,485 - bdfr.connector - Level 9] - Retrieved subreddits
[2024-05-06 15:58:44,486 - bdfr.connector - Level 9] - Retrieved multireddits
[2024-05-06 15:58:44,688 - bdfr.connector - DEBUG] - Retrieving upvoted posts of user DonOwU
[2024-05-06 15:58:44,689 - bdfr.connector - Level 9] - Retrieved user data
[2024-05-06 15:58:44,689 - bdfr.connector - Level 9] - Retrieved submissions for given links
[2024-05-06 15:58:44,816 - bdfr.downloader - INFO] - Calculating hashes for 2498 files
[2024-05-06 16:00:42,353 - bdfr.downloader - DEBUG] - Attempting to download submission 1cksdle
[2024-05-06 16:00:42,354 - bdfr.downloader - DEBUG] - Using Gallery with url https://www.reddit.com/gallery/1cksdle
[2024-05-06 16:00:48,196 - bdfr.file_name_formatter - Level 9] - Formatting filename with index 1
[2024-05-06 16:00:48,200 - bdfr.file_name_formatter - Level 9] - Formatting filename with index 2
[2024-05-06 16:00:48,204 - bdfr.file_name_formatter - Level 9] - Formatting filename with index 3
[2024-05-06 16:00:48,207 - bdfr.file_name_formatter - Level 9] - Formatting filename with index 4
[2024-05-06 16:00:48,210 - bdfr.file_name_formatter - Level 9] - Formatting filename with index 5
[2024-05-06 16:00:48,213 - bdfr.file_name_formatter - Level 9] - Formatting filename with index 6
[2024-05-06 16:00:48,216 - bdfr.file_name_formatter - Level 9] - Formatting filename with index 7
[2024-05-06 16:00:48,219 - bdfr.file_name_formatter - Level 9] - Formatting filename with index 8
[2024-05-06 16:00:48,223 - bdfr.file_name_formatter - Level 9] - Formatting filename with index 9
[2024-05-06 16:00:49,096 - bdfr.downloader - DEBUG] - Written file to F:\Bulk-Download\Reddit\images\yiff\charai1126_I love this artist (waspsalad) [fm]_1cksdle_1.png
[2024-05-06 16:00:49,101 - bdfr.downloader - DEBUG] - Hash added to master list: c7a14645582934865c4eaadc8ce221dc
[2024-05-06 16:00:50,518 - bdfr.downloader - DEBUG] - Written file to F:\Bulk-Download\Reddit\images\yiff\charai1126_I love this artist (waspsalad) [fm]_1cksdle_2.png
[2024-05-06 16:00:50,522 - bdfr.downloader - DEBUG] - Hash added to master list: 83c98b1ab5ee81f5ce44e3732740ab28
[2024-05-06 16:00:51,749 - bdfr.downloader - DEBUG] - Written file to F:\Bulk-Download\Reddit\images\yiff\charai1126_I love this artist (waspsalad) [fm]_1cksdle_3.png
[2024-05-06 16:00:51,753 - bdfr.downloader - DEBUG] - Hash added to master list: 6c488da5e32f5e3214813a2189cdde31
[2024-05-06 16:00:52,156 - bdfr.downloader - DEBUG] - Written file to F:\Bulk-Download\Reddit\images\yiff\charai1126_I love this artist (waspsalad) [fm]_1cksdle_4.jpg
(This section was deleted due to length)
[2024-05-06 16:19:02,743 - bdfr.downloader - DEBUG] - Submission 1bdo83f filtered due to URL https://i.redd.it/cr3qfwu2x2oc1.gif
[2024-05-06 16:19:02,743 - bdfr.downloader - DEBUG] - Attempting to download submission 1bdrkfe
[2024-05-06 16:19:02,744 - bdfr.downloader - DEBUG] - Using Direct with url https://i.redd.it/b0opg9hys3oc1.jpeg
[2024-05-06 16:19:03,637 - bdfr.downloader - INFO] - Resource hash 9e702ad4cd9241cfffca0938612f63be from submission 1bdrkfe downloaded elsewhere
[2024-05-06 16:19:03,638 - bdfr.downloader - DEBUG] - Attempting to download submission 1bd5t9z
[2024-05-06 16:19:03,638 - bdfr.downloader - DEBUG] - Using Direct with url https://i.redd.it/qv555jpndync1.jpeg
[2024-05-06 16:19:04,182 - bdfr.downloader - INFO] - Resource hash 7da50bc88c738a422fb451277cdd05c5 from submission 1bd5t9z downloaded elsewhere
[2024-05-06 16:19:04,183 - bdfr.downloader - DEBUG] - Attempting to download submission 1bd4mgc
[2024-05-06 16:19:04,183 - bdfr.downloader - DEBUG] - Using Direct with url https://i.redd.it/4ndjxdnj4ync1.jpeg
[2024-05-06 16:19:04,616 - bdfr.downloader - INFO] - Resource hash 0f0a6d4729cde5648a29dbebf2844471 from submission 1bd4mgc downloaded elsewhere
[2024-05-06 16:19:04,616 - bdfr.downloader - DEBUG] - Attempting to download submission 1bdzgsb
[2024-05-06 16:19:04,617 - bdfr.downloader - DEBUG] - Using Direct with url https://i.redd.it/6jc8qkyfe5oc1.jpeg
[2024-05-06 16:19:05,141 - bdfr.downloader - INFO] - Resource hash cc1a4c305888be00d7414c8da2cf5add from submission 1bdzgsb downloaded elsewhere
[2024-05-06 16:19:05,141 - bdfr.download_filter - Level 9] - Url "https://i.redd.it/flk54dkwi0oc1.gif" matched with "re.compile('.*(mp4|avi|mov|gif)$')"
[2024-05-06 16:19:05,142 - bdfr.downloader - DEBUG] - Submission 1bdg6w8 filtered due to URL https://i.redd.it/flk54dkwi0oc1.gif
[2024-05-06 16:19:05,142 - bdfr.downloader - DEBUG] - Attempting to download submission 1bdonh8
[2024-05-06 16:19:05,142 - bdfr.downloader - DEBUG] - Using Direct with url https://i.redd.it/btdl9b2j03oc1.png
[2024-05-06 16:19:05,732 - bdfr.downloader - INFO] - Resource hash 3d57a89c70593091edfda274b2bd33e0 from submission 1bdonh8 downloaded elsewhere
[2024-05-06 16:19:05,733 - root - INFO] - Program complete
Are you sure this isn't the de-dupe in action, ie, it's saying that it's already been downloaded to that folder and is then skipped? I can't test right now, but you should be able to verify by loading the url of a skipped file and cross checking with what came down. If there are too many files/folders to visually check (and assuming it has a different filename), you could manually download and use a dupe file utility like dupeGuru.
You're using the --search-existing
option (don't, it's not good), so it's searching all of your already-downloaded files and then not writing a second file if it's an exact match. They might be files from different posts or subreddits, but are the same file nonetheless. If you don't want this behaviour, don't use --search-existing
and --no-dupes
.