PeskyPotato/archive-chan

small bug (DB) : baord ID is not a static value

baraa272 opened this issue · 9 comments

VirtualBox_windows 10_09_01_2021_16_52_53
everytime you run archive.py with use_db command board id will change to something else which causes many problems
i'd suggest adding an board NAME column in Threads Table for a quick workaround

VirtualBox_windows 10_09_01_2021_17_15_22
this after another archiving another thread
also
VirtualBox_windows 10_09_01_2021_17_16_09
the archiver would still download the whole THREAD again even if its whole files are already there
i think its related to the DB problem

That stuff with the database is all with @LameLemon.
In my fork, I implemented the feature of "resuming unfinished downloads" and "not redownloading everything everytime" without using it at all.

Actually I tried your fork and downloaded a new copy today of your archive Chan , I make sure I installed all the requirements but well...
It didn't work at all , when I try to download an thread with -v -p flags it just don't do anything and after a second or two it will exit with (finished dowbloading) but when I check the archive folder there is no threads folders and there is nothing to be found so I assume it didn't grab anything and exits

The database needs a lot of work, it was purely experimental. I actually use it for the site I run which serves the threads. On this repo there's currently no logic to check if a thread is already downloaded since the --use_db flag is optional, I'll have to implement that at some point.

The database needs a lot of work, it was purely experimental. I actually use it for the site I run which serves the threads.

actually it works well for the purpose of finding main threads from replays and associate that with the corresponding thread on the same board folder
here a screenshot
VirtualBox_windows 10_09_01_2021_21_26_15
as you can see i can filter the main threads by using RESTO 0 value and know which main thread have the original folder in the archive
but the "not redownloading everything everytime" feature is a must since everytime i need to rearchive something for new replays/images it always download everything , anyway i think this is so important since that problem not only time consuming but waste bandwidth

@baraa272
hahah I'm sorry. It was indeed broken. I just fixed it.
Since we don't have a test suite yet, sometimes new features introduce bugs.

Maybe we should create some sort of group chat so we can communicate better? What do you guys think?
This archiver is by far the one with the most potential to be big.

thanks for efforts <3
Anyway the good thing about archiver is just werks , sure it needs some tweaking but in the long run it runs just fine and it have a small db so you can know where original threads are which is very important since BASC-AECHIVER doesn't have this feature

Also it would be nice from you if implement skip already downloaded files on here (i mean original repo lol)

Anyway thanks very much for all effort of you two <3

hahah I'm sorry. It was indeed broken. I just fixed it.

well , yeah about that after install requirements ....

Capture

I created a gitter.im community for us: https://gitter.im/archive-chan/community

it would be nice from you if implement skip already downloaded files on here (i mean original repo lol)

If you take a look at #2, you'll see I actually wrote a short modification to the download function which does exactly that, but there are bugs in LameLemon's implementation of the media downloads which lead to data loss.
I fixed those in mine, but as I've said on our gitter chat, merging our work is not at all a straightforward task.