PeskyPotato/archive-chan

{BUG} g ..... or any board name is invalid

baraa272 opened this issue · 9 comments

Capturezzzzzzzzzzzzzzzzzzzzzzz
as you see the issue also sometimes it gives me "reply" is undefined and the script wont save html because of it
so please can you look at this

You were using the wrong format, @baraa272. You shouldn't use the whole boards.4chan.org/g/ URL when trying to download all board threads. Just the string between the slashes, e.g., b or pol.

The last command you issued is the only one that's correct.
python archiver.py g -p -v will save every active thread.
That "Invalid request: g" is just an accidental message. If you wait a bit it'll start downloading the threads.

$ python archiver.py g -p -v --use_db
Invalid request: g
Downloading thread: 79587404
Downloading thread: 79583810
Downloading thread: 76759434
Downloading thread: 79586994
Downloading image: https://i.4cdn.org/g/1610118225680.jpg g.jpg
Downloading post: 79587404 posted on 01/08/21(Fri)10:03:45
Downloading image: https://i.4cdn.org/g/1610095625707.jpg 21-11-2020.1.jpg
Downloading reply: 79587552 replied on 01/08/21(Fri)10:14:56
Downloading reply: 79587570 replied on 01/08/21(Fri)10:16:10
Downloading reply: 79587592 replied on 01/08/21(Fri)10:17:35```

Ok, I think I found the source of the confusion:
image

We need to remove that else statement on line 134.
Because it's the else of the previous if and whenever that previous if evaluates to false it runs the else and that is not the desired behavior.

But it has got nothing to do with the "bug" you mentioned.
It's just a spurious print statement introduced by the latest pull request.

@LameLemon
Since you're going to edit this part of the code, maybe you could find some inspiration on a commit of mine that dealt with this: cardoso-neto@f318a58

My thoughts were on making the board URLs extraction process happen inside the 4chan API class since it is specific to 4chan.

cant wait to the new COMMIT to be pushed as soon as possible
also thanks for effort man and thanks for author for this non hassle tool
i have a question though
how to open chan.db file and view the DB???

VirtualBox_windows 10_09_01_2021_03_58_37
well , i downloaded your fork and tried after installing pip install superjson but still cant download boards as i face this error message in picture
also again thanks for all effort <3

@baraa272

how to open chan.db file and view the DB???

The database is written in sqlite3, you can use a tool sqlitebrowser which gives you a GUI interface to interact with the database, there's also a CLI tool.

@cardoso-neto I'll work on moving over the board extraction today, muito obrigado!

This issue has been addressed in efd3f22.

The code base is a little more cleaner now.

: )

cant wait to use , also thanks for all effort

@baraa272
My fork is not really intended for the general public to use yet.
However, the master branch is working, and you just have to pass it the flag --new_logic.
I've been refactoring the code non-stop, as it's still alpha software, so things will change quickly and without warning.

Sneak peek:

$ python archiver.py logs/list.txt --verbose --preserve_media --path test --new_logic

Load from 'test/w/2131136/thread.json' ...
    Complete! Elapse 0.001323 sec.

Load from 'test/w/2180395/thread.json' ...
    Complete! Elapse 0.001179 sec.

Load from 'test/w/2180136/thread.json' ...
    Complete! Elapse 0.004443 sec.
All available media has been downloaded.
All available media has been downloaded.
All available media has been downloaded.
Time elapsed: 6.3776s

logs/list.txt

https://boards.4channel.org/w/thread/2131136/miss-kobayashis-dragon-maid
https://boards.4channel.org/w/thread/2180395/ector-thread-requests-sharing-no-anime-girl-as-op
https://boards.4channel.org/w/thread/2180136/new-desktop-thread