Build thread-reply graph
Closed this issue · 2 comments
qpfiffer commented
Each scraped webm needs to have it's reply graph taken. Threads should also probably be monitored until they 404 to find replies. Monitored threads should then be checked everytime the downloader runs, and only removed when they 404.
qpfiffer commented
Create a new models:
thread
model:- Simply a list of foreign keys to post objects.
post
model:- Back-references to the thread
- Any post-text
webm
model needs to be modified to have a foreign key to a thread.webm_alias
model needs to be modified to have a foreign key to a thread.
Replies can be extracted by looking for strings like the following in post bodies:
<a href=\"#p119901356\" class=\"quotelink\">>>119901356</a>
Heres how this will probably work:
- During the thread parsing step, for any post that references a hit, add it to a
thread_hits
array of some sort. - Any post that references anything in the
thread_hits
list is added to the thread_hits list. - Stick all that shit in the database.
qpfiffer commented
A good chunk of this is implemented, still need to do the following:
- Check for new posts on things we've skipped
- Actually build the reply graph and display it on relevant pages