qpfiffer/mzbh

Build thread-reply graph

Closed this issue · 2 comments

Each scraped webm needs to have it's reply graph taken. Threads should also probably be monitored until they 404 to find replies. Monitored threads should then be checked everytime the downloader runs, and only removed when they 404.

Create a new models:

  • thread model:
    • Simply a list of foreign keys to post objects.
  • post model:
    • Back-references to the thread
    • Any post-text
  • webm model needs to be modified to have a foreign key to a thread.
  • webm_alias model needs to be modified to have a foreign key to a thread.

Replies can be extracted by looking for strings like the following in post bodies:

<a href=\"#p119901356\" class=\"quotelink\">&gt;&gt;119901356</a>

Heres how this will probably work:

  1. During the thread parsing step, for any post that references a hit, add it to a thread_hits array of some sort.
  2. Any post that references anything in the thread_hits list is added to the thread_hits list.
  3. Stick all that shit in the database.

A good chunk of this is implemented, still need to do the following:

  • Check for new posts on things we've skipped
  • Actually build the reply graph and display it on relevant pages