sandsmark/scp-wiki

are there any plans to update the archive?

Closed this issue · 1 comments

i'm planning on training a gpt-2 instance on the entire scp wiki & this is the only archive i've been able to find. are there any plans to update it?

yes, I updated https://github.com/sandsmark/wdotcrawl a bit and started running it again.

edit: it's probably going to take a long time (a couple of weeks). There's thousands of pages and pages have a ton of revisions, and I throttle heavily (wikidot didn't seem to mind 200ms delay between requests, but I started getting 500 errors so I delay at least 1s with exponential backoff on errors).