Question about source of data
Closed this issue · 2 comments
LostRuins commented
Hello, I've been sailing across the interwebs looking for alternatives ever since the Reddit APIcalypse and pushshift sold out and bought the farm.
I've noticed that your API does seem to return new results post the APIcalypse - got some questions:
- Are you performing all reddit data ingestion and storage yourself? Does the ingest ever update or is it a one-time thing?
- Does reddit clamp down or block your data ingestion?
- Are you related to pullpush.io, or are you using any of their data?
- Are you using any archived data from pushshift before they self destructed?
Thanks and keep up the good work!
ArthurHeitmann commented
- Yes, I do all the archiving myself. Each post and comment is retrieved once shortly after it was created and a second time 36 hours later, to update scores, comment numbers, detect deletions/edits and more.
- Activity on reddit is low enough and the API rate limits generous enough, so that anyone can archive all new reddit content in realtime.
- Pullpush and me are independent. But they support a bit with distributing my torrents.
- All data up to 2023-03 is from pushshift.
LostRuins commented
Amazing, thanks. Keep it up. I'm just glad we have alternatives.