ArthurHeitmann/arctic_shift

Best way to download data for a select number of subreddits?

Closed this issue · 7 comments

I'm looking for the most efficient method to download data from specific subreddits within a given date range, including all posts and comments.

I've attempted using https://arctic-shift.photon-reddit.com/download-tool by manually inputting the subreddit and date range. However, this manual process is cumbersome, and there's a risk of failure. Is there a more automated and reliable approach? I prefer not to download archives due to space and speed considerations.

It depends on how big the subreddit is. The download tool is too slow for the many millions of comments on the top subreddits. Though for smaller ones it should work fine.

If you don't need data from 2023, you can use these dumps separated by subreddit. And for more recent data, you will have to wait a couple more weeks for u/Watchful1 to finish those.

Thanks for the quick reply! In general, it is a couple of medium sized subreddits, speed is not the issue, more curious if there was an automated way to download them without needing to click and type on the webpage many times.

I will check out the dumps!

You can also use the API yourself, if you want more control and confidence. That is what the website is using.

I would prefer to use the API but am unsure how to format the calls such that they match how the website functions. My concern is that due to the limit of 100, I would simply just miss out on the totality of posts or comment for any given day. Happy to chat more via reddit or discord, already very appreciative of the help.

My contact details are in the readme. But the basic idea is that you move through the data time wise. When making a request, all posts/comments will be returned sorted by their date in ascending or descending order. Depending on whether you want to start with the newest and move towards the oldest, or the other way around, you get the date of the last post/comment. Then you use that date as a parameter for the next request. Then you continue that until you hit the end.

Makes sense, will give that a go! I tried to use the discord URL but it kept bouncing me back to the landing page.

If you still need help, I've added my discord username to the readme as well.