DarshanDeshpande/Scrapera

Reddit posts scraper

pratik-choudhari opened this issue · 3 comments

I would like to contribute a program to scrape reddit posts obtained when a specific topic is searched.
Following information will be recorded:

  • number of upvotes
  • number of comments
  • title
  • author
  • link
  • subreddit name
  • isSponsored flag

Program will:

  • make use of reddit endpoints.
  • support explicit proxies
  • allow to put a cap on max posts to scrap
  • allow to specify sleep interval between requests

Could you add comments scraping as well? That would be extremely helpful. The Reddit API documentation does support it. If that doesn't work, you could try sending a normal GET request to get the form-token and then manually send POST requests with the token to get more data. If it works then just create a pull request and I will be glad to merge it 👍

@DarshanDeshpande The scraper I have built doesn't require logging in to reddit as it intercepts api called while loading page when user isn't logged in. Whereas, the official reddit api requires authentication of user, Should I include both in the PR?

Scraper merged. Closing this issue