This project can be used to crawl, scrape data from different social media websites. (As of now this only supports Twitter, other modules will be added soon.)
This project does not use Twitter API. For performing searches from Twitter API, you can use Tweepy.
This script can be used to scrape Tweets using requests
to retrieve the content and Beautifulsoup4
to parse the retrieved content and then Pandas
to save all data in csv format.
Well if you're here you already know why :P One of the major disadvantages of Twitter Search API is that you can only access Tweets written in the past 7 days. This is a major bottleneck for anyone looking for older data. Using this code you can scrape beyond 7 days.
$ git clone https://github.com/narcheady/SocialScraper.git $ cd SocialScraper/ $ pip install -r requirements.txt
You can search for tweets using certain words or hashtags within specific dates.
searchWord = "#Twitter" searchFrom = "2020-01-01" searchUntil = "2020-09-19"
Using these inputs as parameters, script will scrape tweets and create a dataframe which then will be converted to csv.