Webscraping Instagram with Python
Purpose
This repo contains a couple of jupyter notebooks to be used for scraping posts from instagram on the basis of a hashtag.
The scripts are configured to be executible fully online, using mybinder or a similar jupyter hub server environment. To use mybinder, use the badge below:
Authors
- Vojtěch Kaše , SDAM project, vojtech.kase@gmail.com
License
CC-BY-SA 4.0, see attached License.md
How to use this repository
Sources and prerequisites
Data for the scripts are scraped directly from Instagram via its native API. The script is preconfigured in a way that the scraped data are automatically saved to sciencedata.dk. Thus, to have the scripts fully functional, you must have a sciencedata.dk account and be able to properly configure the sddk python package.
Software
* Python 3 with packages specified in requirements.txt
Registered account
- sciencedata.dk
- google (optional)
- github (optional)
Installation
Click on the badge above and wait - mybinder will install all you need.
Instructions
- setup your sciencedata.dk account and choose a folder where you want to save your data
- launch the repository on mybinder using the badge
- for a full integration with google, create a Google API key & Google Service Account Credentials files and upload them to your sciencedata or elsewhere (from where you can load them to your python environment).
- use mybinder terminal with preinstalled git to get your scripts back to github