/instagram_webscraping

Primary LanguageJupyter NotebookCreative Commons Attribution Share Alike 4.0 InternationalCC-BY-SA-4.0

Webscraping Instagram with Python


Purpose

This repo contains a couple of jupyter notebooks to be used for scraping posts from instagram on the basis of a hashtag.

The scripts are configured to be executible fully online, using mybinder or a similar jupyter hub server environment. To use mybinder, use the badge below:

Binder


Authors

License

CC-BY-SA 4.0, see attached License.md


How to use this repository

Sources and prerequisites

Data for the scripts are scraped directly from Instagram via its native API. The script is preconfigured in a way that the scraped data are automatically saved to sciencedata.dk. Thus, to have the scripts fully functional, you must have a sciencedata.dk account and be able to properly configure the sddk python package.

Software

* Python 3 with packages specified in requirements.txt

Registered account

  1. sciencedata.dk
  2. google (optional)
  3. github (optional)

Installation


Click on the badge above and wait - mybinder will install all you need.

Instructions

  1. setup your sciencedata.dk account and choose a folder where you want to save your data
  2. launch the repository on mybinder using the badge
  3. for a full integration with google, create a Google API key & Google Service Account Credentials files and upload them to your sciencedata or elsewhere (from where you can load them to your python environment).
  4. use mybinder terminal with preinstalled git to get your scripts back to github