Scraping Instagram

This project automate the process to update recent media information to a commerce using instagram posts for use of page web

Overview
Dependencies
Instalation
Requirements
Configuration
Running
Comments
Features

Dependencies

Node.js e NPM (suportadas versões: 10.x.x)
Mysql

Instalation

if you install this project on AWS EC2 (for example) you need make this steps:

Git

  sudo apt-get install git-all

Crominium

na raiz do projeto, vá até cd ./node_modules/puppeteer

$ cd ./node_modules/puppeteer

instale todas as dependências dele

$ npm run install

caso necessário, instale todas as dependências necessárias no Debian para execução do navegador (Chromium)

$ sudo apt-get install gconf-service libasound2 libatk1.0-0 libc6 libcairo2 libcups2 libdbus-1-3 libgbm-dev libexpat1 libfontconfig1 libgcc1 libgconf-2-4 libgdk-pixbuf2.0-0 libglib2.0-0 libgtk-3-0 libnspr4 libpango-1.0-0 libpangocairo-1.0-0 libstdc++6 libx11-6 libx11-xcb1 libxcb1 libxcomposite1 libxcursor1 libxdamage1 libxext6 libxfixes3 libxi6 libxrandr2 libxrender1 libxss1 libxtst6 ca-certificates fonts-liberation libappindicator1 libnss3 lsb-release xdg-utils wget

Install PM2

One way: Curl Method

apt update && apt install sudo curl && curl -sL https://raw.githubusercontent.com/Unitech/pm2/master/packager/setup.deb.sh | sudo -E bash -

Two way yarn or npm:

npm install pm2 -g

Install auto complete of PM2

pm2 completion install

Update PM2

npm install pm2 -g && pm2 update

Requirements

instagram perfil need to public
you need create database to use, and include this in configuration of .env

Configuration

put informations to instagram account used to login and scraping posts content in .env like a .env.example
if you deploy using Heroku as needed add buildpack https://github.com/jontewks/puppeteer-heroku-buildpack

Running

install all dependencies with npm i
run

$ npm run dev

Comments

if you want run this project in heroku, watch this problem: if you use free dyno, you get a wrong result. heroku free dyno hibernate and stop your clock process, so not run routine of scraping a instagram page.
if AWS EC2 not run build, execute this commands:

$ sudo /bin/dd if=/dev/zero of=/var/swap.1 bs=1M count=1024

$ sudo /sbin/mkswap /var/swap.1

$ sudo /sbin/swapon /var/swap.1

mugarate12/instagram_scraping