Instamancer

Scrape Instagram's API with Puppeteer.

Install | Usage | Comparison | Website | FAQ

Instamancer is a new type of scraping tool that leverages Puppeteer's ability to intercept requests made by a webpage to an API.

Read more about how Instamancer works here.

Features

Scrape hashtags, locations and users
Output JSON, CSV
Download images, albums, and videos
Batch scraping
API response validation

Data

Metadata that Instamancer is able to gather from posts:

Text
Timestamps
Tagged users
Accessibility captions
Like counts
Comment counts
Images (Thumbnails, Dimensions, URLs)
Videos (URL, View count, Duration)
Comments (Timestamp, Text, Like count, User)
User (Username, Full name, Profile picture, Profile privacy)
Location (Name, Street, Zip code, City, Region, Country)

Install

Linux

See Puppeteer troubleshooting

Enable user namespace cloning:

sysctl -w kernel.unprivileged_userns_clone=1

Or run without a sandbox:

# WARNING: unsafe
export NO_SANDBOX=true

Without downloading chromium

If you wish to install Instamancer without downloading chromium, enable the PUPPETEER_SKIP_CHROMIUM_DOWNLOAD environment variable before installation

export PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true

From this repository

Requires TypeScript

git clone https://github.com/ScriptSmith/instamancer.git
cd instamancer
npm install
npm install -g

From NPM

npm install -g instamancer

If you're using root to install globally, use the following command to install the Puppeteer dependency

sudo npm install -g instamancer --unsafe-perm=true

From NPX

npx instamancer

Usage

Command Line

$ instamancer
Usage: instamancer <command> [options]

Commands:
  instamancer hashtag [id]       Scrape a hashtag
  instamancer location [id]      Scrape a location
  instamancer user [id]          Scrape a user
  instamancer post [ids]         Scrape a comma-separated list of posts
  instamancer batch [batchfile]  Read newline-separated arguments from a file

Options:
  --help                  Show help                                    [boolean]
  --version               Show version number                          [boolean]
  --count, -c             Number of posts to download. 0 to download all
                                                                    [default: 0]
  --visible               Show browser on the screen            [default: false]
  --download, -d          Save images and videos from posts
                                                      [boolean] [default: false]
  --graft, -g             Enable grafting              [boolean] [default: true]
  --full                  Get the full list of posts and their details from the
                          API and web page            [boolean] [default: false]
  --video                 Download videos. Only works in full mode
                                                      [boolean] [default: false]
  --silent                Disable progress output     [boolean] [default: false]
  --strict                Throw an error if types from Instagram API have been
                          changed                     [boolean] [default: false]  
  --sync                  Synchronously download files between API requests
                                                      [boolean] [default: false]
  --threads, -k           The number of parallel download / upload threads
                                                           [number] [default: 4]
  --waitDownload, -w      When true, media will only download once scraping is
                          finished                    [boolean] [default: false]
  --filename, --file, -f  Name of the output file              [default: "[id]"]
  --filetype, --type, -t  Type of output file
                              [choices: "csv", "json", "both"] [default: "json"]
  --downdir               Directory / Container to save media
                                          [default: "downloads/[endpoint]/[id]"]
  --mediaPath, --mp       Store the paths of downloaded media in the
                          '_mediaPath' key            [boolean] [default: false]
  --logging               Level of logger
                   [choices: "error", "none", "info", "debug"] [default: "none"]
  --logfile               Name of the log file      [default: "instamancer.log"]
  --browser               Location of the browser. Defaults to the copy
                          downloaded at installation
  --swift                 Upload media to openstack's swift object storage
                          rather than saving to disk  [boolean] [default: false]

Examples:
  instamancer hashtag instagood -d          Download all the available posts,
                                            and their thumbnails from #instagood
  instamancer location 644269022 --count    Download 200 posts tagged as being
  200                                       at the Arc Du Triomphe
  instamancer user arianagrande             Download Ariana Grande's posts to a
  --filetype=csv --logging=info --visible   CSV file with a non-headless
                                            browser, and log all events

Source code available at https://github.com/ScriptSmith/instamancer

Module

ES2018 Typescript example:

import * as Instamancer from "instamancer";

const options: Instamancer.IOptions = {
    total: 10
};

const hashtag = Instamancer.hashtag("beach", options);
(async () => {
    for await (const post of hashtag) {
        console.log(post);
    }
})();

Generator functions

Instamancer.hashtag(id, options);
Instamancer.location(id, options);
Instamancer.user(id, options);
Instamancer.post(ids, options);

Options

const options: Instamancer.IOptions = {
    // Total posts to download. 0 for unlimited
    total: number,
    
    // Run Chrome in headless mode
    headless: boolean,
    
    // Logging events
    logger: winston.Logger,
    
    // Run without output to stdout
    silent: boolean,
    
    // Time to sleep between interactions with the page
    sleepTime: number,

    // Throw an error if type validation has been failed
    strict?: boolean,
    
    // Time to sleep when rate-limited
    hibernationTime: number,
    
    // Enable the grafting process
    enableGrafting: boolean,
    
    // Extract the full amount of information from the API
    fullAPI: boolean,
    
    // Use a proxy in Chrome to connect to Instagram
    proxyURL: string,
    
    // Location of the chromium / chrome binary executable
    executablePath: string,

    // Custom io-ts validator
    validator?: Type<unknown>;
}

Comparison

A comparison of Instagram scraping tools. Please suggest more tools and criteria through a pull request.

To see a speed comparison, visit this page

Tool	Hashtags	Users	Locations	Posts	Login not required	Private feeds	Batch mode	Command-line	Library/Module	Download media	Download metadata	Scraping method	Daily builds	Main language	Speed ____________________________	Build status ____________________________	Test coverage ____________________________	Code quality ____________________________
Instamancer	✔️	✔️	✔️	✔️	✔️	❌	✔️	✔️	✔️	✔️	✔️	Web API request interception	✔️	Typescript
Instaphyte	✔️	❌	✔️	❌	✔️	❌	❌	✔️	✔️	✔️	✔️	Web API simulation	✔️	Python
Instaloader	✔️	✔️	✔️	✔️	✔️	✔️	❌	✔️	✔️	✔️	✔️	Web API simulation	❌	Python			❓	❓
Instalooter	✔️	✔️	✔️	✔️	❌	✔️	✔️	✔️	✔️	✔️	✔️	Web API simulation	❌	Python
Instagram crawler	✔️	✔️	❌	✔️	✔️	❌	❌	✔️	✔️	❌	✔️	Web DOM reading	❌	Python	❓		❓	❓
Instagram Scraper	✔️	✔️	✔️	❌	❌	✔️	❌	✔️	✔️	✔️	✔️	Web API simulation	❌	Python			❓	❓
Instagram Private API	✔️	✔️	✔️	✔️	✔️	✔️	❌	❌	✔️	✔️	✔️	App and Web API simulation	❌	Python	❓		❓	❓
Instagram PHP Scraper	✔️	✔️	✔️	✔️	✔️	✔️	❌	❌	✔️	✔️	✔️	Web API simulation	❌	PHP	❓	❓	❓	❓

mrsimonbennett/instamancer