Instamancer
Scrape Instagram's API with Puppeteer.
Install | Usage | Comparison | Website | FAQ
Instamancer is a new type of scraping tool that leverages Puppeteer's ability to intercept requests made by a webpage to an API.
Read more about how Instamancer works here.
Features
- Scrape hashtags, locations and users
- Output JSON, CSV
- Download images, albums, and videos
- Batch scraping
- API response validation
Data
Metadata that Instamancer is able to gather from posts:
- Text
- Timestamps
- Tagged users
- Accessibility captions
- Like counts
- Comment counts
- Images (Thumbnails, Dimensions, URLs)
- Videos (URL, View count, Duration)
- Comments (Timestamp, Text, Like count, User)
- User (Username, Full name, Profile picture, Profile privacy)
- Location (Name, Street, Zip code, City, Region, Country)
Install
Linux
Enable user namespace cloning:
sysctl -w kernel.unprivileged_userns_clone=1
Or run without a sandbox:
# WARNING: unsafe
export NO_SANDBOX=true
Without downloading chromium
If you wish to install Instamancer without downloading chromium, enable the PUPPETEER_SKIP_CHROMIUM_DOWNLOAD
environment variable before installation
export PUPPETEER_SKIP_CHROMIUM_DOWNLOAD=true
From this repository
Requires TypeScript
git clone https://github.com/ScriptSmith/instamancer.git
cd instamancer
npm install
npm install -g
From NPM
npm install -g instamancer
If you're using root to install globally, use the following command to install the Puppeteer dependency
sudo npm install -g instamancer --unsafe-perm=true
From NPX
npx instamancer
Usage
Command Line
$ instamancer
Usage: instamancer <command> [options]
Commands:
instamancer hashtag [id] Scrape a hashtag
instamancer location [id] Scrape a location
instamancer user [id] Scrape a user
instamancer post [ids] Scrape a comma-separated list of posts
instamancer batch [batchfile] Read newline-separated arguments from a file
Options:
--help Show help [boolean]
--version Show version number [boolean]
--count, -c Number of posts to download. 0 to download all
[default: 0]
--visible Show browser on the screen [default: false]
--download, -d Save images and videos from posts
[boolean] [default: false]
--graft, -g Enable grafting [boolean] [default: true]
--full Get the full list of posts and their details from the
API and web page [boolean] [default: false]
--video Download videos. Only works in full mode
[boolean] [default: false]
--silent Disable progress output [boolean] [default: false]
--strict Throw an error if types from Instagram API have been
changed [boolean] [default: false]
--sync Synchronously download files between API requests
[boolean] [default: false]
--threads, -k The number of parallel download / upload threads
[number] [default: 4]
--waitDownload, -w When true, media will only download once scraping is
finished [boolean] [default: false]
--filename, --file, -f Name of the output file [default: "[id]"]
--filetype, --type, -t Type of output file
[choices: "csv", "json", "both"] [default: "json"]
--downdir Directory / Container to save media
[default: "downloads/[endpoint]/[id]"]
--mediaPath, --mp Store the paths of downloaded media in the
'_mediaPath' key [boolean] [default: false]
--logging Level of logger
[choices: "error", "none", "info", "debug"] [default: "none"]
--logfile Name of the log file [default: "instamancer.log"]
--browser Location of the browser. Defaults to the copy
downloaded at installation
--swift Upload media to openstack's swift object storage
rather than saving to disk [boolean] [default: false]
Examples:
instamancer hashtag instagood -d Download all the available posts,
and their thumbnails from #instagood
instamancer location 644269022 --count Download 200 posts tagged as being
200 at the Arc Du Triomphe
instamancer user arianagrande Download Ariana Grande's posts to a
--filetype=csv --logging=info --visible CSV file with a non-headless
browser, and log all events
Source code available at https://github.com/ScriptSmith/instamancer
Module
ES2018 Typescript example:
import * as Instamancer from "instamancer";
const options: Instamancer.IOptions = {
total: 10
};
const hashtag = Instamancer.hashtag("beach", options);
(async () => {
for await (const post of hashtag) {
console.log(post);
}
})();
Generator functions
Instamancer.hashtag(id, options);
Instamancer.location(id, options);
Instamancer.user(id, options);
Instamancer.post(ids, options);
Options
const options: Instamancer.IOptions = {
// Total posts to download. 0 for unlimited
total: number,
// Run Chrome in headless mode
headless: boolean,
// Logging events
logger: winston.Logger,
// Run without output to stdout
silent: boolean,
// Time to sleep between interactions with the page
sleepTime: number,
// Throw an error if type validation has been failed
strict?: boolean,
// Time to sleep when rate-limited
hibernationTime: number,
// Enable the grafting process
enableGrafting: boolean,
// Extract the full amount of information from the API
fullAPI: boolean,
// Use a proxy in Chrome to connect to Instagram
proxyURL: string,
// Location of the chromium / chrome binary executable
executablePath: string,
// Custom io-ts validator
validator?: Type<unknown>;
}
Comparison
A comparison of Instagram scraping tools. Please suggest more tools and criteria through a pull request.
To see a speed comparison, visit this page
Tool | Hashtags | Users | Locations | Posts | Login not required | Private feeds | Batch mode | Command-line | Library/Module | Download media | Download metadata | Scraping method | Daily builds | Main language | Speed ____________________________ | License ____________________________ | Last commit ____________________________ | Open Issues ____________________________ | Closed Issues ____________________________ | Build status ____________________________ | Test coverage ____________________________ | Code quality ____________________________ |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Instamancer | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | Web API request interception | ✔️ | Typescript | ||||||||
Instaphyte | ✔️ | ❌ | ✔️ | ❌ | ✔️ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | Web API simulation | ✔️ | Python | ||||||||
Instaloader | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | Web API simulation | ❌ | Python | ❓ | ❓ | ||||||
Instalooter | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | Web API simulation | ❌ | Python | ||||||||
Instagram crawler | ✔️ | ✔️ | ❌ | ✔️ | ✔️ | ❌ | ❌ | ✔️ | ✔️ | ❌ | ✔️ | Web DOM reading | ❌ | Python | ❓ | ❓ | ❓ | |||||
Instagram Scraper | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ | ❌ | ✔️ | ✔️ | ✔️ | ✔️ | Web API simulation | ❌ | Python | ❓ | ❓ | ||||||
Instagram Private API | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | App and Web API simulation | ❌ | Python | ❓ | ❓ | ❓ | |||||
Instagram PHP Scraper | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ✔️ | ❌ | ❌ | ✔️ | ✔️ | ✔️ | Web API simulation | ❌ | PHP | ❓ | ❓ | ❓ | ❓ |