/KeepGrabbing

Scripts for managing scrapers

Primary LanguageRubyGNU General Public License v3.0GPL-3.0

This is a collection of scripts for managing scrapers.

Summary:
jsongen.rb: Generates JSONs using a schema you specify. Can be used for
anything, but it's good for making machine-readable lists of search terms.

linkedin.rb: Runs the LinkedIn scraper on a set of search terms in a json.

crypto/: Encrypt and decrypt all files in a directory with GPG.

config/: Scripts for setup and syncing a scraping machine.


Detailed Instructions:
jsongen.rb-
Currently this only supports single level JSONs.

To Run:
1. ruby json.rb 
2. Follow the directions to manually input the schema and items
3. Stop adding items by adding an item with all blank fields


linkedin.rb-
To run this, you need a JSON where every item has the following fields:
Search Term: The phrase you want to search for.
Degrees: The number of degrees you want to go out with "people also viewed"

To Run:
1. ruby linkedin.rb
2. When prompted, type in the name of the file with the search terms
3. When prompted, type in the name of the directory where you want to save
results
4. Wait. A new .json and .csv file will be generated for each search term


crypto/-
Encrypt files with encrypt.rb and decrypt with decrypt.rb.

Encrypt:
1. ruby encrypt.rb
2. When prompted, type the email address of recipient (keys must be imported
into GPG already). You can add as many recipients as you want.
3. Hit enter, leaving a recipient blank, when you want to stop adding
recipients.
4. When prompted, enter the path to the directory where you want to save
results.
5. Wait as the files are encrypted.

Decrypt:
1. ruby decrypt.rb
2. When prompted, enter the path to the directory where you want to decrypt
files.
3. Enter the password for your GPG key.
4. Wait as the files are decrypted.


config/-
Setup and syncing scripts for a scraping machine.

Setup:
./install.sh

Sync:
./sync.sh