/dataset-scripts

Scripts for NYC Space/Time Directory datasets

Primary LanguageShellMIT LicenseMIT

Scripts for NYC Space/Time Directory datasets

Publish NYC Space/Time Directory data to S3

dataset-to-s3.sh uploads a single NYC Space/Time Directory dataset to S3. The script also creates a GeoJSON file from the NDJSON objects file, and zips the dataset.

all-etl-data-to-s3.sh copies all output (final datasets and intermediate steps) of NYC Space/Time Directory's ETL tool to S3.

Prerequisites

  1. First, install spacetime-config and set the etl.outputDir configuration option. See spacetime-etl for more information.
  2. Install spacetime-cli
  3. Install jq
  4. Install aws-cli
  5. Add AWS credentials to ~/.aws/credentials, using the spacetime profile:
[spacetime]
echo aws_access_key_id = AWS_ACCESS_KEY_ID
echo aws_secret_access_key = AWS_SECRET_ACCESS_KEY

Usage

To publish a single dataset to S3, run:

./dataset-to-s3 DATASET.STEP

For example:

./dataset-to-s3 mapwarper.transform

To sync all output of NYC Space/Time Directory's ETL tool to S3, run:

./all-etl-data-to-s3

See also