cf-scraper

A simple app which scrapes information about Cloud Foundry orgs

How to deploy

Prepare the service instances

cf-scraper gets the credentials for accessing the Cloud Foundry API via a service of type secret-store. Furthermore, it gets the list input orgs from an S3 service instance and also writes the scrape output to that S3 service instance.

Create a user in your Cloud Foundry instance with the role cloud_controller.global_auditor.

uaac user add $AUDITOR_USER_NAME --emails $AUDITOR_EMAIL;
uaac member add cloud_controller.global_auditor $AUDITOR_USERNAME;

Create a secrets store called cf-api-credentials.

cf cs secrets-store json cf-api-credentials -c '{"username": "'$AUDITOR_USER_NAME'", "password": "'$AUDITOR_PASSWORD'"}'

Create an S3 service instance named orgs-store.

cf cs dynstrg-2 usage orgs-store

Load the input orgs

The scraper uses a file called input/input-orgs.json in the orgs-store instance. The list can be changed at any time. The scraper will pick up the latest version when it starts the next run.

Prepare the file input-orgs.json to contain an array of org names.

[
  "org-1",
  "org-2",
  "org-3",
  ...,
  "org-n"
]

Upload the file to orgs-store/input, for example using mc.

Adapt the schedule

The scraper runs as a scheduled task. The schedule is defined as a cron expression in the environment variable SYNC_SCHEDULE.

Open manifest.yml and set SYNC_SCHEDULE to the desired cron expression (e.g. */15 * * * * for "at every 15th minute").

Push the app

Everything else is self configuring. Just push the app.

cf push

Collect the scrape result

The scraper uploads the result of a scrape run to orgs-store/output/scrape-result.json. Before starting the upload, a backup copy of the previous result is made called scrape-result-backup.json.

Download orgs-store/output/scrape-result.json, for example using mc.