/cloud-provider-dataset

CSV Datasets for Cloud Providers of Azure and Google Cloud. Contains regions, datacenter sustainability, machines, used CPUs, and embodied emissions.

Primary LanguageHTMLGNU Affero General Public License v3.0AGPL-3.0

Cloud Provider Dataset

Dataset for the Cloud Providers Microsoft Azure and Google Cloud Platform (GCP).

Table of Content

Installation

Tested on OpenJDK 17.0.2 & Maven 3.6.3.

mvn clean install -DskipTests

Dataset Update

Extracted from https://learn.microsoft.com/en-us/azure/virtual-machines/linux/compute-benchmark-scores with the responsible Web Scraper.

cd azure-web-scraper
java -jar ./target/exectuable.jar

Copy the <table> element of the table on the website and use it in the resources with minimal reformatting.

Website Element Screenshot

azure-regions-export

Then run the responsible Web Scraper.

cd azure-web-scraper
java -jar ./target/exectuable.jar

Copy the <div> element of the table on the website and use it in the resources with minimal reformatting.

Website Element Screenshot

azure-factsheet-export

Then run the responsible Web Scraper.

cd azure-web-scraper
java -jar ./target/exectuable.jar

Copied from the https://docs.google.com/spreadsheets of https://www.cloudcarbonfootprint.org/docs/embodied-emissions/.

Extracted from https://cloud.google.com/compute/docs/cpu-platforms with the responsible Web Scraper.

cd gcp-web-scraper
java -jar ./target/exectuable.jar

Queried from BigQuery and saved as CSV:

SELECT year,cloud_region,location,zone_id,cfe_region,google_cfe FROM `bigquery-public-data.google_cfe.datacenter_cfe` WHERE year=2021 LIMIT 1000

Created by hand and matched with the cloud regions. A lot of table entries could not find a match with the regions:

  • Lenoir, North Carolina
  • Montgomery County, Tennessee
  • Jackson County, Alabama
  • Papillion, Nebraska
  • Mayes County, Oklahoma

Copy the <table> element of the table on the website and use it in the resources with minimal reformatting.

Website Element Screenshot

gcp-machines-export

Then run the responsible Web Scraper.

cd gcp-web-scraper
java -jar ./target/exectuable.jar

Extracted from https://cloud.google.com/compute/docs/regions-zones with the responsible Web Scraper.

cd gcp-web-scraper
java -jar ./target/exectuable.jar