Comparison of compute price per hour

Making apples to apples comparisons between different cloud providers is very difficult, because each one offers instances with varying vCPUs, RAM, SSD space and HDD space.

As an attempt to provide a clearer price comparison between compute service prices, I've used the multiple linear regression to "normalise" the pricing of on-demand, general purpose compute instances across different cloud providers.

In essence, If every cloud provider offered the same size compute instances, how expensive would they be?

The dataset

I'll be taking the price tables of:

Google Cloud - Predefined machine types
AWS - On demand instances
Azure - Linux virtual machines

and converting them into the instance sizes offered by Catalyst Cloud. You can find the datasets and their sources here. We won't be taking into account term or volume discounts.

The Python

I've used the scikit-learn library's multiple linear regression to achieve the desired normalisation, and Pandas for managing the data.

You can see my working on an IPython notebook here.

douglasbagnall/catalystcloud-price-comparison

Comparison of compute price per hour

The dataset

The Python