Making apples to apples comparisons between different cloud providers is very difficult, because each one offers instances with varying vCPUs, RAM, SSD space and HDD space.
As an attempt to provide a clearer price comparison between compute service prices, I've used the multiple linear regression to "normalise" the pricing of on-demand, general purpose compute instances across different cloud providers.
In essence, If every cloud provider offered the same size compute instances, how expensive would they be?
I'll be taking the price tables of:
- Google Cloud - Predefined machine types
- AWS - On demand instances
- Azure - Linux virtual machines
and converting them into the instance sizes offered by Catalyst Cloud. You can find the datasets and their sources here. We won't be taking into account term or volume discounts.
I've used the scikit-learn library's multiple linear regression to achieve the desired normalisation, and Pandas for managing the data.
You can see my working on an IPython notebook here.