Billing-Data-Loader
Serverless Function for loading Yandex.Cloud billing data to ClickHouse
Prerequisites
Ask Yandex.Cloud support to export your billing data to S3 bucket of your choice
https://cloud.yandex.ru/docs/billing-folder-report/get-folder-report
Also you need to have Service Account and access keys to access Object Storage (previously mentioned bucket). It could be created using web console or with yc (don't forget to write down your access key and secret key):
$ yc iam service-account create --name function-sa
$ yc iam access-key create --service-account-name function-sa
$ yc resource-manager folder add-access-binding <Folder-Name> \
--subject serviceAccount:<ServiceAccount-ID> --role serverless.functions.invoker
Add read ACL for your service account
$ aws --endpoint-url=https://storage.yandexcloud.net s3api put-bucket-acl --bucket BUCKETNAME --grant-read id=SERVICEACC_ID
Same operation can be done manually via WEB UI
Create ClickHouse database
Before we start deploying, we need to create ClickHouse database and obtain it's connection parameters. Easiest way is to use web-console.
$ yc managed-clickhouse cluster create --name billingserver --environment=production \
--network-name VPC_NAME --clickhouse-resource-preset s2.small \
--host type=clickhouse,zone-id=ru-central1-c,assign-public-ip=true,subnet-id=SUBNET_C_ID \
--host type=clickhouse,zone-id=ru-central1-a,assign-public-ip=true,subnet-id=SUBNET_A_ID \
--host type=zookeeper,zone-id=ru-central1-c,subnet-id=SUBNET_C_ID \
--host type=zookeeper,zone-id=ru-central1-b,subnet-id=SUBNET_B_ID \
--host type=zookeeper,zone-id=ru-central1-a,subnet-id=SUBNET_A_ID \
--clickhouse-disk-size 100 --clickhouse-disk-type network-ssd \
--user name=user1,password=pass@word12 --database name=db1 --serverless-access --datalens-access
$ yc clickhouse user grant-permission user1 --cluster-name billingserver --database db1
Don't forget:
- use strong passwords
Deploy function
To deploy your function:
$ yc serverless function create --name billfunc
$ sh deploy.sh
Used environment variables (inside deploy.sh):
CH_HOST
– IP Address;CH_DB
– Database Name;CH_USER
– Database Username;CH_PASSWORD
– Database PasswordCH_TABLE
– Table Name (function will create it if table not exists);STORAGE_BUCKET
– Storage Bucket;STORAGE_FOLDER
– Folder name inside bucket (usually it is yc-billing-export) ;AWS_ACCESS_KEY_ID
– AWS Access KeyAWS_SECRET_ACCESS_KEY
– Aws Access Secret Key
Initial load and full reload
$ yc serverless function invoke billfunc -d '{"queryStringParameters": {"method": "reload"}}'
Behaviour
During first run with method = "reload" parameter function creates table CH_TABLE
in ClickHouse and loads all data from CSV files.
Create S3 Trigger
After deploying your function you should create trigger for object storage where you store your billing data: https://cloud.yandex.com/docs/functions/quickstart/os-trigger-quickstart
$ yc serverless trigger create object-storage billfuncs3trigger \
--description 'Billing-Data-Loader s3 trigger' \
--invoke-function-name billfunc \
--invoke-function-service-account-name function-sa \
--bucket-id string \
--prefix yc-billing-export \
--suffix csv \
--events 'create-object','update-object'
Several Billing Accounts
If you have several clouds inside several billing accounts - you can combine billing data between them inside same ClickHouse cluster. For each billing account use separate function instance and separate target table. You can combine them with such view:
create view db1.bill_all on cluster '{cluster}'
as
SELECT billing_account_id, billing_account_name, cloud_id, cloud_name, folder_id, folder_name, service_id, service_name, sku_id, sku_name, `date`, currency, pricing_quantity, pricing_unit, cost, credit, monetary_grant_credit, volume_incentive_credit, cud_credit, misc_credit, locale, updated_at, exported_at
from db1.billing ba
union all
SELECT billing_account_id, billing_account_name, cloud_id, cloud_name, folder_id, folder_name, service_id, service_name, sku_id, sku_name, `date`, currency, pricing_quantity, pricing_unit, cost, credit, monetary_grant_credit, volume_incentive_credit, cud_credit, misc_credit, locale, updated_at, exported_at
from db1.billing1 ba1