/logs_api_integration

Script for integration with Logs API

Primary LanguagePython

Russian version

Integration with Logs API

This script can help you to integrate Yandex.Metrica Logs API with ClickHouse.

If you have any questions, feel free to write comments, create issues on GitHub or write me (e-mail: miptgirl@yandex-team.ru).

Requirements

Script uses Python 2.7 and also requires requests library. You can install this library using package manager pip

pip install requests

Also, you need a running ClickHouse instance to load data into it. Instruction how to install ClickHouse can be found on official site.

Setting up

First of all, you need to fill in config

{
	"token" : "<your_token>", // token to access Yandex.Metrica API
	"counter_id": "<your_counter_id>",
	"visits_fields": [ // list of params for visits
	    "ym:s:counterID",
	    "ym:s:dateTime",
	    "ym:s:date",
	    "ym:s:firstPartyCookie"
	],
	"hits_fields": [ // list of params for hits
	    "ym:pv:counterID",
	    "ym:pv:dateTime",
	    "ym:pv:date",
	    "ym:pv:firstPartyCookie"
	],
	"log_level": "INFO", 
	"retries": 1, 
	"retries_delay": 60, // delay between retries
	"clickhouse": {
		"host": "http://localhost:8123", 
		"user": "", 
		"password": "",
		"visits_table": "visits_all", // table name for visits
		"hits_table": "hits_all", // table name for hits
		"database": "default" // database name
	}
}

On first execution script creates all tables in database according to config. So if you change parameters, you need to drop all tables and load data again or add new columns manually using ALTER TABLE.

Running a program

When running the program you need to specify a souce (hits or visits) using option -source.

Script has several modes:

  • history - loads all the data from day one to the day before yesterday
  • regular - loads data only for day before yesterday (recommended for regular downloads)
  • regular_early - loads yesterday data (yesterday data may be not complete: some visits can lack page views)

Example:

python metrica_logs_api.py -mode history -source visits

Also you can load data for particular time period:

python metrica_logs_api.py -source hits -start_date 2016-10-10 -end_date 2016-10-18