Stats
Small personal library for sending quick queries to GBQ and getting back aggregates for pandas plots
how to install:
pip:
pip3 install git+https://github.com/PatrickChodowski/stats.git
poetry:
poetry add "git+https://github.com/PatrickChodowski/stats.git#main"
Usage:
Setup
from stats import GBQData
g = GBQData(gbq_path='project.dataset.table',
sa_path='credentials/sa.json')
Aggregated bar chart
g.set(dimensions=['team_abbreviation', 'player_name'],
metrics=['pts'],
aggregations=['sum'],
sort=('sum_pts', 'desc'),
filters=[('team_abbreviation', 'eq', 'GUA')],
limit=30)
g.get()
g.plot("barh")
Histogram
g.set(dimensions=None,
metrics=['pts'],
aggregations=['none'],
filters=[('team_abbreviation', 'eq', 'GUA')])
g.get()
g.plot('hist')
Modules:
- gbq_data.py - Main interface to work with GBQ connection and sending queries
- query_builder.py - Which takes the input, validates it and builds query
- plots.py - Which takes the data from .set() method of validated query and produces optional plot
The goal of this setup is to split query building and validation from plotting, and have already correct data before actually visualizing the report, instead of learning it already when plot is created.
Aggregation options:
- avg
- sum
- min
- max
- count
- count_distinct
- count_nulls
- any
- string_agg
- array_agg
- string_agg_distinct
- array_agg_distinct
- median
- q1
- q3
- percentiles
- stdev
- var
- none (no aggregation)
Plot options:
- bar chart
- horizontal bar chart
- box plot
- histogram
- scatter plot