In a perfect world, all time series data is time-zone-aware and stored in UTC. Sadly, we do not live in a perfect world. Time series data often lacks a time zone identifier, or worse, does not actually adhere to the time zone it claims to be in.
tz-canary
inspects the Daylight Savings Time (DST) switches in a time series to infer a set of plausible time zones the data could be in.
It allows you to infer the full set of plausible time zones for the data, or to validate whether a given time zone is plausible for the data.
tz-canary
is available on PyPI, so you can install it just like any other Python package:
pip install tz-canary
The simplest way to use tz-canary
is to validate a given time zone for a time series:
import pandas as pd
from tz_canary import validate_time_zone
df = pd.read_csv("docs/data/example_data.csv", index_col="datetime", parse_dates=True)
validate_time_zone(df.index, "Europe/Amsterdam") # will pass
validate_time_zone(df.index, "America/New_York") # will raise ImplausibleTimeZoneError
validate_time_zone(df.index, "UTC") # will raise ImplausibleTimeZoneError
You can also get a list of all plausible time zones for a time series:
from pprint import pprint
import pandas as pd
from tz_canary import infer_time_zone
df = pd.read_csv("docs/data/example_data.csv", index_col="datetime", parse_dates=True)
plausible_time_zones = infer_time_zone(df.index)
pprint(plausible_time_zones)
# Output:
# {zoneinfo.ZoneInfo(key='Africa/Ceuta'),
# zoneinfo.ZoneInfo(key='Arctic/Longyearbyen'),
# zoneinfo.ZoneInfo(key='Europe/Amsterdam'),
# ...
# zoneinfo.ZoneInfo(key='Europe/Zurich')}
When processing many time series, it can be useful to cache the transitions data used by tz-canary
to infer time zones.
You can do this by creating a TransitionsData
object and passing it to infer_time_zone
(and this also works for validate_time_zone
):
import pandas as pd
from tz_canary import TransitionsData, infer_time_zone
# We create a TransitionsData object to avoid having to recompute the transitions for
# every call to validate_time_zone.
transitions_data = TransitionsData(2010, 2023)
for i in range(10):
df = pd.read_csv(
"docs/data/example_data.csv", # In reality, these would be different files
index_col="datetime",
parse_dates=True,
)
plausible_time_zones = infer_time_zone(df.index, transitions_data=transitions_data)
print(i, plausible_time_zones)
- Make sure you have git, git LFS, and Poetry installed.
- Clone this repository:
git clone https://github.com/leonoverweel/tz-canary cd tz-canary
- Install the development requirements:
poetry install --with dev
- Install the pre-commit hooks (used for linting):
pre-commit install
- Run the tests:
poetry run pytest
- Bump the version number in
pyproject.toml
and commit the change. - Make a new release on GitHub.
- Build the package:
poetry build
- Publish the package to PyPI:
poetry publish
Please don't hesitate to open issues and PRs!
GitHub repository: https://github.com/leonoverweel/tz-canary.