This repo contains analysis of residential moves across Japan using Telephone directory name and address data. The repo contains the data processing pipeline that takes the raw data, geocodes it and then assigns each address to a census chocho (町丁). The repo will also contain the analysis that has been used to determine whether an individual has moved or remained in the same property/chocho between each year.
- Geolonia Japanese Address Normalizer/Geocoder
- Environment Variables: - Raw data filepath (rawdata_path) - Clean data filepath (cleandata_path) - Geolonia Address API Configuration (geolonia_api) This should be set up in a local .env file, that may look something like this:
rawdata_path="<PATH>"
cleandata_path="<PATH>"
geolonia_api="<API>"
and can be set by running export $(cat .env | xargs)
To Execute this pipeline, run snakemake -j<No. cores>
- Clean input address data by removing duplicates and unnecessary columns clean_addresses.py
- Run Geolonia Japanese Address Normalizer/Geocoder
- Identify addresses where geocoding failed
- Re-run Geolonia Japanese Address Normalizer/Geocoder on failed addresses
- Combine final outputs and identify census chocho for each address