/reverse-engineering-YJMob100K-grid

Revealing urban area from mobile positioning data

Primary LanguageJupyter NotebookBSD 3-Clause "New" or "Revised" LicenseBSD-3-Clause

Revealing urban area from mobile positioning data

Usage

The pyproject.toml document the required dependencies. It's suggested to use the Poetry packaging tool. In this case, just issue the poetry install command to set up a virtual environment with all the necessary dependencies.

After the development environment has set up, run the notebooks in the following order to reproduce the results.

  1. plot_heatmaps.ipynb
    • this will reproduce the heatmaps [Figure 6] from the data description paper,
    • do the inverse-transformed plots, and
    • some related plots for the paper
  2. generate_grid.ipnyb
    • this will locate the observation area within Japan and generate the grid

These two notebooks contain the main work. The detect_homes.ipynb, validate_home_detection.ipynb, calculate_grid_complexity.ipynb, and the src/plot_grid.ipynb are optional steps to reproduce the figure in the technical validation section of the paper.

Upscaling grid

  • Upscale heatmap as a template
    • merges the neighboring cells while summing the activity in the four cells resulting lower resolution template heatmaps
  • Locate upscaled observation area
    • plots the land area of the selected six prefectures proportionally to the upscaled heatmap (grid) and applies template matching

Other cities

  1. Helsinki
    • the Helsinki notebook processes the data for different grid sizes in one run
  2. London
    • rx and ry parameters are for the grid size, use either 500, 1000, 2000, or 4000
  3. Toronto
    • it is the same notebook as for London, because the dataset is the same
    • enable the Toronto parameter block
  4. Dallas--Fort Worth
    • RES parameter is for the H3 resolution, vales between 6 and 10 were applied

User identifiability

A user is considered k-identifiable if the most frequently visited k location are distinguishable 1. The top four location have been determined for every user, then the grid cell were upscaled to 1, 2, 4, 8, and 16 km.

The following table compares the top-four-location identifiable users by upscaled grids. The relevant notebook is here.

distinguishable cells 1 km x 1 km 2 km x 2 km 4 km x 4 km 8 km x 8 km 16 km x 16 km
4 35469 12882 5090 1810 470
3 48228 42323 28457 16752 7438
2 15582 38548 50987 52608 44939
1 721 6247 15466 28830 47153

Results

The results are included to be available without executing the code. Most notably, the reproduced grid (in EPSG:2449 projection).

Choropleth maps using the reproduced grid

The spatial distribution of the activity (first) and the number of unique users (second) per cell using the reproduced grid.

spatial distribution of activity

spatial distribution of unique users

Data sources

  1. Mobility data: YJMob100K
  2. OpenStreetMap data
    • Copyrighted by OpenStreetMap contributors. It is available under the Open Database License (ODbL).
    • Administrative data is from OpenStreetMap
      • downloaded from OSM-Boundaries
        • prefectures (admin level 4), then filtered manually
        • municipalities (admin level 7), then filtered manually
        • wards (admin level 8), then filtered to Nagoya
    • Coastline is downloaded from https://osmdata.openstreetmap.de/data/land-polygons.html
      • the islands of Japan was extracted using the prefecture boundaries
  3. Census data

License

  • The code is licensed under BSD-3-Clause
  • The documentation and figures are CC BY 4.0
  • The shape files are from OpenStreetMap and licensed under the Open Data Commons Open Database License (ODbL)
  • The census data was downloaded from the Portal Site of Official Statistics of Japan website (https://www.e-stat.go.jp/)

Footnotes

  1. Hui Zang and Jean Bolot. 2011. Anonymization of location data does not work: a large-scale measurement study. In Proceedings of the 17th annual international conference on Mobile computing and networking (MobiCom '11). Association for Computing Machinery, New York, NY, USA, 145–156. https://doi.org/10.1145/2030613.2030630