Python module for mobility data anonymization of the MobiDataLab European project.
Developed by CRISES research group from URV.
To install the package, run the commands below in a terminal located at the root of this repository. This will create a Conda environment, install the dependencies and setup the module. Setup is only required when using the module as a library in Python code, not for CLI usage (see Usage section).
# Create and activate environment
conda create -n mdl_env pip python=3.9 rtree
conda activate mdl_env
# Install dependencies
conda install -c conda-forge scikit-mobility -y # If this fails, use "pip install scikit-mobility"
conda install -c conda-forge pyarrow -y
conda install -c conda-forge py-xgboost
conda install -c conda-forge haversine -y
conda install tqdm typer more-itertools -y
# [Optional] API
conda install -c conda-forge fastapi
conda install -c conda-forge uvicorn
conda install -c conda-forge python-multipart
# [Optional] Build and setup the package for Python import, not required for CLI usage
conda install conda-build
conda develop mdl_anonymizer
Tested to work with the following software versions:
- Python: 3.9 and 3.10
- Conda: 4.11.0 and 4.12.0
- Operating System: Ubuntu 20.04 and Windows 10
This module can be used as an independent command line interface (CLI) tool or as a Python library. Following subsections illustrate their usage.
The developed package provides a command line interface (CLI) that allows users to anonymize a mobility dataset, to perform an analysis in a private-way and to compute some utility and privacy measures over both the original and the anonymized datasets in a straightforward way.
python -m mdl_anonymizer
You can find a detailed documentation here.
The anonymization module is also ready to be deployed in a server to provide all its functionality remotely. To start the server application, use the following command:
uvicorn mdl_anonymizer.server.main_api:app --reload --host 0.0.0.0 --port 8000
See a detailed documentation here
Once the module is installed, its usage only requires an import:
import mdl_anonymizer
You can find examples of how to use the library in the examples folder.
The anonymization module has been designed with a focus on modularity, where pseudonymization or anonymization methods can be built using different components dedicated to preprocessing, clustering, distance computation, aggregation, etc. We have focused on making it easy to add new methods and components, in order to encourage contributions from other researchers.
To do so, developers should simply follow the next steps:
- Create a Python class that implements (inherits from), respectively:
- AnonymizationMethodInterface, for new anonymization methods
- TrajectoryAggregationInterface, for new aggregation methods
- AnalysisMethodInterface, for new analysis methods
- ClusteringInterface, for new clustering methods
- MeasuresMethodInterface, for new measure methods
- DistanceInterface, for new trajectory distance methods
- The constructor of the new class must receive as arguments first the original dataset and then the necessary parameters for the new method.
- Implement the inherited class method
run()
by including the code that executes the logic of the new method (e.g., in the case of a new anonymization method, the routine that anonymizes the original dataset) - Include the reference description to the new class method in the main configuration file (config.json) archive located at the root of the project library. The reference must be included inside the method type (anonymization, clustering, aggregation, trajectory_distances, analysis, or measures) and it must contain the name of the method and the path name of the new class.
- Add a Unit Test to the test folder. We recommend to use a mock dataset included in the data folder.
- Don't forget to add a description of your method in the docs section.
Once the new method has been implemented and referenced in the config.json file as described above, the new method can be used in the same way as those already developed.