This repo is about paper "Efficient Trajectory Similarity Computation with Contrastive Learning".
- Ubuntu OS
- python >= 3.6 (Anaconda3 is recommended)
- Pytorch >= 1.5.0 (Tested on 1.5.0)
- transformers >= 2.9.0 (Tested on 2.9.0)
- dataclasses >= 0.7 (Tested on 0.7)
We follow t2vec (https://github.com/boathit/t2vec) to do the pre-processing.
Specifically, you should download the codes of t2vec firstly. After that, you should change the working directory to t2vec. Then for the Porto dataset, you can do as follows.
$ curl http://archive.ics.uci.edu/ml/machine-learning-databases/00339/train.csv.zip -o data/porto.csv.zip
$ unzip data/porto.csv.zip
$ mv train.csv data/porto.csv
$ cd preprocessing
$ julia porto2h5.jl
$ julia preprocess.jl
After pre-processing, you can get the training data (i.e., train.trg and train.src) in the data directory of t2vec.
To pre-train the representations of grids, you should run "grids_pretraining.py" in this repo.
P.S. you should change the value of variables (i.e., dataset_dir and vec_dir) in "grids_pretraining.py", in which dataset_dir is the directory of training data and vec_dir is to saving trained representations.
To train CL-TSim, you should change the value of variable (i.e., datadir and cell_embedding) in "config.py", where datadir is the directory of training data and cell_embedding is the path of pre-trained representations. Then you can train CL-TSim as follows.
python main.py
After training is done (around 1 hour by using GeForce GTX 1080 Ti), you can see the trained model in "log" directory.
To reproduce the results stated in our paper, you can apply the following steps.
Similar to Pre-processing part, we follow t2vec to generate data for self-similairty and cross-similairty.
Specifically, you should step into experiment directory (of this repo) firstly, and then run the julia scripts as follows.
julia exp1_dataproducer.jl
julia exp1_baseline_dataproducer.jl
julia exp2_dataproducer.jl
julia exp2_baseline_dataproducer.jl
julia exp3_dataproducer.jl
julia exp3_baseline_dataproducer.jl
where "exp1" refers to The effect of database size, and "exp2" and "exp3" refer to Robustness (other experiments are also based on these data).
To evaluate the performance of the trained model, you can generate the representations of trajectories in test set that are generated by above part. Then you run the corresponding evaluation script.
For example, if you want to reproduce the results of "The effect of database size", you should do as follows.
python exp1_output_traj_vectors.py # In which, you should set the directories of the trained model and the test set, and the directory for saving representations of trajectories
python exp1_evaluate.py # In which, you should set the directory that contains the representations of trajectories
where the first step is used to generate representations of trajectories in test set, and the second step is used the generated representations to do self-similarity experiment.