This repository contains the code used in our ICDE-18 paper Deep Representation Learning for Trajectory Similarity Computation.
- Ubuntu OS
- Julia 1.0+
- Python >= 3.5 (Anaconda3 is recommended)
- PyTorch 1.0+
Please refer to the source code to install all required packages in Julia and Python.
You can install all packages involved for Julia by running,
$ julia pkg-install.jl
The preprocessing step will generate all data required in the training stage.
-
For the Porto dataset, you can do as follows.
$ curl http://archive.ics.uci.edu/ml/machine-learning-databases/00339/train.csv.zip -o data/porto.csv.zip $ unzip data/porto.csv.zip $ mv train.csv data/porto.csv $ cd preprocessing $ julia porto2h5.jl $ julia preprocess.jl
-
If you want to work on another city, you are supposed to provide the expected hdf5 input
t2vec/data/cityname.h5
as well as set proper hyperparameters int2vec/hyper-parameters.json
. The expected hdf5 input requires the following format,attrs(f)["num"] = number of trajectories f["/trips/i"] = matrix (2xn) f["/timestamps/i"] = vector (n,)
where
attrs(f)["num"]
stores the number of trajectories in total;f["/trips/i"]
is the gps matrix for i-th trajectory, the first row is the longitude sequence and the second row is the latitude sequence,f["/timestamps/i"]
is the corresponding timestamp sequence. Please refer toporto2h5
to see how to generate it.
The generated files for training are saved in t2vec/data/
.
$ python t2vec.py -data data -vocab_size 18866 -criterion_name "KLDIV" -knearestvocabs "data/porto-vocab-dist-cell100.h5"
where 18866 is the output of last stage.
The training produces two model checkpoint.pt
and best_model.pt
, checkpoint.pt
contains the latest trained model and best_model.pt
saves the model which has the best performance on the validation data. You can find our saved best_model.pt
here.
In our original experiment, the model was trained with a Tesla K40 GPU about 14 hours so you can just terminate the training after 14 hours if you use a GPU that is as good as or better than K40, the above two models will be saved automatically.
cd experiment
julia createTest.jl
head -5 ../data/trj.t # the first 5 trajectories
head -5 ../data/trj.label # trajectory ids
It will produce two files data/trj.t
and data/trj.label
. Each row of trj.t
(trj.label
) is a token representation of the orginal trajectory (trajectory ID).
$ python t2vec.py -data experiment -vocab_size 18866 -checkpoint "best_model.pt" -mode 2
It will encode the trajectories in file experiment/trj.t
into vectors which will be saved into file experiment/trj.h5
.
In our experiment we train a three-layers model and the last layer outputs are used as the trajectory representations, see the code in experiment/experiment.jl
:
vecs = h5open(joinpath("", "trj.h5"), "r") do f
read(f["layer3"])
end
vecs[i] # the vector representation of i-th trajectory
@inproceedings{DBLP:conf/icde/LiZCJW18,
author = {Xiucheng Li and
Kaiqi Zhao and
Gao Cong and
Christian S. Jensen and
Wei Wei},
title = {Deep Representation Learning for Trajectory Similarity Computation},
booktitle = {34th {IEEE} International Conference on Data Engineering, {ICDE} 2018,
Paris, France, April 16-19, 2018},
pages = {617--628},
year = {2018},
crossref = {DBLP:conf/icde/2018},
url = {https://doi.org/10.1109/ICDE.2018.00062},
doi = {10.1109/ICDE.2018.00062},
timestamp = {Tue, 20 Nov 2018 10:20:00 +0100},
biburl = {https://dblp.org/rec/bib/conf/icde/LiZCJW18},
bibsource = {dblp computer science bibliography, https://dblp.org}
}