Fast large-scale trajectory clustering

Technical Report

https://t4research.github.io/k-paths-tr.pdf

Introduction

This repo holds the source code and scripts for reproduce the key experiments of k-paths trajectory clustering.

Usage

If you run in Eclipse, just go to "au.edu.rmit.trajectory.expriments.kpathEfficiency", and click the "run configuration", creat a new java application, and fill the following parameters:

.\data_porto\reassign\porto_mm_edge.dat 10 1000000 .\data_porto\reassign\new_edge_street.txt .\data_porto\reassign\new_graph.txt Porto

There are six parameters:

arg[0] is the trajectory data file
arg[1] is the number of clusters (k)
arg[2] is the number of trajectories in the datafile which will be clustered (|D|)
arg[3] is the edge info file which contains the street name
arg[4] is the road network graph file
arg[5] is the city name.

Then, all the result will be recorded into the log file under the "logs" folder.

If you want to run from commands (recommended):

mvn clean package

A file "torch-clus-0.0.1-SNAPSHOT.jar" will be generated under folder "target".

#run the tdrive clustering for efficiency comparision.

 java -Xmx16192M -cp ./torch-clus-0.0.1-SNAPSHOT.jar au.edu.rmit.trajectory.expriments.kpathEfficiency ./data_tdrive/beijing_mm_edge.txt.reassign 10 250997 ./data_tdrive/new_id_edge_raw_beijing.txt ./data_tdrive/beijing_graph_new.txt tdrive

#run the porto clustering for efficiency comparision.

 java -Xmx16192M -cp ./torch-clus-0.0.1-SNAPSHOT.jar au.edu.rmit.trajectory.expriments.kpathEfficiency ./data_porto/porto_mm_edge.dat 10 1565595 ./data_porto/new_edge_street.txt ./data_porto/new_graph.txt porto

#run the porto clustering, and produce clustering results for visualization.

 #java -Xmx16192M -cp ./torch-clus-0.0.1-SNAPSHOT.jar au.edu.rmit.trajectory.clustering.Running ./data_porto/porto_mm_edge.dat 10 100000 ./data_porto/new_edge_street.txt ./data_porto/new_graph.txt porto

#run the tdrive clustering, and produce clustering results for visualization.

 #java -Xmx16192M -cp ./torch-clus-0.0.1-SNAPSHOT.jar au.edu.rmit.trajectory.clustering.Running ./data_tdrive/beijing_mm_edge.txt.reassign 10 10000 ./data_tdrive/new_id_edge_raw_beijing.txt ./data_tdrive/beijing_graph_new.txt tdrive

#compare with other distance measure in Tdrive dataset

 java -Xmx16192M -cp ./torch-clus-0.0.1-SNAPSHOT.jar au.edu.rmit.trajectory.expriments.EBD ./data_tdrive/beijing_mm_edge.txt.reassign 10 1000 ./data_tdrive/new_id_edge_raw_beijing.txt ./data_tdrive/beijing_graph_new.txt tdrive

#compare with other distance measure on Porto dataset

 java -Xmx16192M -cp ./torch-clus-0.0.1-SNAPSHOT.jar au.edu.rmit.trajectory.expriments.EBD ./data_porto/porto_mm_edge.dat 10 100000 ./data_porto/new_edge_street.txt ./data_porto/new_graph.txt porto

Datasets

We use the map-matched dataset, and trajectory data composed of integer ids. Since they have a size above the standard of Github, we store it in Google Drive, and you can find the dataset from: https://sites.google.com/site/shengwangcs/torch

Download the trajectory dataset from the above link, and put the dataset into "data_porto" or "data_tdrive". (The road network graph datasets are already there.)

Visualization

We use MapV (https://github.com/huiyan-fe/mapv) to visulized the cluster result using different color.

If you are familar with javascript, you can use WebStorm (https://www.jetbrains.com/webstorm/) to open the webpage and see how the data is demonstrated.

An online visualization using dynamic flow can also be found in http://203.101.224.103:8080/TTorchServer/.

Citation

If you use our code for research work, please cite our paper as below:

@article{wang2019fast,
  title={Fast large-scale trajectory clustering},
  author={Wang, Sheng and Bao, Zhifeng and Culpepper, J Shane and Sellis, Timos and Qin, Xiaolin},
  journal={Proceedings of the VLDB Endowment},
  volume={13},
  number={1},
  pages={29--42},
  year={2019},
  publisher={VLDB Endowment}
}

If you use our mapped trajectory dataset for research work, please cite our paper as below:

@inproceedings{wang2018torch,
  author          = {{Wang}, Sheng and {Bao}, Zhifeng and {Culpepper}, J. Shane and {Xie}, Zizhe and {Liu}, Qizhi and {Qin}, Xiaolin},
  title           = "{Torch: {A} Search Engine for Trajectory Data}",
  booktitle       = {Proceedings of the 41th International ACM SIGIR Conference on Research & Development in Information Retrieval},
  organization    = {ACM},
  pages     = {535--544},
  year            = 2018,
}

wsiy/k-paths-clustering