/Query-TS

Primary LanguagePython

RL4QDTS

This is the implementation of our paper "Collectively Simplifying Trajectories in a Database: A Query Accuracy Driven Approach" (ICDE 2024).

Requirements

  • Linux Ubuntu OS (16.04 is tested)
  • Python >= 3.5 (Anaconda3 is recommended and 3.6 is tested)
  • Tensorflow & Keras (1.8.0 and 2.2.0 are tested)

Please refer to the source code to install the required packages that have not been installed in your environment, e.g., some Julia packages are used to call t2vec for KNN query.

Dataset & Preprocessing

Download & unzip the dataset Geolife and put its folder into ./TrajData. Note that the input data generated by preprocess.py will also be stored in this folder.

python preprocess.py

Running Procedures

Hyperparameters

There are several hyperparameters in the codes, which can be turned for a better performance when training, e.g.,

level_start, level_end = 9, 12
K = 2 #the state space
query_count = 50 #the interval of obtaining reward everytime
batch_size = 32, and many other hyperparameters in the neural networks

Training

Run rl_main_by_data.py or rl_main_by_gau.py, the generated models will be stored in the folder ./save automatically, and you can pick one model with the best performance on the validation data (e.g., according to the performance of queries on data distribution) as your model from them. Here, we have provided the trained models in the folder.

python rl_main_by_data.py
python rl_main_by_gau.py

We provide an interface load(checkpoint) for you to load an intermediate model to continue the training from the checkpoint. After your model is trained (called trained_model), you can use a fast interface called fast_online_act(state). It is implemented with numpy matrix computation instead of the Keras tool for the NN forward process more efficiently.

Trajectory Databsae Simplification

You can directly run the simplification.py once you obtain the trained model.

python simplification.py

Note that you can run the code multiple times (note to set different random seeds), and thus obtaining multiple simplified database if you want to test the errorbar.

Query Processing

Based on the simplified database, you can perform different query processing operators. For KNN query (t2vec), the trained t2vec model (called best_model.pt) on Geolife is provided here.

python range_query.py
python knn_query_edr.py
python knn_query_t2vec.py 
python join_query.py
python clustering.py