Dataset Availability
Closed this issue · 4 comments
Dear Yuanshao,
I have trouble reproducing your amazing results on my dataset, so I would like to test it with the dataset that you used in the paper. However, the dataset doesn't seem to be available anymore under the links (https://outreach.didichuxing.com/) provided in the paper. Do you know if there is any way to access these datasets still?
I might have missed the link, but because I don't speak Chinese and needed to translate the website and could not find any link to download either the Chengdu or the Xi'an dataset.
Your help is greatly appreciated!
Best regards,
Erik
For some reason, this dataset is restricted for use, and it may be hard to get a license. I can confirm that DiffTraj can work well on datasets like Porto and Geolife. Of course, the final performance depends on the quality of the data used for training.
Thank you!
I can confirm that it is possible to train the model on both Geolife and Foursquare NYC. It is possible to use the pre-processed datasets used in my repo (https://github.com/erik-buchholz/SoK-TrajGen). Then, one only has to replace the collate_fn
in the DataLoader, which used to be padding in my project to the interpolation used by DiffTraj.
Thank you for confirming my work. You have provided a very valuable and detailed repository. :)
"For some reason, this dataset is restricted for use, and it might be difficult to obtain a license. I can assure you that DiffTraj functions effectively on datasets such as Porto and Geolife. Naturally, the ultimate performance is contingent upon the quality of the data utilized for training."
However, Geolife does not have a marker for the start/end of a trip, so how can you calculate the features required for the head?