CreativeInquiry/terrapattern

how to collect the dataset

Closed this issue · 3 comments

Hi all, I have two queries: (1) How to label the downloaded image tiles in order to train CNN? As there are a large amount of image tiles, it is impossible to manually label them into different categories. (2) in the script download_area.rb, the lngOffset should be changed according to the latitude. So why it is kept fixed in the script? Hope for your reply, thanks! @golanlevin @workergnome @irealva

To answer (1), we used some of the other scripts in the latitude to identify categories of data using OSM.

To answer (2), a small amount of overlap does not effect the quality of our results, and we felt it was easier to keep it static than to worry too much about perfection. The features at a given lat/lng are there regardless if the image tile overlaps with another tile. We do adjust this in the UI, but only per-city, not per-lat/lng.

@workergnome thanks for reply. I want to collect a large dataset for my research paper using your scripts, but I'm a bit confused. I did not find the script to identify categories of data using OSM in the terrapattern repository. It is not included? Could you give some advice or some documents that I can refer to? Thanks!

If you look at the Download Ways or Download Nodes folders in the repo, you'll find those scripts. As far as identifying useful categories—we identified categories by hand. There's not an automated way to decide which categories are most useful to your project, unfortunately.