Question of data format explanation
LL0912 opened this issue · 1 comments
Hello, I am trying to use the dataset. I have downloaded the dataset from Zenodo. However, I found that there is no explanation of the data format, such as the meaning of the name of each file in the"features" and dictionary's keys in the "labels.geojson" . I can only guess the meaning by codes. How can I get the official explanation of the dataset including the filename and so on. Can you help me?
Hi there!
Apologies for the delayed reply. I'll add this to the main README but in the meantime:
labels.geojson
>>> import geopandas
>>> labels = geopandas.read_file("labels.geojson")
>>> labels.columns
Index(['harvest_date', 'planting_date', 'label', 'classification_label',
'index', 'is_crop', 'lat', 'lon', 'dataset', 'collection_date',
'export_end_date', 'is_test', 'geometry'],
dtype='object')
There are two types of columns; RequiredColumns
which must be filled for all rows, and NullableColumns
, which can have null values (see here).
Required Columns
index
- the index of the rowis_crop
- a boolean indicating whether or not the point being described contains cropland or not (at the date described byexport_end_date
lat
- the latitude of the pointlon
- the longitude of the pointdataset
- the dataset which the point comes fromcollection_date
- the date at which the point was collectedexport_end_date
- we collect a year of data for each point - this value defines the last month for which data is exported (and therefore the entire timeseries, since we will collect data for a year up to that point).geometry
- the geometry of the point. This may be a polygon (in which caselat
/lon
will be the central point of that field) or a pointis_test
- a boolean indicating whether or not the point is part of the test data
Nullable columns
harvest_date
- the harvest date of the crop described at thelat
/lon
planting_date
- the planting date of the crop described at thelat
/lon
label
- the label - this will be the higher level agricultural land cover label describing the land use at thelat
/lon
for the givenexport_end_date
classification_label
- the higher level classification oflabel
, defined by the FAO's indicative crop classification (i.e. if a row has alabel="maize"
, then it would haveclassification_label="cereals"
features
All features have the following naming convention: {index}_{dataset}.h5
- where these two values are defined above. So each feature is associated with a row in the labels.geojson
.
We are currently in the process of changing this convention so that names are instead in a f"min_lat={min_lat}_min_lon={min_lon}_max_lat={max_lat}_max_lon={max_lon}_dates={start_date}_{end_date}_all"
format.
Let me know if I can provide any further clarifications!