K. Fang, M. Pan, and CP. Shen, The value of SMAP for long-term soil moisture estimation with the help of deep learning, Transactions on Geoscience and Remote Sensing, doi: 10.1109/TGRS.2018.2872131 (2018)
K. Fang, CP. Shen, D. Kifer and X. Yang, Prolongation of SMAP to Spatio-temporally Seamless Coverage of Continental US Using a Deep Learning Neural Network, Geophysical Research Letters, doi: 10.1002/2017GL075619, preprint accessible at: arXiv:1707.06611 (2017)
Two examples with sample data are wrapped up including
A demo for temporal test is here
├── CONUS
│ ├── 2000
│ │ ├── [Variable-Name].csv
│ │ ├── ...
│ │ ├── timeStr.csv
│ │ └── time.csv
│ ├── ...
│ ├── 2017
│ │ └── ...
│ ├── const
│ │ ├── [Constant-Variable-Name].csv
│ │ └── ...
│ └── crd.csv
├── CONUSv4f1
│ └── ...
├── Statistics
│ ├── [Variable-Name]_stat.csv
│ ├── ...
│ ├── const_[Constant-Variable-Name]_stat.csv
│ └── ...
├── Subset
│ ├── CONUS.csv
│ └── CONUSv4f1.csv
└── Variable
├── varConstLst.csv
└── varLst.csv
Data folder contains all data including both training and testing, time-dependent variables and constant variables. In example data structure, there are two dataset folders - CONUS and CONUSv4f1. Those data are saved in:
- year/[Variable-Name].csv:
A csv file of size [#grid, #time], where each column is one grid and each row is one time step. This file saved data of a time-dependent variable of current year. For example, CONUS/2010/SMAP_AM.csv is SMAP data of 2002 on the CONUS.
Most time-dependent varibles comes from NLDAS, which included two forcing product (FORA, FORB) and three simulations product land surface models (NOAH, MOS, VIC). Variables are named as [variable]_[product]_[layer], and reference of variable can be found in NLDAS document. For example, SOILM_NOAH_0-10 refers to soil moisture product simulated by NOAH model at 0-10 cm.
Other than NLDAS, SMAP data are also saved in same format but always used as target. In level 3 database, there are two SMAP csv files which are only available after 2015: SMAP_AM.csv and SMAP_PM.csv.
-9999 refers to NaN.
- year/time.csv & timeStr.csv
Dates csv file of current year folder, of size [#date]. time.csv recorded Matlab datenum and timeStr.csv recorded date in format of yyyy-mm-dd.
Notice that each year start from and end before April 1st. For example data in folder 2010 is actually data from 2010-04-01 to 2011-03-31. The reason is that SMAP launched at April 1st.
- const/[Constant Variable Name].csv
csv file for constant variables of size [#grid].
- crd.csv
Coordinate of all grids. First Column is latitude and second column is longitude. Each row refers a grid.
Stored statistics of variables in order to do data normalization during training. Named as:
- Time dependent variables-> [variable name].csv
- Constant variables-> const_[variable name].csv
Each file wrote four statistics of variable:
- 90 percentile
- 10 percentile
- mean
- std
During training we normalize data by (data - mean) / std
Subset refers to a subset of grids from the complete dataset (CONUS or Global). For example, a subset only contains grids in Pennsylvania. All subsets (including the CONUS or Global dataset) will have a [subset name].csv file in the Subset folder. [subset name].csv is wrote as:
- line 1 -> root dataset
- line 2 - end -> indexs of subset grids in rootset (start from 1)
If the index is -1 means all grid, from example CONUS dataset.
Stored csv files contains a list of variables. Used as input to training code. Time-dependent variables and constant variables should be stored seperately. For example:
- varLst.csv -> a list of time-dependent variables used as training predictors.
- varLst.csv -> a list of constant variables used as training predictors.