reorganize config file inputs
dbuscombe-usgs opened this issue · 4 comments
in the long term I think having 46+ possible config items, some of which are mandatory and some of which are optional, is a clunky way to organize this. We either:
- split it all up into separate configs, like 'train', 'dataset', and 'predict', or perhaps 'base' and 'optional'
- come up with an alternative to the config file for all the user inputs. what does this look like? a database file? a GUI? a series of text prompts?
What do you say @2320sharon and @venuswku ?
Originally posted by @dbuscombe-usgs in #122 (comment)
I like the idea of keeping the config file as a json file as this kind of file is easy to understand, is easy to edit, and easy to host and transmit across the internet. I think we should keep config files but have different kinds of config files depending on the task they are being used for.
Here are a couple of ideas:
- Separate training and trained model configs
- The training config file would contain only the settings relevant to training. Some of the more niche parameters could be set to default values and still be included in the file
- The trained config file would contain settings relevant to running predictions with the model. It should be generated at the end of training a model.
- Create a GUI for creating the config file
- simple as making a few ipywidgets that have a save button to create the config (similar to coastseg settings)
- could be run with jupyter notebook very easily
- Alternatively we can build it in streamlit so it can be hosted for free and easy to edit ( even though I wouldn’t like we could build it in panel)
I’m not sure what you mean by ‘dataset’?
If it helps plan this, right now we have the following mandatory training params (the program should be modified so it exits if not available):
"TARGET_SIZE"
"MODEL"
"NCLASSES"
"BATCH_SIZE"
"N_DATA_BANDS"
"DO_TRAIN"
"PATIENCE"
"MAX_EPOCHS"
"VALIDATION_SPLIT"
"RAMPUP_EPOCHS"
"SUSTAIN_EPOCHS"
"EXP_DECAY"
"START_LR"
"MIN_LR"
"MAX_LR"
Mandatory training params for ResUnet only (i.e. not needed by segformer models):
"FILTERS"
"KERNEL"
"STRIDE"
"LOSS"
Optional training params:
"DROPOUT"
"DROPOUT_CHANGE_PER_LAYER"
"DROPOUT_TYPE"
"USE_DROPOUT_ON_UPSAMPLING"
"INITIAL_EPOCH"
"CLEAR_MEMORY"
"LOAD_DATA_WITH_CPU"
Mandatory data params (or should these all be optional, defaulting to certain values?):
"ROOT_STRING"
"AUG_ROT"
"AUG_ZOOM"
"AUG_WIDTHSHIFT"
"AUG_HEIGHTSHIFT"
"AUG_HFLIP"
"AUG_VFLIP"
"AUG_LOOPS"
"AUG_COPIES"
Optional data params
"FILTER_VALUE"
"DOPLOT"
"USEMASK"
"REMAP_CLASSES"
Optional inference parameters:
"TESTTIMEAUG"
"TESTTIMEAUG"
"WRITE_MODELMETADATA"
"OTSU_THRESHOLD"
Optional general params:
"SET_GPU"
"SET_PCI_BUS_ID"
One idea I had was to write a simple utility to generate a config file. User would provide at least
"TARGET_SIZE"
"MODEL"
"NCLASSES"
"BATCH_SIZE"
"N_DATA_BANDS"
"VALIDATION_SPLIT"
and the program could fill the remaining params with default values.... the user could then edit that file
The program could be called generate_config.py
or whatever, and that's the first thing someone does ... the user provides the 6 values above. The program generates sets of parameters based on the value of 'MODEL'
I think I will go ahead and make this config generator tool, like I described above, in the next version