[Request]: Sweeps in mmdetection
David-Biggs opened this issue ยท 9 comments
Thanks for this amazing tool!! I have been blown away ever since I came across it.
I am using mmdetection to train my models. How do I go about setting my sweep within mmdetection?
Many thanks,
David
Hey @David-Biggs, thanks for the request. I will try to come up with something and let you know when I have an example.
Not sure if you are aware about this PR that was recently merged in the dev
branch of MMDetection that provides some dedicated support for MMDetection.
For sweeps it might require some workaround but a proper integration would be better. I will try to scope it out.
Hey @David-Biggs, after thinking about this for sometime, here's a rough solution for you if you wanna take a stab:
- W&B requires a sweep config and a train function. You can see the same in this intro to sweep colab:
- A sweep config is nothing but a dict of hyperparameter space you wanna search from/tune. Below is an example of sweep config:
import wandb
sweep_config = {
"name" : "my-sweep",
"method" : "random",
"parameters" : {
"epochs" : {
"values" : [10, 20, 50]
},
"learning_rate" :{
"min": 0.0001,
"max": 0.1
}
}
}
- You will then generate a sweep id by doing this:
sweep_id = wandb.sweep(sweep_config)
. - The train function looks something like this:
def train():
with wandb.init() as run:
config = wandb.config
model = make_model(config)
for epoch in range(config["epochs"]):
loss = model.fit() # your model training code here
wandb.log({"loss": loss, "epoch": epoch})
- The train function will have access to
wandb.config
. They are coming fromsweep_config
(for a range a hyperparameter, the value is selected based on optimization method (grid, random, etc)). - MMDetection also has a
train_detector
function what you can call from thetrain
function. The interesting bit would be to manage MMDetection config. - You can do get MMDetection inside the train function and update the required config using wandb.config. Something like this:
def train():
with wandb.init() as run:
config = wandb.config
model = make_model(config)
# MMDetection config
config_file = 'mmdetection/configs/mask_rcnn/mask_rcnn_r50_caffe_fpn_mstrain-poly_1x_coco.py'
cfg = Config.fromfile(config_file)
cfg.optimizer.lr = config.learning_rate
train_detector(model, datasets, cfg, distributed=False, validate=True, meta=meta)
It should work. I will also try to write a colab and share with you.
Hi @ayulockin,
So I've been working on it for a while and I made some alterations and have some interesting observations.
Alterations:
- In train() you make use of
model = make_model(config)
. I removed this and usedmodel = build_detector(cfg.model)
(from mmdetection). I wasn't sure which one to use. I thought I would use the information in the 'current' sweep_config to update the cfg, by doingcfg.optimizer.lr = config.learning_rate ... etc
, then passcfg.model
intobuild_detector()
- I removed
meta=meta
. The default for meta is None and I wasn't sure where you defined yourmeta
variable.
This works... Sort of. The model trains the sweeps loop over the different values but:
Obervations:
- My training losses are all Nan
- During training, I get this error
The testing results of the whole dataset is empty
. There are no validation results (mAP nor Losses)
I removed all sweep related code and ran the MMDetection train_detector(model, datasets, cfg, distributed=False, validate=True)
command and it worked perfectly fine. Losses were real values and I got validation results. I did some digging but could not resolve either of the two issues.
Thanks for trying it out @David-Biggs.
Sorry, I should have clarified that make_model
was more of a pseudocode and not the actual API. Glad it worked (sort of).
Were you able to resolve the NaN loss issue?
Hi @ayulockin,
So I found that the reason for the Nan loss issues was the learning rate. All the values I chose were slightly too large. After reducing the values I was able to do a successful sweep.
Thanks again for your help.
Glad it worked for you. ๐ฏโโ๏ธ
Closing the issue since it's resolved.
@David-Biggs once I have my colab ready, I will share it here.