Lightning-Universe/lightning-bolts

FasterRCNN breaks with custom backbone

raghavmecheri opened this issue ยท 2 comments

๐Ÿ› Bug

kwargs in the call to create_fasterrcnn_backbone for a custom FasterRCNN backbone within pl_bolts.models.detection.FasterRCNN breaks due to an unexpected min_size param

To Reproduce

Steps to reproduce the behavior:

  1. Create a new FasterRCNN instance with a backbone that isn't resnet50 (I noticed it with resnet18)
  2. Attempt to train the model and the program crashes with the below:
Your Lightning App is starting. This won't take long.
INFO: Your app has started. View it in your browser: http://127.0.0.1:7501/view
Global seed set to 42
Process Process-2:
Traceback (most recent call last):
  File "/MYCONDA/lib/python3.9/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/MYCONDA/lib/python3.9/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/MYPATH/venv/lib/python3.9/site-packages/lightning_app/utilities/proxies.py", line 308, in __call__
    raise e
  File "/MYPATH/venv/lib/python3.9/site-packages/lightning_app/utilities/proxies.py", line 291, in __call__
    self.run_once()
  File "/MYPATH/venv/lib/python3.9/site-packages/lightning_app/utilities/proxies.py", line 400, in run_once
    self.work.on_exception(e)
  File "/MYPATH/venv/lib/python3.9/site-packages/lightning_app/core/work.py", line 443, in on_exception
    raise exceptionINFO: Your Lightning App is being stopped. This won't take long.

  File "/MYPATH/venv/lib/python3.9/site-packages/lightning_app/utilities/proxies.py", line 391, in run_once
    ret = work_run(*args, **kwargs)
  File "/MYPATH/venv/lib/python3.9/site-packages/vision_utils/PyTorchLightningScript.py", line 87, in run
    super().run(*args, **kwargs)
  File "/MYPATH/venv/lib/python3.9/site-packages/lightning_app/components/python/tracer.py", line 95, in run
    res = self._run_tracer(init_globals)
  File "/MYPATH/venv/lib/python3.9/site-packages/lightning_app/components/python/tracer.py", line 102, in _run_tracer
    return tracer.trace(self.script_path, *self.script_args, init_globals=init_globals)
  File "/MYPATH/venv/lib/python3.9/site-packages/lightning_app/utilities/tracer.py", line 169, in trace
    res = runpy.run_path(script, run_name="__main__", init_globals=init_globals or globals())
  File "/MYCONDA/lib/python3.9/runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/MYCONDA/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/MYCONDA/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/MYPATH/fasterrcnn_pascal_q2_2022/./tmptrainer.py", line 6, in <module>
    cli = LightningCLI(
  File "/MYPATH/venv/lib/python3.9/site-packages/pytorch_lightning/utilities/cli.py", line 563, in __init__
    self.instantiate_classes()
  File "/MYPATH/venv/lib/python3.9/site-packages/pytorch_lightning/utilities/cli.py", line 701, in instantiate_classes
    self.config_init = self.parser.instantiate_classes(self.config)
  File "/MYPATH/venv/lib/python3.9/site-packages/jsonargparse/deprecated.py", line 127, in patched_instantiate_classes
    cfg = self._unpatched_instantiate_classes(cfg, **kwargs)
  File "/MYPATH/venv/lib/python3.9/site-packages/jsonargparse/core.py", line 1107, in instantiate_classes
    component.instantiate_class(component, cfg)
  File "/MYPATH/venv/lib/python3.9/site-packages/jsonargparse/signatures.py", line 499, in group_instantiate_class
    parent[key] = group.group_class(**value)
  File "/MYPATH/venv/lib/python3.9/site-packages/pl_bolts/models/detection/faster_rcnn/faster_rcnn_module.py", line 99, in __init__
    backbone_model = create_fasterrcnn_backbone(
  File "/MYPATH/venv/lib/python3.9/site-packages/pl_bolts/models/detection/faster_rcnn/backbones.py", line 33, in create_fasterrcnn_backbone
    backbone = resnet_fpn_backbone(backbone, pretrained=True, trainable_layers=trainable_backbone_layers, **kwargs)
  File "/MYPATH/venv/lib/python3.9/site-packages/torchvision/models/_utils.py", line 142, in wrapper
    return fn(*args, **kwargs)
  File "/MYPATH/venv/lib/python3.9/site-packages/torchvision/models/_utils.py", line 228, in inner_wrapper
    return builder(*args, **kwargs)
TypeError: resnet_fpn_backbone() got an unexpected keyword argument 'min_size'

Code sample

trainer.py:

from pl_bolts.datamodules import VOCDetectionDataModule
from pl_bolts.models.detection import FasterRCNN
from pytorch_lightning.utilities.cli import LightningCLI

if __name__ == "__main__":
    cli = LightningCLI(
        FasterRCNN, VOCDetectionDataModule, seed_everything_default=42, save_config_overwrite=True, run=False
    )
    cli.trainer.fit(cli.model, datamodule=cli.datamodule)

app.py:

import lightning_app as L
import vision_utils as V
import os.path as ops

TRAINER_CONFIGS = ["--trainer.max_epochs=5", "--trainer.gpus=1", "--trainer.callbacks=ModelCheckpoint", "--trainer.callbacks.monitor=val_iou"]
MODEL_CONFIGS = ["--model.backbone=resnet18", "--model.learning_rate=0.01"]
DATA_CONFIGS = ["--data.num_workers=1", "--data.batch_size=4"]

ARGS = TRAINER_CONFIGS + MODEL_CONFIGS + DATA_CONFIGS

class TrainingFlow(L.LightningFlow):
    def __init__(self, name, slack_config) -> None:
        super().__init__()
        self.train_work = V.PyTorchLightningScript(
            script_path=ops.join(ops.dirname(__file__), "./trainer.py"),
            script_args=ARGS,
            cloud_compute=L.CloudCompute("gpu")
        )

    def run(self):
        self.train_work.run()

flow = TrainingFlow("fasterrcnn_pascal_q2_2022", SLACK_CONFIG)
app = L.LightningApp(flow)

To run: python app.py

Expected behavior

For the model to compile + train

Environment

  • Python 3.9.7
  • MacOS
  • pip used for installation
  • No CUDA used

Additional context

Commenting out the kwargs param is a quick fix, but this isn't optimal - I could take a closer look when I get a second, if need be

otaj commented

Hi, @raghavmecheri! We are currently going through a major revision. Please, give us some time to finalize the revision, or, sign up for the part of the revision (we're going to more than welcome every helping hand โšก) and fix possible old bugs that you've discovered while doing that revision. The revision issue is #839 ๐Ÿ”ฉ

stale commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.