Can't use a logger
Closed this issue ยท 4 comments
๐ Bug
The script crashes when I try to log the run.
To Reproduce
Run:
python train.py task=nlp/text_classification dataset=nlp/text_classification/emotion trainer.gpus=1 trainer.accelerator=dp log=true trainer.logger=tensorboard
And then I get:
> python train.py task=nlp/text_classification dataset=nlp/text_classification/emotion trainer.gpus=1 trainer.accelerator=dp log=true trainer.logger=tensorboard
num_workers: 16
trainer:
_target_: pytorch_lightning.Trainer
logger: tensorboard
checkpoint_callback: true
callbacks: null
default_root_dir: null
gradient_clip_val: 0.0
process_position: 0
num_nodes: 1
num_processes: 1
gpus: 1
auto_select_gpus: false
tpu_cores: null
log_gpu_memory: null
progress_bar_refresh_rate: 1
overfit_batches: 0.0
track_grad_norm: -1
check_val_every_n_epoch: 1
fast_dev_run: false
accumulate_grad_batches: 1
max_epochs: 1
min_epochs: 1
max_steps: null
min_steps: null
limit_train_batches: 1.0
limit_val_batches: 1.0
limit_test_batches: 1.0
val_check_interval: 1.0
flush_logs_every_n_steps: 100
log_every_n_steps: 50
accelerator: dp
sync_batchnorm: false
precision: 32
weights_summary: top
weights_save_path: null
num_sanity_val_steps: 2
truncated_bptt_steps: null
resume_from_checkpoint: null
profiler: null
benchmark: false
deterministic: false
reload_dataloaders_every_epoch: false
auto_lr_find: false
replace_sampler_ddp: true
terminate_on_nan: false
auto_scale_batch_size: false
prepare_data_per_node: true
plugins: null
amp_backend: native
amp_level: O2
move_metrics_to_cpu: false
experiment_name: ${now:%Y-%m-%d}_${now:%H-%M-%S}
log: true
ignore_warnings: true
Error executing job with overrides: ['task=nlp/text_classification', 'dataset=nlp/text_classification/emotion', 'trainer.gpus=1', 'trainer.accelerator=dp', 'log=true', 'trainer.logger=tensorboard']
Top level config has to be OmegaConf DictConfig, plain dict, or a Structured Config class or instance
Set the environment variable HYDRA_FULL_ERROR=1 for a complete stack trace.
Expected behavior
Be able to run the script and log the training process!
Environment
- PyTorch Version (e.g., 1.0): 11.2
- OS (e.g., Linux): Linux
- How you installed PyTorch (
conda
,pip
, source): pip - Build command you used (if compiling from source): -
- Python version: 3.8.10
- CUDA/cuDNN version: 11.2
- GPU models and configuration: A100
- Any other relevant information:
Additional context
Please help :)
Just wanted to add that I have the exact same issue and it is a critical one. This looks like a great framework, but the fact that I can't see how training is progressing in the form of a readable log (e.g. tensorboard, wandb, etc) makes it not useable. Even if some of the documentation could walk users through this (assuming it isn't just a bug) it would be helpful.
Apologies on the late response here!
Logging was added under conf/trainer/loggers
and can be enabled like such:
python train.py task=nlp/text_classification dataset=nlp/text_classification/emotion trainer.gpus=1 trainer.accelerator=dp log=true +trainer/logger=tensorboard
appending +trainer/logger=tensorboard
is the CLI command here. You can modify the save dir by appending trainer.logger.save_dir=my_directory/
. See the conf directory for more loggers! https://github.com/PyTorchLightning/lightning-transformers/tree/master/conf/trainer/logger
I'll make a PR to update the documentation, and give this information under a new tab to close this PR
Documentation has been added here: https://lightning-transformers.readthedocs.io/en/latest/trainer/logging.html
Great thanks!