Pinned issues
Issues
- 0
- 0
[Bug] 分布式训练代码例子报错,
#1540 opened by apachemycat - 1
- 0
ValueError: torch.utils.checkpoint: please pass in use_reentrant=True or use_reentrant=False explicitly. T
#1539 opened by apachemycat - 0
[Bug] 中断后恢复训练报错RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
#1538 opened by Helen-Ren-yi - 0
[Bug] 多卡情况下,训练后eval和离线test的精度不能保证一致
#1536 opened by whlook - 2
[Bug] No module named 'mmengine.models'
#1525 opened by hitbuyi - 1
DeepSpeed2 不能自动排除冻结的参数[Bug]
#1518 opened by Baboom-l - 1
[Bug] Unable to save results using pklfile_prefix tag
#1533 opened by abadithela - 0
[Feature] Support dataset streaming?
#1535 opened by Ablustrund - 8
[Feature] Log metrics in test mode
#1482 opened by mmeendez8 - 5
[Feature] remove AMP wrap in train_step
#1457 opened by whlook - 1
- 0
[Bug] Error Encountered with mmengine Dependency Involving JSON and Time Modules
#1523 opened by Duguce - 1
不是大模型使用并行策略的效率大大降低 !
#1522 opened by Shen001 - 0
[Docs] Add OMG-Seg to ecoystem projects
#1521 opened by evdcush - 0
- 0
- 0
Suggested combination of Runner and AmpOptimWrapper does not result in mixed precision training [Docs]
#1515 opened by JMQuehl - 0
[Bug]
#1514 opened by fsbarros98 - 0
- 2
Is anywhere record the version support matrix between mmengine and pytorch?
#1509 opened by kelvinwang139 - 0
AttributeError: module 'torch.distributed' has no attribute 'ReduceOp'
#1507 opened by AttilaLengyel-TomTom - 9
[Bug] config to import yapf causes 'EOFError: Ran out of input' when distributed training
#1480 opened by DeclK - 2
[Bug] MMDistributedDataParallel distributed training can not help save memory. The total memory usage is twice that of a single card.
#1504 opened by humian321 - 1
[Docs] Add "colossalai" in requirements
#1487 opened by Yanjia0 - 1
scattering the data to gpu when using base dataelement
#1501 opened by ajaynitk - 3
[Feature] Nested initialization implementation of pure Python style configuration files
#1467 opened by YinAoXiong - 2
[Bug] load_from pretrained checkpoint fails using FlexibleRunner and DeepSpeed
#1499 opened by pdmct - 1
[Feature] Early Stopping, Validation Loss
#1491 opened by 1dmesh - 0
[Feature] A Checkpoint hook for saving model checkpoints as Weights & Biases Artifacts
#1492 opened by soumik12345 - 2
[Feature] nn.LazyLinear
#1484 opened by holdjun - 2
[Docs] How do backend_args work?
#1470 opened by Data-drone - 1
activation_checkpointing 导致权重无法更新
#1425 opened by Qidian213 - 1
怎么调用ProfilerHook
#1442 opened by ChaoyiXie - 1
[Bug] Config to_dict() does not convert type recursively.
#1464 opened by wangg12 - 1
KeyError: 'CocoDataset
#1463 opened by lfreee - 3
[Bug] TypeError: `logger` should be either a logging.Logger object, str, "silent", "current" or None, but got <class 'list'>
#1452 opened by wang-tf - 4
[Bug] MMDistributedDataParallel have no effect
#1455 opened by doodoo0006 - 2
[Feature] 保存的pth文件越来越大
#1446 opened by ChaoyiXie - 0
[Feature] Log metrics to visualizer on test run
#1450 opened by InakiRaba91 - 0
- 0
[Feature] Serialize data list to torch.Tensor
#1443 opened by wangg12 - 0
[Bug] error in readme
#1439 opened by del-zhenwu - 1
'hybrid_parallel' plugin in 'ColossalAIStrategy' is not supported in mmengin-0.10.0
#1432 opened by taohan10200 - 2
[Bug] NameError: name 'OptimWrapper' is not defined, when i used mmdeploy in jetson
#1433 opened by lijoe123 - 2
[Bug] `scale_lr()` cannot be called after `ParamScheduler` in DDPStrategy using `FlexibleRunner`.
#1427 opened by SCZwangxiao - 0
[Bug] Deepcopy of BaseDataElement seems not working
#1423 opened by RunsenXu - 0
[Feature] Add xFormers to MMEngine
#1420 opened by AkideLiu - 1
[Bug] MMDeepSpeedEngineWrapper bf16 bug
#1417 opened by felixfuu