Issues
- 11
GPT2预训练模型,相同配置下libai的显存占用率会显著高于megatron-lm
#349 opened by Sakura-gh - 0
LLaMA-7B SFT died with <Signals.SIGABRT: 6>
#539 opened by PussyCat0700 - 0
单机多卡跑gpt2_pretrain.py遇到如下问题
#534 opened by treestreamymw - 1
运行GLM示例报错 module 'oneflow._C' has no attribute 'fused_multi_head_attention_inference_v2'
#512 opened by bmfire1 - 0
建议requirements 中涉及requests指定一下具体版本
#522 opened by digger-yu - 0
Project下的MAE多卡训练报错
#508 opened by stonewjf - 0
- 1
GLM 10B CN推理加速耗时
#479 opened by vicwer - 0
测试并行框架,张量并行结果与官网所给数据不一致
#477 opened by lisuq - 1
GPT2预训练,libai的throughput和以前的数据不匹配
#475 opened by lisuq - 1
[多机多卡][MT5]failed to connect to all addresses
#470 opened by MikeDean2367 - 4
MT5和T5的区别
#468 opened by MikeDean2367 - 2
GLM libai推理报错
#464 opened by tanklandry - 5
- 3
微信群满了
#433 opened by MissiontoMars - 2
纯tensor并行训练,4卡和8卡使用的集合通信算子不同
#455 opened by Panlichen - 1
设置不同的data_parallel_size导致了不同的global_batch_size
#345 opened by Yipeng1994 - 0
CI test 失效
#435 opened by leaves-zwx - 2
[MT5] exec_graph.cpp physical shape check failed.
#405 opened by strint - 3
- 2
关于benchmark实验结果的疑问
#421 opened by frankxyy - 47
[Bug][MT5] Throughput is unexpected
#406 opened by strint - 3
多机训练失败后,非master node的进程没有完全kill掉
#416 opened by frankxyy - 13
- 27
[Bug][MT5] graph compile time
#407 opened by strint - 12
MT5 8卡纯模型并行,graph模式运行报错
#409 opened by Ldpe2G - 3
可否支持读取pytorch model进行训练
#414 opened by frankxyy - 0
python requirements缺失?
#413 opened by frankxyy - 25
[Bug][MT5]在特定环境下出现 build model 时间很久的问题
#404 opened by strint - 2
Swin Transformer V2 基于LiBai复现 (关于CCF的pr整合)
#359 opened by shaoshitong - 1
PyTorchJob 的方式在 Kubernetes 多机多卡启动 libai 执行
#368 opened by strint - 0
我在复现DETR代码时,在并行计算时出现问题,请问该怎么解决。
#376 opened by oooo111 - 7
基于libai复现SegFormer[projects]
#342 opened by zhanggj821 - 0
[WIP] Swin 和 Swin v2 训练精度对齐官方实现
#357 opened by Ldpe2G - 25
DETR结果对齐实验记录
#288 opened by HiHippie - 3
运行tools/train.sh脚本报错:Check failed: num_device > 0 (0 vs. 0) No IB device found
#340 opened by Sakura-gh - 1
[TODO] DeiT III 的训练策略同步
#266 opened by rentainhe - 7
Module重构讨论
#335 opened by xiezipeng-ML - 6
swin 数据并行,8卡线性加速比,比pytorch低
#312 opened by Ldpe2G - 8
Swin 数据并行从单进程到8进程,显存暴涨
#311 opened by Ldpe2G - 0
LiBai Benchmark README.md
#317 opened by chengtbf - 5
想问问现在有转换hugging face模型的代码了吗?
#310 opened by drxmy - 0
多卡模型加载 key 异常
#293 opened by strint - 28
Swin 模型加载 checkpoint 训练0号卡显存显著增多
#292 opened by Ldpe2G - 1
[Research] Support ffcv backend
#280 opened by rentainhe - 5
期望libai mae支持graph格式数据并行,流水线并行和模型并行
#259 opened by KellyZhang2020 - 7
推理和生成相关调研和设计
#265 opened by L1aoXingyu - 1
流水并行下 get_batch 中 data to_global 的修复方式讨论
#257 opened by Ldpe2G - 1
[BUG] mae finetune 加载官方mae vit模型失败
#258 opened by KellyZhang2020 - 0
Documentation Guide
#246 opened by rentainhe