Issues
- 2
- 3
PP allocation issue
#108 opened by jordane95 - 1
`FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'`
#167 opened by NouamaneTazi - 5
- 1
README typo, specifies a .sh as a config
#166 opened by staghado - 0
We don't save checkpoint after training ends
#163 opened by NouamaneTazi - 0
moe in src and load balancing losses
#159 opened by haeggee - 1
Make Pipeline Parallelism Optional
#153 opened by XinDongol - 1
- 2
Mamba dependecies
#112 opened by staghado - 5
Train more than 1 epoch?
#158 opened by Lauler - 3
'LlamaModel' object has no attribute 'get_named_params_without_weight_decay' in the beginner example
#146 opened by XinDongol - 0
Add data loading time in log
#154 opened by XinDongol - 0
- 0
- 6
Multinode minimal example
#115 opened by staghado - 1
`nanotron/the-pile-for-doremi` is empty
#127 opened by Tonyhao96 - 0
[Bug] Fix clipping gradients's test
#92 opened by xrsrke - 1
Question concerning context parallelism.
#126 opened by veritas9872 - 1
FEAT: Support 1.58-bit LLMs training
#114 opened by younesbelkada - 0
[Feature] nanotron <-> conversion for Llama
#124 opened by yardenas - 4
Example code does not work.
#121 opened by codingchild2424 - 2
AssertionError related to tied parameters during `train_tiny_llama.sh` execution
#101 opened by xffxff - 1
- 5
Continued Pretraining on Llama 7b.
#79 opened by wiseyy - 3
[Question] Modification for Performing Fine-Tuning
#57 opened by allanj - 0
[Feature] All GPUs within the same TP group load training data from shared memory
#91 opened by xrsrke - 2
- 5
[Unit Test] Add unit tests for DistributedTrainer
#90 opened by xrsrke - 0
[Feature] Asyncronous Serialization
#87 opened by xrsrke - 0
[Unit Test] Add unit test for DoReMi's trainer
#89 opened by xrsrke - 0
[Feature] Parallel transformer block
#84 opened by xrsrke - 0
[Feature] Kernel Fusion of Layer Norm and GeLU
#86 opened by xrsrke - 0
[Feature] LAMB optimizer
#85 opened by xrsrke - 0
[Bug] Not saving `lm_head` in checkpoint
#82 opened by xrsrke - 1
Continued Pretraining on Llama7b.
#78 opened by wiseyy - 0
- 0
Merging optimizer states from different pipeline parallel size to resume training
#38 opened by xrsrke - 0
- 2
[Question] Async Tensor Parallel
#48 opened by woshiyyya - 2
- 1
Integration with the HuggingFace Ecosystem
#47 opened by woshiyyya - 1
- 1
- 3
[Bug] `TypeError: Config.__init__() [...]` from `examples/config_tiny_llama.py`
#35 opened by saforem2 - 1
How is it compared with Megatron Deepspeed?
#36 opened by allanj - 1
Question: Roadmap / Feature Scope
#30 opened by Algomancer - 0
[Refactor] Add support to resume training using optimizer states with different topology
#19 opened by NouamaneTazi - 0
- 0