huggingface/nanotron

Minimalistic large language model 3D-parallelism training

PythonApache-2.0

Issues

[Question] Correctness of backward pass of RowLinear
#46 opened 4 months ago by ufotalent
2
PP allocation issue
#108 opened 2 months ago by jordane95
3
`FileNotFoundError: [Errno 2] No such file or directory: '/usr/local/cuda/bin/nvcc'`
#167 opened 19 days ago by NouamaneTazi
1
out of memory for continuing pretraining llama3-8B
#161 opened a month ago by ckzbullbullet
5
README typo, specifies a .sh as a config
#166 opened 23 days ago by staghado
1
We don't save checkpoint after training ends
#163 opened a month ago by NouamaneTazi
0
moe in src and load balancing losses
#159 opened a month ago by haeggee
0
Make Pipeline Parallelism Optional
#153 opened a month ago by XinDongol
1
Resume training with a different Tensor parallel value
#106 opened a month ago by 3outeille
1
Mamba dependecies
#112 opened a month ago by staghado
2
Train more than 1 epoch?
#158 opened a month ago by Lauler
5
'LlamaModel' object has no attribute 'get_named_params_without_weight_decay' in the beginner example
#146 opened a month ago by XinDongol
3
Add data loading time in log
#154 opened a month ago by XinDongol
0
TritonRMSNorm generates randomized results during inference
#138 opened a month ago by zzhhjjj
0
[Feature] Use CUDA event for measuring elasped time
#88 opened 3 months ago by xrsrke
0
Multinode minimal example
#115 opened 2 months ago by staghado
6
`nanotron/the-pile-for-doremi` is empty
#127 opened 2 months ago by Tonyhao96
1
[Bug] Fix clipping gradients's test
#92 opened 2 months ago by xrsrke
0
Question concerning context parallelism.
#126 opened 2 months ago by veritas9872
1
FEAT: Support 1.58-bit LLMs training
#114 opened 2 months ago by younesbelkada
1
[Feature] nanotron <-> conversion for Llama
#124 opened 2 months ago by yardenas
0
Example code does not work.
#121 opened 2 months ago by codingchild2424
4
AssertionError related to tied parameters during `train_tiny_llama.sh` execution
#101 opened 2 months ago by xffxff
2
[Features] support gradient checkpointing for memory saving
#99 opened 3 months ago by zguo0525
1
Continued Pretraining on Llama 7b.
#79 opened 3 months ago by wiseyy
5
[Question] Modification for Performing Fine-Tuning
#57 opened 4 months ago by allanj
3
[Feature] All GPUs within the same TP group load training data from shared memory
#91 opened 3 months ago by xrsrke
0
[Feature Request] Support Data Streaming for faster training of large models
#45 opened 4 months ago by chagri
2
[Unit Test] Add unit tests for DistributedTrainer
#90 opened 3 months ago by xrsrke
5
[Feature] Asyncronous Serialization
#87 opened 3 months ago by xrsrke
0
[Unit Test] Add unit test for DoReMi's trainer
#89 opened 3 months ago by xrsrke
0
[Feature] Parallel transformer block
#84 opened 3 months ago by xrsrke
0
[Feature] Kernel Fusion of Layer Norm and GeLU
#86 opened 3 months ago by xrsrke
0
[Feature] LAMB optimizer
#85 opened 3 months ago by xrsrke
0
[Bug] Not saving `lm_head` in checkpoint
#82 opened 3 months ago by xrsrke
0
Continued Pretraining on Llama7b.
#78 opened 3 months ago by wiseyy
1
[Feature] Refactor `ParallelContext.world_rank_matrix`
#77 opened 3 months ago by NouamaneTazi
0
Merging optimizer states from different pipeline parallel size to resume training
#38 opened 4 months ago by xrsrke
0
[Feature] Fix support for sequence parallelism with MoEs
#74 opened 4 months ago by NouamaneTazi
0
[Question] Async Tensor Parallel
#48 opened 4 months ago by woshiyyya
2
[Feature request] Performance and accuracy benchmarks
#61 opened 4 months ago by brianyu-nexusflowai
2
Integration with the HuggingFace Ecosystem
#47 opened 4 months ago by woshiyyya
1
[Feature Request] Add simple communications benchmarks to the repo
#43 opened 4 months ago by NouamaneTazi
1
Question concerning Megatron-style sequence parallel support plans.
#32 opened 4 months ago by veritas9872
1
[Bug] `TypeError: Config.__init__() [...]` from `examples/config_tiny_llama.py`
#35 opened 4 months ago by saforem2
3
How is it compared with Megatron Deepspeed?
#36 opened 4 months ago by allanj
1
Question: Roadmap / Feature Scope
#30 opened 4 months ago by Algomancer
1
[Refactor] Add support to resume training using optimizer states with different topology
#19 opened 4 months ago by NouamaneTazi
0
Save checkpoint before terminating the training run
#21 opened 5 months ago by xrsrke
0
[Refactor] DistributedOptimizer and FP32GradAccum
#20 opened 5 months ago by NouamaneTazi
0