AI-Hypercomputer/maxtext

A simple, performant and scalable Jax LLM!

PythonApache-2.0

Issues

https://us-python.pkg.dev/gce-ai-infra/maxtext-build-support-packages/simple/ not public
#758 opened 21 days ago by emergenz
6
Flash attention - head_dim 64
#1047 opened a month ago by peregilk
0
Why logit checker has such a high tolerance?
#1021 opened a month ago by hugoabonizio
0
converting Gemma maxtext compatible checkpoint to Hugging Face format
#829 opened 4 months ago by salrowili
4
PGLE doesn't work for Tensor Parallelism
#1005 opened 2 months ago by wang2yn84
3
Cannot do inference in float32
#595 opened 3 months ago by borisdayma
5
Long Context
#801 opened 3 months ago by peregilk
4
converted mlperf gpt3 ckpt starts with a worse loss
#887 opened 3 months ago by gramesh-amd
26
nucleus top_p sampling seems wrong? (edit: nvm, read and tested the code wrong)
#950 opened 3 months ago by honglu2875
1
Training more than one epoch
#914 opened 3 months ago by peregilk
4
Mask is being ignored when cudnn_flash_attention is used
#878 opened 3 months ago by finbarrtimbers
2
Support nsys profiler upload in all cases
#911 opened 3 months ago by gobbleturk
0
Standalone checkpoint write seems to have memory leak
#831 opened 3 months ago by bernardhan33
2
Test
#919 opened 3 months ago by shralex
0
Move maxtext docker images being built to artifact registry
#904 opened 3 months ago by parambole
0
Unable to recover after checkpoint saving
#868 opened 4 months ago by peregilk
2
Support beam search
#594 opened 8 months ago by borisdayma
0
Support for RecurrentGemma
#605 opened 8 months ago by cyrilzakka
0
DEFAULT_MASK_VALUE causes gradient explosion and nan loss on deep models
#614 opened 8 months ago by logicchains
2
Support LoRA training
#609 opened 8 months ago by hxssgaa
2
llama_or_mistral_ckpt.py file requiring checkpoints in local file system
#674 opened 7 months ago by shivajid
0
Llama3
#683 opened 3 months ago by peregilk
2
Support target masking (aka loss masking or label masking) for SFT datasets
#736 opened 6 months ago by jmschndev
0
How to implement 1F1B pipeline parallelism in Jax?
#752 opened 6 months ago by MoFHeka
1
Inconsistent environment variable names
#775 opened 5 months ago by gabeweisz
0
Multihost training collapses from time to time when loading the next batch
#786 opened 5 months ago by YUE-FAN
3
Make MaxText as Python Modules
#819 opened 5 months ago by JoeZijunZhou
0
Converting LLama3.1 405B checkpoint - Requesting multipass checkpoint conversion
#864 opened 3 months ago by shivajid
3
Cannot see multiple GPUs when using Slurm (with proposed fix)
#865 opened 4 months ago by gabeweisz
0
Inconsistent code formatting
#735 opened 3 months ago by jmschndev
0
Error loading mlperf gpt3 checkpoint after pax to maxtext conversion
#879 opened 3 months ago by gramesh-amd
14
mlperf gpt3 ckpt permission issues
#847 opened 3 months ago by gramesh-amd
11
Cannot load the paxml gpt3 tokenizer
#875 opened 3 months ago by gramesh-amd
7
How to load tfrecords from local file system for Mlperf training?
#844 opened 4 months ago by gramesh-amd
3
Question: Gradient Accumulation
#607 opened 4 months ago by thiagolaitz
6
FlashAttention Support - TPUv3
#791 opened 4 months ago by maciek-pioro
1
aqtp release 0.8.0 breaking dependencies
#849 opened 4 months ago by bernardhan33
1
Gemma 2 support
#733 opened 5 months ago by borisdayma
3
`hf_access_token` only effective for loading gated datasets, not gated tokenizers
#734 opened 5 months ago by jmschndev
0
Outdated links in `First_run.md`
#776 opened 5 months ago by emergenz
1
Eval on C4?
#711 opened 6 months ago by tjingrant
1
Update Inference Microbenchmark scripts
#660 opened 7 months ago by jon-chuang
0
How to convert a model to parameter only checkpoints (unscanned) on a CPU VM
#634 opened 8 months ago by hosseinsarshar
2
Reproducing pure computation TFLOPs
#624 opened 8 months ago by prrathi
4
Asignación
#622 opened 8 months ago by Cyberwoodd
1
Clarification: how does Llama-2-7b fit on a v4-8 when using Adam?
#606 opened 8 months ago by rodrigo-f-nogueira
3
Consolidate inference related logic under jetstream-maxtext
#612 opened 8 months ago by ahg-g
1
Support Qwen1.5
#585 opened 8 months ago by Muhtasham
1
Gemma instructions were deleted in commit
#579 opened 9 months ago by emergenz
2
Issues running test_llama2_7b.sh on TPU VM v3-8
#572 opened 9 months ago by korney3
1