databricks/megablocks

PythonApache-2.0

Issues

AMP + BF16 failing
#95 opened 8 months ago by jramapuram
4
Question on offsets in figures 5
#61 opened 9 months ago by DaehanKim
2
Wrong outputs for hidden dim 14336
#46 opened 10 months ago by pierrestock
4
Routing
#118 opened 3 months ago by alexliap
1
Illegal memory access on non-0 cuda devices from `histogram`
#117 opened 3 months ago by phillip-kravtsov
0
Cloning input `x` in `megablocks.layers.glu.SparseGLU` leads to different SDD outputs
#115 opened 3 months ago by cmsflash
2
Can we change self.blocking in dmoe.py from 128 to 64?
#114 opened 4 months ago by seanM29
2
_LOAD_BALANCING_LOSS returns empty list sometimes
#113 opened 4 months ago by exnx
1
Bad throughput with GLU
#110 opened 4 months ago by Muennighoff
1
1-expert worse than dense model
#107 opened 5 months ago by Muennighoff
0
Sum missing axis arg in kernels.py
#102 opened 6 months ago by jambo6
4
support amd/rocm
#97 opened 6 months ago by ehartford
3
OSError: Stale file handle with dMoE
#106 opened 5 months ago by Muennighoff
3
[integrating megablocks with open_lm] Question about megablocks + FSDP
#57 opened 9 months ago by kernelmachine
9
Add a fine-tune script for JetMoE
#105 opened 5 months ago by shamanez
2
ScatterMoE feature
#104 opened 6 months ago by ehartford
5
RuntimeError: Triton Error [CUDA]: invalid argument
#88 opened 8 months ago by noob-ctrl
15
Implement Mixture of Depth and Experts (MoDE)
#103 opened 6 months ago by casper-hansen
2
Import dmoe model into other training script?
#101 opened 6 months ago by andrewnc
3
Computation distribution with expert parallelism
#100 opened 6 months ago by opherlieber
1
SFT Script and Hyperparameters used for DBRX-Instruct
#99 opened 6 months ago by alpayariyak
5
Does this framework support SFT?
#90 opened 8 months ago by banksy23
2
Has anyone encountered this CUDA error?
#62 opened 9 months ago by bozheng-hit
15
Unsharding scripts for megablocks models
#94 opened 8 months ago by mayank31398
0
the wrong loss func was chosen at evaluation
#93 opened 8 months ago by peterjc123
2
Seeking a good multi-node training config
#92 opened 8 months ago by rpand002
3
selective router precision
#91 opened 8 months ago by 152334H
1
different load_balancing_loss with different pipeline_parallel_size
#85 opened 9 months ago by bozheng-hit
8
Error from pip about missing torch module
#78 opened 8 months ago by michaelwhitford
4
Docker issues with PyPI installation
#67 opened 9 months ago by sedrick-keh-tri
3
ParallelDroplessMLP initialises self.mlp twice
#83 opened 9 months ago by 152334H
6
Gradient scale size for expert gradient
#86 opened 9 months ago by fanshiqing
4
save loading_balancing_loss properly
#82 opened 9 months ago by gouchangjiang
2
How to integrate to transformers-based mixtral
#84 opened 9 months ago by nxphi47
1
Why the second matrix of the mlp layer has the same shape of the first one?
#81 opened 9 months ago by gouchangjiang
1
[BUG] Optimizer Weights Not Reloaded When Training with bf16 Pretrained Weights
#80 opened 9 months ago by RookieHong
1
Comparison against top-2 routing?
#49 opened 10 months ago by sunnyszy
4
Script for Full Fine-Tuning of Mixtral
#68 opened 9 months ago by alpayariyak
1
Efficiency of torch mlp
#77 opened 9 months ago by imoneoi
2
How do you use routing balancing loss under pipeline parallelism
#64 opened 9 months ago by szhengac
5
How to add support for swiglu in Megablocks?
#35 opened 10 months ago by fanshiqing
14
About the Multi-node Script
#59 opened 9 months ago by XingyuXie
4
Inference code
#48 opened 9 months ago by AlpinDale
5
How to pip install the latest megablocks?
#32 opened 10 months ago by fanshiqing
2
Installation fails due to missing mosaicml-turbo
#51 opened 9 months ago by AlpinDale
2
Latest GitHub release version higher than main branch setup.py
#50 opened 9 months ago by nateraw
4
Why not support tensor model parallel?
#40 opened 10 months ago by Richie-yan
7
multi-node problem
#18 opened a year ago by sudahui
5
Does megablocks support the true expert parallelism?
#21 opened a year ago by feifeibear
2
Current installation instructions don't quite work
#3 opened 2 years ago by deepakn94
1