SamKG opened this issue 9 months ago · 1 comments
Hello,
I am wondering if there are any examples which use Flax (or just pure Jax) for mixture of experts models. I'd be happy to contribute one myself if there aren't any - just wondering if anyone has done the heavy lifting already.
found one here: #4035