This is a poor man's replica of gh:BlinkDL/RWKV-LM using JAX+Haiku.
Additional thanks to gh:tensorpro/jax-rwkv, but it is using flax
, with which I have a really bad experience of constant skissues.
This is a poor man's replica of gh:BlinkDL/RWKV-LM using JAX+Haiku.
Additional thanks to gh:tensorpro/jax-rwkv, but it is using flax
, with which I have a really bad experience of constant skissues.