Next WeNet Roadmap
robin1001 opened this issue · 4 comments
We will mainly focus on the following two problems in Next WeNet
.
- NN based contextual biasing and LM solution. On the one hand, a pure end-to-end model is our final goal, including contextual biasing and LM. On the other hand, there are a lot of problems in our current contextual biasing and LM, such as poor rare word performance in contextual biasing, complicated LM solution since FST and token passing beam search are introduced, and so on. Also, we are looking for new paradigm, such as joint text/audio learning, prompt learning, and so on.
- Open source big model, pretrained model, and mutimodal model exploration. We can see the increasing capability, influence, and interest in these models, and we believe it may give a final solution to general AI. It's hard for us to directly do such things due to the lack of research and computation resources. However, we can explore the usage of the models in speech recognition applications as
open source big models
+task/private data
may be the new paradigm for the next AI.
We are open for other proposals. WeNet is a community-driven project and we love your feedback and proposals on where we should be heading. Feel free to volunteer yourself if you are interested in trying out some items(they do not have to be on the list).
From Google's recent USM paper, we can see the following three points:
1 injecting tezt
2 Simpler pre-training
3 Text to speech intermediate representation
I think these three are the ultimate weapons for speech recognition, whether it is from the signal level or the text level。
And the community is a good way to cooperate to make the big model or the road of the new pipeline
From Google's recent USM paper, we can see the following three points:
1 injecting tezt
2 Simpler pre-training
3 Text to speech intermediate representation
I think these three are the ultimate weapons for speech recognition, whether it is from the signal level or the text level。
And the community is a good way to cooperate to make the big model or the road of the new pipeline
For 2: sipmpler pretrin: May be bestrq is good start : https://github.com/wenet-e2e/wenet/tree/Mddct-bestrq/wenet/ssl/bestrq
This issue has been automatically closed due to inactivity.