- https://github.com/salute-developers/golos
- https://github.com/snakers4/open_stt
- https://github.com/GeorgeFedoseev/DeepSpeech
- https://github.com/sovaai/sova-dataset
- https://www.openslr.org/96/ - Russian Librispeech
- https://commonvoice.mozilla.org/ru/datasets - MCV
- https://www.caito.de/2019/01/03/the-m-ailabs-speech-dataset/ - M-AILabs dataset (from Librivox)
- https://ruslan-corpus.github.io/
- https://github.com/sovaai/sova-tts
- https://huggingface.co/bene-ges/tts_ru_hifigan_ruslan
- https://github.com/alphacep/vosk-tts
- https://github.com/RHVoice
- https://github.com/snakers4/silero-models#text-to-speech
- https://github.com/reynoldsnlp/udar
- https://github.com/einhornus/russian_accentuation
- https://github.com/wilpert/RusPhonetizer
- https://huggingface.co/bene-ges/ru_g2p_ipa_bert_large
- https://github.com/Desklop/StressRNN
- https://github.com/nsu-ai/russian_g2p
- https://github.com/nsu-ai-team/russian_g2p_neuro
- https://github.com/suralmasha/RuTranscript
- https://github.com/MashaPo/russtress
- https://huggingface.co/IlyaGusev/ru-word-stress-transformer
- https://github.com/reynoldsnlp/udar/blob/main/src/udar/resources/src/Tixonov.txt - Морфемно-орфографический словарь Тихонова
- http://aot.ru - Источник словаря Зализняка в машинном формате
- https://github.com/gramdict/gramdict - современная версия словаря Зализняка
- http://odict.ru/ - другое развитие Зализняка
- http://opencorpora.org/ - размеченный морфологический словарь
- https://ru.wiktionary.org - Wiktionary
- https://kaikki.org/dictionary/Russian/ - дамп wiktionary в удобном формате
- https://github.com/e2yo/eyo-kernel
- https://github.com/kalashnikovisme/karamzin
- https://github.com/link2xt/yoficator
- https://github.com/Text-extend-tools/python-yoficator
- https://github.com/emacsmirror/yoficator
- https://github.com/unabashed/yoficator
- https://github.com/sovaai/sova-tts-tps
- https://github.com/snakers4/silero-models#text-enhancement
- https://github.com/snakers4/russian_stt_text_normalization
Сравнение моделей тут.
- Vosk Small https://alphacephei.com/vosk/models/vosk-model-small-ru-0.22.zip
- Vosk Big 0.22 https://alphacephei.com/vosk/models/vosk-model-ru-0.22.zip
- Vosk Big 0.42 https://alphacephei.com/vosk/models/vosk-model-ru-0.42.zip
- Nvidia RNNT Large https://huggingface.co/nvidia/stt_ru_conformer_transducer_large
- Whisper medium https://github.com/openai/whisper
- Whisper Adapted Medium https://huggingface.co/mitchelldehaven/whisper-medium-ru
- Whisper Adapted Large https://huggingface.co/mitchelldehaven/whisper-large-v2-ru
- Wav2VecLM https://huggingface.co/jonatasgrosman/wav2vec2-xls-r-1b-russian
- Wav2VecLM Bond005 https://huggingface.co/bond005/wav2vec2-large-ru-golos (version 03.2023)
- Salute Citrinet https://github.com/salute-developers/golos
- FunASR Russian https://modelscope.cn/models/damo/speech_UniASR_asr_2pass-ru-16k-common-vocab1664-tensorflow1-offline/summary
Не тестировались (похуже качеством)
- https://github.com/kotikkonstantin/ru-autopunctuation
- https://huggingface.co/kontur-ai/sbert_punc_case_ru
- https://github.com/vlomme/Bert-Russian-punctuation
- https://github.com/Lesha17/Punctuation
- https://github.com/gleb-skobinsky/ru_punct
- https://github.com/sviperm/neuro-comma
- https://github.com/snakers4/silero-models
- https://github.com/marlon-br/neuro-comma
- https://github.com/sviperm/neuro-comma
- https://github.com/averkij/multipunct