"If you can’t measure it, you can’t improve it." -- Peter Drucker
Regarding to the current state of Automatic Speech Recognition(ASR), the term "State-Of-The-Art"(SOTA) is kind of vague in the sense that:
- For industry, there is no objective and quantative benchmark on how these commercial APIs perform in real-life scenarios, at least in public domain.
- For academia, it is becoming harder today to compare ASR models due to the fragmentation of research toolkits and ecosystems.
- How are academic SOTA and industrial SOTA related ?
As above figure shows, SpeechIO leaderboard serves as an ASR benchmarking platform, by providing 3 components:
- TestSet Zoo: A collection of test sets covering wide range of speech recognition scenarios
- Model Zoo: A collection of models including commercial APIs and open-sourced pretrained models
- An automated benchmarking pipeline:
- defines a simplest-possible specification on recognition interface, the format of input test sets, the format of output recognition results.
- As long as model submitters conform to this specification, a fully automated pipeline will take care of the rest (e.g. data preparation -> recognition invocation -> text post processing -> WER/CER/SER evaluation)
With SpeechIO leaderboard, anyone can benchmark, reproduce, compare others' systems on local machine, as long as they are released in model zoo and test-set zoo.
Test Sets From Public Academic Datasets
已公开 Released |
编号 TEST_SET_ID |
说明 DESCRIPTION |
语言 LANGUAGE |
---|---|---|---|
✓ | LIBRISPEECH_TEST_CLEAN | "test_clean" set of LibriSpeech | en |
✓ | LIBRISPEECH_TEST_OTHER | "test_other" set of LibriSpeech | en |
✓ | GIGASPEECH_V1.0.0_DEV | dev set of GigaSpeech | en |
✓ | GIGASPEECH_V1.0.0_TEST | test set of GigaSpeech | en |
✓ | AISHELL1_TEST | test set of AISHELL-1 | zh |
✓ | AISHELL2_IOS_TEST | test set of AISHELL-2 (iOS channel) | zh |
✓ | AISHELL2_ANDROID_TEST | test set of AISHELL-2 (Android channel) | zh |
✓ | AISHELL2_MIC_TEST | test set of AISHELL-2 (Microphone channel) | zh |
SpeechIO Test Sets (ZH)
SpeechIO test sets are carefully curated by SpeechIO authors, crawled from publicly available sources (Youtube, TV programs, Podcast etc), covering various well-known acoustic scenarios(AM) and content domains(LM & vocabulary), labeled by professional annotators.
已公开 Released |
编号 TEST_SET_ID |
名称 Name |
场景 Scenario |
内容领域 Topic Domain |
时长 hours |
难度(1-5) Difficulty |
---|---|---|---|---|---|---|
✓ | SPEECHIO_ASR_ZH00000 | 接入调试集 For leaderboard submitter debugging |
视频会议、论坛演讲 video conference & forum speech |
经济、货币、金融 economy, currency, finance |
1.0 | ★★☆ |
✓ | SPEECHIO_ASR_ZH00001 | 新闻联播 | 新闻播报 TV News |
时政 news & politics |
9 | ★ |
✓ | SPEECHIO_ASR_ZH00002 | 鲁豫有约 | 访谈电视节目 TV interview |
名人工作/生活 celebrity & film & music & daily |
3 | ★★☆ |
✗ | SPEECHIO_ASR_ZH00003 | 天下足球 | 专题电视节目 TV program |
足球 Sports & Football & Worldcup |
2.7 | ★★☆ |
✗ | SPEECHIO_ASR_ZH00004 | 罗振宇跨年演讲 | 会场演讲 Stadium Public Speech |
社会、人文、商业 Society & Culture & Business Trend |
2.7 | ★★ |
✗ | SPEECHIO_ASR_ZH00005 | 李永乐老师在线讲堂 | 在线教育 Online Education |
科普 Popular Science |
4.4 | ★★★ |
✗ | SPEECHIO_ASR_ZH00006 | 张大仙 & *白 王者荣耀直播 | 直播 Live Broadcasting |
游戏 Game |
1.6 | ★★★☆ |
✗ | SPEECHIO_ASR_ZH00007 | 李佳琪 & 薇娅 直播带货 | 直播 Live Broadcasting |
电商、美妆 Makeup & Online shopping/advertising |
0.9 | ★★★★☆ |
✗ | SPEECHIO_ASR_ZH00008 | 老罗语录 | 线下培训 Offline lecture |
段子、做人 Life & Purpose & Ethics |
1.3 | ★★★★☆ |
✗ | SPEECHIO_ASR_ZH00009 | 故事FM | 播客 Podcast |
人生故事、见闻 Ordinary Life Story Telling |
4.5 | ★★☆ |
✗ | SPEECHIO_ASR_ZH00010 | 创业内幕 | 播客 Podcast |
创业、产品、投资 Startup & Enterprenuer & Product & Investment |
4.2 | ★★☆ |
✗ | SPEECHIO_ASR_ZH00011 | 罗翔 刑法法考培训讲座 | 在线教育 Online Education |
法律 法考 Law & Lawyer Qualification Exams |
3.4 | ★★☆ |
✗ | SPEECHIO_ASR_ZH00012 | 张雪峰 考研线上小讲堂 | 在线教育 Online Education |
考研 高校报考 University & Graduate School Entrance Exams |
3.4 | ★★★☆ |
✗ | SPEECHIO_ASR_ZH00013 | 谷阿莫&牛叔说电影 | 短视频 VLog |
电影剪辑 Movie Cuts |
1.8 | ★★★ |
✗ | SPEECHIO_ASR_ZH00014 | 贫穷料理 & 琼斯爱生活 | 短视频 VLog |
美食、烹饪 Food & Cooking & Gourmet |
1 | ★★★☆ |
✗ | SPEECHIO_ASR_ZH00015 | 单田芳 白眉大侠 | 评书 Traditional Podcast |
江湖、武侠 Kongfu Fiction |
2.2 | ★★☆ |
✗ | SPEECHIO_ASR_ZH00016 | 德云社相声演出 | 剧场相声 Theater Crosstalk Show |
包袱段子 Funny Stories |
1 | ★★★ |
✗ | SPEECHIO_ASR_ZH00017 | 吐槽大会 | 脱口秀电视节目 Standup Comedy |
明星糗事 Celebrity Jokes |
1.8 | ★★☆ |
✗ | SPEECHIO_ASR_ZH00018 | 小猪佩奇 & 熊出没 | 少儿动画 Children Cartoon |
童话故事、日常 Fairy Tale |
0.9 | ★☆ |
✗ | SPEECHIO_ASR_ZH00019 | CCTV5 NBA 比赛转播 | 体育赛事解说 Sports Game Live |
篮球、NBA NBA Game |
0.7 | ★★★ |
✗ | SPEECHIO_ASR_ZH00020 | 篮球人物 | 纪录片 Documentary |
篮球明星、成长 NBA Super Stars' Life & History |
2.2 | ★★ |
✗ | SPEECHIO_ASR_ZH00021 | 汽车之家 车辆评测 | 短视频 VLog |
汽车测评 Car benchmarks, Road driving test |
1.7 | ★★★☆ |
✗ | SPEECHIO_ASR_ZH00022 | 小艾大叔 豪宅带看 | 短视频 VLog |
房地产、豪宅 Realestate, Mansion tour |
1.7 | ★★★ |
✗ | SPEECHIO_ASR_ZH00023 | 无聊开箱 & Zealer评测 | 短视频 VLog |
产品开箱评测 Unboxing |
2 | ★★★ |
✗ | SPEECHIO_ASR_ZH00024 | 付老师种植技术 | 短视频 VLog |
农业、种植 Agriculture, Planting |
2.7 | ★★★☆ |
✗ | SPEECHIO_ASR_ZH00025 | 石国鹏讲古希腊哲学 | 线下培训 Offline lecture |
历史,古希腊哲学 History, Greek philosophy |
1.3 | ★★☆ |
✗ | SPEECHIO_ASR_ZH00026 | 张震鬼故事 | 广播节目 Broadcasting Program |
鬼故事 Horror Stories |
2.4 | ★★★ |
✗ | SPEECHIO_ASR_ZH00027 | 华语辩论世界杯 | 辩论赛 Debates Contest |
兴趣、技能、成长 Hobby, Skill, Growth |
1.4 | ★★★ |
✗ | SPEECHIO_ASR_ZH00028 | 时政现场同传 | 同声传译 Simultaneous Translation |
时政、社会公共治理 News & Events on Public Governance |
2.1 | ★★★☆ |
To pull a released test set from cloud to your local dataset-zoo leaderboard/datasets/*
:
ops/pull dataset <TEST_SET_ID>
Cloud API Models
API models are usually small (basically client programs), so we normally put them in this github repo.
Local Engine (Open-sourced Pretrained ASR Models)
Local models/engines are normally too large for github, so we store these models in cloud.
已公开 Released |
编号 MODEL_ID |
类型 type |
模型作者/所有人 model author/owner |
简介 description |
---|---|---|---|---|
✓ | speechio_kaldi_multicn | pretrained model | Xingyu NA(那兴宇) | Kaldi multi_cn recipe |
✓ | wenet_multi_cn | pretrained model | Binbin Zhang(张彬彬)@wenet-e2e | WeNet multi_cn recipe |
✓ | vosk_model_cn | batteries-included local engine | alphacephei | Chinese engine of Vosk |
To pull a released model from cloud to your local model-zoo leaderboard/models/*
:
ops/pull model <MODEL_ID>
To submit your model to leaderboard and get it benchmarked over all(including unreleased) test sets, follow this Specification
Also you can pull publicly released models & test sets, and trigger benchmarking pipeline on your local machine via:
ops/leaderboard_runner requests/request.yaml
the content of request.yaml
is described in above specification.
Email: leaderboard@speechio.ai