Pinned Repositories
ASR
GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
impala
hbase-clouder-0.94.6
jaxrl
JAX (Flax) implementation of algorithms for Deep Reinforcement Learning with continuous action spaces.
LLaSM
第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验,同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
PunctuationModel
中文标点符号模型,可以给文本添加标点符号。
Reinforcement-learning-with-tensorflow
Simple Reinforcement learning tutorials
self-llm
《开源大模型食用指南》基于AutoDL快速部署开源大模型,更适合**宝宝的部署教程
tensorflow-with-kenlm
Tensorflow with KenLM integrated for beam search scoring
VTuberTalk
shiyuzh2007's Repositories
shiyuzh2007/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
shiyuzh2007/self-llm
《开源大模型食用指南》基于AutoDL快速部署开源大模型,更适合**宝宝的部署教程
shiyuzh2007/3D-Speaker
A repository for single- and multi-modal speaker verification, speaker recognition and speaker diarization.
shiyuzh2007/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
shiyuzh2007/audioldm_eval
This toolbox aims to unify audio generation model evaluation for easier comparison.
shiyuzh2007/AutoGPT
AutoGPT is the vision of accessible AI for everyone, to use and to build on. Our mission is to provide the tools, so that you can focus on what matters.
shiyuzh2007/AutoGPTQ
An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.
shiyuzh2007/awesome-chatgpt
A curated list of awesome ChatGPT related projects.
shiyuzh2007/Awesome-Multimodal-Large-Language-Models
:sparkles::sparkles:Latest Papers and Datasets on Multimodal Large Language Models, and Their Evaluation.
shiyuzh2007/Bert-VITS2
vits2 backbone with bert
shiyuzh2007/e2m
E2M converts various file types (doc, docx, epub, html, htm, url, pdf, ppt, pptx, mp3, m4a) into Markdown. It’s easy to install, with dedicated parsers and converters, supporting custom configs. E2M offers an all-in-one, flexible, and open-source solution.
shiyuzh2007/facechain
FaceChain is a deep-learning toolchain for generating your Digital-Twin.
shiyuzh2007/FunASR
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models.
shiyuzh2007/i-Code
shiyuzh2007/langchain
⚡ Building applications with LLMs through composability ⚡
shiyuzh2007/langflow
⛓️ Langflow is a UI for LangChain, designed with react-flow to provide an effortless way to experiment and prototype flows.
shiyuzh2007/llama
Inference code for LLaMA models
shiyuzh2007/magvit
Official JAX implementation of MAGVIT: Masked Generative Video Transformer
shiyuzh2007/Make-An-Audio
shiyuzh2007/OOTDiffusion
Official implementation of OOTDiffusion
shiyuzh2007/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
shiyuzh2007/PALM-E
Implementation of "PaLM-E: An Embodied Multimodal Language Model"
shiyuzh2007/Qwen
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
shiyuzh2007/Qwen-Audio
The official repo of Qwen-Audio (通义千问-Audio) chat & pretrained large audio language model proposed by Alibaba Cloud.
shiyuzh2007/safe-rlhf
Safe-RLHF: Constrained Value Alignment via Safe Reinforcement Learning from Human Feedback
shiyuzh2007/seamless_communication
Foundational Models for State-of-the-Art Speech and Text Translation
shiyuzh2007/stable-diffusion-webui
Stable Diffusion web UI
shiyuzh2007/unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
shiyuzh2007/Video-LLaVA
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
shiyuzh2007/wukong-robot
🤖 wukong-robot 是一个简单、灵活、优雅的中文语音对话机器人/智能音箱项目,支持ChatGPT多轮对话能力,还可能是首个支持脑机交互的开源智能音箱项目。