paulpaul91's Stars
deepseek-ai/DeepSeek-V3
binary-husky/gpt_academic
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。
2noise/ChatTTS
A generative speech model for daily dialogue.
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
VikParuchuri/marker
Convert PDF to markdown + JSON quickly with high accuracy
huggingface/trl
Train transformer language models with reinforcement learning.
ShiArthur03/ShiArthur03
facebookresearch/nougat
Implementation of Nougat Neural Optical Understanding for Academic Documents
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
huggingface/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
microsoft/DeepSpeedExamples
Example models using DeepSpeed
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
arcee-ai/mergekit
Tools for merging pretrained large language models.
huggingface/parler-tts
Inference and training library for high-quality TTS models.
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
clovaai/deep-text-recognition-benchmark
Text recognition (optical character recognition) with deep learning methods, ICCV 2019
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
baaivision/EVA
EVA Series: Visual Representation Fantasies from BAAI
stanford-crfm/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
microsoft/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
libukai/Awesome-ChatTTS
官方推荐的 ChatTTS 资源汇总项目,整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project
acids-ircam/RAVE
Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder
pjlab-sys4nlp/llama-moe
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
ssine/pptx2md
a pptx to markdown converter
HITsz-TMG/UMOE-Scaling-Unified-Multimodal-LLMs
The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
langgptai/Awesome-Multimodal-Prompts
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
mynameischaos/Lion
Lion: Kindling Vision Intelligence within Large Language Models
chaoyi-wu/GPT-4V_Medical_Evaluation