paulpaul91

paulpaul91's Stars

deepseek-ai/DeepSeek-V3
Language:Python94.7k 742 49115.3k
binary-husky/gpt_academic
为GPT/GLM等LLM大语言模型提供实用化交互接口，特别优化论文阅读/润色/写作体验，模块化设计，支持自定义快捷按钮&函数插件，支持Python和C++等项目剖析&自译解功能，PDF/LaTex论文翻译&总结功能，支持并行问询多种LLM模型，支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。
Language:Python68k 280 1.7k8.3k
2noise/ChatTTS
A generative speech model for daily dialogue.
Language:Python35.5k 193 6243.8k
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。
Language:Python29.3k 146 1.2k2.3k
VikParuchuri/marker
Convert PDF to markdown + JSON quickly with high accuracy
Language:Python23.5k 95 4171.5k
huggingface/trl
Train transformer language models with reinforcement learning.
Language:Python12.9k 86 1.7k1.7k
ShiArthur03/ShiArthur03
Language:MATLAB10.3k 32 1.4k1.9k
facebookresearch/nougat
Implementation of Nougat Neural Optical Understanding for Academic Documents
Language:Python9.4k 64 219604
open-mmlab/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
Language:Python8.9k 81 256694
huggingface/accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
Language:Python8.6k 96 1.8k1.1k
microsoft/DeepSpeedExamples
Example models using DeepSpeed
Language:Python6.2k 74 5461.1k
QwenLM/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
Language:Python5.7k 49 463434
arcee-ai/mergekit
Tools for merging pretrained large language models.
Language:Python5.5k 58 361521
huggingface/parler-tts
Inference and training library for high-quality TTS models.
Language:Python5.2k 56 144544
FunAudioLLM/SenseVoice
Multilingual Voice Understanding Model
Language:Python5.2k 51 186466
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
Language:Python4.3k 32 510270
clovaai/deep-text-recognition-benchmark
Text recognition (optical character recognition) with deep learning methods, ICCV 2019
Language:Jupyter Notebook3.8k 85 3981.1k
facebookresearch/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Language:Python3.6k 58 71322
baaivision/EVA
EVA Series: Visual Representation Fantasies from BAAI
Language:Python2.5k 31 165183
stanford-crfm/helm
Holistic Evaluation of Language Models (HELM), a framework to increase the transparency of language models (https://arxiv.org/abs/2211.09110). This framework is also used to evaluate text-to-image models in HEIM (https://arxiv.org/abs/2311.04287) and vision-language models in VHELM (https://arxiv.org/abs/2410.07112).
Language:Python2.1k 38 1.1k283
microsoft/Megatron-DeepSpeed
Ongoing research training transformer language models at scale, including: BERT & GPT-2
Language:Python2k 25 186347
libukai/Awesome-ChatTTS
官方推荐的 ChatTTS 资源汇总项目，整理了全网相关资源和常见问题 || Officially recommended ChatTTS resource collection project
1.6k 15 094
acids-ircam/RAVE
Official implementation of the RAVE model: a Realtime Audio Variational autoEncoder
Language:Python1.5k 46 176193
pjlab-sys4nlp/llama-moe
⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)
Language:Python939 8 2454
ssine/pptx2md
a pptx to markdown converter
Language:Python939 16 56119
HITsz-TMG/UMOE-Scaling-Unified-Multimodal-LLMs
The codes about "Uni-MoE: Scaling Unified Multimodal Models with Mixture of Experts"
Language:Python706 11 1342
ZhangXInFD/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Language:Python539 16 2350
langgptai/Awesome-Multimodal-Prompts
Prompts of GPT-4V & DALL-E3 to full utilize the multi-modal ability. GPT4V Prompts, DALL-E3 Prompts.
245 2 016
mynameischaos/Lion
Lion: Kindling Vision Intelligence within Large Language Models
52 2 51
chaoyi-wu/GPT-4V_Medical_Evaluation
41 1 15