Pinned Repositories
3ddfav2_cpp
the cpp version of 3ddfav2
AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.
Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
catchup
Monkey
Monkey (LMM); 多模态大模型 华科小猴子
openBliSSART
Blind Source Separation for Audio Recognition Tasks
pineking's Repositories
pineking/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Alibaba DAMO Academy.
pineking/Amphion
Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
pineking/AniPortrait
AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation
pineking/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
pineking/bark
🔊 Text-Prompted Generative Audio Model
pineking/Monkey
Monkey (LMM); 多模态大模型 华科小猴子
pineking/catvision
A multimodal large-scale model, which performs close to the closed-source Qwen-VL-PLUS on many datasets and significantly surpasses the performance of the open-source model Qwen-VL-7B-Chat.
pineking/CMMMU
pineking/DCTC
pineking/dreamtalk
Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
pineking/E2STR
The official code for the CVPR 2024 paper: Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
pineking/Emu
Emu: An Open Multimodal Generalist
pineking/FastChat
An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.
pineking/FiT
FiT: Flexible Vision Transformer for Diffusion Model
pineking/generative-models
Generative Models by Stability AI
pineking/genmusic_demo_list
a list of demo websites for automatic music generation research
pineking/llama.cpp
Port of Facebook's LLaMA model in C/C++
pineking/LLaMA2-Accessory
An Open-source Toolkit for LLM Development
pineking/LLM-groundedDiffusion
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models (LLM-grounded Diffusion: LMD)
pineking/MakeItTalk
pineking/Multi-Modality-Arena
Chatbot Arena meets multi-modality! Multi-Modality Arena allows you to benchmark vision-language models side-by-side while providing images as inputs. Supports MiniGPT-4, LLaMA-Adapter V2, LLaVA, BLIP-2, and many more!
pineking/MultimodalOCR
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
pineking/nougat
Implementation of Nougat Neural Optical Understanding for Academic Documents
pineking/Open-AnimateAnyone
Unofficial Implementation of Animate Anyone
pineking/open_flamingo
An open-source framework for training large multimodal models.
pineking/PhotoMaker
PhotoMaker
pineking/pineking.github.io
pineking/Prompt-Engineering-Guide
🐙 Guides, papers, lecture, notebooks and resources for prompt engineering
pineking/text-generation-inference
Large Language Model Text Generation Inference
pineking/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities