Siri-2001's Stars
labmlai/annotated_deep_learning_paper_implementations
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), gans(cyclegan, stylegan2, ...), 🎮 reinforcement learning (ppo, dqn), capsnet, distillation, ... 🧠
Vision-CAIR/MiniGPT-4
Open-sourced codes for MiniGPT-4 and MiniGPT-v2 (https://minigpt-4.github.io, https://minigpt-v2.github.io/)
Mintplex-Labs/anything-llm
The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.
haotian-liu/LLaVA
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
SYSTRAN/faster-whisper
Faster Whisper transcription with CTranslate2
liguodongiot/llm-action
本项目旨在分享大模型相关技术原理以及实战经验(大模型工程化、大模型应用落地)
xszyou/Fay
Fay is an open-source digital human framework integrating language models and digital characters. It offers retail, assistant, and agent versions for diverse applications like virtual shopping guides, broadcasters, assistants, waiters, teachers, and voice or text-based mobile assistants.
ymcui/Chinese-LLaMA-Alpaca-2
中文LLaMA-2 & Alpaca-2大模型二期项目 + 64K超长上下文模型 (Chinese LLaMA-2 & Alpaca-2 LLMs with 64K long context models)
lonePatient/awesome-pretrained-chinese-nlp-models
Awesome Pretrained Chinese NLP Models,高质量中文预训练模型&大模型&多模态模型&大语言模型集合
DAMO-NLP-SG/Video-LLaMA
[EMNLP 2023 Demo] Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
lucidrains/naturalspeech2-pytorch
Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
Jamie-Stirling/RetNet
An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
jin-s13/COCO-WholeBody
ECCV2020 paper "Whole-Body Human Pose Estimation in the Wild"
cschenxiang/DRSformer
Learning A Sparse Transformer Network for Effective Image Deraining (CVPR 2023)
wholebody3d/wholebody3d
Official repository of Human3.6M 3D WholeBody (H3WB) dataset
syncdoth/RetNet
Huggingface compatible implementation of RetNet (Retentive Networks, https://arxiv.org/pdf/2307.08621.pdf) including parallel, recurrent, and chunkwise forward.
woodfrog/floor-sp
Floor-SP: Inverse CAD for Floorplans by Sequential Room-wise Shortest Path, ICCV 2019
Awexander/audioWhisper
Listen to any audio stream on your machine and print out the transcribed or translated audio.
junyangwang0410/AMBER
An LLM-free Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation
udon-universe/stable-diffusion-webui-extension-templates
a template of stable-diffusion-webui extension
wanng-ide/VQA_to_multimodal_survey
Update 2020
Holipori/MIMIC-Diff-VQA
DrZiji/VecFloorSeg
Source code repo for VectorFloorSeg: Two-Stream Graph Attention Network for Vectorized Roughcast Floorplan Segmentation
imatge-upc/slt_how2sign_wicv2023
Sign Language Translation for Instructional Videos - CVPR WiCV 2023
sharif1093/py_floor_plan_segmenter
A Python package to segment cluttered 2D floor plans based on down-sampling.
heyudage/VoiceTyping
通过语音(说话)即可完成实时文本输入。通过PaddleSpeech项目二次开发 完成,支持离线脱网环境部署,支持GPU推理,目前客户端仅支持Windows。
ByZ0e/Glance-Focus
This repo contains source code for Glance and Focus: Memory Prompting for Multi-Event Video Question Answering (Accepted in NeurIPS 2023)
bipashasen/INR-V-VideoGenerationSpace
The Official Implementation for INR-V: A Continuous Representation Space for Video-based Generative Tasks
sor8sh/Room-Segmentation
Automatic Room Segmentation
wangcunxiang/Graph-aS-Tokens