Tigerdwgth's Stars
littlecodersh/ItChat
A complete and graceful API for Wechat. 微信个人号接口、微信机器人及命令行微信,三十行即可自定义个人号机器人。
BlinkDL/RWKV-LM
RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
GAIR-NLP/O1-Journey
O1 Replication Journey: A Strategic Progress Report – Part I
lfranke/vr_splatting
shibhansh/loss-of-plasticity
Demonstrations of Loss of Plasticity and Implementation of Continual Backpropagation
octo-models/octo
Octo is a transformer-based robot policy trained on a diverse mix of 800k robot trajectories.
jayLEE0301/vq_bet_official
Official code for "Behavior Generation with Latent Actions" (ICML 2024 Spotlight)
google-deepmind/open_x_embodiment
tonyzhaozh/act
Shaka-Labs/ACT
Action Chunking Transformer implementation for low cost robot
mlzxy/arp
Autoregressive Policy for Robot Learning
CleanDiffuserTeam/CleanDiffuser
CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making
YanjieZe/3D-Diffusion-Policy
[RSS 2024] 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
bytedance/GR-1
Code for "Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation"
jy0205/LaVIT
LaVIT: Empower the Large Language Model to Understand and Generate Visual Content
tyshiwo1/Accelerating-T2I-AR-with-SJD
Implementation of Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Alpha-VLLM/Lumina-mGPT
Official Implementation of "Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining"
apoorvumang/prompt-lookup-decoding
hyx1999/SAM-Decoding
Official Implementation of SAM-Decoding: Speculative Decoding via Suffix Automaton
FasterDecoding/REST
REST: Retrieval-Based Speculative Decoding, NAACL 2024
cvg/NoPoSplat
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
collabora/WhisperFusion
WhisperFusion builds upon the capabilities of WhisperLive and WhisperSpeech to provide a seamless conversations with an AI.
haksorus/gsplatloc
GSplatLoc: Grounding Keypoint Descriptors into 3D Gaussian Splatting for Improved Visual Localization
AudioLLMs/AudioLLM
Audio Large Language Models
fixie-ai/ultravox
A fast multimodal LLM for real-time voice
THUDM/GLM-4-Voice
GLM-4-Voice | 端到端中英语音对话模型
dcharatan/pixelsplat
[CVPR 2024 Oral, Best Paper Runner-Up] Code for "pixelSplat: 3D Gaussian Splats from Image Pairs for Scalable Generalizable 3D Reconstruction" by David Charatan, Sizhe Lester Li, Andrea Tagliasacchi, and Vincent Sitzmann
Lightning-AI/litgpt
20+ high-performance LLMs with recipes to pretrain, finetune and deploy at scale.
TEN-framework/TEN-Agent
TEN Agent is a world-class multimodal AI agent integrated with the OpenAI Realtime API, RTC, and features weather checks, web search, vision, and RAG.
ictnlp/LLaMA-Omni
LLaMA-Omni is a low-latency and high-quality end-to-end speech interaction model built upon Llama-3.1-8B-Instruct, aiming to achieve speech capabilities at the GPT-4o level.