emigmo's Stars
Yxxxb/VoCo-LLaMA
VoCo-LLaMA: This repo is the official implementation of "VoCo-LLaMA: Towards Vision Compression with Large Language Models".
CleanDiffuserTeam/CleanDiffuser
CleanDiffuser: An Easy-to-use Modularized Library for Diffusion Models in Decision Making
ethz-spylab/agentdojo
A Dynamic Environment to Evaluate Attacks and Defenses for LLM Agents.
liugangcode/InfoAlign
The code for "Learning Molecular Representation in a Cell"
alanaai/EVUD
Egocentric Video Understanding Dataset (EVUD)
ml-research/LlavaGuard
YangLing0818/buffer-of-thought-llm
Buffer of Thoughts: Thought-Augmented Reasoning with Large Language Models
RL4VLM/RL4VLM
Official Repo for Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning
togethercomputer/MoA
Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models
yfzhang114/SliME
✨✨Beyond LLaVA-HD: Diving into High-Resolution Large Multimodal Models
DAMO-NLP-SG/VideoLLaMA2
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
OpenGVLab/De-focus-Attention-Networks
Learning 1D Causal Visual Representation with De-focus Attention Networks
NL2Code/CodeR
truefoundry/cognita
RAG (Retrieval Augmented Generation) Framework for building modular, open source applications for production by TrueFoundry
infiniflow/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
SALT-NLP/demonstrated-feedback
kaistAI/Janus
[ACL 2024 NLP4ConvAI Oral] Train LLMs with diverse system messages reflecting individualized preferences to generalize to unseen system messages
beccabai/Data-centric_multimodal_LLM
Survey on Data-centric Large Language Models
2noise/ChatTTS
A generative speech model for daily dialogue.
shenao-zhang/SELM
The official implementation of Self-Exploring Language Models (SELM)
sahsaeedi/triple-preference-optimization
princeton-nlp/SimPO
SimPO: Simple Preference Optimization with a Reference-Free Reward
YueFan1014/VideoAgent
This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)
architsharma97/dpo-rlaif
X-LANCE/SLAM-LLM
Speech, Language, Audio, Music Processing with Large Language Model
RLHF-V/RLAIF-V
RLAIF-V: Aligning MLLMs through Open-Source AI Feedback for Super GPT-4V Trustworthiness
EDiRobotics/GR1-Training
A generalized policy for robotics manipulation
EvolvingLMMs-Lab/lmms-eval
Accelerating the development of large multimodal models (LMMs) with lmms-eval
YifeiZhou02/ArCHer
Research Code for "ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL"
KindXiaoming/pykan
Kolmogorov Arnold Networks