Frankluox's Stars
RVC-Boss/GPT-SoVITS
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
2noise/ChatTTS
A generative speech model for daily dialogue.
princeton-nlp/SWE-agent
SWE-agent takes a GitHub issue and tries to automatically fix it, using GPT-4, or your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges.
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
axolotl-ai-cloud/axolotl
Go ahead and axolotl questions
lllyasviel/Omost
Your image is almost there!
OpenBMB/MiniCPM
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
huggingface/lerobot
🤗 LeRobot: Making AI for Robotics more accessible with end-to-end learning
arcee-ai/mergekit
Tools for merging pretrained large language models.
facebookresearch/schedule_free
Schedule-Free Optimization in PyTorch
NVlabs/VILA
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
google-deepmind/penzai
A JAX research toolkit for building, editing, and visualizing neural networks.
dora-rs/dora
DORA (Dataflow-Oriented Robotic Architecture) is middleware designed to streamline and simplify the creation of AI-based robotic applications. It offers low latency, composable, and distributed dataflow capabilities. Applications are modeled as directed graphs, also referred to as pipelines.
google-deepmind/mujoco_menagerie
A collection of high-quality models for the MuJoCo physics engine, curated by Google DeepMind.
PufferAI/PufferLib
Simplifying reinforcement learning for complex game environments
prometheus-eval/prometheus-eval
Evaluate your LLM's response with Prometheus and GPT4 💯
IDEA-Research/Grounding-DINO-1.5-API
API for Grounding DINO 1.5: IDEA Research's Most Capable Open-World Object Detection Model Series
OpenTeleVision/TeleVision
[CoRL 2024] Open-TeleVision: Teleoperation with Immersive Active Visual Feedback
robfiras/loco-mujoco
Imitation learning benchmark focusing on complex locomotion tasks using MuJoCo.
robocasa/robocasa
RoboCasa: Large-Scale Simulation of Everyday Tasks for Generalist Robots
AIGText/Glyph-ByT5
[ECCV2024] This is an official inference code of the paper "Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering" and "Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering""
OpenRobotLab/GRUtopia
GRUtopia: Dream General Robots in a City at Scale
maitrix-org/Pandora
Pandora: Towards General World Model with Natural Language Actions and Video States
bigcode-project/starcoder2-self-align
StarCoder2-Instruct: Fully Transparent and Permissive Self-Alignment for Code Generation
AILab-CVC/CV-VAE
CV-VAE: A Compatible Video VAE for Latent Generative Video Models
mihirp1998/VADER
Video Diffusion Alignment via Reward Gradients. We improve a variety of video diffusion models such as VideoCrafter, OpenSora, ModelScope and StableVideoDiffusion by finetuning them using various reward models such as HPS, PickScore, VideoMAE, VJEPA, YOLO, Aesthetics etc.
google-research/android_world
AndroidWorld is an environment and benchmark for autonomous agents
jonzamora/awesome-robot-learning-envs
A list of awesome and popular robot learning environments
mathvision-cuhk/MATH-V
MATH-Vision dataset and code to measure Multimodal Mathematical Reasoning capabilities.
DCDmllm/MorphTokens