XL2013's Stars
Deep-Agent/R1-V
Witness the aha moment of VLM with less than $3.
om-ai-lab/VLM-R1
Solve Visual Understanding with Reinforced VLMs
huggingface/open-r1
Fully open reproduction of DeepSeek-R1
stepfun-ai/Step-Video-T2V
deepseek-ai/DeepSeek-R1
khoj-ai/khoj
Your AI second brain. Self-hostable. Get answers from the web or your docs. Build custom agents, schedule automations, do deep research. Turn any online or local LLM into your personal, autonomous AI (gpt, claude, gemini, llama, qwen, mistral). Get started - free.
THU-MIG/YOLO-UniOW
YOLO-UniOW: Efficient Universal Open-World Object Detection
Karbo123/segmentator
Segmentator for clustering on meshes or pointclouds
soxoj/maigret
🕵️♂️ Collect a dossier on a person by username from thousands of sites
MAmmoTH-VL/MAmmoTH-VL
MAmmoTH-VL: Eliciting Multimodal Reasoning with Instruction Tuning at Scale
RyanG41/SA3DIP
ali-vilab/In-Context-LoRA
Official repository of In-Context LoRA for Diffusion Transformers
DS4SD/docling
Get your documents ready for gen AI
soimort/you-get
:arrow_double_down: Dumb downloader that scrapes the web
ptrvilya/blendify
Lightweight Python framework that provides a high-level API for creating and rendering scenes with Blender.
astral-sh/uv
An extremely fast Python package and project manager, written in Rust.
yt-dlp/yt-dlp
A feature-rich command-line audio/video downloader
aminebdj/OpenYOLO3D
[ICLR 2025 (Oral 📢) ] Our OpenYOLO3D model achieves state-of-the-art performance in Open Vocabulary 3D Instance Segmentation on ScanNet200 and Replica datasets with up ∼16x speedup compared to the best existing method in literature.
TIGER-AI-Lab/VLM2Vec
This repo contains the code for "VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks" [ICLR25]
DIYgod/RSSHub
🧡 Everything is RSSible
embodied-generalist/embodied-generalist
[ICML 2024] Official code repository for 3D embodied generalist agent LEO
ZCMax/LLaVA-3D
A Simple yet Effective Pathway to Empowering LLaVA to Understand and Interact with 3D World
wgh136/PicaComic
A comic app built with Flutter, supporting multiple comic sources.
caorushizi/mediago
跨平台视频提取工具:支持流媒体下载、视频下载、m3u8 下载及 B站视频下载,提供 Windows 和 Mac 桌面客户端。Cross-platform video extraction tool: Supports streaming download, video download, m3u8 download, and Bilibili video download, with desktop clients for Windows and Mac.
dk-liang/UniSeg3D
[NeurIPS 2024] A Unified Framework for 3D Scene Understanding
hzwer/WritingAIPaper
Writing AI Conference Papers: A Handbook for Beginners
RSSNext/Follow
🧡 Follow everything in one place
fishaudio/fish-speech
SOTA Open Source TTS
ZiyuGuo99/SAM2Point
The Most Faithful Implementation of Segment Anything (SAM) in 3D
sherlock-project/sherlock
Hunt down social media accounts by username across social networks