hululuzhu's Stars
microsoft/markitdown
Python tool for converting files and office documents to Markdown.
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
Genesis-Embodied-AI/Genesis
A generative world for general-purpose robotics & embodied AI learning.
barry-ran/QtScrcpy
Android real-time display control software
facebookresearch/audiocraft
Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor / tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
openai/swarm
Educational framework exploring ergonomic, lightweight multi-agent orchestration. Managed by OpenAI Solution team.
state-spaces/mamba
Mamba SSM architecture
richards199999/Thinking-Claude
Let your Claude able to think
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
facebookresearch/demucs
Code for the paper Hybrid Spectrogram and Waveform Source Separation
ltdrdata/ComfyUI-Manager
ComfyUI-Manager is an extension designed to enhance the usability of ComfyUI. It offers management functions to install, remove, disable, and enable various custom nodes of ComfyUI. Furthermore, this extension provides a hub feature and convenience functions to access a wide range of information within ComfyUI.
LargeWorldModel/LWM
Large World Model -- Modeling Text and Video with Millions Context
pyannote/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding
livekit/agents
Build real-time multimodal AI applications 🤖🎙️📹
fudan-generative-vision/hallo2
Hallo2: Long-Duration and High-Resolution Audio-driven Portrait Image Animation
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
facebookresearch/audio2photoreal
Code and dataset for photorealistic Codec Avatars driven from audio
Farama-Foundation/HighwayEnv
A minimalist environment for decision-making in autonomous driving
chibat/chrome-extension-typescript-starter
Chrome Extension TypeScript Starter
b4rtaz/distributed-llama
Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.
MaximeVandegar/Papers-in-100-Lines-of-Code
Implementation of papers in 100 lines of code.
marl/crepe
CREPE: A Convolutional REpresentation for Pitch Estimation -- pre-trained model (ICASSP 2018)
yangxy/PASD
[ECCV2024] Pixel-Aware Stable Diffusion for Realistic Image Super-Resolution and Personalized Stylization
teticio/audio-diffusion
Apply diffusion models using the new Hugging Face diffusers package to synthesize music instead of images.
ai-ng/swift
Fast voice assistant powered by Groq, Cartesia, and Vercel.
FreedomIntelligence/HuatuoGPT-Vision
Medical Multimodal LLMs
room-js/chrome-extension-ts-starter
Chrome Extension starter built with TypeScript
submit-paper/Danzero_plus
indently/discord_tutorial_2024
Here's the source code from my YouTube tutorial.
AdamEXu/OpenBot
OpenBot