LandyGuo's Stars
AUTOMATIC1111/stable-diffusion-webui
Stable Diffusion web UI
binary-husky/gpt_academic
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, moss等。
abi/screenshot-to-code
Drop in a screenshot and convert it to clean code (HTML/Tailwind/React/Vue)
langgenius/dify
Dify is an open-source LLM app development platform. Dify's intuitive interface combines AI workflow, RAG pipeline, agent capabilities, model management, observability features and more, letting you quickly go from prototype to production.
huggingface/diffusers
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
hpcaitech/Open-Sora
Open-Sora: Democratizing Efficient Video Production for All
HumanAIGC/AnimateAnyone
Animate Anyone: Consistent and Controllable Image-to-Video Synthesis for Character Animation
LargeWorldModel/LWM
IDEA-Research/GroundingDINO
[ECCV 2024] Official implementation of the paper "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"
CompVis/taming-transformers
Taming Transformers for High-Resolution Image Synthesis
UX-Decoder/Segment-Everything-Everywhere-All-At-Once
[NeurIPS 2023] Official implementation of the paper "Segment Everything Everywhere All at Once"
mlfoundations/open_flamingo
An open-source framework for training large multimodal models.
NExT-GPT/NExT-GPT
Code and models for NExT-GPT: Any-to-Any Multimodal Large Language Model
Breakthrough/PySceneDetect
:movie_camera: Python and OpenCV-based scene cut/transition detection program & library.
dvlab-research/MGM
Official repo for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models"
X-PLUG/MobileAgent
Mobile-Agent: The Powerful Mobile Device Operation Assistant Family
google-research/big_vision
Official codebase used to develop Vision Transformer, SigLIP, MLP-Mixer, LiT and more.
tencent-ailab/V-Express
V-Express aims to generate a talking head video under the control of a reference image, an audio, and a sequence of V-Kps images.
Zz-ww/SadTalker-Video-Lip-Sync
本项目基于SadTalkers实现视频唇形合成的Wav2lip。通过以视频文件方式进行语音驱动生成唇形,设置面部区域可配置的增强方式进行合成唇形(人脸)区域画面增强,提高生成唇形的清晰度。使用DAIN 插帧的DL算法对生成视频进行补帧,补充帧间合成唇形的动作过渡,使合成的唇形更为流畅、真实以及自然。
baaivision/Emu
Emu Series: Generative Multimodal Models from BAAI
eric-ai-lab/MiniGPT-5
Official implementation of paper "MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens"
soCzech/TransNetV2
TransNet V2: Shot Boundary Detection Neural Network
llava-rlhf/LLaVA-RLHF
Aligning LMMs with Factually Augmented RLHF
azad-academy/denoising-diffusion-model
A simple guide to diffusion models. Helpful in understanding the concept and practicing with the method.
feizc/Visual-LLaMA
Open LLaMA Eyes to See the World
alipay/Ant-Multi-Modal-Framework
Research Code for Multimodal-Cognition Team in Ant Group
DmitryRyumin/NewEraAI-Papers
The repository provides links to collections of influential and interesting research papers from top AI conferences, with open-source code to promote reproducibility and provide detailed implementation insights beyond the scope of the article. Stay up to date with the latest advances in AI research!
williechai/speedup-plugin-for-stable-diffusions
Cranial-XIX/FAMO
Official PyTorch Implementation for Fast Adaptive Multitask Optimization (FAMO)
allenai/unified-io-2.pytorch