mehamednews's Stars
allenai/open-instruct
sreenivas88/LP-IOANet
MetabrainAGI/Awaker
origin-space/originui
Junyi42/monst3r
Official Implementation of paper "MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion"
modelscope/ms-swift
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL, Phi3.5-Vision, ...)
thu-ml/SageAttention
Quantized Attention that achieves speedups of 2.1-3.1x and 2.7-5.1x compared to FlashAttention2 and xformers, respectively, without lossing end-to-end metrics across various models.
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
3DTopia/3DTopia-XL
3DTopia-XL: High-Quality 3D PBR Asset Generation via Primitive Diffusion
ToTheBeginning/PuLID
[NeurIPS 2024] Official code for PuLID: Pure and Lightning ID Customization via Contrastive Alignment
wjbmattingly/qwen2-vl-finetune-huggingface
This project is a collection of fine-tuning scripts to help researchers fine-tune Qwen 2 VL on HuggingFace datasets.
Stable-X/StableDelight
StableDelight: Revealing Hidden Textures by Removing Specular Reflections
VectorSpaceLab/OmniGen
OmniGen: Unified Image Generation. https://arxiv.org/pdf/2409.11340
bklieger-groq/g1
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
QwenLM/Qwen2-VL
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
CakeCrusher/TaxonomySynthesis
An AI-driven framework for synthesizing adaptive taxonomies, enabling automated data categorization and classification within dynamic hierarchical structures.
2U1/Qwen2-VL-Finetune
An open-source implementaion for fine-tuning Qwen2-VL series by Alibaba Cloud.
UniModal4Reasoning/DocGenome
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models
abdo-eldesokey/build-a-scene
Official repository for "Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation"
Veason-silverbullet/ViTLP
[NAACL 2024] Visually Guided Generative Text-Layout Pre-training for Document Intelligence
RotsteinNoam/Paint-by-Inpaint
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
OpenBMB/MiniCPM-V
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
LayTextLLM/LayTextLLM
autonomousvision/LaRa
[ECCV 2024] Efficient Large-Baseline Radiance Fields, a feed-forward 2DGS model
jinyeying/DC-ShadowNet-Hard-and-Soft-Shadow-Removal
[ICCV2021]"DC-ShadowNet: Single-Image Hard and Soft Shadow Removal Using Unsupervised Domain-Classifier Guided Network", https://arxiv.org/abs/2207.10434
vanstinator/document-scanner
ZZZHANG-jx/DocRes
[CVPR 2024] DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks
ispamm/NAF-DPM
NAF-DPM: A Nonlinear Activation-Free Diffusion Probabilistic Model for Document Enhancement
naver/mast3r
Grounding Image Matching in 3D with MASt3R
YihanHu-2022/DiffMatte