seefun's Stars
infiniflow/ragflow
RAGFlow is an open-source RAG (Retrieval-Augmented Generation) engine based on deep document understanding.
KwaiVGI/LivePortrait
Bring portraits to life!
InstantID/InstantID
InstantID : Zero-shot Identity-Preserving Generation in Seconds 🔥
OpenTalker/video-retalking
[SIGGRAPH Asia 2022] VideoReTalking: Audio-based Lip Synchronization for Talking Head Video Editing In the Wild
Tencent/HunyuanDiT
Hunyuan-DiT : A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
TMElyralab/MuseTalk
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
CosmosShadow/gptpdf
Using GPT to parse PDF
nerfies/nerfies.github.io
a312863063/generators-with-stylegan2
Here is a series of face generators based on StyleGAN2
kvcache-ai/Mooncake
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
facebookresearch/chameleon
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
PixArt-alpha/PixArt-sigma
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation
AlibabaResearch/AdvancedLiterateMachinery
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
OpenGVLab/VisionLLM
VisionLLM Series
mlfoundations/MINT-1T
MINT-1T: A one trillion token multimodal interleaved dataset.
haofanwang/inswapper
One-click Face Swapper and Restoration powered by insightface 🔥
Yuliang-Liu/MultimodalOCR
On the Hidden Mystery of OCR in Large Multimodal Models (OCRBench)
dvlab-research/Step-DPO
Implementation for "Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
OpenGVLab/OmniCorpus
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
lmas/opensimplex
This repo has been migrated to https://code.larus.se/lmas/opensimplex
Global-Chem/global-chem
A Knowledge Graph of Common Chemical Names to their Molecular Definition
nttmdlab-nlp/InstructDoc
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions (AAAI2024)
UniModal4Reasoning/DocGenome
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Models
materials-data-facility/matchem-llm
A public repository collecting links to state of the art QA and evaluation sets for various ML and LLM applications
OpenGVLab/LCL
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
yuyq96/TextHawk
Exploring Efficient Fine-Grained Perception of Multimodal Large Language Models
MengLcool/DeepStack-VL
[NeurIPS-24] This is the official implementation of the paper "DeepStack: Deeply Stacking Visual Tokens is Surprisingly Simple and Effective for LMMs".
Kohulan/OCSR_Review
This repository contains the information related to the benchmark study on openly available OCSR tools
OpenGVLab/De-focus-Attention-Networks
Learning 1D Causal Visual Representation with De-focus Attention Networks
jiachengxiong/alpha-Extractor
Test data for paper “αExtractor: a web server for automatic extraction of chemical structure from literature”