wwnbbd's Stars
woshidandan/TANet-image-aesthetics-and-quality-assessment
🔥[IJCAI 2022, Official Code] for paper "Rethinking Image Aesthetics Assessment: Models, Datasets and Benchmarks". Official Weights and Demos provided. 首个面向多主题场景的美学评估数据集、算法和benchmark.
woshidandan/Image-Color-Aesthetics-and-Quality-Assessment
🔥[ICCV 2023, Official Code] for paper "Thinking Image Color Aesthetics Assessment: Models, Datasets and Benchmarks". Official Weights and Demos provided. 首个面向图像色彩主观美学评估的数据集、算法和benchmark.
IceClear/CLIP-IQA
[AAAI 2023] Exploring CLIP for Assessing the Look and Feel of Images
VikParuchuri/surya
OCR, layout analysis, reading order, table recognition in 90+ languages
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具,将PDF转换成Markdown和JSON格式。
abdullahtarek/tennis_analysis
This project analyzes Tennis players in a video to measure their speed, ball shot speed and number of shots. This project will detect players and the tennis ball using YOLO and also utilizes CNNs to extract court keypoints. This hands on project is perfect for polishing your machine learning, and computer vision skills.
ultralytics/ultralytics
Ultralytics YOLO11 🚀
wdrink/OmniVid
h-zhao1997/cobra
[AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
ZhengYu518/VL-Mamba
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
jacobgil/pytorch-grad-cam
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
BlinkDL/nanoRWKV
RWKV in nanoGPT style
X-LANCE/weblm
[WSDM 2024] Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding
tstanislawek/awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
Layout-Parser/layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
LargeWorldModel/LWM
Large World Model -- Modeling Text and Video with Millions Context
All-Hands-AI/OpenHands
🙌 OpenHands: Code Less, Make More
OpenGVLab/Vision-RWKV
[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
breezedeus/Pix2Text
An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.
datadreamer-dev/DataDreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
AcademySoftwareFoundation/openfx
OpenFX effects API
PKU-YuanGroup/Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
alibaba/Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Zheng0428/COIG-Kun
netease-youdao/QAnything
Question and Answer based on Anything.
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
mlabonne/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
tyxsspa/AnyText
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
jordan-cutler/path-to-senior-engineer-handbook
All the resources you need to get to Senior Engineer and beyond
clovaai/donut
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022