wwnbbd

wwnbbd's Stars

woshidandan/TANet-image-aesthetics-and-quality-assessment
🔥[IJCAI 2022, Official Code] for paper "Rethinking Image Aesthetics Assessment: Models, Datasets and Benchmarks". Official Weights and Demos provided. 首个面向多主题场景的美学评估数据集、算法和benchmark.
Language:Python30919
woshidandan/Image-Color-Aesthetics-and-Quality-Assessment
🔥[ICCV 2023, Official Code] for paper "Thinking Image Color Aesthetics Assessment: Models, Datasets and Benchmarks". Official Weights and Demos provided. 首个面向图像色彩主观美学评估的数据集、算法和benchmark.
Language:Python1637
IceClear/CLIP-IQA
[AAAI 2023] Exploring CLIP for Assessing the Look and Feel of Images
Language:Python38020
VikParuchuri/surya
OCR, layout analysis, reading order, table recognition in 90+ languages
Language:Python16.6k1.1k
opendatalab/MinerU
A high-quality tool for convert PDF to Markdown and JSON.一站式开源高质量数据提取工具，将PDF转换成Markdown和JSON格式。
Language:Python27.9k2.1k
abdullahtarek/tennis_analysis
This project analyzes Tennis players in a video to measure their speed, ball shot speed and number of shots. This project will detect players and the tennis ball using YOLO and also utilizes CNNs to extract court keypoints. This hands on project is perfect for polishing your machine learning, and computer vision skills.
Language:Jupyter Notebook551186
ultralytics/ultralytics
Ultralytics YOLO11 🚀
Language:Python37.7k7.3k
wdrink/OmniVid
Language:Python492
h-zhao1997/cobra
[AAAI-25] Cobra: Extending Mamba to Multi-modal Large Language Model for Efficient Inference
Language:Python2668
ZhengYu518/VL-Mamba
Implementation of "VL-Mamba: Exploring State Space Models for Multimodal Learning"
80
jacobgil/pytorch-grad-cam
Advanced AI Explainability for computer vision. Support for CNNs, Vision Transformers, Classification, Object detection, Segmentation, Image similarity and more.
Language:Python11.2k1.6k
BlinkDL/nanoRWKV
RWKV in nanoGPT style
Language:Python18711
X-LANCE/weblm
[WSDM 2024] Hierarchical Multimodal Pre-training for Visually Rich Webpage Understanding
14
tstanislawek/awesome-document-understanding
A curated list of resources for Document Understanding (DU) topic
1.4k160
Layout-Parser/layout-parser
A Unified Toolkit for Deep Learning Based Document Image Analysis
Language:Python5.1k485
LargeWorldModel/LWM
Large World Model -- Modeling Text and Video with Millions Context
Language:Python7.2k557
All-Hands-AI/OpenHands
🙌 OpenHands: Code Less, Make More
Language:Python50k5.5k
OpenGVLab/Vision-RWKV
[ICLR 2025 Spotlight] Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Language:Python42017
breezedeus/Pix2Text
An Open-Source Python3 tool with SMALL models for recognizing layouts, tables, math formulas (LaTeX), and text in images, converting them into Markdown format. A free alternative to Mathpix, empowering seamless conversion of visual content into text-based representations. 80+ languages are supported.
Language:Jupyter Notebook2.2k209
datadreamer-dev/DataDreamer
DataDreamer: Prompt. Generate Synthetic Data. Train & Align Models. 🤖💤
Language:Python97852
AcademySoftwareFoundation/openfx
OpenFX effects API
Language:C++443129
PKU-YuanGroup/Video-LLaVA
【EMNLP 2024🔥】Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Language:Python3.2k228
alibaba/Pai-Megatron-Patch
The official repo of Pai-Megatron-Patch for LLM & VLM large scale training developed by Alibaba Cloud.
Language:Python921136
Zheng0428/COIG-Kun
Language:Python364
netease-youdao/QAnything
Question and Answer based on Anything.
Language:Python12.8k1.2k
OpenGVLab/InternVideo
[ECCV2024] Video Foundation Models & Data for Multimodal Understanding
Language:Python1.7k103
mlabonne/llm-course
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Language:Jupyter Notebook47.9k5.1k
tyxsspa/AnyText
Official implementation code of the paper <AnyText: Multilingual Visual Text Generation And Editing>
Language:Python4.6k295
jordan-cutler/path-to-senior-engineer-handbook
All the resources you need to get to Senior Engineer and beyond
14.7k1.3k
clovaai/donut
Official Implementation of OCR-free Document Understanding Transformer (Donut) and Synthetic Document Generator (SynthDoG), ECCV 2022
Language:Python6.1k488