serdaryildiz/DailyResearchPaper

most notable computer vision arxiv papers

DailyResearchPaper

2 January 2024

HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations

Arxiv : https://arxiv.org/pdf/2401.00271.pdf
Github : https://github.com/HCVLab/HybridGait

A Large-Scale Re-identification Analysis in Sporting Scenarios: the Betrayal of Reaching a Critical Point

Arxiv : https://arxiv.org/pdf/2401.00080.pdf
Github : -

BREAK

12 December 2023

Photorealistic Video Generation with Diffusion Models

Arxiv : https://arxiv.org/pdf/2312.06662.pdf
Github : https://walt-video-diffusion.github.io/

X2-Softmax: Margin Adaptive Loss Function for Face Recognition

Arxiv : https://arxiv.org/pdf/2312.05281.pdf
Github : https://github.com/xujiamu123/X2-Softmax/tree/main

PIXLORE: A DATASET-DRIVEN APPROACH TO RICH IMAGE CAPTIONING

Arxiv : https://arxiv.org/pdf/2312.05349.pdf
Github : https://github.com/diegobonilla98/PixLore?tab=readme-ov-file

LOSS FUNCTIONS IN THE ERA OF SEMANTIC SEGMENTATION: A SURVEY AND OUTLOOK

Arxiv : https://arxiv.org/pdf/2312.05391.pdf
Github : https://github.com/YilmazKadir/Segmentation_Losses

Pose Guidance by Supervision: A Framework for Clothes-Changing Person Re-Identification

Arxiv : https://arxiv.org/pdf/2312.05634.pdf
Github : https://github.com/huyquoctrinh/PGS

Open World Object Detection in the Era of Foundation Models

Arxiv : https://arxiv.org/pdf/2312.05745.pdf
Github : https://orrzohar.github.io/projects/fomo/

SSPNet: Scale and spatial priors guided generalizable and interpretable pedestrian attribute recognition

Arxiv : https://arxiv.org/pdf/2312.06049.pdf
Github : https://github.com/guotengg/SSPNet

MaskConver: Revisiting Pure Convolution Model for Panoptic Segmentation

CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models

Arxiv : https://arxiv.org/pdf/2312.06059.pdf
Github : https://conform-diffusion.github.io/

NutritionVerse-Synth: An Open Access Synthetically Generated 2D Food Scene Dataset for Dietary Intake Estimation

Arxiv : https://arxiv.org/pdf/2312.06192.pdf
Github : https://saeejithnair.github.io/nvsynth/

Localization Is All You Evaluate: Data Leakage in Online Mapping Datasets and How to Fix It

Arxiv : https://arxiv.org/pdf/2312.06420.pdf
Github : https://github.com/LiljaAdam/geographical-splits

Detecting Events in Crowds Through Changes in Geometrical Dimensions of Pedestrians

Arxiv : https://arxiv.org/pdf/2312.06495.pdf
Github : -

04 December 2023

Global Localization: Utilizing Relative Spatio-Temporal Geometric Constraints from Adjacent and Distant Cameras

Arxiv : https://arxiv.org/pdf/2312.00500.pdf
Github : -

A knowledge-based data-driven (KBDD) framework for all-day identification of cloud types using satellite remote sensing

Arxiv : https://arxiv.org/pdf/2312.00308.pdf
Github : https://github.com/rsai0/PMD/tree/main/CldNetV1_0_0

01 December 2023

CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

Arxiv : https://arxiv.org/pdf/2311.18405.pdf
Github : https://github.com/zengjianhao/CAT-DM

MAXTRON: MASK TRANSFORMER WITH TRAJECTORY ATTENTION FOR VIDEO PANOPTIC SEGMENTATION

Arxiv : https://arxiv.org/pdf/2311.18537.pdf
Github : https://github.com/TACJu/MaXTron

Guided Prompting in SAM for Weakly Supervised Cell Segmentation in Histopathological Images

Arxiv : https://arxiv.org/pdf/2311.17960.pdf
Github : https://github.com/dair-iitd/Guided-Prompting-SAM

Zooming Out on Zooming In: Advancing Super-Resolution for Remote Sensing

Arxiv : https://arxiv.org/pdf/2311.18082.pdf
Github : https://github.com/allenai/satlas-super-resolution/tree/main

Diffusion Models Without Attention

Arxiv : https://arxiv.org/pdf/2311.18257.pdf
Github : -

29 November 2023

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

Arxiv : https://arxiv.org/pdf/2311.17043.pdf
Github : https://github.com/dvlab-research/LLaMA-VID

Self-training solutions for the ICCV 2023 GeoNet Challenge

Arxiv : https://arxiv.org/pdf/2311.16843.pdf
Github : https://github.com/tim-learn/GeoNet23_casia_tim

Small and Dim Target Detection in IR Imagery: A Review

Arxiv : https://arxiv.org/pdf/2311.16346.pdf
Github : -

GaitContour: Efficient Gait Recognition based on a Contour-Pose Representation

Arxiv : https://arxiv.org/pdf/2311.16497.pdf
Github : -

Word for Person: Zero-shot Composed Person Retrieval

Arxiv : https://arxiv.org/pdf/2311.16515.pdf
Github : https://github.com/Delong-liu-bupt/Word4Per

28 November 2023

Video-based Visible-Infrared Person Re-Identification with Auxiliary Samples

Arxiv : https://arxiv.org/pdf/2311.15571.pdf
Github : https://github.com/dyhBUPT/BUPTCampus

Model-agnostic Body Part Relevance Assessment for Pedestrian Detection

Arxiv : https://arxiv.org/pdf/2311.15679.pdf
Github : -

Optimal Transport Aggregation for Visual Place Recognition

Arxiv : https://arxiv.org/pdf/2311.15937.pdf
Github : https://github.com/serizba/salad

Unleashing the Power of Prompt-driven Nucleus Instance Segmentation

Arxiv : https://arxiv.org/pdf/2311.15939.pdf
Github : https://github.com/windygoo/PromptNucSeg

Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images

Arxiv : https://arxiv.org/pdf/2311.16094.pdf
Github : https://cuiaiyu.github.io/StreetTryOn/

21 November 2023

Multi-Task Faces (MTF) Data Set: A Legally and Ethically Compliant Collection of Face Images for Various Classification Tasks

Arxiv : https://arxiv.org/pdf/2311.11882.pdf
Github : https://github.com/RamiHaf/MTF_data_set

SniffyArt: The Dataset of Smelling Persons

Arxiv : https://arxiv.org/pdf/2311.11888.pdf
Github : -

LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions

Arxiv : https://arxiv.org/pdf/2311.11904.pdf
Github : https://github.com/zhuole1025/LLMs_as_Visual_Explainers

Exchanging Dual Encoder-Decoder: A New Strategy for Change Detection with Semantic Guidance and Spatial Localization

Arxiv : https://arxiv.org/pdf/2311.11302.pdf
Github : https://github.com/NJU-LHRS/official-SGSLN

CurriculumLoc: Enhancing Cross-Domain Geolocalization through Multi-Stage Refinement

Arxiv : https://arxiv.org/pdf/2311.11604.pdf
Github : https://github.com/npupilab/CurriculumLoc

20 November 2023

BiHRNet: A Binary high-resolution network for Human Pose Estimation

Arxiv : https://arxiv.org/pdf/2311.10296.pdf
Github : -

FRCSyn Challenge at WACV 2024: Face Recognition Challenge in the Era of Synthetic Data

Arxiv : https://arxiv.org/pdf/2311.10476.pdf
Github : https://frcsyn.github.io/

SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning

Arxiv : https://arxiv.org/pdf/2311.10572.pdf
Github : https://github.com/YUE-FAN/SSB

FOCAL: A Cost-Aware Video Dataset for Active Learning

Arxiv : https://arxiv.org/pdf/2311.10591.pdf
Github : https://github.com/olivesgatech/FOCAL_Dataset

CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification

Arxiv : https://arxiv.org/pdf/2311.10605.pdf
Github : -

EMU VIDEO: Factorizing Text-to-Video Generation by Explicit Image Conditioning

Arxiv : https://arxiv.org/pdf/2311.10709.pdf
Github : https://emu-video.metademolab.com/

17 November 2023

RED-DOT: MULTIMODAL FACT-CHECKING VIA RELEVANT EVIDENCE DETECTION

Arxiv : https://arxiv.org/pdf/2311.09939.pdf
Github : https://github.com/stevejpapad/relevant-evidence-detection

Reading Between the Mud: A Challenging Motorcycle Racer Number Dataset

Arxiv : https://arxiv.org/pdf/2311.09256.pdf
Github : https://github.com/JacobTyo/SwinTextSpotter

Devil in the Landscapes: Inferring Epidemic Exposure Risks from Street View Imagery

Arxiv : https://arxiv.org/pdf/2311.09240.pdf
Github : -

RENI++: A Rotation-Equivariant, Scale-Invariant, Natural Illumination Prior

Arxiv : https://arxiv.org/pdf/2311.09361.pdf
Github : https://github.com/JADGardner/ns_reni

16 November 2023

MUDD: A New Re-Identification Dataset with Efficient Annotation for Off-Road Racers in Extreme Conditions

Arxiv : https://arxiv.org/pdf/2311.08488.pdf
Github : https://github.com/JacobTyo/MUDD

LOW-LIGHT PEDESTRIAN DETECTION IN VISIBLE AND INFRARED IMAGE FEEDS: ISSUES AND CHALLENGES

Arxiv : https://arxiv.org/pdf/2311.08557.pdf
Github : -

ConeQuest: A Benchmark for Cone Segmentation on Mars

Arxiv : https://arxiv.org/pdf/2311.08657.pdf
Github : https://github.com/kerner-lab/ConeQuest

Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models

Arxiv : https://arxiv.org/pdf/2311.09064.pdf
Github : https://systematic-visual-imagination.github.io/

Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search

Arxiv : https://arxiv.org/pdf/2311.09084.pdf
Github : -

WildlifeDatasets: An open-source toolkit for animal re-identification

Arxiv : https://arxiv.org/pdf/2311.09118.pdf
Github : https://github.com/WildlifeDatasets/wildlife-datasets

14 November 2023

Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

Arxiv : https://arxiv.org/pdf/2303.17368.pdf
Github : https://story2motion.github.io/

VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search

Arxiv : https://arxiv.org/pdf/2311.07514.pdf
Github : -

Towards Automatic Honey Bee Flower-Patch Assays with Paint Marking Re-Identification

Arxiv : https://arxiv.org/pdf/2311.07407.pdf
Github : -

PICS IN PICS: PHYSICS INFORMED CONTOUR SELECTION FOR RAPID IMAGE SEGMENTATION

Arxiv : https://arxiv.org/pdf/2311.07002.pdf
Github : -

CHATANYTHING: FACETIME CHAT WITH LLM-ENHANCED PERSONAS

Arxiv : https://arxiv.org/pdf/2311.06772.pdf
Github : https://chatanything.github.io/

13 November 2023

DIFFUSION MODELS FOR EARTH OBSERVATION USE-CASES: FROM CLOUD REMOVAL TO URBAN CHANGE DETECTION

Arxiv : https://arxiv.org/pdf/2311.06222.pdf
Github : https://zenodo.org/records/8144238

Harnessing Synthetic Datasets: The Role of Shape Bias in Deep Neural Network Generalization

Arxiv : https://arxiv.org/pdf/2311.06224.pdf
Github : -

Learning Human Action Recognition Representations Without Real Humans

Arxiv : https://arxiv.org/pdf/2311.06231.pdf
Github : https://github.com/howardzh01/PPMA

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Arxiv : https://arxiv.org/pdf/2311.06242.pdf
Github : -

Whole-body Detection, Recognition and Identification at Altitude and Range

Arxiv : https://arxiv.org/pdf/2311.05725.pdf
Github : -

Are “Hierarchical” Visual Representations Hierarchical?

Arxiv : https://arxiv.org/pdf/2311.05784.pdf
Github : https://github.com/ethanlshen/HierNet

8 November 2023

Bias and Diversity in Synthetic-based Face Recognition

Arxiv : https://arxiv.org/pdf/2311.03970.pdf
Github : -

Multi-view Information Integration and Propagation for Occluded Person Re-identification

Arxiv : https://arxiv.org/pdf/2311.03828.pdf
Github : https://github.com/nengdong96/MVIIP

Unsupervised Region-Growing Network for Object Segmentation in Atmospheric Turbulence

Arxiv : https://arxiv.org/pdf/2311.03572.pdf
Github : -

7 November 2023

Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion

Arxiv : https://arxiv.org/pdf/2311.03352.pdf
Github : https://github.com/qqlu/Entity/tree/main

PainSeeker: An Automated Method for Assessing Pain in Rats Through Facial Expressions

Arxiv : https://arxiv.org/pdf/2311.03205.pdf
Github : https://github.com/xhzongyuan/RatsPain

A survey and classification of face alignment methods based on face models

Arxiv : https://arxiv.org/pdf/2311.03082.pdf
Github : -

Fast and Interpretable Face Identification for Out-Of-Distribution Data Using Vision Transformers

Arxiv : https://arxiv.org/pdf/2311.02803.pdf
Github : -

AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection

Arxiv : https://arxiv.org/pdf/2311.02733.pdf
Github : -

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

Arxiv : https://arxiv.org/pdf/2311.02538.pdf
Github : -

UniTSFace: Unified Threshold Integrated Sample-to-Sample Loss for Face Recognition

Arxiv : https://arxiv.org/pdf/2311.02523.pdf
Github : https://github.com/CVI-SZU/UniTSFace

Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval

Arxiv : https://arxiv.org/pdf/2311.02122.pdf
Github : -

6 November 2023

Medical Image Segmentation with Domain Adaptation: A Survey

Arxiv : https://arxiv.org/pdf/2311.01702.pdf
Github : -

26 October 2023

ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors

Arxiv : https://arxiv.org/pdf/2310.16447v1.pdf
Github : https://shirleymaxx.github.io/ChimpACT/

CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection

Arxiv : https://arxiv.org/pdf/2310.16667v1.pdf
Github : https://github.com/CVMI-Lab/CoDet

Land-cover change detection using paired OpenStreetMap data and optical high-resolution imagery via object-guided Transformer

Arxiv : https://arxiv.org/pdf/2310.02674v2.pdf
Github : -

VACATION

10 October 2023

Anchor-Intermediate Detector: Decoupling and Coupling Bounding Boxes for Accurate Object Detection

Arxiv : https://arxiv.org/pdf/2310.05666.pdf
Github : https://github.com/YilongLv/AID

OV-PARTS: Towards Open-Vocabulary Part Segmentation

Arxiv : https://arxiv.org/pdf/2310.05107.pdf
Github : https://github.com/OpenRobotLab/OV_PARTS

AANet: Aggregation and Alignment Network with Semi-hard Positive Sample Mining for Hierarchical Place Recognition

Arxiv : https://arxiv.org/pdf/2310.05184.pdf
Github : https://github.com/Lu-Feng/AANet

A Benchmark Dataset for Harmful Object Detection

Arxiv : https://arxiv.org/pdf/2310.05192.pdf
Github : https://github.com/poori-nuna/HOD-Benchmark-Dataset

9 October 2023

ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted Transformer

Arxiv : https://arxiv.org/pdf/2310.04099.pdf
Github : -

5 October 2023

Land-cover change detection using paired OpenStreetMap data and optical high-resolution imagery via object-guided Transformer

Arxiv : https://arxiv.org/pdf/2310.02674.pdf
Github : -

Human-centric Behavior Description in Videos: New Benchmark and Model

Arxiv : https://arxiv.org/pdf/2310.02894.pdf
Github : -

4 October 2023

MINIGPT-5: INTERLEAVED VISION-AND-LANGUAGE GENERATION VIA GENERATIVE VOKENS

Arxiv : https://arxiv.org/pdf/2310.02239.pdf
Github : https://github.com/eric-ai-lab/MiniGPT-5

MATHVISTA: EVALUATING MATHEMATICAL REASONING OF FOUNDATION MODELS IN VISUAL CONTEXTS

Arxiv : https://arxiv.org/pdf/2310.02255.pdf
Github : https://mathvista.github.io/

DREAM: Visual Decoding from REversing HumAn Visual SysteM

Arxiv : https://arxiv.org/pdf/2310.02265.pdf
Github : -

AI-Generated Images as Data Source: The Dawn of Synthetic Era

Arxiv : https://arxiv.org/pdf/2310.01830.pdf
Github : https://github.com/mwxely/AIGS

Arxiv :
Github : -

Arxiv :
Github : -

Arxiv :
Github : -

Arxiv :
Github : -

Arxiv :
Github : -

Arxiv :
Github : -

Arxiv :
Github : -

2 October 2023

HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Arxiv : https://arxiv.org/pdf/2309.17024.pdf
Github : https://holoassist.github.io/

Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process

Arxiv : https://arxiv.org/pdf/2309.17031.pdf
Github : https://github.com/Z-Zheng/Changen

Retail-786k: a Large-Scale Dataset for Visual Entity Matching

Arxiv : https://arxiv.org/pdf/2309.17164.pdf
Github : https://www.retail-786k.org/

TBD Pedestrian Data Collection: Towards Rich, Portable, and Large-Scale Natural Pedestrian Data

Arxiv : https://arxiv.org/pdf/2309.17187.pdf
Github : https://tbd.ri.cmu.edu/resources/tbd-social-navigation-datasets/

A Survey on Deep Learning Techniques for Action Anticipation

Arxiv : https://arxiv.org/pdf/2309.17257.pdf
Github : -

27 September 2023

The Surveillance AI Pipeline

Arxiv : https://arxiv.org/pdf/2309.15084.pdf
Github : -

Explaining Deep Face Algorithms through Visualization: A Survey

Arxiv : https://arxiv.org/pdf/2309.14715.pdf
Github : -

ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios

Arxiv : https://arxiv.org/pdf/2309.14809.pdf
Github : https://iplab.dmi.unict.it/ENIGMA-51/

26 September 2023

Multiple Different Explanations for Image Classifiers

Arxiv : https://arxiv.org/pdf/2309.14309.pdf
Github : -

Contextual Emotion Estimation from Image Captions

Arxiv : https://arxiv.org/pdf/2309.13136.pdf
Github : https://rosielab.github.io/emotion-captions/

UniHead: Unifying Multi-Perception for Detection Heads

Arxiv : https://arxiv.org/pdf/2309.13242.pdf
Github : -

Face-Att: Enhancing Image Captioning with Facial Attributes for Portrait Images

Arxiv : https://arxiv.org/pdf/2309.13601.pdf
Github : https://zenodo.org/record/8144361

UCF-Crime Annotation: A Benchmark for Surveillance Video-and-Language Understanding

Arxiv : https://arxiv.org/pdf/2309.13925.pdf
Github : https://github.com/Xuange923/UCA-dataset

Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition

Arxiv : https://arxiv.org/pdf/2309.14183.pdf
Github : https://species-dataset.github.io/

Single Image Test-Time Adaptation for Segmentation

Arxiv : https://arxiv.org/pdf/2309.14052.pdf
Github : https://klarajanouskova.github.io/sitta-seg/

25 September 2023

DETECT EVERY THING WITH FEW EXAMPLES

Arxiv : https://arxiv.org/pdf/2309.12969.pdf
Github : https://github.com/mlzxy/devit

FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare

Arxiv : https://arxiv.org/pdf/2309.12325.pdf
Github : -

Performance Analysis of UNet and Variants for Medical Image Segmentation

Arxiv : https://arxiv.org/pdf/2309.13013.pdf
Github : -

DIOR: Dataset for Indoor-Outdoor Reidentification - Long Range 3D/2D Skeleton Gait Collection Pipeline, Semi-Automated Gait Keypoint Labeling and Baseline Evaluation Methods

Arxiv : https://arxiv.org/pdf/2309.12429.pdf
Github : -

22 September 2023

SANPO A SCENE UNDERSTANDING, ACCESSIBILITY, NAVIGATION, PATHFINDING, OBSTACLE AVOIDANCE DATASET

Arxiv : https://arxiv.org/pdf/2309.12172.pdf
Github : https://google-research-datasets.github.io/sanpo_dataset/

BASE: Probably a Better Approach to Multi-Object Tracking

Arxiv : https://arxiv.org/pdf/2309.12035.pdf
Github : https://github.com/ffi-no.

21 September 2023

A SYSTEMATIC REVIEW OF FEW-SHOT LEARNING IN MEDICAL IMAGING

Arxiv : https://arxiv.org/pdf/2309.11433.pdf
Github : -

20 September 2023

Exploring Different Levels of Supervision for Detecting and Localizing Solar Panels on Remote Sensing Imagery

Arxiv : https://arxiv.org/pdf/2309.10421.pdf
Github : -

Human Gait Recognition using Deep Learning: A Comprehensive Review

Arxiv : https://arxiv.org/pdf/2309.10144.pdf
Github : -

19 September 2023

EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding

Arxiv : https://arxiv.org/pdf/2309.08816.pdf
Github : https://github.com/facebookresearch/EgoObjects

Rethinking Cross-Domain Pedestrian Detection: A Background-Focused Distribution Alignment Framework for Instance-free One-Stage Detectors

Arxiv : https://arxiv.org/pdf/2309.08771.pdf
Github : https://github.com/caiyancheng/BFDA

IMPROVED BREAST CANCER DIAGNOSIS THROUGH TRANSFER LEARNING ON HEMATOXYLIN AND EOSIN STAINED HISTOLOGY IMAGES

Arxiv : https://arxiv.org/pdf/2309.08745.pdf
Github : https://www.bracs.icar.cnr.it/

Personalized Food Image Classification: Benchmark Datasets and New Baseline

Arxiv : https://arxiv.org/pdf/2309.08744.pdf
Github : https://skynet.ecn.purdue.edu/~pan161/dataset_personal.html

MONITORING URBAN CHANGES IN MARIUPOL/UKRAINE IN 2022/23

Arxiv : https://arxiv.org/pdf/2309.08607.pdf
Github : https://github.com/It4innovations/urban_change_monitoring_mariupol_ua

18 September 2023

Double Domain Guided Real-Time Low-Light Image Enhancement for Ultra-High-Definition Transportation Surveillance

Arxiv : https://arxiv.org/pdf/2309.08382.pdf
Github : https://github.com/QuJX/DDNet

Salient Object Detection in Optical Remote Sensing Images Driven by Transformer

Arxiv : https://arxiv.org/pdf/2309.08206.pdf
Github : https://github.com/MathLee/GeleNet

Padding Aware Neurons

Arxiv : https://arxiv.org/pdf/2309.08048.pdf
Github : -

TOWARDS LARGE-SCALE BUILDING ATTRIBUTE MAPPING USING CROWDSOURCED IMAGES: SCENE TEXT RECOGNITION ON FLICKR AND PROBLEMS TO BE SOLVED

Arxiv : https://arxiv.org/pdf/2309.08042.pdf
Github : https://github.com/ya0-sun/STR-Berlin

12 September 2023

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

Arxiv : https://arxiv.org/pdf/2309.05551.pdf
Github : https://github.com/aimagelab/open-fashion-clip

BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification

Arxiv : https://arxiv.org/pdf/2309.04675.pdf
Github : -

Beyond Skin Tone: A Multidimensional Measure of Apparent Skin Color

Arxiv : https://arxiv.org/pdf/2309.05148.pdf
Github : -

11 September 2023

Long-Range Correlation Supervision for Land-Cover Classification from Remote Sensing Images

Arxiv : https://arxiv.org/pdf/2309.04225.pdf
Github : -

WiSARD: A Labeled Visual and Thermal Image Dataset for Wilderness Search and Rescue

Arxiv : https://arxiv.org/pdf/2309.04453.pdf
Github : https://sites.google.com/uw.edu/wisard/

8 September 2023

Region Generation and Assessment Network for Occluded Person Re-Identification

Arxiv : https://arxiv.org/pdf/2309.03558.pdf
Github : -

RepSGG: Novel Representations of Entities and Relationships for Scene Graph Generation

Arxiv : https://arxiv.org/pdf/2309.03240.pdf
Github : -

Better Practices for Domain Adaptation

Arxiv : https://arxiv.org/pdf/2309.03879.pdf
Github : -

Anatomy-informed Data Augmentation for Enhanced Prostate Cancer Detection

Arxiv : https://arxiv.org/pdf/2309.03652.pdf
Github : https://github.com/MIC-DKFZ/anatomy_informed_DA

7 September 2023

My Art My Choice: Adversarial Protection Against Unruly AI

Arxiv : https://arxiv.org/pdf/2309.03198.pdf
Github : -

Do We Still Need Non-Maximum Suppression? Accurate Confidence Estimates and Implicit Duplication Modeling with IoU-Aware Calibration

Arxiv : https://arxiv.org/pdf/2309.03110.pdf
Github : -

6 September 2023

Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification

Arxiv : https://arxiv.org/pdf/2309.01420.pdf
Github : https://github.com/ZhiyinShao-H/UniPT

DeViL: Decoding Vision features into Language

Arxiv : https://arxiv.org/pdf/2309.01617.pdf
Github : https://github.com/ExplainableML/DeViL

Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models

Arxiv : https://arxiv.org/pdf/2309.01674.pdf
Github : https://github.com/hassanhajj910/prompt-me-a-dataset

Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations

Arxiv : https://arxiv.org/pdf/2309.01858.pdf
Github : https://cmp.felk.cvut.cz/univ_emb/

SyntheWorld: A Large-Scale Synthetic Dataset for Land Cover Mapping and Building Change Detection

Arxiv : https://arxiv.org/pdf/2309.01907.pdf
Github : https://github.com/JTRNEO/SyntheWorld

NICE 2023 Zero-shot Image Captioning Challenge

Arxiv : https://arxiv.org/pdf/2309.01961.pdf
Github : https://nice.lgresearch.ai/

A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking

Arxiv : https://arxiv.org/pdf/2309.02031.pdf
Github : -

DCP-Net: A Distributed Collaborative Perception Network for Remote Sensing Semantic Segmentation

Arxiv : https://arxiv.org/pdf/2309.02230.pdf
Github : -

Haystack: A Panoptic Scene Graph Dataset to Evaluate Rare Predicate Classes

Arxiv : https://arxiv.org/pdf/2309.02286.pdf
Github : https://lorjul.github.io/haystack/

Muti-Stage Hierarchical Food Classification

Arxiv : https://arxiv.org/pdf/2309.01075.pdf
Github : -

Prototype-based Dataset Comparison

Arxiv : https://arxiv.org/pdf/2309.02401.pdf
Github : https://github.com/Nanne/ProtoSim

4 September 2023

DARC: Distribution-Aware Re-Coloring Model for Generalizable Nucleus Segmentation

Arxiv : https://arxiv.org/pdf/2309.00188.pdf
Github : https://github.com/csccsccsccsc/DARC

Diffusion Model with Clustering-based Conditioning for Food Image Generation

Arxiv : https://arxiv.org/pdf/2309.00199.pdf
Github : -

Object-Centric Multiple Object Tracking

Fine-grained Recognition with Learnable Semantic Data Augmentation

Arxiv : https://arxiv.org/pdf/2309.00399.pdf
Github : -

An Improved Encoder-Decoder Framework for Food Energy Estimation

Arxiv : https://arxiv.org/pdf/2309.00468.pdf
Github : -

TIME SERIES ANALYSIS OF URBAN LIVEABILITY

Arxiv : https://arxiv.org/pdf/2309.00594.pdf
Github : -

25 August 2023

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

Arxiv : https://arxiv.org/pdf/2308.12383.pdf
Github : https://github.com/aimagelab/PMA-Net

Ground-to-Aerial Person Search: Benchmark Dataset and Approach

Arxiv : https://arxiv.org/pdf/2308.12712.pdf
Github : https://github.com/yqc123456/HKD_for_person_search

LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition

Arxiv : https://arxiv.org/pdf/2308.12774.pdf
Github : https://github.com/AlibabaResearch/AdvancedLiterateMachinery

Beyond Document Page Classification: Design, Datasets, and Challenges

Arxiv : https://arxiv.org/pdf/2308.12896.pdf
Github : -

24 August 2023

InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4

Arxiv : https://arxiv.org/pdf/2308.12067.pdf
Github : -

SPPNet: A Single-Point Prompt Network for Nuclei Image Segmentation

Arxiv : https://arxiv.org/pdf/2308.12231.pdf
Github : https://github.com/xq141839/SPPNet

Weakly Supervised Face and Whole Body Recognition in Turbulent Environments

Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification

Arxiv : https://arxiv.org/pdf/2308.11901.pdf
Github : https://cvlab.yonsei.ac.kr/projects/CaCL/

Progressive Feature Mining and External Knowledge-Assisted Text-Pedestrian Image Retrieval

Arxiv : https://arxiv.org/pdf/2308.11994.pdf
Github : -

HarvestNet: A Dataset for Detecting Smallholder Farming Activity Using Harvest Piles and Remote Sensing

Arxiv : https://arxiv.org/pdf/2308.12061.pdf
Github : https://figshare.com/s/45a7b45556b90a9a11d2

The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures

Arxiv : https://arxiv.org/pdf/2308.12116.pdf
Github : https://christophreich1996.github.io/tyc_dataset/

23 August 2023

Classification of the lunar surface pattern by AI architectures: Does AI see a rabbit in the Moon?

Arxiv : https://arxiv.org/pdf/2308.11107.pdf
Github : -

SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for Remote Sensing Images Change Detection

Arxiv : https://arxiv.org/pdf/2308.11159.pdf
Github : https://github.com/DalongZ/SwinV2DNet

A three in one bottom-up framework for simultaneous semantic segmentation, instance segmentation and classification of multi-organ nuclei in digital cancer histology

Arxiv : https://arxiv.org/pdf/2308.11179.pdf
Github : -

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment

Arxiv : https://arxiv.org/pdf/2308.11206.pdf
Github : -

Using and Abusing Equivariance

Arxiv : https://arxiv.org/pdf/2308.11316.pdf
Github : -

22 August 2023

Can Language Models Learn to Listen?

Arxiv : https://arxiv.org/pdf/2308.10897.pdf
Github : -

EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition

Arxiv : https://arxiv.org/pdf/2308.10832.pdf
Github : https://github.com/gmberton/EigenPlaces

A step towards understanding why classification helps regression

Arxiv : https://arxiv.org/pdf/2308.10603.pdf
Github : -

GaitPT: Skeletons Are All You Need For Gait Recognition

Arxiv : https://arxiv.org/pdf/2308.10623.pdf
Github : -

Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification

Arxiv : https://arxiv.org/pdf/2308.10692.pdf
Github : -

Color Prompting for Data-Free Continual Unsupervised Domain Adaptive Person Re-Identification

Arxiv : https://github.com/vimar-gu/ColorPromptReID
Github : -

Rethinking Person Re-identification from a Projection-on-Prototypes Perspective

Arxiv : https://arxiv.org/pdf/2308.10717.pdf
Github : -

Patch Is Not All You Need

Arxiv : https://arxiv.org/pdf/2308.10729.pdf
Github : -

Noisy-Correspondence Learning for Text-to-Image Person Re-identification

Arxiv : https://arxiv.org/pdf/2308.09911.pdf
Github : -

Microscopy Image Segmentation via Point and Shape Regularized Data Synthesis

Arxiv : https://arxiv.org/pdf/2308.09835.pdf
Github : https://github.com/CJLee94/Points2Image

21 August 2023

LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark

Arxiv : https://arxiv.org/pdf/2308.09618.pdf
Github : https://lojzezust.github.io/lars-dataset/

Data augmentation and explainability for bias discovery and mitigation in deep learning

Arxiv : https://arxiv.org/pdf/2308.09464.pdf
Github : -

Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data

Arxiv : https://arxiv.org/pdf/2308.08656.pdf
Github : -

Identity-Seeking Self-Supervised Representation Learning for Generalizable Person Re-identification

Arxiv : https://arxiv.org/pdf/2308.08887.pdf
Github : https://github.com/dcp15/ISR_ICCV2023_Oral

Identity-Aware Semi-Supervised Learning for Comic Character Re-Identification

Arxiv : https://arxiv.org/pdf/2308.09096.pdf
Github : -

Generalized Sum Pooling for Metric Learning

Arxiv : https://arxiv.org/pdf/2308.09228.pdf
Github : -

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

Arxiv : https://arxiv.org/pdf/2308.09311.pdf
Github : -

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Arxiv : https://arxiv.org/pdf/2308.09372.pdf
Github : -

Language-Guided Diffusion Model for Visual Grounding

Arxiv : https://arxiv.org/pdf/2308.09599.pdf
Github : https://github.com/iQua/vgbase/tree/DiffusionVG

GeoDTR+: Toward generic cross-view geolocalization via geometric disentanglement

Arxiv : https://arxiv.org/pdf/2308.09624.pdf
Github : -

17 August 2023

TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

Arxiv : https://arxiv.org/pdf/2308.08545.pdf
Github : https://huangyangyi.github.io/tech

Diagnosing Human-object Interaction Detectors

Arxiv : https://arxiv.org/pdf/2308.08529.pdf
Github : https://github.com/neu-vi/Diag-HOI

DeDoDe: Detect, Don’t Describe — Describe, Don’t Detect for Local Feature Matching

Arxiv : https://arxiv.org/pdf/2308.08479.pdf
Github : https://github.com/Parskatt/DeDoDe

Integrating Visual and Semantic Similarity Using Hierarchies for Image Retrieval

Arxiv : https://arxiv.org/pdf/2308.08431.pdf
Github : https://github.com/vaishwarya96/Hierarchy-image-retrieval

ALIP: Adaptive Language-Image Pre-training with Synthetic Caption

Arxiv : https://arxiv.org/pdf/2308.08428.pdf
Github : https://github.com/deepglint/ALIP

Membrane Potential Batch Normalization for Spiking Neural Networks

Arxiv : https://arxiv.org/pdf/2308.08359.pdf
Github : https://github.com/yfguo91/MPBN

Visually-Aware Context Modeling for News Image Captioning

Arxiv : https://arxiv.org/pdf/2308.08325.pdf
Github : -

MultiMediate ’23: Engagement Estimation and Bodily Behaviour Recognition in Social Interactions

Arxiv : https://arxiv.org/pdf/2308.08256.pdf
Github : https://multimediate-challenge.org/Description/

View Consistent Purification for Accurate Cross-View Localization

Arxiv : https://arxiv.org/pdf/2308.08110.pdf
Github : https://shanwang-shan.github.io/PureACL-website/

CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation

Arxiv : https://arxiv.org/pdf/2308.08283.pdf
Github : -

16 August 2023

Ske2Grid: Skeleton-to-Grid Representation Learning for Action Recognition

Arxiv : https://arxiv.org/pdf/2308.07571.pdf
Github : https://github.com/OSVAI/Ske2Grid

15 August 2023

An Outlook into the Future of Egocentric Vision

Arxiv : https://arxiv.org/pdf/2308.07123.pdf
Github : -

14 August 2023

Image-based Geolocalization by Ground-to-2.5D Map Matching

Arxiv : https://arxiv.org/pdf/2308.05993.pdf
Github : -

The Multi-modality Cell Segmentation Challenge: Towards Universal Solutions

Arxiv : https://arxiv.org/pdf/2308.05864.pdf
Github : https://neurips22-cellseg.grand-challenge.org/neurips22-cellseg/

11 August 2023

YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection

Arxiv : https://arxiv.org/pdf/2308.05480.pdf
Github : https://github.com/FishAndWasabi/YOLO-MS

TrainFors: A Large Benchmark Training Dataset for Image Manipulation Detection and Localization

Arxiv : https://arxiv.org/pdf/2308.05264.pdf
Github : https://github.com/vimal-isi-edu/TrainFors

Vacation

10 July 2023

Beyond Geo-localization: Fine-grained Orientation of Street-view Images by Cross-view Matching with Satellite Imagery

Arxiv : https://arxiv.org/pdf/2307.03398.pdf
Github : -

Vision Language Transformers: A Survey

Arxiv : https://arxiv.org/pdf/2307.03254.pdf
Github : -

To pretrain or not to pretrain? A case study of domain-specific pretraining for semantic segmentation in histopathology

Arxiv : https://arxiv.org/pdf/2307.03275.pdf
Github : -

PseudoCell: Hard Negative Mining as Pseudo Labeling for Deep Learning-Based Centroblast Cell Detection

Arxiv : https://arxiv.org/pdf/2307.03211.pdf
Github : https://github.com/IoBT-VISTEC/PseudoCell

7 July 2023

Synthesizing Artistic Cinemagraphs from Text

Arxiv : https://arxiv.org/pdf/2307.03190.pdf
Github : https://text2cinemagraph.github.io/website/

CityTrack: Improving City-Scale Multi-Camera Multi-Target Tracking by Location-Aware Tracking and Box-Grained Matching

Arxiv : https://arxiv.org/pdf/2307.02753.pdf
Github : -

Spherical Feature Pyramid Networks For Semantic Segmentation

Arxiv : https://arxiv.org/pdf/2307.02658.pdf
Github : -

SegNetr: Rethinking the local-global interactions and skip connections in U-shaped networks

Arxiv : https://arxiv.org/pdf/2307.02953.pdf
Github : -

LOSS FUNCTIONS AND METRICS IN DEEP LEARNING. A REVIEW

Arxiv : https://arxiv.org/pdf/2307.02694.pdf
Github : -

6 July 2023

Unbalanced Optimal Transport: A Unified Framework for Object Detection

Arxiv : https://arxiv.org/pdf/2307.02402.pdf
Github : -

Rethinking Multiple Instance Learning for Whole Slide Image Classification: A Good Instance Classifier is All You Need

Arxiv : https://arxiv.org/pdf/2307.02249.pdf
Github : -

MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets

Arxiv : https://arxiv.org/pdf/2307.02100.pdf
Github : https://github.com/siyi-wind/MDViT

Make A Long Image Short: Adaptive Token Length for Vision Transformers

Arxiv : https://arxiv.org/pdf/2307.02092.pdf
Github : -

21 Jun 2023

Bullying10K: A Neuromorphic Dataset towards Privacy-Preserving Bullying Recognition

Arxiv : https://arxiv.org/pdf/2306.11546.pdf
Github : https://figshare.com/articles/dataset/Bullying10k/19160663

How can objects help action recognition?

Arxiv : https://arxiv.org/pdf/2306.11726.pdf
Github : -

Meerkat Behaviour Recognition Dataset

Arxiv : https://arxiv.org/pdf/2306.11326.pdf
Github : -

Quilt-1M: One Million Image-Text Pairs for Histopathology

Arxiv : https://arxiv.org/pdf/2306.11207.pdf
Github : https://github.com/wisdomikezogwo/quilt1m

AVOIDDS: Aircraft Vision-based Intruder Detection Dataset and Simulator

Arxiv : https://arxiv.org/pdf/2306.11203.pdf
Github : https://github.com/sisl/VisionBasedAircraftDAA

Enlighten-anything:When Segment Anything Model Meets Low-light Image Enhancement

Arxiv : https://arxiv.org/pdf/2306.10286.pdf
Github : https://github.com/zhangbaijin/enlighten-anything

19 Jun 2023

Scaling Open-Vocabulary Object Detection

Arxiv : https://arxiv.org/pdf/2306.09683.pdf
Github : -

Leveraging Human Salience to Improve Calorie Estimation

Arxiv : https://arxiv.org/pdf/2306.09527.pdf
Github : -

16 Jun 2023

UNDERSTANDING OPTIMIZATION OF DEEP LEARNING

Arxiv : https://arxiv.org/pdf/2306.09338.pdf
Github : -

When and Why Momentum Accelerates SGD: An Empirical Study

Arxiv : https://arxiv.org/pdf/2306.09000.pdf
Github : -

OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments

Arxiv : https://arxiv.org/pdf/2306.08649.pdf
Github : https://github.com/k4ntz/OC_Atari

What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations

Arxiv : https://arxiv.org/pdf/2306.08713.pdf
Github : https://chiaraplizz.github.io/what-can-a-cook/

TryOnDiffusion: A Tale of Two UNets

Arxiv : https://arxiv.org/pdf/2306.08276.pdf
Github : -

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models

Arxiv : https://arxiv.org/pdf/2306.09265.pdf
Github : -

DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data

Arxiv : https://arxiv.org/pdf/2306.09344.pdf
Github : https://dreamsim-nights.github.io/

Fast Training of Diffusion Models with Masked Transformers

Arxiv : https://arxiv.org/pdf/2306.09305.pdf
Github : https://github.com/Anima-Lab/MaskDiT

14 Jun 2023

Image Captioners Are Scalable Vision Learners Too

Arxiv : https://arxiv.org/pdf/2306.07915.pdf
Github : -

GeneCIS: A Benchmark for General Conditional Image Similarity

Arxiv : https://arxiv.org/pdf/2306.07969.pdf
Github : https://github.com/facebookresearch/genecis

VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON

Arxiv : https://arxiv.org/pdf/2306.07890.pdf
Github : https://huggingface.co/datasets/VISION-Workshop/VISION-Datasets

Neural Scene Chronology

Arxiv : https://arxiv.org/pdf/2306.07970.pdf
Github : https://zju3dv.github.io/neusc/

Semi-supervised learning made simple with self-supervised clustering

Arxiv : https://arxiv.org/pdf/2306.07483.pdf
Github : -

Retrieve Anyone: A General-purpose Person Re-identification Task with Instructions

Arxiv : https://arxiv.org/pdf/2306.07520.pdf
Github : https://github.com/hwz-zju/Instruct-ReID

Compositionally Equivariant Representation Learning

Arxiv : https://arxiv.org/pdf/2306.07783.pdf
Github : -

Reviving Shift Equivariance in Vision Transformers

Arxiv : https://arxiv.org/pdf/2306.07470.pdf
Github : -

13 Jun 2023

MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images

Arxiv : https://arxiv.org/pdf/2306.07257.pdf
Github : -

7 Jun 2023

PhenoBench — A Large Dataset and Benchmarks for Semantic Image Interpretation in the Agricultural Domain

Arxiv : https://arxiv.org/pdf/2306.04557.pdf
Github : https://www.phenobench.org/dataset.html

6 Jun 2023

Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Arxiv : https://arxiv.org/pdf/2306.02691.pdf
Github : https://github.com/wuyongjianCODE/Cyclic

Cycle Consistency Driven Object Discovery

Arxiv : https://arxiv.org/pdf/2306.02204.pdf
Github : -

5 Jun 2023

Towards In-context Scene Understanding

Arxiv : https://arxiv.org/pdf/2306.01667.pdf
Github : -

Publicly available datasets of breast histopathology H&E whole-slide images: A systematic review

Arxiv : https://arxiv.org/pdf/2306.01546.pdf
Github : -

1 Jun 2023

LOWA: Localize Objects in the Wild with Attributes

Arxiv : https://arxiv.org/pdf/2305.20047.pdf
Github : -

GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition

Arxiv : https://arxiv.org/pdf/2305.19700.pdf
Github : -

Are Large Kernels Better Teachers than Transformers for ConvNets?

Arxiv : https://arxiv.org/pdf/2305.19412.pdf
Github : https://github.com/VITA-Group/SLaK

A Unified Framework for U-Net Design and Analysis

Arxiv : https://arxiv.org/pdf/2305.19638.pdf
Github : -

31 May 2023

Multi-modal Queried Object Detection in the Wild

Arxiv : https://arxiv.org/pdf/2305.18980.pdf
Github : https://github.com/YifanXu74/MQ-Det

LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding

Arxiv : https://arxiv.org/pdf/2305.18721.pdf
Github : -

30 May 2023

FUSECAP: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions

Arxiv : https://arxiv.org/pdf/2305.17718.pdf
Github : https://rotsteinnoam.github.io/FuseCap/

Using Caterpillar to Nibble Small-Scale Images

Arxiv : https://arxiv.org/pdf/2305.17644.pdf
Github : https://github.com/sunjin19126/Caterpillar

TaleCrafter: Interactive Story Visualization with Multiple Characters

Arxiv : https://arxiv.org/pdf/2305.18247.pdf
Github : https://github.com/VideoCrafter/TaleCrafter

Contextual Object Detection with Multimodal Large Language Models

Arxiv : https://arxiv.org/pdf/2305.18279.pdf
Github : https://github.com/yuhangzang/ContextDET

Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory

Arxiv : https://arxiv.org/pdf/2305.17144.pdf
Github : https://github.com/OpenGVLab/GITM

29 May 2023

Mindstorms in Natural Language-Based Societies of Mind

Arxiv : https://arxiv.org/pdf/2305.17066.pdf
Github : -

26 May 2023

Break-A-Scene: Extracting Multiple Concepts from a Single Image

Arxiv : https://arxiv.org/pdf/2305.16311.pdf
Github : https://omriavrahami.com/break-a-scene/

Making Vision Transformers Truly Shift-Equivariant

Arxiv : https://arxiv.org/pdf/2305.16316.pdf
Github : -

CENSUS-HWR: a large training dataset for offline handwriting recognition

Arxiv : https://arxiv.org/pdf/2305.16275.pdf
Github : https://censustree.org/get_the_data.html

CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition

Arxiv : https://arxiv.org/pdf/2305.16049.pdf
Github : http://cnceleb.org/#portfolio

24 May 2023

DetGPT: Detect What You Need via Reasoning

Arxiv : https://arxiv.org/pdf/2305.14167.pdf
Github : https://detgpt.github.io/

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

Arxiv : https://arxiv.org/pdf/2305.13786.pdf
Github : https://github.com/deepmind/perception_test

MIPI 2023 Challenge on Nighttime Flare Removal: Methods and Results

Arxiv : https://arxiv.org/pdf/2305.13770.pdf
Github : https://mipi-challenge.org/MIPI2023/

MaskCL: Semantic Mask-Driven Contrastive Learning for Unsupervised Person Re-Identification with Clothes Change

Arxiv : https://arxiv.org/pdf/2305.13600.pdf
Github : -

A Laplacian Pyramid Based Generative H&E Stain Augmentation Network

Arxiv : https://arxiv.org/pdf/2305.14301.pdf
Github : https://github.com/lifangda01/GSAN-Demo

23 May 2023

Materialistic: Selecting Similar Materials in Images

Arxiv : https://arxiv.org/pdf/2305.13291.pdf
Github : -

Movie101: A New Movie Understanding Benchmark

Arxiv : https://arxiv.org/pdf/2305.12140.pdf
Github : https://github.com/yuezih/Movie101

Productive Crop Field Detection: A New Dataset and Deep Learning Benchmark Results

Arxiv : https://arxiv.org/pdf/2305.11990.pdf
Github : https://github.com/egnascimento/productivefieldsdetection

18 May 2023

ICDAR 2023 Competition on Hierarchical Text Detection and Recognition

Arxiv : https://arxiv.org/pdf/2305.09750.pdf
Github : https://rrc.cvc.uab.es/?ch=18

Variational Classification

Arxiv : https://arxiv.org/pdf/2305.10406.pdf
Github : https://github.com/shehzaadzd/variational-classification

PromptUNet: Toward Interactive Medical Image Segmentation

Arxiv : https://arxiv.org/pdf/2305.10300.pdf
Github : https://github.com/WuJunde/PromptUNet

17 May 2023

Annotating 8,000 Abdominal CT Volumes for Multi-Organ Segmentation in Three Weeks

Arxiv : https://arxiv.org/pdf/2305.09666.pdf
Github : https://github.com/MrGiovanni/AbdomenAtlas

16 May 2023

ON THE HIDDEN MYSTERY OF OCR IN LARGE MULTIMODAL MODELS

Arxiv : https://arxiv.org/pdf/2305.07895.pdf
Github : https://github.com/Yuliang-Liu/MultimodalOCR

PLIP: Language-Image Pre-training for Person Representation Learning

Arxiv : https://arxiv.org/pdf/2305.08386.pdf
Github : https://github.com/Zplusdragon/PLIP

Document Understanding Dataset and Evaluation (DUDE )

Arxiv : https://arxiv.org/pdf/2305.08455.pdf
Github : -

CLIP-VG: Self-paced Curriculum Adapting of CLIP via Exploiting Pseudo-Language Labels for Visual Grounding

Arxiv : https://arxiv.org/pdf/2305.08685.pdf
Github : https://github.com/linhuixiao/CLIP-VG

M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis

Arxiv : https://arxiv.org/pdf/2305.08719.pdf
Github : https://github.com/HCIILAB/M6Doc

Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

Arxiv : https://arxiv.org/pdf/2305.07498.pdf
Github : https://github.com/jfkuang/CFAM

The ASNR-MICCAI Brain Tumor Segmentation (BraTS) Challenge 2023: Intracranial Meningioma

Arxiv : https://arxiv.org/pdf/2305.07642.pdf
Github : -

12 May 2023

Hyperbolic Deep Learning in Computer Vision: A Survey

Arxiv : https://arxiv.org/pdf/2305.06611.pdf
Github : -

Segment and Track Anything

Arxiv : https://arxiv.org/pdf/2305.06558.pdf
Github : https://github.com/z-x-yang/Segment-and-Track-Anything

10 May 2023

WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

Arxiv : https://arxiv.org/pdf/2305.05432.pdf
Github : https://github.com/google-research-datasets/wit/blob/main/wikiweb2m.md

Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition

Arxiv : https://arxiv.org/pdf/2305.05140.pdf
Github : https://github.com/CyrilSterling/LPV

Eiffel Tower: A Deep-Sea Underwater Dataset for Long-Term Visual Localization

Arxiv : https://arxiv.org/pdf/2305.05301.pdf
Github : https://www.seanoe.org/data/00810/92226/

TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition

Arxiv : https://arxiv.org/pdf/2305.05322.pdf
Github : https://github.com/simplify23/TPS_PP

Restormer-Plus for Real World Image Deraining: One State-of-the-Art Solution to the GT-RAIN Challenge (CVPR 2023 UG2+ Track 3)

Arxiv : https://arxiv.org/pdf/2305.05454.pdf
Github : https://github.com/ZJLAB-AMMI/Restormer-Plus

Real-time instance segmentation with polygons using an Intersection-over-Union loss

Arxiv : https://arxiv.org/pdf/2305.05490.pdf
Github : https://github.com/KatiaJDL/CenterPoly-v2

GROUP ACTIVITY RECOGNITION VIA DYNAMIC COMPOSITION AND INTERACTION

Arxiv : https://arxiv.org/pdf/2305.05583.pdf
Github : -

9 May 2023

IIITD-20K: Dense captioning for Text-Image ReID

Arxiv : https://arxiv.org/pdf/2305.04497.pdf
Github : https://drive.google.com/file/d/1oG0a4WQfkEeL_NKajtMQvY4yFnZ5jDJ8/view

DocDiff: Document Enhancement via Residual Diffusion Models

Arxiv : https://arxiv.org/pdf/2305.03892.pdf
Github : https://github.com/Royalvice/DocDiff

Video Object Segmentation in Panoptic Wild Scenes

Arxiv : https://arxiv.org/pdf/2305.04470.pdf
Github : https://github.com/yoxu515/VIPOSeg-Benchmark

Revisiting Table Detection Datasets for Visually Rich Documents

FEW SHOT LEARNING FOR MEDICAL IMAGING: A COMPARATIVE ANALYSIS OF METHODOLOGIES AND FORMAL MATHEMATICAL FRAMEWORK

Arxiv : https://arxiv.org/pdf/2305.04401.pdf
Github : -

CatFLW: Cat Facial Landmarks in the Wild Dataset

Arxiv : https://arxiv.org/pdf/2305.04232.pdf
Github : -

SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation

Arxiv : https://arxiv.org/pdf/2305.04609.pdf
Github : https://github.com/ayanban011/SwinDocSegmenter

ElasticHash: Semantic Image Similarity Search by Deep Hashing with Elasticsearch

Arxiv : https://arxiv.org/pdf/2305.04710.pdf
Github : -

Learning to Generate Poetic Chinese Landscape Painting with Calligraphy

Arxiv : https://arxiv.org/pdf/2305.04719.pdf
Github : -

Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields

Arxiv : https://arxiv.org/pdf/2305.04722.pdf
Github : -

AvatarReX: Real-time Expressive Full-body Avatars

Arxiv : https://arxiv.org/pdf/2305.04789.pdf
Github : -

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

Arxiv : https://arxiv.org/pdf/2305.04790.pdf
Github : https://github.com/open-mmlab/Multimodal-GPT

8 May 2023

Cola: How to adapt vision-language models to Compose Objects Localized with Attributes?

Arxiv : https://arxiv.org/pdf/2305.03689.pdf
Github : -

Semantic Segmentation using Vision Transformers: A survey

Arxiv : https://arxiv.org/pdf/2305.03273.pdf
Github : -

How Segment Anything Model (SAM) Boost Medical Image Segmentation?

Arxiv : https://github.com/YichiZhang98/SAM4MIS
Github : -

Breast Cancer Immunohistochemical Image Generation: a Benchmark Dataset and Challenge Review

Arxiv : https://arxiv.org/pdf/2305.03546.pdf
Github : https://bci.grand-challenge.org/

AttentionViz: A Global View of Transformer Attention

Arxiv : https://arxiv.org/pdf/2305.03210.pdf
Github : http://attentionviz.com/

27 April 2023

EasyPortrait – Face Parsing and Portrait Segmentation Dataset

Arxiv : https://arxiv.org/pdf/2304.13509.pdf
Github : https://anonymous.4open.science/r/anonymous-dataset-pep8/README.md

CLUSTER ENTROPY: ACTIVE DOMAIN ADAPTATION IN PATHOLOGICAL IMAGE SEGMENTATION

Arxiv : https://arxiv.org/pdf/2304.13513.pdf
Github : -

Development of a Realistic Crowd Simulation Environment for Fine-grained Validation of People Tracking Methods

Arxiv : https://arxiv.org/pdf/2304.13403.pdf
Github : -

26 April 2023

A Strong and Reproducible Object Detector with Only Public Datasets

Arxiv : https://arxiv.org/pdf/2304.13027.pdf
Github : https://github.com/IDEA-Research/Stable-DINO

DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents

Arxiv : https://arxiv.org/pdf/2304.12484.pdf
Github : https://datalab-groupe.github.io/

TensoIR: Tensorial Inverse Rendering

Arxiv : https://arxiv.org/pdf/2304.12461.pdf
Github : https://haian-jin.github.io/TensoIR/

Docmarking: Real-Time Screen-Cam Robust Document Image Watermarking

Arxiv : https://arxiv.org/pdf/2304.12682.pdf
Github : -

25 April 2023

A Cookbook of Self-Supervised Learning

Arxiv : https://arxiv.org/pdf/2304.12210.pdf
Github : -

Meta-tuning Loss Functions and Data Augmentation for Few-shot Object Detection

Arxiv : https://arxiv.org/pdf/2304.12161.pdf
Github : -

GRIG: Few-Shot Generative Residual Image Inpainting

Arxiv : https://arxiv.org/pdf/2304.12035.pdf
Github : -

Track Anything: Segment Anything Meets Videos

Arxiv : https://arxiv.org/pdf/2304.11968.pdf
Github : https://github.com/gaomingqi/Track-Anything

Survey on Unsupervised Domain Adaptation for Semantic Segmentation for Visual Perception in Automated Driving

Arxiv : https://arxiv.org/pdf/2304.11928.pdf
Github : -

ICDAR 2023 Competition on Reading the Seal Title

Arxiv : https://arxiv.org/pdf/2304.11966.pdf
Github : -

Segment Anything in Medical Images

Arxiv : https://arxiv.org/pdf/2304.12306.pdf
Github : https://github.com/bowang-lab/MedSAM

Advances in Deep Concealed Scene Understanding

Arxiv : https://arxiv.org/pdf/2304.11234.pdf
Github : https://github.com/DengPingFan/CSU

OmniLabel: A Challenging Benchmark for Language-Based Object Detection

Arxiv : https://arxiv.org/pdf/2304.11463.pdf
Github : https://www.omnilabel.org/

AirBirds: A Large-scale Challenging Dataset for Bird Strike Prevention in Real-world Airports

Arxiv : https://arxiv.org/pdf/2304.11662.pdf
Github : https://airbirdsdata.github.io/

24 April 2023

Factored Neural Representation for Scene Understanding

Arxiv : https://arxiv.org/pdf/2304.10950.pdf
Github : https://yushiangw.github.io/factorednerf/

19 April 2023

Perceive, Excavate and Purify: A Novel Object Mining Framework for Instance Segmentation

Arxiv : https://arxiv.org/pdf/2304.08826.pdf
Github : -

Quantum Annealing for Single Image Super-Resolution

Arxiv : https://arxiv.org/pdf/2304.08924.pdf
Github : -

Deep Unrestricted Document Image Rectification

Arxiv : https://arxiv.org/pdf/2304.08796.pdf
Github : https://github.com/fh2019ustc/DocTr-Plus

PG-VTON: A Novel Image-Based Virtual Try-On Method via Progressive Inference Paradigm

Arxiv : https://arxiv.org/pdf/2304.08956.pdf
Github : https://github.com/NerdFNY/PGVTON

DO HUMANS AND MACHINES HAVE THE SAME EYES? HUMAN-MACHINE PERCEPTUAL DIFFERENCES ON IMAGE CLASSIFICATION

Arxiv : https://arxiv.org/pdf/2304.08733.pdf
Github : -

A Comparison of Image Denoising Methods

Arxiv : https://arxiv.org/pdf/2304.08990.pdf
Github : https://github.com/ZhaomingKong/Denoising-Comparison

18 April 2023

Delving into Shape-aware Zero-shot Semantic Segmentation

Arxiv : https://arxiv.org/pdf/2304.08491.pdf
Github : https://github.com/Liuxinyv/SAZS

The 7th AI City Challenge

Arxiv : https://arxiv.org/pdf/2304.07500.pdf
Github : -

Handling Heavy Occlusion in Dense Crowd Tracking by Focusing on the Heads

Arxiv : https://arxiv.org/pdf/2304.07705.pdf
Github : -

GaitRef: Gait Recognition with Refined Sequential Skeletons

Arxiv : https://arxiv.org/pdf/2304.07916.pdf
Github : -

DETRs Beat YOLOs on Real-time Object Detection

Arxiv : https://arxiv.org/pdf/2304.08069.pdf
Github : https://github.com/PaddlePaddle/PaddleDetection

OVTrack: Open-Vocabulary Multiple Object Tracking

Arxiv : https://arxiv.org/pdf/2304.08408.pdf
Github : -

Synthetic Data from Diffusion Models Improves ImageNet Classification

Arxiv : https://arxiv.org/pdf/2304.08466.pdf
Github : -

DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training

Arxiv : https://arxiv.org/pdf/2304.08480.pdf
Github : https://github.com/IDEA-Research/DisCo-CLIP

Text2Performer: Text-Driven Human Video Generation

Arxiv : https://arxiv.org/pdf/2304.08483.pdf
Github : https://yumingj.github.io/projects/Text2Performer.html

17 April 2023

PARFormer: Transformer-based Multi-Task Network for Pedestrian Attribute Recognition

Arxiv : https://arxiv.org/pdf/2304.07230.pdf
Github : https://github.com/xwf199/PARFormer

The Second Monocular Depth Estimation Challenge

Arxiv : https://arxiv.org/pdf/2304.07051.pdf
Github : -

14 April 2023

Segment Everything Everywhere All at Once

Arxiv : https://arxiv.org/pdf/2304.06718.pdf
Github : https://github.com/UX-Decoder/Segment-Everything-Everywhere-All-At-Once

UniverSeg: Universal Medical Image Segmentation

Arxiv : https://arxiv.org/pdf/2304.06131.pdf
Github : https://github.com/JJGO/UniverSeg