/DailyResearchPaper

most notable computer vision arxiv papers

DailyResearchPaper

2 January 2024

HybridGait: A Benchmark for Spatial-Temporal Cloth-Changing Gait Recognition with Hybrid Explorations

A Large-Scale Re-identification Analysis in Sporting Scenarios: the Betrayal of Reaching a Critical Point

BREAK

12 December 2023

Photorealistic Video Generation with Diffusion Models

X2-Softmax: Margin Adaptive Loss Function for Face Recognition

PIXLORE: A DATASET-DRIVEN APPROACH TO RICH IMAGE CAPTIONING

LOSS FUNCTIONS IN THE ERA OF SEMANTIC SEGMENTATION: A SURVEY AND OUTLOOK

Pose Guidance by Supervision: A Framework for Clothes-Changing Person Re-Identification

Open World Object Detection in the Era of Foundation Models

SSPNet: Scale and spatial priors guided generalizable and interpretable pedestrian attribute recognition

MaskConver: Revisiting Pure Convolution Model for Panoptic Segmentation

CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models

NutritionVerse-Synth: An Open Access Synthetically Generated 2D Food Scene Dataset for Dietary Intake Estimation

Localization Is All You Evaluate: Data Leakage in Online Mapping Datasets and How to Fix It

Detecting Events in Crowds Through Changes in Geometrical Dimensions of Pedestrians

04 December 2023

Global Localization: Utilizing Relative Spatio-Temporal Geometric Constraints from Adjacent and Distant Cameras

A knowledge-based data-driven (KBDD) framework for all-day identification of cloud types using satellite remote sensing

01 December 2023

CAT-DM: Controllable Accelerated Virtual Try-on with Diffusion Model

MAXTRON: MASK TRANSFORMER WITH TRAJECTORY ATTENTION FOR VIDEO PANOPTIC SEGMENTATION

Guided Prompting in SAM for Weakly Supervised Cell Segmentation in Histopathological Images

Zooming Out on Zooming In: Advancing Super-Resolution for Remote Sensing

Diffusion Models Without Attention

29 November 2023

LLaMA-VID: An Image is Worth 2 Tokens in Large Language Models

Self-training solutions for the ICCV 2023 GeoNet Challenge

Small and Dim Target Detection in IR Imagery: A Review

GaitContour: Efficient Gait Recognition based on a Contour-Pose Representation

Word for Person: Zero-shot Composed Person Retrieval

28 November 2023

Video-based Visible-Infrared Person Re-Identification with Auxiliary Samples

Model-agnostic Body Part Relevance Assessment for Pedestrian Detection

Optimal Transport Aggregation for Visual Place Recognition

Unleashing the Power of Prompt-driven Nucleus Instance Segmentation

Street TryOn: Learning In-the-Wild Virtual Try-On from Unpaired Person Images

21 November 2023

Multi-Task Faces (MTF) Data Set: A Legally and Ethically Compliant Collection of Face Images for Various Classification Tasks

SniffyArt: The Dataset of Smelling Persons

LLMs as Visual Explainers: Advancing Image Classification with Evolving Visual Descriptions

Exchanging Dual Encoder-Decoder: A New Strategy for Change Detection with Semantic Guidance and Spatial Localization

CurriculumLoc: Enhancing Cross-Domain Geolocalization through Multi-Stage Refinement

20 November 2023

BiHRNet: A Binary high-resolution network for Human Pose Estimation

FRCSyn Challenge at WACV 2024: Face Recognition Challenge in the Era of Synthetic Data

SSB: Simple but Strong Baseline for Boosting Performance of Open-Set Semi-Supervised Learning

FOCAL: A Cost-Aware Video Dataset for Active Learning

CA-Jaccard: Camera-aware Jaccard Distance for Person Re-identification

EMU VIDEO: Factorizing Text-to-Video Generation by Explicit Image Conditioning

17 November 2023

RED-DOT: MULTIMODAL FACT-CHECKING VIA RELEVANT EVIDENCE DETECTION

Reading Between the Mud: A Challenging Motorcycle Racer Number Dataset

Devil in the Landscapes: Inferring Epidemic Exposure Risks from Street View Imagery

RENI++: A Rotation-Equivariant, Scale-Invariant, Natural Illumination Prior

16 November 2023

MUDD: A New Re-Identification Dataset with Efficient Annotation for Off-Road Racers in Extreme Conditions

LOW-LIGHT PEDESTRIAN DETECTION IN VISIBLE AND INFRARED IMAGE FEEDS: ISSUES AND CHALLENGES

ConeQuest: A Benchmark for Cone Segmentation on Mars

Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models

Contrastive Transformer Learning with Proximity Data Generation for Text-Based Person Search

WildlifeDatasets: An open-source toolkit for animal re-identification

14 November 2023

Story-to-Motion: Synthesizing Infinite and Controllable Character Animation from Long Text

VGSG: Vision-Guided Semantic-Group Network for Text-based Person Search

Towards Automatic Honey Bee Flower-Patch Assays with Paint Marking Re-Identification

PICS IN PICS: PHYSICS INFORMED CONTOUR SELECTION FOR RAPID IMAGE SEGMENTATION

CHATANYTHING: FACETIME CHAT WITH LLM-ENHANCED PERSONAS

13 November 2023

DIFFUSION MODELS FOR EARTH OBSERVATION USE-CASES: FROM CLOUD REMOVAL TO URBAN CHANGE DETECTION

Harnessing Synthetic Datasets: The Role of Shape Bias in Deep Neural Network Generalization

Learning Human Action Recognition Representations Without Real Humans

Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks

Whole-body Detection, Recognition and Identification at Altitude and Range

Are “Hierarchical” Visual Representations Hierarchical?

8 November 2023

Bias and Diversity in Synthetic-based Face Recognition

Multi-view Information Integration and Propagation for Occluded Person Re-identification

Unsupervised Region-Growing Network for Object Segmentation in Atmospheric Turbulence

7 November 2023

Rethinking Evaluation Metrics of Open-Vocabulary Segmentaion

PainSeeker: An Automated Method for Assessing Pain in Rats Through Facial Expressions

A survey and classification of face alignment methods based on face models

Fast and Interpretable Face Identification for Out-Of-Distribution Data Using Vision Transformers

AV-Lip-Sync+: Leveraging AV-HuBERT to Exploit Multimodal Inconsistency for Video Deepfake Detection

Dense Video Captioning: A Survey of Techniques, Datasets and Evaluation Protocols

UniTSFace: Unified Threshold Integrated Sample-to-Sample Loss for Face Recognition

Lost Your Style? Navigating with Semantic-Level Approach for Text-to-Outfit Retrieval

6 November 2023

Medical Image Segmentation with Domain Adaptation: A Survey

26 October 2023

ChimpACT: A Longitudinal Dataset for Understanding Chimpanzee Behaviors

CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection

Land-cover change detection using paired OpenStreetMap data and optical high-resolution imagery via object-guided Transformer

VACATION

10 October 2023

Anchor-Intermediate Detector: Decoupling and Coupling Bounding Boxes for Accurate Object Detection

OV-PARTS: Towards Open-Vocabulary Part Segmentation

AANet: Aggregation and Alignment Network with Semi-hard Positive Sample Mining for Hierarchical Place Recognition

A Benchmark Dataset for Harmful Object Detection

9 October 2023

ClusVPR: Efficient Visual Place Recognition with Clustering-based Weighted Transformer

5 October 2023

Land-cover change detection using paired OpenStreetMap data and optical high-resolution imagery via object-guided Transformer

Human-centric Behavior Description in Videos: New Benchmark and Model

4 October 2023

MINIGPT-5: INTERLEAVED VISION-AND-LANGUAGE GENERATION VIA GENERATIVE VOKENS

MATHVISTA: EVALUATING MATHEMATICAL REASONING OF FOUNDATION MODELS IN VISUAL CONTEXTS

DREAM: Visual Decoding from REversing HumAn Visual SysteM

AI-Generated Images as Data Source: The Dawn of Synthetic Era

  • Arxiv :

  • Github : -

  • Arxiv :

  • Github : -

  • Arxiv :

  • Github : -

  • Arxiv :

  • Github : -

  • Arxiv :

  • Github : -

  • Arxiv :

  • Github : -

  • Arxiv :

  • Github : -

2 October 2023

HoloAssist: an Egocentric Human Interaction Dataset for Interactive AI Assistants in the Real World

Scalable Multi-Temporal Remote Sensing Change Data Generation via Simulating Stochastic Change Process

Retail-786k: a Large-Scale Dataset for Visual Entity Matching

TBD Pedestrian Data Collection: Towards Rich, Portable, and Large-Scale Natural Pedestrian Data

A Survey on Deep Learning Techniques for Action Anticipation

27 September 2023

The Surveillance AI Pipeline

Explaining Deep Face Algorithms through Visualization: A Survey

ENIGMA-51: Towards a Fine-Grained Understanding of Human-Object Interactions in Industrial Scenarios

26 September 2023

Multiple Different Explanations for Image Classifiers

Contextual Emotion Estimation from Image Captions

UniHead: Unifying Multi-Perception for Detection Heads

Face-Att: Enhancing Image Captioning with Facial Attributes for Portrait Images

UCF-Crime Annotation: A Benchmark for Surveillance Video-and-Language Understanding

Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition

Single Image Test-Time Adaptation for Segmentation

25 September 2023

DETECT EVERY THING WITH FEW EXAMPLES

FUTURE-AI: International consensus guideline for trustworthy and deployable artificial intelligence in healthcare

Performance Analysis of UNet and Variants for Medical Image Segmentation

DIOR: Dataset for Indoor-Outdoor Reidentification - Long Range 3D/2D Skeleton Gait Collection Pipeline, Semi-Automated Gait Keypoint Labeling and Baseline Evaluation Methods

22 September 2023

SANPO A SCENE UNDERSTANDING, ACCESSIBILITY, NAVIGATION, PATHFINDING, OBSTACLE AVOIDANCE DATASET

BASE: Probably a Better Approach to Multi-Object Tracking

21 September 2023

A SYSTEMATIC REVIEW OF FEW-SHOT LEARNING IN MEDICAL IMAGING

20 September 2023

Exploring Different Levels of Supervision for Detecting and Localizing Solar Panels on Remote Sensing Imagery

Human Gait Recognition using Deep Learning: A Comprehensive Review

19 September 2023

EgoObjects: A Large-Scale Egocentric Dataset for Fine-Grained Object Understanding

Rethinking Cross-Domain Pedestrian Detection: A Background-Focused Distribution Alignment Framework for Instance-free One-Stage Detectors

IMPROVED BREAST CANCER DIAGNOSIS THROUGH TRANSFER LEARNING ON HEMATOXYLIN AND EOSIN STAINED HISTOLOGY IMAGES

Personalized Food Image Classification: Benchmark Datasets and New Baseline

MONITORING URBAN CHANGES IN MARIUPOL/UKRAINE IN 2022/23

18 September 2023

Double Domain Guided Real-Time Low-Light Image Enhancement for Ultra-High-Definition Transportation Surveillance

Salient Object Detection in Optical Remote Sensing Images Driven by Transformer

Padding Aware Neurons

TOWARDS LARGE-SCALE BUILDING ATTRIBUTE MAPPING USING CROWDSOURCED IMAGES: SCENE TEXT RECOGNITION ON FLICKR AND PROBLEMS TO BE SOLVED

12 September 2023

OpenFashionCLIP: Vision-and-Language Contrastive Learning with Open-Source Fashion Data

BiLMa: Bidirectional Local-Matching for Text-based Person Re-identification

Beyond Skin Tone: A Multidimensional Measure of Apparent Skin Color

11 September 2023

Long-Range Correlation Supervision for Land-Cover Classification from Remote Sensing Images

WiSARD: A Labeled Visual and Thermal Image Dataset for Wilderness Search and Rescue

8 September 2023

Region Generation and Assessment Network for Occluded Person Re-Identification

RepSGG: Novel Representations of Entities and Relationships for Scene Graph Generation

Better Practices for Domain Adaptation

Anatomy-informed Data Augmentation for Enhanced Prostate Cancer Detection

7 September 2023

My Art My Choice: Adversarial Protection Against Unruly AI

Do We Still Need Non-Maximum Suppression? Accurate Confidence Estimates and Implicit Duplication Modeling with IoU-Aware Calibration

6 September 2023

Unified Pre-training with Pseudo Texts for Text-To-Image Person Re-identification

DeViL: Decoding Vision features into Language

Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models

Towards Universal Image Embeddings: A Large-Scale Dataset and Challenge for Generic Image Representations

SyntheWorld: A Large-Scale Synthetic Dataset for Land Cover Mapping and Building Change Detection

NICE 2023 Zero-shot Image Captioning Challenge

A survey on efficient vision transformers: algorithms, techniques, and performance benchmarking

DCP-Net: A Distributed Collaborative Perception Network for Remote Sensing Semantic Segmentation

Haystack: A Panoptic Scene Graph Dataset to Evaluate Rare Predicate Classes

Muti-Stage Hierarchical Food Classification

Prototype-based Dataset Comparison

4 September 2023

DARC: Distribution-Aware Re-Coloring Model for Generalizable Nucleus Segmentation

Diffusion Model with Clustering-based Conditioning for Food Image Generation

Object-Centric Multiple Object Tracking

Fine-grained Recognition with Learnable Semantic Data Augmentation

An Improved Encoder-Decoder Framework for Food Energy Estimation

TIME SERIES ANALYSIS OF URBAN LIVEABILITY

25 August 2023

With a Little Help from your own Past: Prototypical Memory Networks for Image Captioning

Ground-to-Aerial Person Search: Benchmark Dataset and Approach

LISTER: Neighbor Decoding for Length-Insensitive Scene Text Recognition

Beyond Document Page Classification: Design, Datasets, and Challenges

24 August 2023

InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4

SPPNet: A Single-Point Prompt Network for Nuclei Image Segmentation

Weakly Supervised Face and Whole Body Recognition in Turbulent Environments

Camera-Driven Representation Learning for Unsupervised Domain Adaptive Person Re-identification

Progressive Feature Mining and External Knowledge-Assisted Text-Pedestrian Image Retrieval

HarvestNet: A Dataset for Detecting Smallholder Farming Activity Using Harvest Piles and Remote Sensing

The TYC Dataset for Understanding Instance-Level Semantics and Motions of Cells in Microstructures

23 August 2023

Classification of the lunar surface pattern by AI architectures: Does AI see a rabbit in the Moon?

SwinV2DNet: Pyramid and Self-Supervision Compounded Feature Learning for Remote Sensing Images Change Detection

A three in one bottom-up framework for simultaneous semantic segmentation, instance segmentation and classification of multi-organ nuclei in digital cancer histology

DiffCloth: Diffusion Based Garment Synthesis and Manipulation via Structural Cross-modal Semantic Alignment

Using and Abusing Equivariance

22 August 2023

Can Language Models Learn to Listen?

EigenPlaces: Training Viewpoint Robust Models for Visual Place Recognition

A step towards understanding why classification helps regression

GaitPT: Skeletons Are All You Need For Gait Recognition

Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification

Color Prompting for Data-Free Continual Unsupervised Domain Adaptive Person Re-Identification

Rethinking Person Re-identification from a Projection-on-Prototypes Perspective

Patch Is Not All You Need

Noisy-Correspondence Learning for Text-to-Image Person Re-identification

Microscopy Image Segmentation via Point and Shape Regularized Data Synthesis

21 August 2023

LaRS: A Diverse Panoptic Maritime Obstacle Detection Dataset and Benchmark

Data augmentation and explainability for bias discovery and mitigation in deep learning

Flickr Africa: Examining Geo-Diversity in Large-Scale, Human-Centric Visual Data

Identity-Seeking Self-Supervised Representation Learning for Generalizable Person Re-identification

Identity-Aware Semi-Supervised Learning for Comic Character Re-Identification

Generalized Sum Pooling for Metric Learning

Lip Reading for Low-resource Languages by Learning and Combining General Speech Knowledge and Language-specific Knowledge

Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers

Language-Guided Diffusion Model for Visual Grounding

GeoDTR+: Toward generic cross-view geolocalization via geometric disentanglement

17 August 2023

TeCH: Text-guided Reconstruction of Lifelike Clothed Humans

Diagnosing Human-object Interaction Detectors

DeDoDe: Detect, Don’t Describe — Describe, Don’t Detect for Local Feature Matching

Integrating Visual and Semantic Similarity Using Hierarchies for Image Retrieval

ALIP: Adaptive Language-Image Pre-training with Synthetic Caption

Membrane Potential Batch Normalization for Spiking Neural Networks

Visually-Aware Context Modeling for News Image Captioning

MultiMediate ’23: Engagement Estimation and Bodily Behaviour Recognition in Social Interactions

View Consistent Purification for Accurate Cross-View Localization

CARE: A Large Scale CT Image Dataset and Clinical Applicable Benchmark Model for Rectal Cancer Segmentation

16 August 2023

Ske2Grid: Skeleton-to-Grid Representation Learning for Action Recognition

15 August 2023

An Outlook into the Future of Egocentric Vision

14 August 2023

Image-based Geolocalization by Ground-to-2.5D Map Matching

The Multi-modality Cell Segmentation Challenge: Towards Universal Solutions

11 August 2023

YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection

TrainFors: A Large Benchmark Training Dataset for Image Manipulation Detection and Localization

Vacation

10 July 2023

Beyond Geo-localization: Fine-grained Orientation of Street-view Images by Cross-view Matching with Satellite Imagery

Vision Language Transformers: A Survey

To pretrain or not to pretrain? A case study of domain-specific pretraining for semantic segmentation in histopathology

PseudoCell: Hard Negative Mining as Pseudo Labeling for Deep Learning-Based Centroblast Cell Detection

7 July 2023

Synthesizing Artistic Cinemagraphs from Text

CityTrack: Improving City-Scale Multi-Camera Multi-Target Tracking by Location-Aware Tracking and Box-Grained Matching

Spherical Feature Pyramid Networks For Semantic Segmentation

SegNetr: Rethinking the local-global interactions and skip connections in U-shaped networks

LOSS FUNCTIONS AND METRICS IN DEEP LEARNING. A REVIEW

6 July 2023

Unbalanced Optimal Transport: A Unified Framework for Object Detection

Rethinking Multiple Instance Learning for Whole Slide Image Classification: A Good Instance Classifier is All You Need

MDViT: Multi-domain Vision Transformer for Small Medical Image Segmentation Datasets

Make A Long Image Short: Adaptive Token Length for Vision Transformers

21 Jun 2023

Bullying10K: A Neuromorphic Dataset towards Privacy-Preserving Bullying Recognition

How can objects help action recognition?

Meerkat Behaviour Recognition Dataset

Quilt-1M: One Million Image-Text Pairs for Histopathology

AVOIDDS: Aircraft Vision-based Intruder Detection Dataset and Simulator

Enlighten-anything:When Segment Anything Model Meets Low-light Image Enhancement

19 Jun 2023

Scaling Open-Vocabulary Object Detection

Leveraging Human Salience to Improve Calorie Estimation

16 Jun 2023

UNDERSTANDING OPTIMIZATION OF DEEP LEARNING

When and Why Momentum Accelerates SGD: An Empirical Study

OCAtari: Object-Centric Atari 2600 Reinforcement Learning Environments

What can a cook in Italy teach a mechanic in India? Action Recognition Generalisation Over Scenarios and Locations

TryOnDiffusion: A Tale of Two UNets

LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models

DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data

Fast Training of Diffusion Models with Masked Transformers

14 Jun 2023

Image Captioners Are Scalable Vision Learners Too

GeneCIS: A Benchmark for General Conditional Image Similarity

VISION Datasets: A Benchmark for Vision-based InduStrial InspectiON

Neural Scene Chronology

Semi-supervised learning made simple with self-supervised clustering

Retrieve Anyone: A General-purpose Person Re-identification Task with Instructions

Compositionally Equivariant Representation Learning

Reviving Shift Equivariance in Vision Transformers

13 Jun 2023

MovieFactory: Automatic Movie Creation from Text using Large Generative Models for Language and Images

7 Jun 2023

PhenoBench — A Large Dataset and Benchmarks for Semantic Image Interpretation in the Agricultural Domain

6 Jun 2023

Cyclic Learning: Bridging Image-level Labels and Nuclei Instance Segmentation

Cycle Consistency Driven Object Discovery

5 Jun 2023

Towards In-context Scene Understanding

Publicly available datasets of breast histopathology H&E whole-slide images: A systematic review

1 Jun 2023

LOWA: Localize Objects in the Wild with Attributes

GaitGS: Temporal Feature Learning in Granularity and Span Dimension for Gait Recognition

Are Large Kernels Better Teachers than Transformers for ConvNets?

A Unified Framework for U-Net Design and Analysis

31 May 2023

Multi-modal Queried Object Detection in the Wild

LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding

30 May 2023

FUSECAP: Leveraging Large Language Models to Fuse Visual Data into Enriched Image Captions

Using Caterpillar to Nibble Small-Scale Images

TaleCrafter: Interactive Story Visualization with Multiple Characters

Contextual Object Detection with Multimodal Large Language Models

Ghost in the Minecraft: Generally Capable Agents for Open-World Enviroments via Large Language Models with Text-based Knowledge and Memory

29 May 2023

Mindstorms in Natural Language-Based Societies of Mind

26 May 2023

Break-A-Scene: Extracting Multiple Concepts from a Single Image

Making Vision Transformers Truly Shift-Equivariant

CENSUS-HWR: a large training dataset for offline handwriting recognition

CN-Celeb-AV: A Multi-Genre Audio-Visual Dataset for Person Recognition

24 May 2023

DetGPT: Detect What You Need via Reasoning

Perception Test: A Diagnostic Benchmark for Multimodal Video Models

MIPI 2023 Challenge on Nighttime Flare Removal: Methods and Results

MaskCL: Semantic Mask-Driven Contrastive Learning for Unsupervised Person Re-Identification with Clothes Change

A Laplacian Pyramid Based Generative H&E Stain Augmentation Network

23 May 2023

Materialistic: Selecting Similar Materials in Images

Movie101: A New Movie Understanding Benchmark

Productive Crop Field Detection: A New Dataset and Deep Learning Benchmark Results

18 May 2023

ICDAR 2023 Competition on Hierarchical Text Detection and Recognition

Variational Classification

PromptUNet: Toward Interactive Medical Image Segmentation

17 May 2023

Annotating 8,000 Abdominal CT Volumes for Multi-Organ Segmentation in Three Weeks

16 May 2023

ON THE HIDDEN MYSTERY OF OCR IN LARGE MULTIMODAL MODELS

PLIP: Language-Image Pre-training for Person Representation Learning

Document Understanding Dataset and Evaluation (DUDE )

CLIP-VG: Self-paced Curriculum Adapting of CLIP via Exploiting Pseudo-Language Labels for Visual Grounding

M6Doc: A Large-Scale Multi-Format, Multi-Type, Multi-Layout, Multi-Language, Multi-Annotation Category Dataset for Modern Document Layout Analysis

Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution

The ASNR-MICCAI Brain Tumor Segmentation (BraTS) Challenge 2023: Intracranial Meningioma

12 May 2023

Hyperbolic Deep Learning in Computer Vision: A Survey

Segment and Track Anything

10 May 2023

WikiWeb2M: A Page-Level Multimodal Wikipedia Dataset

Linguistic More: Taking a Further Step toward Efficient and Accurate Scene Text Recognition

Eiffel Tower: A Deep-Sea Underwater Dataset for Long-Term Visual Localization

TPS++: Attention-Enhanced Thin-Plate Spline for Scene Text Recognition

Restormer-Plus for Real World Image Deraining: One State-of-the-Art Solution to the GT-RAIN Challenge (CVPR 2023 UG2+ Track 3)

Real-time instance segmentation with polygons using an Intersection-over-Union loss

GROUP ACTIVITY RECOGNITION VIA DYNAMIC COMPOSITION AND INTERACTION

9 May 2023

IIITD-20K: Dense captioning for Text-Image ReID

DocDiff: Document Enhancement via Residual Diffusion Models

Video Object Segmentation in Panoptic Wild Scenes

Revisiting Table Detection Datasets for Visually Rich Documents

FEW SHOT LEARNING FOR MEDICAL IMAGING: A COMPARATIVE ANALYSIS OF METHODOLOGIES AND FORMAL MATHEMATICAL FRAMEWORK

CatFLW: Cat Facial Landmarks in the Wild Dataset

SwinDocSegmenter: An End-to-End Unified Domain Adaptive Transformer for Document Instance Segmentation

ElasticHash: Semantic Image Similarity Search by Deep Hashing with Elasticsearch

Learning to Generate Poetic Chinese Landscape Painting with Calligraphy

Understanding Gaussian Attention Bias of Vision Transformers Using Effective Receptive Fields

AvatarReX: Real-time Expressive Full-body Avatars

MultiModal-GPT: A Vision and Language Model for Dialogue with Humans

8 May 2023

Cola: How to adapt vision-language models to Compose Objects Localized with Attributes?

Semantic Segmentation using Vision Transformers: A survey

How Segment Anything Model (SAM) Boost Medical Image Segmentation?

Breast Cancer Immunohistochemical Image Generation: a Benchmark Dataset and Challenge Review

AttentionViz: A Global View of Transformer Attention

27 April 2023

EasyPortrait – Face Parsing and Portrait Segmentation Dataset

CLUSTER ENTROPY: ACTIVE DOMAIN ADAPTATION IN PATHOLOGICAL IMAGE SEGMENTATION

Development of a Realistic Crowd Simulation Environment for Fine-grained Validation of People Tracking Methods

26 April 2023

A Strong and Reproducible Object Detector with Only Public Datasets

DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents

TensoIR: Tensorial Inverse Rendering

Docmarking: Real-Time Screen-Cam Robust Document Image Watermarking

25 April 2023

A Cookbook of Self-Supervised Learning

Meta-tuning Loss Functions and Data Augmentation for Few-shot Object Detection

GRIG: Few-Shot Generative Residual Image Inpainting

Track Anything: Segment Anything Meets Videos

Survey on Unsupervised Domain Adaptation for Semantic Segmentation for Visual Perception in Automated Driving

ICDAR 2023 Competition on Reading the Seal Title

Segment Anything in Medical Images

Advances in Deep Concealed Scene Understanding

OmniLabel: A Challenging Benchmark for Language-Based Object Detection

AirBirds: A Large-scale Challenging Dataset for Bird Strike Prevention in Real-world Airports

24 April 2023

Factored Neural Representation for Scene Understanding

19 April 2023

Perceive, Excavate and Purify: A Novel Object Mining Framework for Instance Segmentation

Quantum Annealing for Single Image Super-Resolution

Deep Unrestricted Document Image Rectification

PG-VTON: A Novel Image-Based Virtual Try-On Method via Progressive Inference Paradigm

DO HUMANS AND MACHINES HAVE THE SAME EYES? HUMAN-MACHINE PERCEPTUAL DIFFERENCES ON IMAGE CLASSIFICATION

A Comparison of Image Denoising Methods

18 April 2023

Delving into Shape-aware Zero-shot Semantic Segmentation

The 7th AI City Challenge

Handling Heavy Occlusion in Dense Crowd Tracking by Focusing on the Heads

GaitRef: Gait Recognition with Refined Sequential Skeletons

DETRs Beat YOLOs on Real-time Object Detection

OVTrack: Open-Vocabulary Multiple Object Tracking

Synthetic Data from Diffusion Models Improves ImageNet Classification

DisCo-CLIP: A Distributed Contrastive Loss for Memory Efficient CLIP Training

Text2Performer: Text-Driven Human Video Generation

17 April 2023

PARFormer: Transformer-based Multi-Task Network for Pedestrian Attribute Recognition

The Second Monocular Depth Estimation Challenge

14 April 2023

Segment Everything Everywhere All at Once

UniverSeg: Universal Medical Image Segmentation