Paper-List-DAILY
Automatically Update Papers Daily in list

Updated on 2024.06.02

Table of Contents

Classification
Object Detection
Semantic Segmentation
Object Tracking
Action Recognition
Pose Estimation
Image Generation
LLM
Scene Understanding
Depth Estimation
Audio Processing
Multimodal
Anomaly Detection
Transfer Learning
Optical Flow
Reinforcement Learning
Graph Neural Networks

Classification

Publish Date	Title	Authors	PDF	Code
2024-05-30	DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark	Haoxing Chen et.al.	2405.19707	link
2024-05-30	A Novel Approach for Automated Design Information Mining from Issue Logs	Jiuang Zhao et.al.	2405.19623	null
2024-05-29	I Bet You Did Not Mean That: Testing Semantic Importance via Betting	Jacopo Teneggi et.al.	2405.19146	null
2024-05-29	Verifiably Robust Conformal Prediction	Linus Jeary et.al.	2405.18942	null
2024-05-29	Leveraging Many-To-Many Relationships for Defending Against Visual-Language Adversarial Attacks	Futa Waseda et.al.	2405.18770	null
2024-05-29	GIST: Greedy Independent Set Thresholding for Diverse Data Summarization	Matthew Fahrbach et.al.	2405.18754	null
2024-05-29	LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification	Renyi Qu et.al.	2405.18672	null
2024-05-28	Its Not a Modality Gap: Characterizing and Addressing the Contrastive Gap	Abrar Fahim et.al.	2405.18570	null
2024-05-28	Why are Visually-Grounded Language Models Bad at Image Classification?	Yuhui Zhang et.al.	2405.18415	link
2024-05-28	MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution	Wenzhuo Liu et.al.	2405.18240	null
2024-05-28	Confidence-aware multi-modality learning for eye disease screening	Ke Zou et.al.	2405.18167	link
2024-05-28	4-bit Shampoo for Memory-Efficient Network Training	Sike Wang et.al.	2405.18144	null
2024-05-28	DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture	Shentong Mo et.al.	2405.17995	null
2024-05-27	WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average	Louis Fournier et.al.	2405.17517	null
2024-05-27	Model-Agnostic Zeroth-Order Policy Optimization for Meta-Learning of Ergodic Linear Quadratic Regulators	Yunian Pan et.al.	2405.17370	null
2024-05-27	On the Noise Robustness of In-Context Learning for Text Generation	Hongfu Gao et.al.	2405.17264	null
2024-05-27	Superpixelwise Low-rank Approximation based Partial Label Learning for Hyperspectral Image Classification	Shujun Yang et.al.	2405.17110	link
2024-05-26	Demystify Mamba in Vision: A Linear Attention Perspective	Dongchen Han et.al.	2405.16605	null
2024-05-26	AdaFisher: Adaptive Second Order Optimization via Fisher Information	Damien Martins Gomes et.al.	2405.16397	null
2024-05-25	ModelLock: Locking Your Model With a Spell	Yifeng Gao et.al.	2405.16285	null
2024-05-25	Accelerating Transformers with Spectrum-Preserving Token Merging	Hoai-Chau Tran et.al.	2405.16148	null
2024-05-25	Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack	Mingli Zhu et.al.	2405.16134	null
2024-05-24	Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images	Yiran Luo et.al.	2405.15961	null
2024-05-24	A Neurosymbolic Framework for Bias Correction in CNNs	Parth Padalkar et.al.	2405.15886	null
2024-05-24	What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models	Abdelrahman Abdelhamed et.al.	2405.15668	null
2024-05-24	Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning	Wenhan Chang et.al.	2405.15662	null
2024-05-24	Exposing Image Classifier Shortcuts with Counterfactual Frequency (CoF) Tables	James Hinns et.al.	2405.15661	null
2024-05-24	Harnessing Increased Client Participation with Cohort-Parallel Federated Learning	Akash Dhasade et.al.	2405.15644	null
2024-05-24	Transformer-based Federated Learning for Multi-Label Remote Sensing Image Classification	Barış Büyüktaş et.al.	2405.15405	null
2024-05-24	CLIP model is an Efficient Online Lifelong Learner	Leyuan Wang et.al.	2405.15155	null
2024-05-24	OptLLM: Optimal Assignment of Queries to Large Language Models	Yueyue Liu et.al.	2405.15130	null
2024-05-23	A Lost Opportunity for Vision-Language Models: A Comparative Study of Online Test-time Adaptation for Vision-Language Models	Mario Döbler et.al.	2405.14977	link
2024-05-23	Domain Wall Magnetic Tunnel Junction Reliable Integrate and Fire Neuron	Can Cui1 et.al.	2405.14851	null
2024-05-23	Explaining Black-box Model Predictions via Two-level Nested Feature Attributions with Consistency Property	Yuya Yoshikawa et.al.	2405.14522	null
2024-05-23	SIAVC: Semi-Supervised Framework for Industrial Accident Video Classification	Zuoyong Li et.al.	2405.14506	null
2024-05-23	Scalable Visual State Space Model with Fractal Scanning	Lv Tang et.al.	2405.14480	null
2024-05-23	Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation	Daniel Kienzle et.al.	2405.14467	null
2024-05-23	Boosting Robustness by Clipping Gradients in Distributed Learning	Youssef Allouah et.al.	2405.14432	null
2024-05-23	Advancing Spiking Neural Networks for Sequential Modeling with Central Pattern Generators	Changze Lv et.al.	2405.14362	null
2024-05-23	Simple Hamiltonian dynamics is a powerful quantum processing resource	Akitada Sakurai et.al.	2405.14245	null
2024-05-23	ChronosLex: Time-aware Incremental Training for Temporal Generalization of Legal Classification Tasks	T. Y. S. S Santosh et.al.	2405.14211	null
2024-05-22	Just rotate it! Uncertainty estimation in closed-source models via multiple queries	Konstantinos Pitas et.al.	2405.13864	null
2024-05-21	Decentralized Federated Learning Over Imperfect Communication Channels	Weicai Li et.al.	2405.12894	null
2024-05-21	Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting	Omar Hamed et.al.	2405.12705	null
2024-05-21	Exploration of Masked and Causal Language Modelling for Text Generation	Nicolo Micheletti et.al.	2405.12630	null
2024-05-21	3DSS-Mamba: 3D-Spectral-Spatial Mamba for Hyperspectral Image Classification	Yan He et.al.	2405.12487	null
2024-05-20	Alzheimer's Magnetic Resonance Imaging Classification Using Deep and Meta-Learning Models	Nida Nasir et.al.	2405.12126	null
2024-05-20	Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification	Weilian Zhou et.al.	2405.12003	link
2024-05-20	A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers	Tom Roth et.al.	2405.11904	null
2024-05-21	A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus	Eduard Poesina et.al.	2405.11877	link
2024-05-20	SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model	Siavash Shams et.al.	2405.11831	link
2024-05-20	Exploring Ordinality in Text Classification: A Comparative Study of Explicit and Implicit Techniques	Siva Rajesh Kasa et.al.	2405.11775	null
2024-05-19	SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization	Jialong Guo et.al.	2405.11582	link
2024-05-19	Reproducibility Study of CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification	Manan Shah et.al.	2405.11574	link
2024-05-19	An Invisible Backdoor Attack Based On Semantic Feature	Yangming Chen et.al.	2405.11551	null
2024-05-19	Verification technology for finger vein biometric	George Kumi Kyeremeh et.al.	2405.11540	null
2024-05-17	Reduced storage direct tensor ring decomposition for convolutional neural networks compression	Mateusz Gabor et.al.	2405.10802	link
2024-05-17	Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset	Jie Zhu et.al.	2405.10542	link
2024-05-17	Smart Expert System: Large Language Models as Text Classifiers	Zhiqiang Wang et.al.	2405.10523	link
2024-05-16	Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge	Florian Schmid et.al.	2405.10018	null
2024-05-16	ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset	Johannes Rückert et.al.	2405.10004	link
2024-05-15	Improving Label Error Detection and Elimination with Uncertainty Quantification	Johannes Jakubik et.al.	2405.09602	null
2024-05-15	Tackling Distribution Shifts in Task-Oriented Communication with Information Bottleneck	Hongru Li et.al.	2405.09514	null
2024-05-15	Feature-based Federated Transfer Learning: Communication Efficiency, Robustness and Privacy	Feng Wang et.al.	2405.09014	link
2024-05-14	The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks	Ziquan Liu et.al.	2405.08886	link
2024-05-14	Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling	Gregory Holste et.al.	2405.08780	null
2024-05-14	FolkTalent: Enhancing Classification and Tagging of Indian Folk Paintings	Nancy Hada et.al.	2405.08776	null
2024-05-14	The impact of Compositionality in Zero-shot Multi-label action recognition for Object-based tasks	Carmela Calabrese et.al.	2405.08695	null
2024-05-14	Achieving Fairness Through Channel Pruning for Dermatological Disease Diagnosis	Qingpeng Kong et.al.	2405.08681	link
2024-05-14	Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning	Alain Riou et.al.	2405.08679	null
2024-05-14	Dual-Branch Network for Portrait Image Quality Assessment	Wei Sun et.al.	2405.08555	null
2024-05-13	Who's in and who's out? A case study of multimodal CLIP-filtering in DataComp	Rachel Hong et.al.	2405.08209	link
2024-05-14	MambaOut: Do We Really Need Mamba for Vision?	Weihao Yu et.al.	2405.07992	link
2024-05-13	Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin Dynamics	Haoyang Zheng et.al.	2405.07839	link
2024-05-13	Analysis of the rate of convergence of an over-parametrized convolutional neural network image classifier learned by gradient descent	Michael Kohler et.al.	2405.07619	null
2024-05-13	On-device Online Learning and Semantic Management of TinyML Systems	Haoyu Ren et.al.	2405.07601	null
2024-05-13	GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation	Andrey V. Galichin et.al.	2405.07562	null
2024-05-13	Fine-tuning the SwissBERT Encoder Model for Embedding Sentences and Documents	Juri Grosjean et.al.	2405.07513	null
2024-05-13	MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks	Haijiang Tian et.al.	2405.07411	null
2024-05-12	Explainable Convolutional Neural Networks for Retinal Fundus Classification and Cutting-Edge Segmentation Models for Retinal Blood Vessels from Fundus Images	Fatema Tuj Johora Faria et.al.	2405.07338	null
2024-05-12	Differentiable Model Scaling using Differentiable Topk	Kai Liu et.al.	2405.07194	null
2024-05-11	A framework of text-dependent speaker verification for chinese numerical string corpus	Litong Zheng et.al.	2405.07029	null
2024-05-10	Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification	Yaoqin Ye et.al.	2405.06468	null
2024-05-10	Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data	Rongyu Zhang et.al.	2405.06413	null
2024-05-10	SaudiBERT: A Large Language Model Pretrained on Saudi Dialect Corpora	Faisal Qarah et.al.	2405.06239	null
2024-05-09	Deep Multi-Task Learning for Malware Image Classification	Ahmed Bensaoud et.al.	2405.05906	null
2024-05-09	Enhancing Suicide Risk Detection on Social Media through Semi-Supervised Deep Label Smoothing	Matthew Squires et.al.	2405.05795	null
2024-05-09	CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks	Nick et.al.	2405.05755	null
2024-05-09	How Quality Affects Deep Neural Networks in Fine-Grained Image Classification	Joseph Smith et.al.	2405.05742	null
2024-05-09	End-to-End Generative Semantic Communication Powered by Shared Semantic Knowledge Base	Shuling Li et.al.	2405.05738	null
2024-05-09	Using Machine Translation to Augment Multilingual Classification	Adam King et.al.	2405.05478	null
2024-05-08	AFEN: Respiratory Disease Classification using Ensemble Learning	Rahul Nadkarni et.al.	2405.05467	null
2024-05-08	XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples	Peiqin Lin et.al.	2405.05116	link
2024-05-08	Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution	Shuo Shao et.al.	2405.04825	null
2024-05-07	Exploring Explainable AI Techniques for Improved Interpretability in Lung and Colon Cancer Classification	Mukaffi Bin Moin et.al.	2405.04610	link
2024-05-07	Pragmatist Intelligence: Where the Principle of Usefulness Can Take ANNs	Antonio Bikić et.al.	2405.04386	null
2024-05-07	Semi-Supervised Disease Classification based on Limited Medical Image Data	Yan Zhang et.al.	2405.04295	null
2024-05-07	DCNN: Dual Cross-current Neural Networks Realized Using An Interactive Deep Learning Discriminator for Fine-grained Objects	Da Fu et.al.	2405.04093	null
2024-05-07	Feature Map Convergence Evaluation for Functional Module	Ludan Zhang et.al.	2405.04041	null
2024-05-07	VMambaCC: A Visual State Space Model for Crowd Counting	Hao-Yuan Ma et.al.	2405.03978	null
2024-05-06	On Adversarial Examples for Text Classification by Perturbing Latent Representations	Korn Sooksatra et.al.	2405.03789	null
2024-05-06	CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification	Sankalp Sinha et.al.	2405.03660	null
2024-05-06	Deep Space Separable Distillation for Lightweight Acoustic Scene Classification	ShuQi Ye et.al.	2405.03567	null
2024-05-06	Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing	Han Liu et.al.	2405.03565	null
2024-05-06	A Lightweight Neural Architecture Search Model for Medical Image Classification	Lunchen Xie et.al.	2405.03462	null
2024-05-06	Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification	Matteo Bianchi et.al.	2405.03301	null
2024-05-06	TED: Accelerate Model Training by Internal Generalization	Jinying Xiao et.al.	2405.03228	null
2024-05-06	Advancing Multimodal Medical Capabilities of Gemini	Lin Yang et.al.	2405.03162	null
2024-05-05	A scoping review of using Large Language Models (LLMs) to investigate Electronic Health Records (EHRs)	Lingyao Li et.al.	2405.03066	null
2024-05-05	Parameter-Efficient Fine-Tuning with Discrete Fourier Transform	Ziqi Gao et.al.	2405.03003	null
2024-05-04	MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning	Vishal Nedungadi et.al.	2405.02771	null
2024-05-03	Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification	Siqi Yin et.al.	2405.02155	null
2024-05-03	The Trade-off between Performance, Efficiency, and Fairness in Adapter Modules for Text Classification	Minh Duc Bui et.al.	2405.02010	null
2024-05-03	Which Identities Are Mobilized: Towards an automated detection of social group appeals in political texts	Felicia Riethmüller et.al.	2405.01904	null
2024-05-02	PVF (Parameter Vulnerability Factor): A Quantitative Metric Measuring AI Vulnerability and Resilience Against Parameter Corruptions	Xun Jiao et.al.	2405.01741	null
2024-05-02	Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey	Guoping Xu et.al.	2405.01725	link
2024-05-02	SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients	Tushar Verma et.al.	2405.01699	null
2024-05-02	Explainable AI (XAI) in Image Segmentation in Medicine, Industry, and Beyond: A Survey	Rokas Gipiškis et.al.	2405.01636	null
2024-05-02	Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models	Nishad Singhi et.al.	2405.01531	null
2024-05-03	Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks	Mikkel Jordahn et.al.	2405.01196	null
2024-05-02	Uncertainty-aware self-training with expectation maximization basis transformation	Zijia Wang et.al.	2405.01175	null
2024-05-02	Transformers Fusion across Disjoint Samples for Hyperspectral Image Classification	Muhammad Ahmad et.al.	2405.01095	null
2024-05-02	Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation	Tianyi Chen et.al.	2405.01041	null
2024-05-02	Benchmarking Representations for Speech, Music, and Acoustic Events	Moreno La Quatra et.al.	2405.00934	link
2024-05-01	Digital-analog quantum convolutional neural networks for image classification	Anton Simen et.al.	2405.00548	null
2024-05-03	BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine	Mingchen Li et.al.	2405.00465	null
2024-05-01	Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol	Konstantinos Apostolidis et.al.	2405.00384	null
2024-05-01	Data Augmentation Policy Search for Long-Term Forecasting	Liran Nochumsohn et.al.	2405.00319	null
2024-04-30	Let's Focus: Focused Backdoor Attack against Federated Transfer Learning	Marco Arazzi et.al.	2404.19420	null
2024-04-30	Large Language Model Informed Patent Image Retrieval	Hao-Cheng Lo et.al.	2404.19360	null
2024-04-30	Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair	Jeonghoon Park et.al.	2404.19250	null
2024-04-29	Spectral-Spatial Mamba for Hyperspectral Image Classification	Lingbo Huang et.al.	2404.18401	null
2024-04-28	TextGram: Towards a better domain-adaptive pretraining	Sharayu Hiwarkhedkar et.al.	2404.18228	null
2024-04-28	L3Cube-MahaNews: News-based Short Text and Long Document Classification Datasets in Marathi	Saloni Mittal et.al.	2404.18216	link
2024-04-28	S $^2$ Mamba: A Spatial-spectral State Space Model for Hyperspectral Image Classification	Guanchun Wang et.al.	2404.18213	null
2024-04-27	Implicit Generative Prior for Bayesian Neural Networks	Yijia Liu et.al.	2404.18008	link
2024-04-27	Towards Privacy-Preserving Audio Classification Systems	Bhawana Chhaglani et.al.	2404.18002	null
2024-04-27	A Method of Moments Embedding Constraint and its Application to Semi-Supervised Learning	Michael Majurski et.al.	2404.17978	null
2024-04-27	Spatial, Temporal, and Geometric Fusion for Remote Sensing Images	Hessah Albanwan et.al.	2404.17851	null
2024-04-27	Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification	Chao Yi et.al.	2404.17753	link
2024-04-26	SPLICE -- Streamlining Digital Pathology Image Processing	Areej Alsaafin et.al.	2404.17704	null
2024-04-26	SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes	Georgia Baltsou et.al.	2404.17255	null
2024-04-25	Incorporating Lexical and Syntactic Knowledge for Unsupervised Cross-Lingual Transfer	Jianyu Zheng et.al.	2404.16627	link
2024-04-25	IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks	Zitong Huang et.al.	2404.16331	null
2024-04-25	Lacunarity Pooling Layers for Plant Image Classification using Texture Analysis	Akshatha Mohan et.al.	2404.16268	link
2024-04-24	MiMICRI: Towards Domain-centered Counterfactual Explanations of Cardiovascular Image Classification Models	Grace Guo et.al.	2404.16174	null
2024-04-24	MoDE: CLIP Data Experts via Clustering	Jiawei Ma et.al.	2404.16030	link
2024-04-26	A Survey on Visual Mamba	Hanwei Zhang et.al.	2404.15956	null
2024-04-24	Vision Transformer-based Adversarial Domain Adaptation	Yahan Li et.al.	2404.15817	link
2024-04-24	Rethinking Model Prototyping through the MedMNIST+ Dataset Collection	Sebastian Doerrich et.al.	2404.15786	null
2024-04-24	Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning	Zuheng Kang et.al.	2404.15704	null
2024-04-24	Brain Storm Optimization Based Swarm Learning for Diabetic Retinopathy Image Classification	Liang Qu et.al.	2404.15585	null
2024-04-23	An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models	Yangchen Pan et.al.	2404.15518	null
2024-04-23	Deep multi-prototype capsule networks	Saeid Abbassi et.al.	2404.15445	null
2024-04-23	A review of deep learning-based information fusion techniques for multimodal medical image classification	Yihao Li et.al.	2404.15022	null
2024-04-23	Social Media and Artificial Intelligence for Sustainable Cities and Societies: A Water Quality Analysis Use-case	Muhammad Asif Auyb et.al.	2404.14977	null
2024-04-23	Traditional to Transformers: A Survey on Current Trends and Future Prospects for Hyperspectral Image Classification	Muhammad Ahmad et.al.	2404.14955	link
2024-04-23	Pyramid Hierarchical Transformer for Hyperspectral Image Classification	Muhammad Ahmad et.al.	2404.14945	link
2024-04-23	Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification	Muhammad Ahmad et.al.	2404.14944	link
2024-04-23	CoProNN: Concept-based Prototypical Nearest Neighbors for Explaining Vision Models	Teodor Chiaburu et.al.	2404.14830	link
2024-04-22	WangLab at MEDIQA-M3G 2024: Multimodal Medical Answer Generation using Large Language Models	Ronald Xie et.al.	2404.14567	null
2024-04-22	CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective	Wencheng Zhu et.al.	2404.14109	null
2024-04-21	EncodeNet: A Framework for Boosting DNN Accuracy with Entropy-driven Generalized Converting Autoencoder	Hasanul Mahmud et.al.	2404.13770	null
2024-04-21	PEACH: Pretrained-embedding Explanation Across Contextual and Hierarchical Structure	Feiqi Cao et.al.	2404.13645	link
2024-04-21	I2CANSAY:Inter-Class Analogical Augmentation and Intra-Class Significance Analysis for Non-Exemplar Online Task-Free Continual Learning	Songlin Dong et.al.	2404.13576	null
2024-04-21	IMO: Greedy Layer-Wise Sparse Representation Learning for Out-of-Distribution Text Classification with Pre-trained Models	Tao Feng et.al.	2404.13504	null
2024-04-20	Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature Processing	Yuang Liu et.al.	2404.13434	null
2024-04-20	Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge	Khuyagbaatar Batsuren et.al.	2404.13292	link
2024-04-20	3D-Convolution Guided Spectral-Spatial Transformer for Hyperspectral Image Classification	Shyam Varahagiri et.al.	2404.13252	link
2024-04-19	On-board classification of underwater images using hybrid classical-quantum CNN based method	Sreeraj Rajan Warrier et.al.	2404.13130	null
2024-04-19	Next Generation Loss Function for Image Classification	Shakhnaz Akhmedova et.al.	2404.12948	null
2024-04-19	A Hybrid Generative and Discriminative PointNet on Unordered Point Sets	Yang Ye et.al.	2404.12925	null
2024-04-19	Transformer-Based Classification Outcome Prediction for Multimodal Stroke Treatment	Danqing Ma et.al.	2404.12634	null
2024-04-18	When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes	Asaf Yehudai et.al.	2404.12365	null
2024-04-18	Observation, Analysis, and Solution: Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training	Jin Gao et.al.	2404.12210	link
2024-04-18	Concept Induction using LLMs: a user experiment for assessment	Adrita Barua et.al.	2404.11875	null
2024-04-17	Pretraining Billion-scale Geospatial Foundational Models on Frontier	Aristeidis Tsaris et.al.	2404.11706	null
2024-04-17	AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts	Meng Jiang et.al.	2404.11449	null
2024-04-17	Achieving Rotation Invariance in Convolution Operations: Shifting from Data-Driven to Mechanism-Assured	Hanlin Mo et.al.	2404.11309	null
2024-04-17	A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene	Wenbo Zhang et.al.	2404.11249	null
2024-04-17	A Novel ICD Coding Framework Based on Associated and Hierarchical Code Description Distillation	Bin Zhang et.al.	2404.11132	null
2024-04-17	Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification	Pierre Lepagnol et.al.	2404.11122	null
2024-04-18	Supervised Contrastive Vision Transformer for Breast Histopathological Image Classification	Mohammad Shiri et.al.	2404.11052	null
2024-04-17	InfoMatch: Entropy Neural Estimation for Semi-Supervised Image Classification	Qi Han et.al.	2404.11003	link
2024-04-16	Incubating Text Classifiers Following User Instruction with Nothing but LLM	Letian Peng et.al.	2404.10877	null
2024-04-16	Vocabulary-free Image Classification and Semantic Segmentation	Alessandro Conti et.al.	2404.10864	link
2024-04-16	Assessing The Impact of CNN Auto Encoder-Based Image Denoising on Image Classification Tasks	Mohsen Hami et.al.	2404.10664	null
2024-04-16	Tree Bandits for Generative Bayes	Sean O'Hagan et.al.	2404.10436	null
2024-04-16	AudioProtoPNet: An interpretable deep learning model for bird sound classification	René Heinrich et.al.	2404.10420	null
2024-04-16	Lighter, Better, Faster Multi-Source Domain Adaptation with Gaussian Mixture Models and Optimal Transport	Eduardo Fernandes Montesuma et.al.	2404.10261	null
2024-04-15	Distributed Federated Learning-Based Deep Learning Model for Privacy MRI Brain Tumor Detection	Lisang Zhou et.al.	2404.10026	null
2024-04-15	Interaction as Explanation: A User Interaction-based Method for Explaining Image Classification Models	Hyeonggeun Yun et.al.	2404.09828	null
2024-04-15	Quantization of Large Language Models with an Overdetermined Basis	Daniil Merkulov et.al.	2404.09737	null
2024-04-15	Pseudo-label Learning with Calibrated Confidence Using an Energy-based Model	Masahito Toba et.al.	2404.09585	null
2024-04-14	Breast Cancer Image Classification Method Based on Deep Transfer Learning	Weimin Wang et.al.	2404.09226	null
2024-04-14	Coreset Selection for Object Detection	Hojun Lee et.al.	2404.09161	null
2024-04-13	Exploring Explainability in Video Action Recognition	Avinab Saha et.al.	2404.09067	null
2024-04-13	Fast Fishing: Approximating BAIT for Efficient and Scalable Deep Active Image Classification	Denis Huseljic et.al.	2404.08981	link
2024-04-13	PM2: A New Prompting Multi-modal Model Paradigm for Few-shot Medical Image Classification	Zhenwei Wang et.al.	2404.08915	null
2024-04-12	VertAttack: Taking advantage of Text Classifiers' horizontal vision	Jonathan Rusert et.al.	2404.08538	null
2024-04-12	SpectralMamba: Efficient Mamba for Hyperspectral Image Classification	Jing Yao et.al.	2404.08489	null
2024-04-12	OTTER: Improving Zero-Shot Classification via Optimal Transport	Changho Shin et.al.	2404.08461	null
2024-04-12	A Survey of Neural Network Robustness Assessment in Image Recognition	Jie Wang et.al.	2404.08285	null
2024-04-12	Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example	MingXuan Xiao et.al.	2404.08279	null
2024-04-11	HGRN2: Gated Linear RNNs with State Expansion	Zhen Qin et.al.	2404.07904	link
2024-04-11	Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification	Ricardo Pereira et.al.	2404.07739	null
2024-04-11	Contrastive-Based Deep Embeddings for Label Noise-Resilient Histopathology Image Classification	Lucas Dedieu et.al.	2404.07605	link
2024-04-11	Learning to Classify New Foods Incrementally Via Compressed Exemplars	Justin Yang et.al.	2404.07507	null
2024-04-11	Interactive Prompt Debugging with Sequence Salience	Ian Tenney et.al.	2404.07498	null
2024-04-11	Privacy preserving layer partitioning for Deep Neural Network models	Kishore Rajasekar et.al.	2404.07437	null
2024-04-11	CopilotCAD: Empowering Radiologists with Report Completion Models and Quantitative Evidence from Medical Image Foundation Models	Sheng Wang et.al.	2404.07424	null
2024-04-11	Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling	Sourajit Saha et.al.	2404.07410	null
2024-04-10	Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations	Ofir Shifman et.al.	2404.07153	null
2024-04-10	Learning of deep convolutional network image classifiers via stochastic gradient descent and over-parametrization	Michael Kohler et.al.	2404.07128	null
2024-04-10	Accelerating Cardiac MRI Reconstruction with CMRatt: An Attention-Driven Approach	Anam Hashmi et.al.	2404.06941	null
2024-04-10	Multi-Label Continual Learning for the Medical Domain: A Novel Benchmark	Marina Ceccon et.al.	2404.06859	null
2024-04-10	Neural Optimizer Equation, Decay Function, and Learning Rate Schedule Joint Evolution	Brandon Morgan et.al.	2404.06679	null
2024-04-09	Variational Stochastic Gradient Descent for Deep Neural Networks	Haotian Chen et.al.	2404.06549	link
2024-04-09	On adversarial training and the 1 Nearest Neighbor classifier	Amir Hagai et.al.	2404.06313	link
2024-04-09	Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models	David Kurzendörfer et.al.	2404.06309	link
2024-04-09	Counterfactual Reasoning for Multi-Label Image Classification via Patching-Based Training	Ming-Kun Xie et.al.	2404.06287	null
2024-04-09	*Quantum Circuit $C^$ -algebra Net**	Yuka Hashimoto et.al.	2404.06218	null
2024-04-09	VI-OOD: A Unified Representation Learning Framework for Textual Out-of-distribution Detection	Li-Ming Zhan et.al.	2404.06217	link
2024-04-09	Symmetry-guided gradient descent for quantum neural networks	Kaiming Bian et.al.	2404.06108	null
2024-04-10	Using Few-Shot Learning to Classify Primary Lung Cancer and Other Malignancy with Lung Metastasis in Cytological Imaging via Endobronchial Ultrasound Procedures	Ching-Kai Lin et.al.	2404.06080	null
2024-04-08	Neural Cellular Automata for Lightweight, Robust and Explainable Classification of White Blood Cell Images	Michael Deutges et.al.	2404.05584	null
2024-04-08	On the Convergence of Continual Learning with Adaptive Methods	Seungyub Han et.al.	2404.05555	null
2024-04-08	Multi-Task Learning for Features Extraction in Financial Annual Reports	Syrielle Montariol et.al.	2404.05281	link
2024-04-08	Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy	Giang Nguyen et.al.	2404.05238	null
2024-04-08	iVPT: Improving Task-relevant Information Sharing in Visual Prompt Tuning by Cross-layer Dynamic Connection	Nan Zhou et.al.	2404.05207	null
2024-04-08	Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods	Roopkatha Dey et.al.	2404.05159	null
2024-04-07	PairAug: What Can Augmented Image-Text Pairs Do for Radiology?	Yutong Xie et.al.	2404.04960	link
2024-04-07	GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets	Dongjing Shan et.al.	2404.04924	null
2024-04-06	Focused Active Learning for Histopathological Image Classification	Arne Schmidt et.al.	2404.04663	null
2024-04-06	Trustless Audits without Revealing Data or Models	Suppakit Waiwitlikhit et.al.	2404.04500	null
2024-04-05	Evaluating Adversarial Robustness: A Comparison Of FGSM, Carlini-Wagner Attacks, And The Role of Distillation as Defense Mechanism	Trilokesh Ranjan Sarkar et.al.	2404.04245	null
2024-04-05	Noisy Label Processing for Classification: A Survey	Mengting Li et.al.	2404.04159	null
2024-04-05	Learning Correlation Structures for Vision Transformers	Manjin Kim et.al.	2404.03924	null
2024-04-05	LiDAR-Guided Cross-Attention Fusion for Hyperspectral Band Selection and Image Classification	Judy X Yang et.al.	2404.03883	null
2024-04-04	Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning	Spyridon Chavlis et.al.	2404.03708	null
2024-04-05	A Methodology to Study the Impact of Spiking Neural Network Parameters considering Event-Based Automotive Data	Iqra Bano et.al.	2404.03493	null
2024-04-04	Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks	Lei Zhang et.al.	2404.03340	null
2024-04-04	Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning	Andrei Semenov et.al.	2404.03323	link
2024-04-04	FACTUAL: A Novel Framework for Contrastive Learning Based Robust SAR Image Classification	Xu Wang et.al.	2404.03225	null
2024-04-03	Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales	Lucas E. Resck et.al.	2404.03098	link
2024-04-03	Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds	Kamalika Chaudhuri et.al.	2404.02866	link
2024-04-03	FPT: Feature Prompt Tuning for Few-shot Readability Assessment	Ziyang Wang et.al.	2404.02772	link
2024-04-03	Adversarial Attacks and Dimensionality in Text Classifiers	Nandish Chattopadhyay et.al.	2404.02660	null
2024-04-04	Non-negative Subspace Feature Representation for Few-shot Learning in Medical Imaging	Keqiang Fan et.al.	2404.02656	null
2024-04-03	Adaptive Cross-lingual Text Classification through In-Context One-Shot Demonstrations	Emilio Villa-Cueva et.al.	2404.02452	link
2024-04-03	A Novel Approach to Breast Cancer Histopathological Image Classification Using Cross-Colour Space Feature Fusion and Quantum-Classical Stack Ensemble Method	Sambit Mallick et.al.	2404.02447	null
2024-04-03	Enhancing Low-Resource LLMs Classification with PEFT and Synthetic Data	Parth Patwa et.al.	2404.02422	null
2024-04-02	Smooth Deep Saliency	Rudolf Herdt et.al.	2404.02282	null
2024-04-02	Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models	Matthew Kowal et.al.	2404.02233	null
2024-04-02	ImageNot: A contrast with ImageNet preserves model rankings	Olawale Salaudeen et.al.	2404.02112	null
2024-04-02	Explainability in JupyterLab and Beyond: Interactive XAI Systems for Integrated and Collaborative Workflows	Grace Guo et.al.	2404.02081	null
2024-04-02	Ukrainian Texts Classification: Exploration of Cross-lingual Knowledge Transfer Approaches	Daryna Dementieva et.al.	2404.02043	null
2024-04-02	CAM-Based Methods Can See through Walls	Magamed Taimeskhanov et.al.	2404.01964	link
2024-04-02	Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss	Jaeha Kim et.al.	2404.01692	null
2024-04-02	A Universal Knowledge Embedded Contrastive Learning Framework for Hyperspectral Image Classification	Quanwei Liu et.al.	2404.01673	null
2024-04-01	Can Biases in ImageNet Models Explain Generalization?	Paul Gavrikov et.al.	2404.01509	link
2024-04-01	Parallel Proportional Fusion of Spiking Quantum Neural Network for Optimizing Image Classification	Zuyu Xu et.al.	2404.01359	null
2024-04-01	Bridging Remote Sensors with Multisensor Geospatial Foundation Models	Boran Han et.al.	2404.01260	link
2024-04-01	Diagnosis of Skin Cancer Using VGG16 and VGG19 Based Transfer Learning Models	Amir Faghihi et.al.	2404.01160	null
2024-03-29	Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations	Jaisidh Singh et.al.	2403.20312	link
2024-03-29	MCNet: A crowd denstity estimation network based on integrating multiscale attention module	Qiang Guo et.al.	2403.20173	null
2024-03-29	Segmentation, Classification and Interpretation of Breast Cancer Medical Images using Human-in-the-Loop Machine Learning	David Vázquez-Lema et.al.	2403.20112	null
2024-03-29	Adverb Is the Key: Simple Text Data Augmentation with Adverb Deletion	Juhwan Choi et.al.	2403.20015	null
2024-03-29	Diverse Feature Learning by Self-distillation and Reset	Sejik Park et.al.	2403.19941	null
2024-03-29	Heterogeneous Network Based Contrastive Learning Method for PolSAR Land Cover Classification	Jianfeng Cai et.al.	2403.19902	link
2024-03-28	X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization	Anna Kukleva et.al.	2403.19811	link
2024-03-28	RSMamba: Remote Sensing Image Classification with State Space Model	Keyan Chen et.al.	2403.19654	link
2024-03-28	Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model	Zhicai Wang et.al.	2403.19600	link
2024-03-28	The Bad Batches: Enhancing Self-Supervised Learning in Image Classification Through Representative Batch Curation	Ozgu Goksu et.al.	2403.19579	null
2024-03-28	Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach	Wei Dong et.al.	2403.19067	link
2024-03-27	Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data	Yuting Guo et.al.	2403.19031	null
2024-03-27	Robustness and Visual Explanation for Black Box Image, Video, and ECG Signal Classification with Reinforcement Learning	Soumyendu Sarkar et.al.	2403.18985	null
2024-03-27	The Impact of Uniform Inputs on Activation Sparsity and Energy-Latency Attacks in Computer Vision	Andreas Müller et.al.	2403.18587	link
2024-03-27	Uncertainty-Aware SAR ATR: Defending Against Adversarial Attacks via Bayesian Neural Networks	Tian Ye et.al.	2403.18318	null
2024-03-27	Multi-scale Unified Network for Image Classification	Wenzhuo Liu et.al.	2403.18294	null
2024-03-26	The Need for Speed: Pruning Transformers with One Recipe	Samir Khaki et.al.	2403.17921	link
2024-03-26	Compressed Multi-task embeddings for Data-Efficient Downstream training and inference in Earth Observation	Carlos Gomes et.al.	2403.17886	null
2024-03-26	PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition	Chenhongyi Yang et.al.	2403.17695	link
2024-03-26	Language Models for Text Classification: Is In-Context Learning Enough?	Aleksandra Edwards et.al.	2403.17661	null
2024-03-26	Boosting Few-Shot Learning with Disentangled Self-Supervised Learning and Meta-Learning for Medical Image Classification	Eva Pachetti et.al.	2403.17530	null
2024-03-26	HILL: Hierarchy-aware Information Lossless Contrastive Learning for Hierarchical Text Classification	He Zhu et.al.	2403.17307	link
2024-03-25	Histogram Layers for Neural Engineered Features	Joshua Peeples et.al.	2403.17176	link
2024-03-25	Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships	Rangel Daroya et.al.	2403.17173	link
2024-03-25	CipherFormer: Efficient Transformer Private Inference with Low Round Complexity	Weize Wang et.al.	2403.16860	null
2024-03-25	Assessing the Performance of Deep Learning for Automated Gleason Grading in Prostate Cancer	Dominik Müller et.al.	2403.16695	null
2024-03-25	DeepGleason: a System for Automated Gleason Grading of Prostate Cancer using Deep Neural Networks	Dominik Müller et.al.	2403.16678	link
2024-03-25	LARA: Linguistic-Adaptive Retrieval-Augmented LLMs for Multi-Turn Intent Classification	Liu Junhua et.al.	2403.16504	null
2024-03-24	On machine learning analysis of atomic force microscopy images for image classification, sample surface recognition	Igor Sokolov et.al.	2403.16230	null
2024-03-24	Leveraging Deep Learning and Xception Architecture for High-Accuracy MRI Classification in Alzheimer Diagnosis	Shaojie Li et.al.	2403.16212	null
2024-03-24	Multi-Task Learning with Multi-Task Optimization	Lu Bai et.al.	2403.16162	null
2024-03-24	CBGT-Net: A Neuromimetic Architecture for Robust Classification of Streaming Data	Shreya Sharma et.al.	2403.15974	link
2024-03-23	A Deep Learning Architectures for Kidney Disease Classification	Muhammad Shoaib Farooq et.al.	2403.15895	null
2024-03-23	VLUE: A New Benchmark and Multi-task Knowledge Transfer Learning for Vietnamese Natural Language Understanding	Phong Nguyen-Thuan Do et.al.	2403.15882	null
2024-03-23	VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Human Annotation-Free Pathological Image Classification	Lanfeng Zhong et.al.	2403.15836	null
2024-03-22	Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion	Sofia Casarin et.al.	2403.15194	null
2024-03-22	Image Classification with Rotation-Invariant Variational Quantum Circuits	Paul San Sebastian et.al.	2403.15031	null
2024-03-22	Extracting Human Attention through Crowdsourced Patch Labeling	Minsuk Chang et.al.	2403.15013	null
2024-03-22	Clean-image Backdoor Attacks	Dazhong Rong et.al.	2403.15010	null
2024-03-22	ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding	Novendra Setyawan et.al.	2403.15004	null
2024-03-22	MasonTigers at SemEval-2024 Task 8: Performance Analysis of Transformer-based Models on Machine-Generated Text Detection	Sadiya Sayara Chowdhury Puspo et.al.	2403.14989	null
2024-03-21	Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention	Ethan N. Evans et.al.	2403.14753	null
2024-03-21	Estimating Physical Information Consistency of Channel Data Augmentation for Remote Sensing Images	Tom Burgert et.al.	2403.14547	null
2024-03-21	Multi-Level Explanations for Generative Language Models	Lucas Monteiro Paes et.al.	2403.14459	null
2024-03-21	Tensor network compressibility of convolutional models	Sukhbinder Singh et.al.	2403.14379	null
2024-03-21	LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding	Masato Fujitake et.al.	2403.14252	null
2024-03-21	Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations	Xun Lin et.al.	2403.14250	null
2024-03-21	Improving Image Classification Accuracy through Complementary Intra-Class and Inter-Class Mixup	Ye Xu et.al.	2403.14137	link
2024-03-20	Bridge the Modality and Capacity Gaps in Vision-Language Model Selection	Chao Yi et.al.	2403.13797	null
2024-03-20	Leveraging feature communication in federated learning for remote sensing image classification	Anh-Kiet Duong et.al.	2403.13575	null
2024-03-20	MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining	Di Wang et.al.	2403.13430	link
2024-03-20	Building Optimal Neural Architectures using Interpretable Knowledge	Keith G. Mills et.al.	2403.13293	link
2024-03-19	LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images	Jing Zhang et.al.	2403.13171	null
2024-03-19	Improved EATFormer: A Vision Transformer for Medical Image Classification	Yulong Shisu et.al.	2403.13167	null
2024-03-19	SIFT-DBT: Self-supervised Initialization and Fine-Tuning for Imbalanced Digital Breast Tomosynthesis Image Classification	Yuexi Du et.al.	2403.13148	link
2024-03-19	Using evolutionary computation to optimize task performance of unclocked, recurrent Boolean circuits in FPGAs	Raphael Norman-Tenazas et.al.	2403.13105	null
2024-03-19	Investigating Text Shortening Strategy in BERT: Truncation vs Summarization	Mirza Alim Mutasodirin et.al.	2403.12799	link
2024-03-18	Posterior Uncertainty Quantification in Neural Networks using Data Augmentation	Luhuan Wu et.al.	2403.12729	null
2024-03-19	SEVEN: Pruning Transformer Model by Reserving Sentinels	Jinying Xiao et.al.	2403.12688	link
2024-03-19	Simple Hack for Transformers against Heavy Long-Text Classification on a Time- and Memory-Limited GPU Service	Mirza Alim Mutasodirin et.al.	2403.12563	null
2024-03-19	Prompt-Guided Adaptive Model Transformation for Whole Slide Image Classification	Yi Lin et.al.	2403.12537	null
2024-03-19	CrossTune: Black-Box Few-Shot Classification with Label Enhancement	Danqing Luo et.al.	2403.12468	null
2024-03-18	Generalizing deep learning models for medical image classification	Matta Sarah et.al.	2403.12167	null
2024-03-19	Leveraging Spatial and Semantic Feature Extraction for Skin Cancer Diagnosis with Capsule Networks and Graph Neural Networks	K. P. Santoso et.al.	2403.12009	null
2024-03-18	High-energy physics image classification: A Survey of Jet Applications	Hamza Kheddar et.al.	2403.11934	null
2024-03-18	Better (pseudo-)labels for semi-supervised instance segmentation	François Porcher et.al.	2403.11675	null
2024-03-18	Continual Forgetting for Pre-trained Vision Models	Hongbo Zhao et.al.	2403.11530	link
2024-03-18	Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting	Mingkui Tan et.al.	2403.11491	null
2024-03-17	Potential of Domain Adaptation in Machine Learning in Ecology and Hydrology to Improve Model Extrapolability	Haiyang Shi et.al.	2403.11331	null
2024-03-17	A Modified Word Saliency-Based Adversarial Attack on Text Classification Models	Hetvi Waghela et.al.	2403.11297	null
2024-03-17	Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation	Silvia Corbara et.al.	2403.11265	null
2024-03-17	Multiple Teachers-Meticulous Student: A Domain Adaptive Meta-Knowledge Distillation Model for Medical Image Classification	Shahabedin Nabavi et.al.	2403.11226	null
2024-03-16	Forward Learning of Graph Neural Networks	Namyong Park et.al.	2403.11004	null
2024-03-16	Understanding Robustness of Visual State Space Models for Image Classification	Chengbin Du et.al.	2403.10935	null
2024-03-16	Automatic location detection based on deep learning	Anjali Karangiya et.al.	2403.10912	null
2024-03-14	Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models	Akhil Kedia et.al.	2403.09635	link
2024-03-14	XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization	Yequan Bie et.al.	2403.09410	null
2024-03-14	ConDiSR: Contrastive Disentanglement and Style Regularization for Single Domain Generalization	Aleksandr Matsun et.al.	2403.09400	null
2024-03-14	A Hierarchical Fused Quantum Fuzzy Neural Network for Image Classification	Sheng-Yao Wu et.al.	2403.09318	null
2024-03-14	CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification	Yiming Ma et.al.	2403.09281	null
2024-03-14	Are Vision Language Models Texture or Shape Biased and Can We Steer Them?	Paul Gavrikov et.al.	2403.09193	null
2024-03-14	Randomized Principal Component Analysis for Hyperspectral Image Classification	Mustafa Ustuner et.al.	2403.09117	null
2024-03-14	CardioCaps: Attention-based Capsule Network for Class-Imbalanced Echocardiogram Classification	Hyunkyung Han et.al.	2403.09108	link
2024-03-14	The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models?	Qinyu Zhao et.al.	2403.09037	link
2024-03-13	PathM3: A Multimodal Multi-Task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning	Qifeng Zhou et.al.	2403.08967	null
2024-03-13	DAM: Dynamic Adapter Merging for Continual Video QA Learning	Feng Cheng et.al.	2403.08755	link
2024-03-13	Leveraging Compressed Frame Sizes For Ultra-Fast Video Classification	Yuxing Han et.al.	2403.08580	null
2024-03-13	HOLMES: HOLonym-MEronym based Semantic inspection for Convolutional Image Classifiers	Francesco Dibitonto et.al.	2403.08536	link
2024-03-13	Pig aggression classification using CNN, Transformers and Recurrent Networks	Junior Silva Souza et.al.	2403.08528	null
2024-03-13	Reduced Jeffries-Matusita distance: A Novel Loss Function to Improve Generalization Performance of Deep Classification Models	Mohammad Lashkari et.al.	2403.08408	null
2024-03-13	Iterative Online Image Synthesis via Diffusion Model for Imbalanced Classification	Shuhan Li et.al.	2403.08407	null
2024-03-13	Advancing Security in AI Systems: A Novel Approach to Detecting Backdoors in Deep Neural Networks	Khondoker Murad Hossain et.al.	2403.08208	null
2024-03-13	Multiscale Low-Frequency Memory Network for Improved Feature Extraction in Convolutional Neural Networks	Fuzhi Wu et.al.	2403.08157	link
2024-03-12	Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection	Tharindu Kumarage et.al.	2403.08035	null
2024-03-13	Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion	Dongyang Li et.al.	2403.07721	link
2024-03-12	FPT: Fine-grained Prompt Tuning for Parameter and Memory Efficient Fine Tuning in High-resolution Medical Image Classification	Yijin Huang et.al.	2403.07576	null
2024-03-12	Backdoor Attack with Mode Mixture Latent Modification	Hongwei Zhang et.al.	2403.07463	null
2024-03-12	In-context learning enables multimodal large language models to classify cancer pathology images	Dyke Ferber et.al.	2403.07407	null
2024-03-12	Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning	Mark D. McDonnell et.al.	2403.07356	null
2024-03-12	How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance	Hongkang Li et.al.	2403.07310	null
2024-03-12	A Bayesian Approach to OOD Robustness in Image Classification	Prakhar Kaushik et.al.	2403.07277	null
2024-03-11	LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations	Mohammad Alkhalefi et.al.	2403.06813	null
2024-03-11	Dynamic Perturbation-Adaptive Adversarial Training on Medical Image Classification	Shuai Li et.al.	2403.06798	null
2024-03-11	Leveraging Internal Representations of Model for Magnetic Image Classification	Adarsh N L et.al.	2403.06797	null
2024-03-11	Shortcut Learning in Medical Image Segmentation	Manxi Lin et.al.	2403.06748	null
2024-03-11	Active Generation for Image Classification	Tao Huang et.al.	2403.06517	null
2024-03-11	Evolving Knowledge Distillation with Large Language Models and Active Learning	Chengyuan Liu et.al.	2403.06414	null
2024-03-11	'One size doesn't fit all': Learning how many Examples to use for In-Context Learning for Improved Text Classification	Manish Chandra et.al.	2403.06402	null
2024-03-10	Probing Image Compression For Class-Incremental Learning	Justin Yang et.al.	2403.06288	null
2024-03-10	Bayesian Random Semantic Data Augmentation for Medical Image Classification	Yaoyao Zhu et.al.	2403.06138	link
2024-03-10	Universal Debiased Editing for Fair Medical Image Classification	Ruinan Jin et.al.	2403.06104	null
2024-03-08	Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets	Lorenzo Brigato et.al.	2403.05532	null
2024-03-08	Generalized Correspondence Matching via Flexible Hierarchical Refinement and Patch Descriptor Distillation	Yu Han et.al.	2403.05388	null
2024-03-08	The Impact of Quantization on the Robustness of Transformer-based Text Classifiers	Seyed Parsa Neshaei et.al.	2403.05365	null
2024-03-08	Multiple Instance Learning with random sampling for Whole Slide Image Classification	H. Keshvarikhojasteh et.al.	2403.05351	null
2024-03-08	Learning Expressive And Generalizable Motion Features For Face Forgery Detection	Jingyi Zhang et.al.	2403.05172	null
2024-03-08	Defending Against Unforeseen Failure Modes with Latent Adversarial Training	Stephen Casper et.al.	2403.05030	link
2024-03-07	Fooling Neural Networks for Motion Forecasting via Adversarial Attacks	Edgar Medina et.al.	2403.04954	null
2024-03-07	T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers	Mariano V. Ntrougkas et.al.	2403.04523	null
2024-03-07	Source Matters: Source Dataset Impact on Model Robustness in Medical Imaging	Dovile Juodelyte et.al.	2403.04484	link
2024-03-07	Advancing Biomedical Text Mining with Community Challenges	Hui Zong et.al.	2403.04261	null
2024-03-07	Scalable On-Chip Optical Linear Processing Unit Using a Single Thin-Film Lithium Niobate Ring Modulator	Zhaoang Deng et.al.	2403.04216	null
2024-03-07	Scalable and Robust Transformer Decoders for Interpretable Image Classification with Foundation Models	Evelyn Mannix et.al.	2403.04125	null
2024-03-07	Privacy-preserving Fine-tuning of Large Language Models through Flatness	Tiejin Chen et.al.	2403.04124	null
2024-03-06	MedMamba: Vision Mamba for Medical Image Classification	Yubiao Yue et.al.	2403.03849	link
2024-03-06	On the Effectiveness of Distillation in Mitigating Backdoors in Pre-trained Encoder	Tingxu Han et.al.	2403.03846	link
2024-03-06	RADIA -- Radio Advertisement Detection with Intelligent Analytics	Jorge Álvarez et.al.	2403.03538	null
2024-03-06	Inverse-Free Fast Natural Gradient Descent Method for Deep Learning	Xinwei Ou et.al.	2403.03473	null
2024-03-06	Sparse Spiking Neural Network: Exploiting Heterogeneity in Timescales for Pruning Recurrent SNN	Biswadeep Chakraborty et.al.	2403.03409	null
2024-03-05	RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules	Miaomiao Li et.al.	2403.02932	link
2024-03-05	Demonstrating Mutual Reinforcement Effect through Information Flow	Chengguang Gan et.al.	2403.02902	null
2024-03-05	Quantum Mixed-State Self-Attention Network	Fu Chen et.al.	2403.02871	null
2024-03-05	SOFIM: Stochastic Optimization Using Regularized Fisher Information Matrix	Gayathri C et.al.	2403.02833	null
2024-03-05	SGD with Partial Hessian for Deep Neural Networks Optimization	Ying Sun et.al.	2403.02681	link
2024-03-05	G-EvoNAS: Evolutionary Neural Architecture Search Based on Network Growth	Juan Zou et.al.	2403.02667	null
2024-03-05	Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad	Sayantan Choudhury et.al.	2403.02648	link
2024-03-05	Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use	Imad Eddine Toubal et.al.	2403.02626	null
2024-03-04	When do Convolutional Neural Networks Stop Learning?	Sahan Ahmad et.al.	2403.02473	link
2024-03-04	NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function	Abdullah Nazhat Abdullah et.al.	2403.02411	link
2024-03-02	Can a Confident Prior Replace a Cold Posterior?	Martin Marek et.al.	2403.01272	link
2024-03-02	Leveraging Self-Supervised Learning for Scene Recognition in Child Sexual Abuse Imagery	Pedro H. V. Valois et.al.	2403.01183	null
2024-03-02	Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation	Lian Xu et.al.	2403.01156	null
2024-03-02	ELA: Efficient Local Attention for Deep Convolutional Neural Networks	Wei Xu et.al.	2403.01123	null
2024-03-01	Margin Discrepancy-based Adversarial Training for Multi-Domain Text Classification	Yuan Wu et.al.	2403.00888	null
2024-03-01	Text classification of column headers with a controlled vocabulary: leveraging LLMs for metadata enrichment	Margherita Martorana et.al.	2403.00884	null
2024-03-01	SURE: SUrvey REcipes for building reliable and robust deep networks	Yuting Li et.al.	2403.00543	link
2024-03-01	Invariant Test-Time Adaptation for Vision-Language Model Generalization	Huan Ma et.al.	2403.00376	null
2024-02-29	TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision	Yunyi Zhang et.al.	2403.00165	null
2024-02-29	Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance	Huakun Shen et.al.	2402.19401	null
2024-02-29	Stitching Gaps: Fusing Situated Perceptual Knowledge with Vision Transformers for High-Level Image Classification	Delfina Sol Martinez Pandiani et.al.	2402.19339	null
2024-02-29	Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction	Hao Li et.al.	2402.19326	null
2024-02-29	Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation	Fahimeh Hosseini Noohdani et.al.	2402.18919	null
2024-02-29	Utilizing Local Hierarchy with Adversarial Training for Hierarchical Text Classification	Zihan Wang et.al.	2402.18825	link
2024-02-28	Comparing Importance Sampling Based Methods for Mitigating the Effect of Class Imbalance	Indu Panigrahi et.al.	2402.18742	link
2024-02-28	Deep Neural Network Models Trained With A Fixed Random Classifier Transfer Better Across Domains	Hafiz Tiomoko Ali et.al.	2402.18614	null
2024-02-28	Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling	Mahdi Karami et.al.	2402.18508	null
2024-02-28	Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization	Deng Li et.al.	2402.18447	null
2024-02-29	A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation	Francesco Barbato et.al.	2402.18402	null
2024-02-28	A Multimodal Handover Failure Detection Dataset and Baselines	Santosh Thoduka et.al.	2402.18319	null
2024-02-28	Classes Are Not Equal: An Empirical Study on Image Recognition Fairness	Jiequan Cui et.al.	2402.18133	null
2024-02-27	Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers	Yiwei Lu et.al.	2402.17710	null
2024-02-27	SDF2Net: Shallow to Deep Feature Fusion Network for PolSAR Image Classification	Mohammed Q. Alkhatib et.al.	2402.17672	link
2024-02-27	Predict the Next Word:	Evgenia Ilia et.al.	2402.17527	null
2024-02-27	Scaling Supervised Local Learning with Augmented Auxiliary Networks	Chenxiang Ma et.al.	2402.17318	link
2024-02-26	Offline Writer Identification Using Convolutional Neural Network Activation Features	Vincent Christlein et.al.	2402.17029	null

(back to top)

Object Detection

Publish Date	Title	Authors	PDF	Code
2024-05-30	RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection	Fangyi Chen et.al.	2405.19854	null
2024-05-30	Improving Object Detector Training on Synthetic Data by Starting With a Strong Baseline Methodology	Frank A. Ruis et.al.	2405.19822	null
2024-05-30	Towards Unified Multi-granularity Text Detection with Interactive Attention	Xingyu Wan et.al.	2405.19765	null
2024-05-30	Fully Test-Time Adaptation for Monocular 3D Object Detection	Hongbin Lin et.al.	2405.19682	null
2024-05-30	YotoR-You Only Transform One Representation	José Ignacio Díaz Villa et.al.	2405.19629	null
2024-05-29	Enabling Visual Recognition at Radio Frequency	Haowen Lai et.al.	2405.19516	null
2024-05-29	Model Agnostic Defense against Adversarial Patch Attacks on Object Detection in Unmanned Aerial Vehicles	Saurabh Pathak et.al.	2405.19179	null
2024-05-29	RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision	Jinzhong Wang et.al.	2405.18955	null
2024-05-29	SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving	Yiming Cui et.al.	2405.18857	null
2024-05-29	PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram	Sifan Zhou et.al.	2405.18734	null
2024-05-28	A Review and Implementation of Object Detection Models and Optimizations for Real-time Medical Mask Detection during the COVID-19 Pandemic	Ioanna Gogou et.al.	2405.18387	link
2024-05-28	Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?	Yifan Bai et.al.	2405.18361	null
2024-05-28	Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention	Weitai Kang et.al.	2405.18295	null
2024-05-28	DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture	Shentong Mo et.al.	2405.17995	null
2024-05-28	Transformer and Hybrid Deep Learning Based Models for Machine-Generated Text Detection	Teodor-George Marchitan et.al.	2405.17964	null
2024-05-28	Self-supervised Pre-training for Transferable Multi-modal Perception	Xiaohao Xu et.al.	2405.17942	null
2024-05-28	Boosting General Trimap-free Matting in the Real-World Image	Leo Shan Wenzhang Zhou Grace Zhao et.al.	2405.17916	null
2024-05-28	The Binary Quantized Neural Network for Dense Prediction via Specially Designed Upsampling and Attention	Xingyu Ding et.al.	2405.17776	null
2024-05-27	Understanding differences in applying DETR to natural and medical images	Yanqi Xu et.al.	2405.17677	null
2024-05-27	Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection	Shuai Zeng et.al.	2405.17422	link
2024-05-27	Tracking Small Birds by Detection Candidate Region Filtering and Detection History-aware Association	Tingwei Liu et.al.	2405.17323	null
2024-05-27	Enhanced Automotive Radar Collaborative Sensing By Exploiting Constructive Interference	Lifan Xu et.al.	2405.17297	null
2024-05-27	SCaRL- A Synthetic Multi-Modal Dataset for Autonomous Driving	Avinash Nittur Ramesh et.al.	2405.17030	null
2024-05-27	Collective Perception Datasets for Autonomous Driving: A Comprehensive Review	Sven Teufel et.al.	2405.16973	null
2024-05-27	OED: Towards One-stage End-to-End Dynamic Scene Graph Generation	Guan Wang et.al.	2405.16925	link
2024-05-27	ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection	Ziying Song et.al.	2405.16873	null
2024-05-27	A re-calibration method for object detection with multi-modal alignment bias in autonomous driving	Zhihang Song et.al.	2405.16848	null
2024-05-26	A Study on Unsupervised Anomaly Detection and Defect Localization using Generative Model in Ultrasonic Non-Destructive Testing	Yusaku Ando et.al.	2405.16580	null
2024-05-26	AI-Generated Text Detection and Classification Based on BERT Deep Learning Algorithm	Hao Wang et.al.	2405.16422	null
2024-05-24	UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes	Ted Lentsch et.al.	2405.15688	null
2024-05-24	Multimodal Object Detection via Probabilistic a priori Information Integration	Hafsa El Hafyani et.al.	2405.15596	null
2024-05-24	Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection	Fan Liu et.al.	2405.15465	null
2024-05-24	Leveraging knowledge distillation for partial multi-task learning from multiple remote sensing datasets	Hoàng-Ân Lê et.al.	2405.15394	null
2024-05-24	Towards Global Optimal Visual In-Context Learning Prompt Selection	Chengming Xu et.al.	2405.15279	null
2024-05-24	Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection	Yajing Liu et.al.	2405.15225	null
2024-05-24	ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models	Jingyuan Zhu et.al.	2405.15199	null
2024-05-24	MonoDETRNext: Next-generation Accurate and Efficient Monocular 3D Object Detection Method	Pan Liao et.al.	2405.15176	null
2024-05-23	Learning to Detect and Segment Mobile Objects from Unlabeled Videos	Yihong Sun et.al.	2405.14841	null
2024-05-23	Designing A Sustainable Marine Debris Clean-up Framework without Human Labels	Raymond Wang et.al.	2405.14815	null
2024-05-23	Drones Help Drones: A Collaborative Framework for Multi-Drone Object Trajectory Prediction and Beyond	Zhechao Wang et.al.	2405.14674	null
2024-05-23	Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment	Muhammad Sohail Danish et.al.	2405.14497	null
2024-05-23	YOLOv10: Real-Time End-to-End Object Detection	Ao Wang et.al.	2405.14458	link
2024-05-23	Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations	Mohammed Baharoon et.al.	2405.14239	null
2024-05-22	Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation	Mykhailo Uss et.al.	2405.14024	null
2024-05-22	TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System	Diogo Lavado et.al.	2405.13989	null
2024-05-22	Class-Conditional self-reward mechanism for improved Text-to-Image models	Safouane El Ghazouali et.al.	2405.13473	link
2024-05-22	Adaptive Wireless Image Semantic Transmission and Over-The-Air Testing	Jiarun Ding et.al.	2405.13403	null
2024-05-21	BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once	Theodore Zhao et.al.	2405.12971	null
2024-05-21	AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection	Zizhao Chen et.al.	2405.12944	link
2024-05-21	Predicting the Influence of Adverse Weather on Pedestrian Detection with Automotive Radar and Lidar Sensors	Daniel Weihmayr et.al.	2405.12736	null
2024-05-21	Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text	Yafu Li et.al.	2405.12689	null
2024-05-21	Automating Attendance Management in Human Resources: A Design Science Approach Using Computer Vision and Facial Recognition	Bao-Thien Nguyen-Tat et.al.	2405.12633	null
2024-05-21	FFAM: Feature Factorization Activation Map for Explanation of 3D Detectors	Shuai Liu et.al.	2405.12601	link
2024-05-21	Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering	Hiba Maryam et.al.	2405.12533	null
2024-05-21	Active Object Detection with Knowledge Aggregation and Distillation from Large Models	Dejie Yang et.al.	2405.12509	null
2024-05-21	Mutual Information Analysis in Multimodal Learning Systems	Hadi Hadizadeh et.al.	2405.12456	null
2024-05-20	Multi-View Attentive Contextualization for Multi-View 3D Object Detection	Xianpeng Liu et.al.	2405.12200	null
2024-05-20	Bangladeshi Native Vehicle Detection in Wild	Bipin Saha et.al.	2405.12150	link
2024-05-20	Salience-guided Ground Factor for Robust Localization of Delivery Robots in Complex Urban Environments	Jooyong Park et.al.	2405.11855	null
2024-05-20	DATR: Unsupervised Domain Adaptive Detection Transformer with Dataset-Level Adaptation and Prototypical Alignment	Jianhong Han et.al.	2405.11765	link
2024-05-20	Versatile Teacher: A Class-aware Teacher-student Framework for Cross-domain Adaptation	Runou Yang et.al.	2405.11754	link
2024-05-19	FADet: A Multi-sensor 3D Object Detection Network based on Local Featured Attention	Ziang Guo et.al.	2405.11682	link
2024-05-19	SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization	Jialong Guo et.al.	2405.11582	link
2024-05-19	The First Swahili Language Scene Text Detection and Recognition Dataset	Fadila Wendigoundi Douamba et.al.	2405.11437	link
2024-05-18	InfRS: Incremental Few-Shot Object Detection in Remote Sensing Images	Wuzhou Li et.al.	2405.11293	null
2024-05-18	Visible and Clear: Finding Tiny Objects in Difference Map	Bing Cao et.al.	2405.11276	null
2024-05-17	A Versatile Framework for Analyzing Galaxy Image Data by Implanting Human-in-the-loop on a Large Vision Model	Mingxiang Fu et.al.	2405.10890	null
2024-05-17	DeepPavlov at SemEval-2024 Task 8: Leveraging Transfer Learning for Detecting Boundaries of Machine-Generated Texts	Anastasia Voznyuk et.al.	2405.10629	link
2024-05-17	DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection	Zhe Huang et.al.	2405.10577	null
2024-05-16	Drone-type-Set: Drone types detection benchmark for drone detection and tracking	Kholoud AlDosari et.al.	2405.10398	null
2024-05-16	Grounded 3D-LLM with Referent Tokens	Yilun Chen et.al.	2405.10370	null
2024-05-16	Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection	Tianhe Ren et.al.	2405.10300	link
2024-05-16	Towards Task-Compatible Compressible Representations	Anderson de Andrade et.al.	2405.10244	link
2024-05-16	SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network	Zhaoxu Li et.al.	2405.10148	null
2024-05-16	SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection	Mingxuan Liu et.al.	2405.10053	null
2024-05-16	FPDIoU Loss: A Loss Function for Efficient Bounding Box Regression of Rotated Object Detection	Siliang Ma et.al.	2405.09942	null
2024-05-16	Infrared Adversarial Car Stickers	Xiaopei Zhu et.al.	2405.09924	null
2024-05-16	PillarNeXt: Improving the 3D detector by introducing Voxel2Pillar feature encoding and extracting multi-scale features	Xusheng Li et.al.	2405.09828	null
2024-05-16	Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection	Feiran Li et.al.	2405.09782	link
2024-05-15	Synth-to-Real Unsupervised Domain Adaptation for Instance Segmentation	Guo Yachan et.al.	2405.09682	null
2024-05-15	Dynamic Loss Decay based Robust Oriented Object Detection on Remote Sensing Images with Noisy Labels	Guozhang Liu et.al.	2405.09024	null
2024-05-14	CLIP with Quality Captions: A Strong Pretraining for Vision Tasks	Pavan Kumar Anasosalu Vasu et.al.	2405.08911	null
2024-05-14	Open-Vocabulary Object Detection via Neighboring Region Attention Alignment	Sunyuan Qiang et.al.	2405.08593	null
2024-05-14	Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method	Mian Zou et.al.	2405.08487	null
2024-05-14	RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images	Zong-Wei Hong et.al.	2405.08483	link
2024-05-14	Multimodal Collaboration Networks for Geospatial Vehicle Detection in Dense, Occluded, and Large-Scale Events	Xin Wu et.al.	2405.08251	link
2024-05-13	RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors	Liam Dugan et.al.	2405.07940	null
2024-05-13	oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving	Abdul Hannan Khan et.al.	2405.07698	null
2024-05-13	MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders	Xueying Jiang et.al.	2405.07696	null
2024-05-13	Quality-aware Selective Fusion Network for V-D-T Salient Object Detection	Liuxin Bao et.al.	2405.07655	link
2024-05-13	Fast Training Data Acquisition for Object Detection and Segmentation using Black Screen Luminance Keying	Thomas Pöllabauer et.al.	2405.07653	null
2024-05-13	Integrity Monitoring of 3D Object Detection in Automated Driving Systems using Raw Activation Patterns and Spatial Filtering	Hakan Yekta Yatbaz et.al.	2405.07600	null
2024-05-13	Environmental Matching Attack Against Unmanned Aerial Vehicles Object Detection	Dehong Kong et.al.	2405.07595	null
2024-05-13	Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis	Tianci Bi et.al.	2405.07481	null
2024-05-13	Enhancing 3D Object Detection by Using Neural Network with Self-adaptive Thresholding	Houze Liu et.al.	2405.07479	null
2024-05-12	MAML MOT: Multiple Object Tracking based on Meta-Learning	Jiayi Chen et.al.	2405.07272	null
2024-05-10	How to Augment for Atmospheric Turbulence Effects on Thermal Adapted Object Detection Models?	Engin Uzun et.al.	2405.06383	null
2024-05-10	Precise Apple Detection and Localization in Orchards using YOLOv5 for Robotic Harvesting Systems	Jiang Ziyue et.al.	2405.06260	null
2024-05-09	CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks	Nick et.al.	2405.05755	null
2024-05-09	Depth Awakens: A Depth-perceptual Attention Fusion Network for RGB-D Camouflaged Object Detection	Xinran Liua et.al.	2405.05614	null
2024-05-09	The object detection model uses combined extraction with KNN and RF classification	Florentina Tatrin Kurniati et.al.	2405.05551	null
2024-05-08	Reviewing Intelligent Cinematography: AI research for camera-based video production	Adrian Azzarelli et.al.	2405.05039	null
2024-05-07	A Novel Wide-Area Multiobject Detection System with High-Probability Region Searching	Xianlei Long et.al.	2405.04589	null
2024-05-07	DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving	Chen Min et.al.	2405.04390	null
2024-05-07	A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields	Raiyan Rahman et.al.	2405.04305	null
2024-05-07	ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers	Jinke Li et.al.	2405.04299	null
2024-05-07	Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore	Junchao Wu et.al.	2405.04286	null
2024-05-07	Deep Event-based Object Detection in Autonomous Driving: A Survey	Bingquan Zhou et.al.	2405.03995	null
2024-05-06	BadFusion: 2D-Oriented Backdoor Attacks against 3D Object Detection	Saket S. Chaturvedi et.al.	2405.03884	null
2024-05-06	RepVGG-GELAN: Enhanced GELAN with VGG-STYLE ConvNets for Brain Tumour Detection	Thennarasi Balakrishnan et.al.	2405.03541	link
2024-05-06	Low-light Object Detection	Pengpeng Li et.al.	2405.03519	null
2024-05-06	Salient Object Detection From Arbitrary Modalities	Nianchang Huang et.al.	2405.03352	null
2024-05-06	Modality Prompts for Arbitrary Modality Salient Object Detection	Nianchang Huang et.al.	2405.03351	null
2024-05-06	Vietnamese AI Generated Text Detection	Quang-Dan Tran et.al.	2405.03206	null
2024-05-06	PTQ4SAM: Post-Training Quantization for Segment Anything	Chengtao Lv et.al.	2405.03144	link
2024-05-05	Performance Evaluation of Real-Time Object Detection for Electric Scooters	Dong Chen et.al.	2405.03039	link
2024-05-05	SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection	Kassaw Abraham Mulat et.al.	2405.02906	null
2024-05-07	Adaptive Guidance Learning for Camouflaged Object Detection	Zhennan Chen et.al.	2405.02824	null
2024-05-05	PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection	Zhaoqi Leng et.al.	2405.02811	null
2024-05-02	Segmentation-Free Outcome Prediction in Head and Neck Cancer: Deep Learning-based Feature Extraction from Multi-Angle Maximum Intensity Projections (MA-MIPs) of PET Images	Amirhosein Toosi et.al.	2405.01756	null
2024-05-02	PointCompress3D -- A Point Cloud Compression Framework for Roadside LiDARs in Intelligent Transportation Systems	Walter Zimmer et.al.	2405.01750	null
2024-05-02	Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey	Guoping Xu et.al.	2405.01725	link
2024-05-02	SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients	Tushar Verma et.al.	2405.01699	null
2024-05-02	Imagine the Unseen: Occluded Pedestrian Detection via Adversarial Feature Completion	Shanshan Zhang et.al.	2405.01311	null
2024-05-02	Overcoming LLM Challenges using RAG-Driven Precision in Coffee Leaf Disease Remediation	Dr. Selva Kumar S et.al.	2405.01310	null
2024-05-02	Towards Consistent Object Detection via LiDAR-Camera Synergy	Kai Luo et.al.	2405.01258	link
2024-05-02	Federated Learning with Heterogeneous Data Handling for Robust Vehicular Object Detection	Ahmad Khalil et.al.	2405.01108	null
2024-05-01	Grains of Saliency: Optimizing Saliency-based Training of Biometric Attack Detection Models	Colton R. Crum et.al.	2405.00650	null
2024-05-01	Object detection under the linear subspace model with application to cryo-EM images	Amitay Eldar et.al.	2405.00364	null
2024-04-30	Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation	Yunhao Ge et.al.	2404.19752	null
2024-04-30	Quantifying Nematodes through Images: Datasets, Models, and Baselines of Deep Learning	Zhipeng Yuan et.al.	2404.19748	null
2024-04-30	Masked Multi-Query Slot Attention for Unsupervised Object Discovery	Rishav Pramanik et.al.	2404.19654	link
2024-04-30	Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World	Wen Yin et.al.	2404.19417	null
2024-04-30	UniFS: Universal Few-shot Instance Perception with Point Representations	Sheng Jin et.al.	2404.19401	null
2024-04-30	Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection	Zhanwei Zhang et.al.	2404.19384	null
2024-04-30	Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank	Sungjune Park et.al.	2404.19299	null
2024-04-29	MiPa: Mixed Patch Infrared-Visible Modality Agnostic Object Detection	Heitor R. Medeiros et.al.	2404.18849	null
2024-04-29	Leveraging PointNet and PointNet++ for Lyft Point Cloud Classification Challenge	Rajat K. Doshi et.al.	2404.18665	null
2024-04-29	CoSense3D: an Agent-based Efficient Learning Framework for Collective Perception	Yunshuang Yuan et.al.	2404.18617	null
2024-04-29	Assessing Quality Metrics for Neural Reality Gap Input Mitigation in Autonomous Driving Testing	Stefano Carlo Lambertenghi et.al.	2404.18577	null
2024-04-29	Efficient Meta-Learning Enabled Lightweight Multiscale Few-Shot Object Detection in Remote Sensing Images	Wenbin Guan et.al.	2404.18426	null
2024-04-29	Multi-modal Perception Dataset of In-water Objects for Autonomous Surface Vehicles	Mingi Jeong et.al.	2404.18411	null
2024-04-28	FAD-SAR: A Novel Fishing Activity Detection System via Synthetic Aperture Radar Images Based on Deep Learning Method	Yanbing Bai et.al.	2404.18245	null
2024-04-28	RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation	Oded Bialer et.al.	2404.18150	null
2024-04-27	Reliable Student: Addressing Noise in Semi-Supervised 3D Object Detection	Farzad Nozarian et.al.	2404.17910	link
2024-04-27	A Hybrid Approach for Document Layout Analysis in Document images	Tahira Shehzadi et.al.	2404.17888	null
2024-04-26	Inhomogeneous illuminated image enhancement under extremely low visibility condition	Libang Chen et.al.	2404.17503	null
2024-04-26	Cost-Sensitive Uncertainty-Based Failure Recognition for Object Detection	Moussa Kassem Sbeyti et.al.	2404.17427	null
2024-04-26	Enhancing mmWave Radar Point Cloud via Visual-inertial Supervision	Cong Fan et.al.	2404.17229	null
2024-04-26	MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection	Chengpei Xu et.al.	2404.17151	null
2024-04-25	Generating Minimalist Adversarial Perturbations to Test Object-Detection Models: An Adaptive Multi-Metric Evolutionary Search Approach	Cristopher McIntyre-Garcia et.al.	2404.17020	link
2024-04-25	Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection	Mehmet Kerem Turkcan et.al.	2404.16944	link
2024-04-25	Self-Balanced R-CNN for Instance Segmentation	Leonardo Rossi et.al.	2404.16633	link
2024-04-25	Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System	Daniel Dworak et.al.	2404.16548	null
2024-04-25	Commonsense Prototype for Outdoor Unsupervised 3D Object Detection	Hai Wu et.al.	2404.16493	link
2024-04-25	IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks	Zitong Huang et.al.	2404.16331	null
2024-04-25	CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions	Haoyuan Li et.al.	2404.16302	link
2024-04-24	AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models	Zhiqiang Tang et.al.	2404.16233	null
2024-04-24	Observational parameters of Blue Large-Amplitude Pulsators	P. Pietrukowicz et.al.	2404.16089	null
2024-04-24	A Survey on Visual Mamba	Hanwei Zhang et.al.	2404.15956	null
2024-04-24	Steal Now and Attack Later: Evaluating Robustness of Object Detection against Black-box Adversarial Attacks	Erh-Chung Chen et.al.	2404.15881	null
2024-04-24	Revisiting Out-of-Distribution Detection in LiDAR-based 3D Object Detection	Michael Kösel et.al.	2404.15879	link
2024-04-23	CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection	Hongyi Cai et.al.	2404.15451	null
2024-04-23	ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning	Weifeng Chen et.al.	2404.15449	null
2024-04-23	Source-free Domain Adaptation for Video Object Detection Under Adverse Image Conditions	Xingguang Zhang et.al.	2404.15252	null
2024-04-23	Efficient Transformer Encoders for Mask2Former-style models	Manyi Yao et.al.	2404.15244	null
2024-04-23	Gallbladder Cancer Detection in Ultrasound Images based on YOLO and Faster R-CNN	Sara Dadjouy et.al.	2404.15129	null
2024-04-23	External Prompt Features Enhanced Parameter-efficient Fine-tuning for Salient Object Detection	Wen Liang et.al.	2404.15008	null
2024-04-23	ContextualFusion: Context-Based Multi-Sensor Fusion for 3D Object Detection in Adverse Operating Conditions	Shounak Sural et.al.	2404.14780	null
2024-04-23	Unified Unsupervised Salient Object Detection via Knowledge Transfer	Yao Yuan et.al.	2404.14759	link
2024-04-22	SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection	Yuxia Wang et.al.	2404.14183	null
2024-04-22	Text in the Dark: Extremely Low-Light Text Image Enhancement	Che-Tsung Lin et.al.	2404.14135	null
2024-04-22	CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective	Wencheng Zhu et.al.	2404.14109	null
2024-04-22	Benchmarking Multi-Modal LLMs for Testing Visual Deep Learning Systems Through the Lens of Image Mutation	Liwen Wang et.al.	2404.13945	null
2024-04-22	NeRF-DetS: Enhancing Multi-View 3D Object Detection with Sampling-adaptive Network of Continuous NeRF-based Representation	Chi Huang et.al.	2404.13921	null
2024-04-22	TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos	Atom Scott et.al.	2404.13868	null
2024-04-22	Toward Robust LiDAR based 3D Object Detection via Density-Aware Adaptive Thresholding	Eunho Lee et.al.	2404.13852	null
2024-04-21	A Nasal Cytology Dataset for Object Detection and Deep Learning	Mauro Camporeale et.al.	2404.13745	null
2024-04-23	Clio: Real-time Task-Driven Open-Set 3D Scene Graphs	Dominic Maggio et.al.	2404.13696	null
2024-04-20	FisheyeDetNet: Object Detection on Fisheye Surround View Camera Systems for Automated Driving	Ganesh Sistu et.al.	2404.13443	null
2024-04-19	A comparison between single-stage and two-stage 3D tracking algorithms for greenhouse robotics	David Rapado-Rincon et.al.	2404.12963	null
2024-04-19	Language-Driven Active Learning for Diverse Open-Set 3D Object Detection	Ross Greer et.al.	2404.12856	null
2024-04-19	ECOR: Explainable CLIP for Object Recognition	Ali Rasekh et.al.	2404.12839	null
2024-04-19	A Point-Based Approach to Efficient LiDAR Multi-Task Perception	Christopher Lang et.al.	2404.12798	null
2024-04-19	ELEV-VISION-SAM: Integrated Vision Language and Foundation Model for Automated Estimation of Building Lowest Floor Elevation	Yu-Hsuan Ho et.al.	2404.12606	null
2024-04-18	The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models	Cheng Shi et.al.	2404.11957	link
2024-04-18	Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition	Xunsong Li et.al.	2404.11903	null
2024-04-17	TempBEV: Improving Learned BEV Encoders with Combined Image and BEV Space Temporal Aggregation	Thomas Monninger et.al.	2404.11803	null
2024-04-17	Multimodal 3D Object Detection on Unseen Domains	Deepti Hegde et.al.	2404.11764	null
2024-04-17	Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection	Deepti Hegde et.al.	2404.11737	null
2024-04-17	Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems	Luca Bompani et.al.	2404.11488	link
2024-04-17	EcoMLS: A Self-Adaptation Approach for Architecting Green ML-Enabled Systems	Meghana Tedla et.al.	2404.11411	null
2024-04-17	Detector Collapse: Backdooring Object Detection to Catastrophic Overload or Blindness	Hangtao Zhang et.al.	2404.11357	null
2024-04-17	Simple In-place Data Augmentation for Surveillance Object Detection	Munkh-Erdene Otgonbold et.al.	2404.11226	null
2024-04-17	Feature Corrective Transfer Learning: End-to-End Solutions to Object Detection in Non-Ideal Visual Conditions	Chuheng Wei et.al.	2404.11214	null
2024-04-17	GhostNetV3: Exploring the Training Strategies for Compact Models	Zhenhua Liu et.al.	2404.11202	null
2024-04-17	How to deal with glare for improved perception of Autonomous Vehicles	Muhammad Z. Alam et.al.	2404.10992	null
2024-04-17	Leveraging 3D LiDAR Sensors to Enable Enhanced Urban Safety and Public Health: Pedestrian Monitoring and Abnormal Activity Detection	Nawfal Guefrachi et.al.	2404.10978	null
2024-04-16	OSR-ViT: A Simple and Modular Framework for Open-Set Object Detection and Discovery	Matthew Inkawhich et.al.	2404.10865	null
2024-04-16	Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark	Jiangning Zhang et.al.	2404.10760	null
2024-04-16	Watch Your Step: Optimal Retrieval for Continual Learning at Scale	Truman Hickok et.al.	2404.10758	null
2024-04-16	Efficient optimal dispersed Haar-like filters for face detection	Zeinab Sedaghatjoo et.al.	2404.10476	null
2024-04-16	Camera clustering for scalable stream-based active distillation	Dani Manjah et.al.	2404.10411	null
2024-04-15	Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets	Dai Quoc Tran et.al.	2404.10078	link
2024-04-15	Explainable Light-Weight Deep Learning Pipeline for Improved Drought Stres	Aswini Kumar Patra et.al.	2404.10073	null
2024-04-15	VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection	Bonan Ding et.al.	2404.09431	null
2024-04-14	TEXT2TASTE: A Versatile Egocentric Vision System for Intelligent Reading Assistance Using Large Language Model	Wiktor Mucha et.al.	2404.09254	null
2024-04-14	DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection	Lewei Yao et.al.	2404.09216	null
2024-04-14	Coreset Selection for Object Detection	Hojun Lee et.al.	2404.09161	null
2024-04-14	Fusion-Mamba for Cross-modality Object Detection	Wenhao Dong et.al.	2404.09146	null
2024-04-13	The Snake's Beating Heart? A Millisecond Pulsar Binary in the Galactic Center Radio Filament G359.1 $-$ 0.2	Marcus E. Lower et.al.	2404.09098	null
2024-04-13	BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection	Jian Zhang et.al.	2404.08979	null
2024-04-13	Shifting Spotlight for Co-supervision: A Simple yet Efficient Single-branch Network to See Through Camouflage	Yang Hu et.al.	2404.08936	null
2024-04-12	Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation	Yanhao Zheng et.al.	2404.08603	link
2024-04-12	FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation	Riza Velioglu et.al.	2404.08582	null
2024-04-12	Analyzing Decades-Long Environmental Changes in Namibia Using Archival Aerial Photography and Deep Learning	Girmaw Abebe Tadesse et.al.	2404.08544	null
2024-04-12	MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion	Zhe Li et.al.	2404.08406	null
2024-04-12	Overcoming Scene Context Constraints for Object Detection in wild using Defilters	Vamshi Krishna Kancharla et.al.	2404.08293	null
2024-04-11	ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model	Lifan Jiang et.al.	2404.07773	null
2024-04-11	Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification	Ricardo Pereira et.al.	2404.07739	null
2024-04-11	Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation Patterns	Hakan Yekta Yatbaz et.al.	2404.07685	null
2024-04-11	Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes	Poulami Sinhamahapatra et.al.	2404.07664	null
2024-04-11	Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method	Tashmoy Ghosh et.al.	2404.07649	null
2024-04-11	GLID: Pre-training a Generalist Encoder-Decoder Vision Model	Jihao Liu et.al.	2404.07603	null
2024-04-11	SFSORT: Scene Features-based Simple Online Real-Time Tracker	M. M. Morsali et.al.	2404.07553	link
2024-04-11	The Sydney Radio Star Catalogue: properties of radio stars at megahertz to gigahertz frequencies	Laura N. Driessen et.al.	2404.07418	null
2024-04-11	Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing	Jaemin Kang et.al.	2404.07405	null
2024-04-11	A fine-tuning workflow for automatic first-break picking with deep learning	Amir Mardan et.al.	2404.07400	link
2024-04-10	Identification of Fine-grained Systematic Errors via Controlled Scene Generation	Valentyn Boreiko et.al.	2404.07045	null
2024-04-10	Accurate Tennis Court Line Detection on Amateur Recorded Matches	Sameer Agrawal et.al.	2404.06977	null
2024-04-10	SARA: Smart AI Reading Assistant for Reading Comprehension	Enkeleda Thaqi et.al.	2404.06906	null
2024-04-10	Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data	Aakash Kumar et.al.	2404.06715	null
2024-04-10	Scaling Multi-Camera 3D Object Detection through Weak-to-Strong Eliciting	Hao Lu et.al.	2404.06700	link
2024-04-09	Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping	Anas Gouda et.al.	2404.06277	null
2024-04-09	Label-Efficient 3D Object Detection For Road-Side Units	Minh-Quan Dao et.al.	2404.06256	null
2024-04-09	Automatic Defect Detection in Sewer Network Using Deep Learning Based Object Detector	Bach Ha et.al.	2404.06219	null
2024-04-09	YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images	Chenguang Liu et.al.	2404.06180	null
2024-04-09	Enhanced Radar Perception via Multi-Task Learning: Towards Refined Data for Sensor Fusion Applications	Huawei Sun et.al.	2404.06165	null
2024-04-09	Improving Facial Landmark Detection Accuracy and Efficiency with Knowledge Distillation	Zong-Wei Hong et.al.	2404.06029	null
2024-04-08	Retrieval-Augmented Open-Vocabulary Object Detection	Jooyeon Kim et.al.	2404.05687	link
2024-04-08	3D-COCO: extension of MS-COCO dataset for image detection and 3D reconstruction modules	Maxence Bideaux et.al.	2404.05641	null
2024-04-08	PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text?	Kseniia Petukhova et.al.	2404.05483	null
2024-04-08	Detecting Every Object from Events	Haitian Zhang et.al.	2404.05285	link
2024-04-08	MOSE: Boosting Vision-based Roadside 3D Object Detection with Scene Cues	Xiahan Chen et.al.	2404.05280	null
2024-04-08	Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes	Yu Sheng et.al.	2404.05164	null
2024-04-08	Better Monocular 3D Detectors with LiDAR from the Past	Yurong You et.al.	2404.05139	link
2024-04-07	AirShot: Efficient Few-Shot Detection for Autonomous Exploration	Zihan Wang et.al.	2404.05069	link
2024-04-07	PlateSegFL: A Privacy-Preserving License Plate Detection Using Federated Segmentation Learning	Md. Shahriar Rahman Anuvab et.al.	2404.05049	null
2024-04-07	PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot	Shenbagaraj Kannapiran et.al.	2404.05024	null
2024-04-05	SCAResNet: A ResNet Variant Optimized for Tiny Object Detection in Transmission and Distribution Towers	Weile Li et.al.	2404.04179	link
2024-04-05	Designing Robots to Help Women	Martin Cooney et.al.	2404.04123	null
2024-04-04	Is CLIP the main roadblock for fine-grained open-world perception?	Lorenzo Bianchi et.al.	2404.03539	link
2024-04-04	DQ-DETR: DETR with Dynamic Query for Tiny Object Detection	Yi-Xin Huang et.al.	2404.03507	null
2024-04-05	A Methodology to Study the Impact of Spiking Neural Network Parameters considering Event-Based Automotive Data	Iqra Bano et.al.	2404.03493	null
2024-04-04	MonoCD: Monocular 3D Object Detection with Complementary Depths	Longfei Yan et.al.	2404.03181	link
2024-04-03	DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection	Felix Fent et.al.	2404.03015	null
2024-04-03	ALOHa: A New Measure for Hallucination in Captioning Models	Suzanne Petryk et.al.	2404.02904	null
2024-04-03	FlightScope: A Deep Comprehensive Assessment of Aircraft Detection Algorithms in Satellite Imagery	Safouane El Ghazouali et.al.	2404.02877	link
2024-04-03	HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras	Zhongyu Xia et.al.	2404.02517	link
2024-04-04	TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression	Ho-Joong Kim et.al.	2404.02405	null
2024-04-04	EGTR: Extracting Graph from Transformer for Scene Graph Generation	Jinbae Im et.al.	2404.02072	link
2024-04-03	Cooperative Students: Navigating Unsupervised Domain Adaptation in Nighttime Object Detection	Jicheng Yuan et.al.	2404.01988	link
2024-04-02	Towards Enhanced Analysis of Lung Cancer Lesions in EBUS-TBNA -- A Semi-Supervised Video Object Detection Method	Jyun-An Lin et.al.	2404.01929	null
2024-04-02	Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack	Ying Zhou et.al.	2404.01907	link
2024-04-02	Scene Adaptive Sparse Transformer for Event-based Object Detection	Yansong Peng et.al.	2404.01882	link
2024-04-02	Semi-Supervised Domain Adaptation for Wildfire Detection	JooYoung Jang et.al.	2404.01842	null
2024-04-02	Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection	Tahira Shehzadi et.al.	2404.01819	null
2024-04-02	Analyzing the Single Event Upset Vulnerability of Binarized Neural Networks on SRAM FPGAs	Ioanna Souvatzoglou et.al.	2404.01757	null
2024-04-02	Disentangled Pre-training for Human-Object Interaction Detection	Zhuolong Li et.al.	2404.01725	null
2024-04-02	Task Integration Distillation for Object Detectors	Hai Su et.al.	2404.01699	null
2024-03-29	PLoc: A New Evaluation Criterion Based on Physical Location for Autonomous Driving Datasets	Ruining Yang et.al.	2403.19893	null
2024-03-29	MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection	Ali Behrouz et.al.	2403.19888	null
2024-03-28	DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs	Donghyun Kim et.al.	2403.19588	link
2024-03-28	OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation	Zhenyu Wang et.al.	2403.19580	null
2024-03-28	AIpom at SemEval-2024 Task 8: Detecting AI-produced Outputs in M4	Alexander Shirnin et.al.	2403.19354	null
2024-03-28	Sparse Generation: Making Pseudo Labels Sparse for weakly supervision with points	Tian Ma et.al.	2403.19306	null
2024-03-28	CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection	Mikhail Kennerley et.al.	2403.19278	link
2024-03-28	Algorithmic Ways of Seeing: Using Object Detection to Facilitate Art Exploration	Louie Søs Meyer et.al.	2403.19174	null
2024-03-28	CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation	Lingjun Zhao et.al.	2403.19104	null
2024-03-28	A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement	Junjie Wen et.al.	2403.19079	null
2024-03-27	Illicit object detection in X-ray images using Vision Transformers	Jorgen Cani et.al.	2403.19043	null
2024-03-27	Benchmarking Object Detectors with COCO: A New Path Forward	Shweta Singh et.al.	2403.18819	link
2024-03-27	PhysicsAssistant: An LLM-Powered Interactive Learning Robot for Physics Lab Investigations	Ehsan Latif et.al.	2403.18721	null
2024-03-27	CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection	Jiayi Zhu et.al.	2403.18554	null
2024-03-27	BAM: Box Abstraction Monitors for Real-time OoD Detection in Object Detection	Changshun Wu et.al.	2403.18373	null
2024-03-27	Ship in Sight: Diffusion Models for Ship-Image Super Resolution	Luigi Sigillo et.al.	2403.18370	link
2024-03-27	DODA: Diffusion for Object-detection Domain Adaptation in Agriculture	Shuai Xiang et.al.	2403.18334	null
2024-03-27	Tracking-Assisted Object Detection with Event Cameras	Ting-Kang Yen et.al.	2403.18330	null
2024-03-27	SGDM: Static-Guided Dynamic Module Make Stronger Visual Models	Wenjie Xing et.al.	2403.18282	null
2024-03-27	Road Obstacle Detection based on Unknown Objectness Scores	Chihiro Noguchi et.al.	2403.18207	null
2024-03-26	State of the art applications of deep learning within tracking and detecting marine debris: A survey	Zoe Moorton et.al.	2403.18067	null
2024-03-26	The Solution for the CVPR 2023 1st foundation model challenge-Track2	Haonan Xu et.al.	2403.17702	null
2024-03-26	PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition	Chenhongyi Yang et.al.	2403.17695	link
2024-03-26	UADA3D: Unsupervised Adversarial Domain Adaptation for 3D Object Detection with Sparse LiDAR and Large Domain Gaps	Maciej K Wozniak et.al.	2403.17633	null
2024-03-26	SSF3D: Strict Semi-Supervised 3D Object Detection with Switching Filter	Songbur Wong et.al.	2403.17390	null
2024-03-26	Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection	Jiacheng Zhang et.al.	2403.17387	null
2024-03-26	AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving	Mingfu Liang et.al.	2403.17373	null
2024-03-26	Staircase Localization for Autonomous Exploration in Urban Environments	Jinrae Kim et.al.	2403.17330	null
2024-03-25	Co-Occurring of Object Detection and Identification towards unlabeled object discovery	Binay Kumar Singh et.al.	2403.17223	null
2024-03-25	Optimizing LiDAR Placements for Robust Driving Perception in Adverse Conditions	Ye Li et.al.	2403.17009	link
2024-03-25	Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance	Jingyuan Zhu et.al.	2403.16954	null
2024-03-25	TrustAI at SemEval-2024 Task 8: A Comprehensive Analysis of Multi-domain Machine Generated Text Detection Techniques	Ashok Urlana et.al.	2403.16592	null
2024-03-25	RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection	Zhiwei Lin et.al.	2403.16440	link
2024-03-25	ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation	Hannah Schieber et.al.	2403.16400	null
2024-03-25	Impact of Video Compression Artifacts on Fisheye Camera Visual Perception Tasks	Madhumitha Sakthi et.al.	2403.16338	null
2024-03-24	Cross-domain Multi-modal Few-shot Object Detection via Rich Text	Zeyu Shangguan et.al.	2403.16188	null
2024-03-24	Semantic Is Enough: Only Semantic Information For NeRF Reconstruction	Ruibo Wang et.al.	2403.16043	null
2024-03-23	Adversarial Defense Teacher for Cross-Domain Object Detection under Poor Visibility Conditions	Kaiwen Wang et.al.	2403.15786	null
2024-03-23	EAGLE: A Domain Generalization Framework for AI-generated Text Detection	Amrita Bhattacharjee et.al.	2403.15690	null
2024-03-25	Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection	Hongzhi Gao et.al.	2403.15317	null
2024-03-22	CR3DT: Camera-RADAR Fusion for 3D Detection and Tracking	Nicolas Baumann et.al.	2403.15313	null
2024-03-22	IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection	Junbo Yin et.al.	2403.15241	null
2024-03-22	MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection	Taeheon Kim et.al.	2403.15209	null
2024-03-22	SFOD: Spiking Fusion Object Detector	Yimeng Fan et.al.	2403.15192	link
2024-03-22	CRPlace: Camera-Radar Fusion with BEV Representation for Place Recognition	Shaowei Fu et.al.	2403.15183	null
2024-03-22	An In-Depth Analysis of Data Reduction Methods for Sustainable Deep Learning	Víctor Toscano-Durán et.al.	2403.15150	null
2024-03-22	Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection	Jiaming Li et.al.	2403.15127	link
2024-03-22	VRSO: Visual-Centric Reconstruction for Static Object Annotation	Chenyao Yu et.al.	2403.15026	null
2024-03-22	Vehicle Detection Performance in Nordic Region	Hamam Mokayed et.al.	2403.15017	null
2024-03-21	T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy	Qing Jiang et.al.	2403.14610	link
2024-03-21	UAV-Assisted Maritime Search and Rescue: A Holistic Approach	Martin Messmer et.al.	2403.14281	null
2024-03-21	Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection	Tim Salzmann et.al.	2403.14270	null
2024-03-21	3D Object Detection from Point Cloud via Voting Step Diffusion	Haoran Hou et.al.	2403.14133	null
2024-03-20	EcoSense: Energy-Efficient Intelligent Sensing for In-Shore Ship Detection through Edge-Cloud Collaboration	Wenjun Huang et.al.	2403.14027	null
2024-03-20	RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition	Ziyu Liu et.al.	2403.13805	link
2024-03-20	Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments	Yang Yang et.al.	2403.13803	link
2024-03-20	Fostc3net:A Lightweight YOLOv5 Based On the Network Structure Optimization	Danqing Ma et.al.	2403.13703	null
2024-03-20	Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments	Djamahl Etchegaray et.al.	2403.13556	null
2024-03-20	MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining	Di Wang et.al.	2403.13430	link
2024-03-20	Few-shot Oriented Object Detection with Memorable Contrastive Learning in Remote Sensing Images	Jiawei Zhou et.al.	2403.13375	null
2024-03-20	Adaptive Ensembles of Fine-Tuned Transformers for LLM-Generated Text Detection	Zhixin Lai et.al.	2403.13335	null
2024-03-20	DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception	Yibo Wang et.al.	2403.13304	null
2024-03-20	Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models	Huachuan Qiu et.al.	2403.13250	null
2024-03-19	SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model	Armen Avetisyan et.al.	2403.13064	null
2024-03-19	Wildfire danger prediction optimization with transfer learning	Spiros Maggioros et.al.	2403.12871	link
2024-03-19	As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?	Anjun Hu et.al.	2403.12693	null
2024-03-19	EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks	Ziming Wang et.al.	2403.12574	null
2024-03-19	DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM	Yixuan Wu et.al.	2403.12488	null
2024-03-19	TransformMix: Learning Transformation and Mixing Strategies from Data	Tsz-Him Cheung et.al.	2403.12429	null
2024-03-19	VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation	Hao Wang et.al.	2403.12415	null
2024-03-19	Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition	Jielin Qiu et.al.	2403.12339	null
2024-03-18	EffiPerception: an Efficient Framework for Various Perception Tasks	Xinhao Xiang et.al.	2403.12317	null
2024-03-18	Prototipo de un Contador Bidireccional Automático de Personas basado en sensores de visión 3D	Benjamín Ojeda-Magaña et.al.	2403.12310	null
2024-03-18	Align and Distill: Unifying and Improving Domain Adaptive Object Detection	Justin Kay et.al.	2403.12029	link
2024-03-18	TrajectoryNAS: A Neural Architecture Search for Trajectory Prediction	Ali Asghar Sharifi et.al.	2403.11695	null
2024-03-18	Just Add $100 More: Augmenting NeRF-based Pseudo-LiDAR Point Cloud for Resolving Class-imbalance Problem	Mincheol Chang et.al.	2403.11573	null
2024-03-18	R2SNet: Scalable Domain Adaptation for Object Detection in Cloud-Based Robots Ecosystems via Proposal Refinement	Michele Antonazzi et.al.	2403.11567	null
2024-03-18	Continual Forgetting for Pre-trained Vision Models	Hongbo Zhao et.al.	2403.11530	link
2024-03-17	V2X-DGW: Domain Generalization for Multi-agent Perception under Adverse Weather Conditions	Baolu Li et.al.	2403.11371	null
2024-03-17	Advanced Knowledge Extraction of Physical Design Drawings, Translation and conversion to CAD formats using Deep Learning	Jesher Joshua M et.al.	2403.11291	null
2024-03-17	ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models	Siyuan Huang et.al.	2403.11289	null
2024-03-17	CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations	Yuwei Zhang et.al.	2403.11220	link
2024-03-17	GRA: Detecting Oriented Objects through Group-wise Rotating and Attention	Jiangshan Wang et.al.	2403.11127	null
2024-03-17	Self-supervised co-salient object detection via feature correspondence at multiple scales	Souradeep Chakraborty et.al.	2403.11107	link
2024-03-14	Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization	Zhao Wang et.al.	2403.09433	null
2024-03-14	D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection	Dinh Phat Do et.al.	2403.09359	link
2024-03-14	Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring	Yufei Zhan et.al.	2403.09333	link
2024-03-14	EfficientMFD: Towards More Efficient Multimodal Synchronous Fusion Detection	Jiaqing Zhang et.al.	2403.09323	link
2024-03-14	Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection	Martin Aubard et.al.	2403.09313	link
2024-03-14	MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion	Arul Selvam Periyasamy et.al.	2403.09309	null
2024-03-14	CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification	Yiming Ma et.al.	2403.09281	null
2024-03-14	D-YOLO a robust framework for object detection in adverse weather conditions	Zihan Chu et.al.	2403.09233	null
2024-03-14	Improving Distant 3D Object Detection Using 2D Box Supervision	Zetong Yang et.al.	2403.09230	null
2024-03-14	PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest	Jiajun Deng et.al.	2403.09212	null
2024-03-13	VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis	Enric Corona et.al.	2403.08764	null
2024-03-13	MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning	Jialv Zou et.al.	2403.08760	link
2024-03-13	Data Augmentation in Human-Centric Vision	Wentao Jiang et.al.	2403.08650	null
2024-03-13	PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections	Matteo Taiana et.al.	2403.08586	null
2024-03-13	A Multimodal Fusion Network For Student Emotion Recognition Based on Transformer and Tensor Product	Ao Xiang et.al.	2403.08511	null
2024-03-13	Improved YOLOv5 Based on Attention Mechanism and FasterNet for Foreign Object Detection on Railway and Airway tracks	Zongqing Qi et.al.	2403.08499	null
2024-03-13	IAMCV Multi-Scenario Vehicle Interaction Dataset	Novel Certad et.al.	2403.08455	null
2024-03-13	Advancing Security in AI Systems: A Novel Approach to Detecting Backdoors in Deep Neural Networks	Khondoker Murad Hossain et.al.	2403.08208	null
2024-03-12	TaskCLIP: Extend Large Vision-Language Model for Task Oriented Object Detection	Hanning Chen et.al.	2403.08108	null
2024-03-12	Aedes aegypti Egg Counting with Neural Networks for Object Detection	Micheli Nayara de Oliveira Vicente et.al.	2403.08016	null
2024-03-12	Mondrian: On-Device High-Performance Video Analytics with Compressive Packed Inference	Changmin Jeon et.al.	2403.07598	null
2024-03-12	PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution	Honghao Chen et.al.	2403.07589	null
2024-03-12	A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions	Quoc-Vinh Lai-Dang et.al.	2403.07542	null
2024-03-12	JSTR: Joint Spatio-Temporal Reasoning for Event-based Moving Object Detection	Hanyu Zhou et.al.	2403.07436	null
2024-03-12	Eliminating Cross-modal Conflicts in BEV Space for LiDAR-Camera 3D Object Detection	Jiahui Fu et.al.	2403.07372	null
2024-03-12	GPT-generated Text Detection: Benchmark Dataset and Tensor-based Detection Method	Zubair Qazi et.al.	2403.07321	link
2024-03-12	MENTOR: Multilingual tExt detectioN TOward leaRning by analogy	Hsin-Ju Lin et.al.	2403.07286	null
2024-03-12	SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection	Hongcheng Zhang et.al.	2403.07284	null
2024-03-12	Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction	Alexander Timans et.al.	2403.07263	null
2024-03-11	Class Imbalance in Object Detection: An Experimental Diagnosis and Study of Mitigation Strategies	Nieves Crasto et.al.	2403.07113	link
2024-03-11	Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head	Tiancheng Zhao et.al.	2403.06892	null
2024-03-11	LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations	Mohammad Alkhalefi et.al.	2403.06813	null
2024-03-11	Genetic Learning for Designing Sim-to-Real Data Augmentations	Bram Vanherle et.al.	2403.06786	null
2024-03-11	Evaluating the Energy Efficiency of Few-Shot Learning for Object Detection in Industrial Settings	Georgios Tsoumplekas et.al.	2403.06631	null
2024-03-11	Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers	Alexander H. Berger et.al.	2403.06601	null
2024-03-11	SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection	Yuxuan Li et.al.	2403.06534	link
2024-03-11	3D Semantic Segmentation-Driven Representations for 3D Object Detection	Hayeon O et.al.	2403.06501	null
2024-03-11	Fine-Grained Pillar Feature Encoding Via Spatio-Temporal Virtual Grid for 3D Object Detection	Konyul Park et.al.	2403.06433	null
2024-03-10	Transformer based Multitask Learning for Image Captioning and Object Detection	Debolena Basak et.al.	2403.06292	null
2024-03-10	Poly Kernel Inception Network for Remote Sensing Detection	Xinhao Cai et.al.	2403.06258	link
2024-03-08	EVD4UAV: An Altitude-Sensitive Benchmark to Evade Vehicle Detection in UAV	Huiming Sun et.al.	2403.05422	null
2024-03-08	SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection	Yahao Lu et.al.	2403.05416	link
2024-03-08	Exploring Robust Features for Few-Shot Object Detection in Satellite Imagery	Xavier Bou et.al.	2403.05381	null
2024-03-08	Frequency-Adaptive Dilated Convolution for Semantic Segmentation	Linwei Chen et.al.	2403.05369	link
2024-03-08	VLM-PL: Advanced Pseudo Labeling approach Class Incremental Object Detection with Vision-Language Model	Junsu Kim et.al.	2403.05346	null
2024-03-08	Improving the Successful Robotic Grasp Detection Using Convolutional Neural Networks	Hamed Hosseini et.al.	2403.05211	null
2024-03-08	LanePtrNet: Revisiting Lane Detection as Point Voting and Grouping on Curves	Jiayan Cao et.al.	2403.05155	null
2024-03-08	RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features	Geonho Bang et.al.	2403.05061	null
2024-03-08	ActFormer: Scalable Collaborative Perception via Active Queries	Suozhi Huang et.al.	2403.04968	null
2024-03-07	FriendNet: Detection-Friendly Dehazing Network	Yihua Fan et.al.	2403.04443	null
2024-03-07	Effectiveness Assessment of Recent Large Vision-Language Models	Yao Jiang et.al.	2403.04306	null
2024-03-07	ACC-ViT : Atrous Convolution's Comeback in Vision Transformers	Nabil Ibtehaz et.al.	2403.04200	null
2024-03-07	CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images	Guanlin Shen et.al.	2403.04198	null
2024-03-07	Scalable and Robust Transformer Decoders for Interpretable Image Classification with Foundation Models	Evelyn Mannix et.al.	2403.04125	null
2024-03-07	CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D Object Detection	Gyusam Chang et.al.	2403.03721	null
2024-03-06	Adversarial Infrared Geometry: Using Geometry to Perform Adversarial Attack against Infrared Pedestrian Detectors	Kalibinuer Tiliwalidi et.al.	2403.03674	null
2024-03-06	Towards Detecting AI-Generated Text within Human-AI Collaborative Hybrid Texts	Zijie Zeng et.al.	2403.03506	null
2024-03-06	Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator	Wonhyeok Choi et.al.	2403.03468	null
2024-03-06	FLAME Diffuser: Grounded Wildfire Image Synthesis using Mask Guided Diffusion	Hao Wang et.al.	2403.03463	null
2024-03-06	Performance Evaluation of Semi-supervised Learning Frameworks for Multi-Class Weed Detection	Jiajia Li et.al.	2403.03390	link
2024-03-05	Detecting Concrete Visual Tokens for Multimodal Machine Translation	Braeden Bowen et.al.	2403.03075	null
2024-03-05	Loss Design for Single-carrier Joint Communication and Neural Network-based Sensing	Charlotte Muth et.al.	2403.02929	null
2024-03-05	Are Dense Labels Always Necessary for 3D Object Detection from Point Cloud?	Chenqiang Gao et.al.	2403.02818	null
2024-03-05	Bootstrapping Rare Object Detection in High-Resolution Satellite Imagery	Akram Zaytar et.al.	2403.02736	null
2024-03-05	FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View	Jiawei Hou et.al.	2403.02710	null
2024-03-05	False Positive Sampling-based Data Augmentation for Enhanced 3D Object Detection Accuracy	Jiyong Oh et.al.	2403.02639	null
2024-03-05	BSDP: Brain-inspired Streaming Dual-level Perturbations for Online Open World Object Detection	Yu Chen et.al.	2403.02637	null
2024-03-04	NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function	Abdullah Nazhat Abdullah et.al.	2403.02411	link
2024-03-04	COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems against Semantic Attacks	Zijian Huang et.al.	2403.02329	null
2024-03-04	Scalable Vision-Based 3D Object Detection and Monocular Depth Estimation for Autonomous Driving	Yuxuan Liu et.al.	2403.02037	link
2024-03-02	TUMTraf V2X Cooperative Perception Dataset	Walter Zimmer et.al.	2403.01316	null
2024-03-02	Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection	Taeheon Kim et.al.	2403.01300	null
2024-03-02	Run-time Introspection of 2D Object Detection in Automated Driving Systems Using Learning Representations	Hakan Yekta Yatbaz et.al.	2403.01172	null
2024-03-02	ELA: Efficient Local Attention for Deep Convolutional Neural Networks	Wei Xu et.al.	2403.01123	null
2024-03-02	Face Swap via Diffusion Model	Feifei Wang et.al.	2403.01108	null
2024-03-02	Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and Visible Images	Shufan Pei et.al.	2403.01083	null
2024-03-01	Learning Causal Features for Incremental Object Detection	Zhenwei He et.al.	2403.00591	null
2024-03-01	Abductive Ego-View Accident Video Understanding for Safe Driving Perception	Jianwu Fang et.al.	2403.00436	null
2024-03-04	DAMS-DETR: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion	Junjie Guo et.al.	2403.00326	null
2024-03-01	ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting	Chen Duan et.al.	2403.00303	null
2024-02-29	SeMoLi: What Moves Together Belongs Together	Jenny Seidenschwarz et.al.	2402.19463	null
2024-02-29	Genie: Smart ROS-based Caching for Connected Autonomous Robots	Zexin Li et.al.	2402.19410	null
2024-02-29	ProtoP-OD: Explainable Object Detection with Prototypical Parts	Pavlos Rath-Manakidis et.al.	2402.19142	null
2024-02-29	Theoretically Achieving Continuous Representation of Oriented Bounding Boxes	Zikai Xiao et.al.	2402.18975	link
2024-02-29	Boosting Semi-Supervised Object Detection in Remote Sensing Images With Active Teaching	Boxuan Zhang et.al.	2402.18958	null
2024-02-29	Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering	Xiang Chen et.al.	2402.18927	null
2024-02-29	A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection	Chao Hao et.al.	2402.18922	null
2024-02-29	Privacy-Preserving Autoencoder for Collaborative Object Detection	Bardia Azizian et.al.	2402.18864	null
2024-02-29	Debiased Novel Category Discovering and Localization	Juexiao Feng et.al.	2402.18821	null
2024-02-28	Spatial Coherence Loss for Salient and Camouflaged Object Detection and Beyond	Ziyun Yang et.al.	2402.18698	null
2024-02-28	UniMODE: Unified Monocular 3D Object Detection	Zhuoling Li et.al.	2402.18573	null
2024-02-28	Detection of Micromobility Vehicles in Urban Traffic Videos	Khalil Sabri et.al.	2402.18503	link
2024-02-28	Sunshine to Rainstorm: Cross-Weather Knowledge Distillation for Robust 3D Object Detection	Xun Huang et.al.	2402.18493	null
2024-02-28	Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization	Deng Li et.al.	2402.18447	null
2024-02-28	Unveiling novel insights into Kirchhoff migration for effective object detection using experimental Fresnel dataset	Won-Kwang Park et.al.	2402.18322	null
2024-02-28	Zero-Shot Aerial Object Detection with Visual Description Regularization	Zhengqing Zang et.al.	2402.18233	null
2024-02-28	VulMCI : Code Splicing-based Pixel-row Oversampling for More Continuous Vulnerability Image Generation	Tao Peng et.al.	2402.18189	null
2024-02-27	SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection	Junsu Kim et.al.	2402.17323	null
2024-02-27	A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge -- Multi-Task Robustness Track	Zehui Chen et.al.	2402.17319	null
2024-02-27	Probing Multimodal Large Language Models for Global and Local Semantic Representation	Mingxu Tao et.al.	2402.17304	null

(back to top)

Semantic Segmentation

Publish Date	Title	Authors	PDF	Code
2024-05-30	SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow	Chaoyang Wang et.al.	2405.20282	link
2024-05-30	MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion	Angel Villar-Corrales et.al.	2405.19921	link
2024-05-30	Open-Set Domain Adaptation for Semantic Segmentation	Seun-An Choe et.al.	2405.19899	link
2024-05-30	DenseSeg: Joint Learning for Semantic Segmentation and Landmark Detection Using Dense Image-to-Shape Representation	Ron Keuth et.al.	2405.19746	null
2024-05-30	Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing Scenes	Yong-Qiang Mao et.al.	2405.19735	null
2024-05-30	CRIS: Collaborative Refinement Integrated with Segmentation for Polyp Segmentation	Ankush Gajanan Arudkar et.al.	2405.19672	null
2024-05-29	Organizing Background to Explore Latent Classes for Incremental Few-shot Semantic Segmentation	Lianlei Shan et.al.	2405.19568	null
2024-05-29	Enabling Visual Recognition at Radio Frequency	Haowen Lai et.al.	2405.19516	null
2024-05-29	Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models	Tianrun Chen et.al.	2405.19326	null
2024-05-29	A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation	Niclas Vödisch et.al.	2405.19035	link
2024-05-29	Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation	Zelin Peng et.al.	2405.18840	null
2024-05-29	FocSAM: Delving Deeply into Focused Objects in Segmenting Anything	You Huang et.al.	2405.18706	null
2024-05-28	Learning to Detour: Shortcut Mitigating Augmentation for Weakly Supervised Semantic Segmentation	JuneHyoung Kwon et.al.	2405.18148	null
2024-05-28	Edge-guided and Class-balanced Active Learning for Semantic Segmentation of Aerial Images	Lianlei Shan et.al.	2405.18078	null
2024-05-28	RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields	Mihnea-Bogdan Jurca et.al.	2405.18033	null
2024-05-28	DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture	Shentong Mo et.al.	2405.17995	null
2024-05-28	Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation	Yangxiao Lu et.al.	2405.17859	link
2024-05-28	The Binary Quantized Neural Network for Dense Prediction via Specially Designed Upsampling and Attention	Xingyu Ding et.al.	2405.17776	null
2024-05-27	Evaluation of Multi-task Uncertainties in Joint Semantic Segmentation and Monocular Depth Estimation	Steven Landgraf et.al.	2405.17097	null
2024-05-27	DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking	Hongtao Wang et.al.	2405.16980	null
2024-05-27	Collective Perception Datasets for Autonomous Driving: A Comprehensive Review	Sven Teufel et.al.	2405.16973	null
2024-05-27	Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models	Qian Wang et.al.	2405.16947	null
2024-05-27	A re-calibration method for object detection with multi-modal alignment bias in autonomous driving	Zhihang Song et.al.	2405.16848	null
2024-05-26	Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning	Neha Kalibhat et.al.	2405.16401	null
2024-05-25	Video Prediction Models as General Visual Encoders	James Maier et.al.	2405.16382	null
2024-05-25	BOLD: Boolean Logic Deep Learning	Van Minh Nguyen et.al.	2405.16339	null
2024-05-25	Improving 3D Occupancy Prediction through Class-balancing Loss and Multi-scale Representation	Huizhou Chen et.al.	2405.16099	null
2024-05-25	Intensity and Texture Correction of Omnidirectional Image Using Camera Images for Indirect Augmented Reality	Hakim Ikebayashi et.al.	2405.16008	null
2024-05-24	Visualize and Paint GAN Activations	Rudolf Herdt et.al.	2405.15636	null
2024-05-24	Leveraging knowledge distillation for partial multi-task learning from multiple remote sensing datasets	Hoàng-Ân Lê et.al.	2405.15394	null
2024-05-24	Autonomous Quilt Spreading for Caregiving Robots	Yuchun Guo et.al.	2405.15373	null
2024-05-24	U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation	Bingyu Li et.al.	2405.15365	link
2024-05-24	Cross-Domain Few-Shot Semantic Segmentation via Doubly Matching Transformation	Jiayi Chen et.al.	2405.15265	null
2024-05-23	Mamba-R: Vision Mamba ALSO Needs Registers	Feng Wang et.al.	2405.14858	null
2024-05-23	Efficient Robot Learning for Perception and Mapping	Niclas Vödisch et.al.	2405.14688	null
2024-05-23	Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation	Daniel Kienzle et.al.	2405.14467	null
2024-05-23	MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models	Jiuming Liu et.al.	2405.14338	null
2024-05-23	Tuning-free Universally-Supervised Semantic Segmentation	Xiaobo Yang et.al.	2405.14294	null
2024-05-23	SCMix: Stochastic Compound Mixing for Open Compound Domain Adaptation in Semantic Segmentation	Kai Yao et.al.	2405.14278	null
2024-05-23	Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations	Mohammed Baharoon et.al.	2405.14239	null
2024-05-23	Leveraging Semantic Segmentation Masks with Embeddings for Fine-Grained Form Classification	Taylor Archibald et.al.	2405.14162	null
2024-05-23	Skip-SCAR: A Modular Approach to ObjectGoal Navigation with Sparsity and Adaptive Skips	Yaotian Liu et.al.	2405.14154	null
2024-05-22	TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System	Diogo Lavado et.al.	2405.13989	null
2024-05-21	Transparency Distortion Robustness for SOTA Image Segmentation Tasks	Volker Knauthe et.al.	2405.12864	null
2024-05-20	A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation	Sushmita Sarker et.al.	2405.11903	null
2024-05-20	Salience-guided Ground Factor for Robust Localization of Delivery Robots in Complex Urban Environments	Jooyong Park et.al.	2405.11855	null
2024-05-20	Improving the Explain-Any-Concept by Introducing Nonlinearity to the Trainable Surrogate Model	Mounes Zaval et.al.	2405.11837	null
2024-05-20	Universal Organizer of SAM for Unsupervised Semantic Segmentation	Tingting Li et.al.	2405.11742	null
2024-05-19	Interpreting a Semantic Segmentation Model for Coastline Detection	Conor O'Sullivan et.al.	2405.11500	null
2024-05-19	Unifying 3D Vision-Language Understanding via Promptable Queries	Ziyu Zhu et.al.	2405.11442	null
2024-05-18	PS6D: Point Cloud Based Symmetry-Aware 6D Object Pose Estimation in Robot Bin-Picking	Yifan Yang et.al.	2405.11257	null
2024-05-17	CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation	Mushui Liu et.al.	2405.10530	link
2024-05-16	4D Panoptic Scene Graph Generation	Jingkang Yang et.al.	2405.10305	link
2024-05-16	Towards Task-Compatible Compressible Representations	Anderson de Andrade et.al.	2405.10244	link
2024-05-16	DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data	Chengxiang Fan et.al.	2405.10185	link
2024-05-16	An Integrated Framework for Multi-Granular Explanation of Video Summarization	Konstantinos Tsigos et.al.	2405.10082	null
2024-05-16	A Preprocessing and Postprocessing Voxel-based Method for LiDAR Semantic Segmentation Improvement in Long Distance	Andrea Matteazzi et.al.	2405.10046	null
2024-05-16	Towards Realistic Incremental Scenario in Class Incremental Semantic Segmentation	Jihwan Kwak et.al.	2405.09858	null
2024-05-15	Synth-to-Real Unsupervised Domain Adaptation for Instance Segmentation	Guo Yachan et.al.	2405.09682	null
2024-05-14	CLIP with Quality Captions: A Strong Pretraining for Vision Tasks	Pavan Kumar Anasosalu Vasu et.al.	2405.08911	null
2024-05-14	Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study	Qinfeng Zhu et.al.	2405.08493	null
2024-05-14	TEDNet: Twin Encoder Decoder Neural Network for 2D Camera and LiDAR Road Detection	Martín Bayón-Gutiérrez et.al.	2405.08429	link
2024-05-13	IMAFD: An Interpretable Multi-stage Approach to Flood Detection from time series Multispectral Data	Ziyang Zhang et.al.	2405.07916	null
2024-05-13	PLUTO: Pathology-Universal Transformer	Dinkar Juyal et.al.	2405.07905	null
2024-05-12	PotatoGANs: Utilizing Generative Adversarial Networks, Instance Segmentation, and Explainable AI for Enhanced Potato Disease Identification and Classification	Mohammad Shafiul Alam et.al.	2405.07332	link
2024-05-12	Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception	Haoming Chen et.al.	2405.07201	null
2024-05-11	Global Motion Understanding in Large-Scale Video Object Segmentation	Volodymyr Fedynyak et.al.	2405.07031	null
2024-05-10	GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs	Mustafa Munir et.al.	2405.06849	link
2024-05-10	Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach	Elham Ravanbakhsh et.al.	2405.06586	null
2024-05-10	Semantic and Spatial Adaptive Pixel-level Classifier for Semantic Segmentation	Xiaowen Ma et.al.	2405.06525	link
2024-05-10	Multi-Target Unsupervised Domain Adaptation for Semantic Segmentation without External Data	Yonghao Xu et.al.	2405.06502	null
2024-05-10	Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data	Rongyu Zhang et.al.	2405.06413	null
2024-05-10	Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation	Zhenliang Ni et.al.	2405.06228	link
2024-05-10	Zero-shot Degree of Ill-posedness Estimation for Active Small Object Change Detection	Koji Takeda et.al.	2405.06185	null
2024-05-10	Prior-guided Diffusion Model for Cell Segmentation in Quantitative Phase Imaging	Zhuchen Shao et.al.	2405.06175	null
2024-05-09	Mask-TS Net: Mask Temperature Scaling Uncertainty Calibration for Polyp Segmentation	Yudian Zhang et.al.	2405.05830	null
2024-05-09	CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks	Nick et.al.	2405.05755	null
2024-05-08	OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies	Lingdong Kong et.al.	2405.05259	link
2024-05-08	Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving	Lingdong Kong et.al.	2405.05258	link
2024-05-08	Weakly-supervised Semantic Segmentation via Dual-stream Contrastive Learning of Cross-image Contextual Information	Qi Lai et.al.	2405.04913	null
2024-05-08	DeepDamageNet: A two-step deep-learning model for multi-disaster building damage segmentation and classification using satellite imagery	Irene Alisjahbana et.al.	2405.04800	null
2024-05-07	A Self-Supervised Method for Body Part Segmentation and Keypoint Detection of Rat Images	László Kopácsi et.al.	2405.04650	null
2024-05-07	FRACTAL: An Ultra-Large-Scale Aerial Lidar Dataset for 3D Semantic Segmentation of Diverse Landscapes	Charles Gaydon et.al.	2405.04634	link
2024-05-07	AugmenTory: A Fast and Flexible Polygon Augmentation Library	Tanaz Ghahremani et.al.	2405.04442	null
2024-05-07	A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields	Raiyan Rahman et.al.	2405.04305	null
2024-05-07	ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation	Zhibo Zhang et.al.	2405.04121	null
2024-05-07	Structured Click Control in Transformer-based Interactive Segmentation	Long Xu et.al.	2405.04009	link
2024-05-06	PTQ4SAM: Post-Training Quantization for Segment Anything	Chengtao Lv et.al.	2405.03144	link
2024-05-04	MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning	Vishal Nedungadi et.al.	2405.02771	null
2024-05-04	Few-Shot Fruit Segmentation via Transfer Learning	Jordan A. James et.al.	2405.02556	null
2024-05-03	Panoptic-SLAM: Visual SLAM in Dynamic Environments using Panoptic Segmentation	Gabriel Fischer Abati et.al.	2405.02177	null
2024-05-03	Towards general deep-learning-based tree instance segmentation models	Jonathan Henrich et.al.	2405.02061	null
2024-05-03	DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model	Peijin Jia et.al.	2405.02008	null
2024-05-02	Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey	Guoping Xu et.al.	2405.01725	link
2024-05-02	Explainable AI (XAI) in Image Segmentation in Medicine, Industry, and Beyond: A Survey	Rokas Gipiškis et.al.	2405.01636	null
2024-05-02	CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation	Chenying Liu et.al.	2405.01217	null
2024-05-02	Uncertainty-aware self-training with expectation maximization basis transformation	Zijia Wang et.al.	2405.01175	null
2024-05-01	GraCo: Granularity-Controllable Interactive Segmentation	Yian Zhao et.al.	2405.00587	null
2024-05-01	Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis	Huy H. Nguyen et.al.	2405.00355	null
2024-04-30	Masked Multi-Query Slot Attention for Unsupervised Object Discovery	Rishav Pramanik et.al.	2404.19654	link
2024-04-30	UniFS: Universal Few-shot Instance Perception with Point Representations	Sheng Jin et.al.	2404.19401	null
2024-04-30	DELINE8K: A Synthetic Data Pipeline for the Semantic Segmentation of Historical Documents	Taylor Archibald et.al.	2404.19259	null
2024-04-29	Swin2-MoSE: A New Single Image Super-Resolution Model for Remote Sensing	Leonardo Rossi et.al.	2404.18924	null
2024-04-29	IPixMatch: Boost Semi-supervised Semantic Segmentation with Inter-Pixel Relation	Kebin Wu et.al.	2404.18891	null
2024-04-29	From Density to Geometry: YOLOv8 Instance Segmentation for Reverse Engineering of Optimized Structures	Thomas Rochefort-Beaudoin et.al.	2404.18763	null
2024-04-29	Towards Long-term Robotics in the Wild	Stephen Hausler et.al.	2404.18477	null
2024-04-29	Clicks2Line: Using Lines for Interactive Image Segmentation	Chaewon Lee et.al.	2404.18461	null
2024-04-29	MFP: Making Full Use of Probability Maps for Interactive Image Segmentation	Chaewon Lee et.al.	2404.18448	null
2024-04-28	Panoptic Segmentation and Labelling of Lumbar Spine Vertebrae using Modified Attention Unet	Rikathi Pal et.al.	2404.18291	null
2024-04-28	Garbage Segmentation and Attribute Analysis by Robotic Dogs	Nuo Xu et.al.	2404.18112	null
2024-04-27	Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic Environments	Benoît Gérin et.al.	2404.17930	link
2024-04-27	GLIMS: Attention-Guided Lightweight Multi-Scale Hybrid Network for Volumetric Semantic Segmentation	Ziya Ata Yazıcı et.al.	2404.17854	link
2024-04-26	Optimizing Universal Lesion Segmentation: State Space Model-Guided Hierarchical Networks with Feature Importance Adjustment	Kazi Shahriar Sanjid et.al.	2404.17235	null
2024-04-25	Calculation of Femur Caput Collum Diaphyseal angle for X-Rays images using Semantic Segmentation	Deepak Bhatia et.al.	2404.17083	null
2024-04-25	Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals	Oliver Hahn et.al.	2404.16818	link
2024-04-25	Self-Balanced R-CNN for Instance Segmentation	Leonardo Rossi et.al.	2404.16633	link
2024-04-26	Multi-Scale Representations by Varying Window Attention for Semantic Segmentation	Haotian Yan et.al.	2404.16573	link
2024-04-25	360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes	Xu Zheng et.al.	2404.16501	null
2024-04-25	Semantic Segmentation Refiner for Ultrasound Applications with Zero-Shot Foundation Models	Hedda Cohen Indelman et.al.	2404.16325	null
2024-04-25	Style Adaptation for Domain-adaptive Semantic Segmentation	Ting Li et.al.	2404.16301	null
2024-04-25	A Multi-objective Optimization Benchmark Test Suite for Real-time Semantic Segmentation	Yifan Zhao et.al.	2404.16266	link
2024-04-24	Does SAM dream of EIG? Characterizing Interactive Segmenter Performance using Expected Information Gain	Kuan-I Chung et.al.	2404.16155	null
2024-04-24	3D Freehand Ultrasound using Visual Inertial and Deep Inertial Odometry for Measuring Patellar Tracking	Russell Buchanan et.al.	2404.15847	null
2024-04-24	Vision Transformer-based Adversarial Domain Adaptation	Yahan Li et.al.	2404.15817	link
2024-04-23	PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts	Hao Li et.al.	2404.15028	link
2024-04-23	Unknown Object Grasping for Assistive Robotics	Elle Miller et.al.	2404.15001	null
2024-04-22	Surgical-DeSAM: Decoupling SAM for Instrument Segmentation in Robotic Surgery	Yuyang Sheng et.al.	2404.14040	link
2024-04-22	OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks	Sophia Sirko-Galouchenko et.al.	2404.14027	null
2024-04-22	PM-VIS: High-Performance Box-Supervised Video Instance Segmentation	Zhangjing Yang et.al.	2404.13863	null
2024-04-21	Semantic-Rearrangement-Based Multi-Level Alignment for Domain Generalized Segmentation	Guanlong Jiao et.al.	2404.13701	null
2024-04-21	PV-S3: Advancing Automatic Photovoltaic Defect Detection using Semi-Supervised Semantic Segmentation of Electroluminescence Images	Abhishek Jha et.al.	2404.13693	null
2024-04-21	A Complete System for Automated 3D Semantic-Geometric Mapping of Corrosion in Industrial Environments	Rui Pimentel de Figueiredo et.al.	2404.13691	null
2024-04-21	LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing	Tong Wang et.al.	2404.13659	null
2024-04-21	Towards Unified Representation of Multi-Modal Pre-training for 3D Understanding via Differentiable Rendering	Ben Fei et.al.	2404.13619	null
2024-04-20	FisheyeDetNet: Object Detection on Fisheye Surround View Camera Systems for Automated Driving	Ganesh Sistu et.al.	2404.13443	null
2024-04-20	AMMUNet: Multi-Scale Attention Map Merging for Remote Sensing Image Segmentation	Yang Yang et.al.	2404.13408	null
2024-04-19	Nuclei Instance Segmentation of Cryosectioned H&E Stained Histological Images using Triple U-Net Architecture	Zarif Ahmed et.al.	2404.12986	null
2024-04-19	FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving	Xingtai Gui et.al.	2404.12867	null
2024-04-19	Foundation Model assisted Weakly Supervised LiDAR Semantic Segmentation	Yilong Chen et.al.	2404.12861	null
2024-04-19	COIN: Counterfactual inpainting for weakly supervised semantic segmentation for medical images	Dmytro Shvetsov et.al.	2404.12832	link
2024-04-19	A Point-Based Approach to Efficient LiDAR Multi-Task Perception	Christopher Lang et.al.	2404.12798	null
2024-04-19	Generalized Few-Shot Meets Remote Sensing: Discovering Novel Classes in Land Cover Mapping via Hybrid Semantic Segmentation Framework	Zhuohong Li et.al.	2404.12721	link
2024-04-19	Improving Prediction Accuracy of Semantic Segmentation Methods Using Convolutional Autoencoder Based Pre-processing Layers	Hisashi Shimodaira et.al.	2404.12718	null
2024-04-19	Show and Grasp: Few-shot Semantic Segmentation for Robot Grasping through Zero-shot Foundation Models	Leonardo Barcellona et.al.	2404.12717	null
2024-04-18	Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds	Oliver Lemke et.al.	2404.12440	null
2024-04-18	A Perspective on Deep Vision Performance with Standard Image and Video Codecs	Christoph Reich et.al.	2404.12330	null
2024-04-18	Performance Evaluation of Segment Anything Model with Variational Prompting for Application to Non-Visible Spectrum Imagery	Yona Falinie A. Gaus et.al.	2404.12285	null
2024-04-18	Deep Gaussian mixture model for unsupervised image segmentation	Matthias Schwab et.al.	2404.12252	null
2024-04-18	Observation, Analysis, and Solution: Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training	Jin Gao et.al.	2404.12210	link
2024-04-18	How to Benchmark Vision Foundation Models for Semantic Segmentation?	Tommie Kerssies et.al.	2404.12172	null
2024-04-17	Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding	George Retsinas et.al.	2404.12144	link
2024-04-18	Tendency-driven Mutual Exclusivity for Weakly Supervised Incremental Semantic Segmentation	Chongjie Si et.al.	2404.11981	null
2024-04-18	The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models	Cheng Shi et.al.	2404.11957	link
2024-04-18	Group-On: Boosting One-Shot Segmentation with Supportive Query	Hanjing Zhou et.al.	2404.11871	null
2024-04-17	Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach	Mir Rayat Imtiaz Hossain et.al.	2404.11732	null
2024-04-17	A Semantic Segmentation-guided Approach for Ground-to-Aerial Image Matching	Francesco Pro et.al.	2404.11302	link
2024-04-17	Learning from Unlabelled Data with Transformers: Domain Adaptation for Semantic Segmentation of High Resolution Aerial Images	Nikolaos Dionelis et.al.	2404.11299	link
2024-04-17	Criteria for Uncertainty-based Corner Cases Detection in Instance Segmentation	Florian Heidecker et.al.	2404.11266	null
2024-04-16	A Concise Tiling Strategy for Preserving Spatial Context in Earth Observation Imagery	Ellianna Abrahams et.al.	2404.10927	link
2024-04-16	Vocabulary-free Image Classification and Semantic Segmentation	Alessandro Conti et.al.	2404.10864	link
2024-04-16	Gasformer: A Transformer-based Architecture for Segmenting Methane Emissions from Livestock in Optical Gas Imaging	Toqi Tahamid Sarker et.al.	2404.10841	link
2024-04-16	Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark	Jiangning Zhang et.al.	2404.10760	null
2024-04-16	ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation	Iaroslav Melekhov et.al.	2404.10699	null
2024-04-16	Contextrast: Contextual Contrastive Learning for Semantic Segmentation	Changki Sung et.al.	2404.10633	null
2024-04-16	Label merge-and-split: A graph-colouring approach for memory-efficient brain parcellation	Aaron Kujawa et.al.	2404.10572	null
2024-04-16	LAECIPS: Large Vision Model Assisted Adaptive Edge-Cloud Collaboration for IoT-based Perception System	Shijing Hu et.al.	2404.10498	null
2024-04-16	Adversarial Identity Injection for Semantic Face Image Synthesis	Giuseppe Tarollo et.al.	2404.10408	null
2024-04-16	Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation	Jiapeng Su et.al.	2404.10322	null
2024-04-16	Learnable Prompt for Few-Shot Semantic Segmentation in Remote Sensing Domain	Steve Andreas Immanuel et.al.	2404.10307	link
2024-04-15	NOISe: Nuclei-Aware Osteoclast Instance Segmentation for Mouse-to-Human Domain Transfer	Sai Kumar Reddy Manne et.al.	2404.10130	link
2024-04-15	Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL	Fangwei Zhong et.al.	2404.09857	null
2024-04-15	In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation	Han Xue et.al.	2404.09633	null
2024-04-15	The revenge of BiSeNet: Efficient Multi-Task Image Segmentation	Gabriele Rosi et.al.	2404.09570	null
2024-04-15	kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies	Zhongrui Gui et.al.	2404.09447	null
2024-04-15	Human-in-the-Loop Segmentation of Multi-species Coral Imagery	Scarlett Raine et.al.	2404.09406	null
2024-04-14	Bridging Data Islands: Geographic Heterogeneity-Aware Federated Learning for Collaborative Remote Sensing Semantic Segmentation	Jieyi Tan et.al.	2404.09292	null
2024-04-12	Structured Model Pruning for Efficient Inference in Computational Pathology	Mohammed Adnan et.al.	2404.08831	null
2024-04-12	COCONut: Modernizing COCO Segmentation	Xueqing Deng et.al.	2404.08639	null
2024-04-12	Benchmarking the Cell Image Segmentation Models Robustness under the Microscope Optical Aberrations	Boyuan Peng et.al.	2404.08549	null
2024-04-12	Analyzing Decades-Long Environmental Changes in Namibia Using Archival Aerial Photography and Deep Learning	Girmaw Abebe Tadesse et.al.	2404.08544	null
2024-04-12	LaSagnA: Language-based Segmentation Assistant for Complex Queries	Cong Wei et.al.	2404.08506	link
2024-04-12	Adapting the Segment Anything Model During Usage in Novel Situations	Robin Schön et.al.	2404.08421	null
2024-04-12	Let It Flow: Simultaneous Optimization of 3D Flow and Object Clustering	Patrik Vacek et.al.	2404.08363	null
2024-04-12	AdaContour: Adaptive Contour Descriptor with Hierarchical Representation	Tianyu Ding et.al.	2404.08292	null
2024-04-12	Tackling Ambiguity from Perspective of Uncertainty Inference and Affinity Diversification for Weakly Supervised Semantic Segmentation	Zhiwei Yang et.al.	2404.08195	link
2024-04-12	Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation	Sina Hajimiri et.al.	2404.08181	link
2024-04-11	Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification	Ricardo Pereira et.al.	2404.07739	null
2024-04-11	OpenTrench3D: A Photogrammetric 3D Point Cloud Dataset for Semantic Segmentation of Underground Utilities	Lasse H. Hansen et.al.	2404.07711	link
2024-04-11	ViM-UNet: Vision Mamba for Biomedical Segmentation	Anwai Archit et.al.	2404.07705	link
2024-04-11	Implicit and Explicit Language Guidance for Diffusion-based Visual Perception	Hefeng Wang et.al.	2404.07600	null
2024-04-11	Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling	Sourajit Saha et.al.	2404.07410	null
2024-04-10	AI-Guided Defect Detection Techniques to Model Single Crystal Diamond Growth	Rohan Reddy Mekala et.al.	2404.07306	null
2024-04-10	RESSCAL3D: Resolution Scalable 3D Semantic Segmentation of Point Clouds	Remco Royen et.al.	2404.06863	null
2024-04-10	O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation	Muer Tie et.al.	2404.06836	null
2024-04-10	Convolution-based Probability Gradient Loss for Semantic Segmentation	Guohang Shan et.al.	2404.06704	null
2024-04-09	Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation	Luca Barsellotti et.al.	2404.06542	null
2024-04-09	QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding	Yash Mehan et.al.	2404.06442	null
2024-04-09	DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning	Senthil Yogamani et.al.	2404.06352	null
2024-04-09	Automated National Urban Map Extraction	Hasan Nasrallah et.al.	2404.06202	null
2024-04-09	Hierarchical Insights: Exploiting Structural Similarities for Reliable 3D Semantic Segmentation	Mariella Dreissig et.al.	2404.06124	null
2024-04-09	Improving Facial Landmark Detection Accuracy and Efficiency with Knowledge Distillation	Zong-Wei Hong et.al.	2404.06029	null
2024-04-08	Evaluating the Efficacy of Cut-and-Paste Data Augmentation in Semantic Segmentation for Satellite Imagery	Ionut M. Motoi et.al.	2404.05693	null
2024-04-08	AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation	Jiannan Ge et.al.	2404.05667	null
2024-04-08	Impact of LiDAR visualisations on semantic segmentation of archaeological objects	Raveerat Jaturapitpornchai et.al.	2404.05512	null
2024-04-08	Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance	Dazhong Shen et.al.	2404.05384	link
2024-04-08	GPS-free Autonomous Navigation in Cluttered Tree Rows with Deep Semantic Segmentation	Alessandro Navone et.al.	2404.05338	null
2024-04-08	Human Detection from 4D Radar Data in Low-Visibility Field Conditions	Mikael Skog et.al.	2404.05307	null
2024-04-08	iVPT: Improving Task-relevant Information Sharing in Visual Prompt Tuning by Cross-layer Dynamic Connection	Nan Zhou et.al.	2404.05207	null
2024-04-08	UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather	Haimei Zhao et.al.	2404.05145	null
2024-04-07	D2SL: Decouple Defogging and Semantic Learning for Foggy Domain-Adaptive Segmentation	Xuan Sun et.al.	2404.04807	null
2024-04-06	HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene	Ziang Guo et.al.	2404.04653	link
2024-04-05	Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation	Zifu Wan et.al.	2404.04256	null
2024-04-05	Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation	Ji-Jia Wu et.al.	2404.04231	null
2024-04-05	MarsSeg: Mars Surface Semantic Segmentation with Multi-level Extractor and Connector	Junbo Li et.al.	2404.04155	null
2024-04-04	Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation	Elham Amin Mansour et.al.	2404.03799	null
2024-04-04	Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball	Simon Weber et.al.	2404.03778	null
2024-04-04	OW-VISCap: Open-World Video Instance Segmentation and Captioning	Anwesa Choudhuri et.al.	2404.03657	null
2024-04-04	Background Noise Reduction of Attention Map for Weakly Supervised Semantic Segmentation	Izumi Fujimori et.al.	2404.03394	null
2024-04-04	iSeg: Interactive 3D Segmentation via Interactive Attention	Itai Lang et.al.	2404.03219	null
2024-04-04	CORP: A Multi-Modal Dataset for Campus-Oriented Roadside Perception Tasks	Beibei Wang et.al.	2404.03191	null
2024-04-03	GPU-Accelerated RSF Level Set Evolution for Large-Scale Microvascular Segmentation	Meher Niger et.al.	2404.02813	null
2024-04-03	RS-Mamba for Large Remote Sensing Image Dense Prediction	Sijie Zhao et.al.	2404.02668	link
2024-04-03	A Satellite Band Selection Framework for Amazon Forest Deforestation Detection Task	Eduardo Neto et.al.	2404.02659	null
2024-04-03	SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation	Junyan Ye et.al.	2404.02638	link
2024-04-03	Active learning for efficient annotation in precision agriculture: a use-case on crop-weed semantic segmentation	Bart M. van Marrewijk et.al.	2404.02580	null
2024-04-03	HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras	Zhongyu Xia et.al.	2404.02517	link
2024-04-03	Optimizing traffic signs and lights visibility for the teleoperation of autonomous vehicles through ROI compression	I. Dror et.al.	2404.02481	null
2024-04-03	RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation	Xianping Ma et.al.	2404.02457	link
2024-04-02	Constrained Robotic Navigation on Preferred Terrains Using LLMs and Speech Instruction: Exploiting the Power of Adverbs	Faraz Lotfi et.al.	2404.02294	null
2024-04-02	Segment Any 3D Object with Language	Seungjun Lee et.al.	2404.02157	null
2024-04-02	Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation	Hui Xiao et.al.	2404.02065	null
2024-04-01	What is Point Supervision Worth in Video Instance Segmentation?	Shuaiyi Huang et.al.	2404.01990	null
2024-04-02	Synthetic Data for Robust Stroke Segmentation	Liam Chalcroft et.al.	2404.01946	link
2024-04-02	Improving Bird's Eye View Semantic Segmentation by Task Decomposition	Tianhao Zhao et.al.	2404.01925	null
2024-04-02	Rethinking Annotator Simulation: Realistic Evaluation of Whole-Body PET Lesion Interactive Segmentation Methods	Zdravko Marinov et.al.	2404.01816	null
2024-04-02	Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model	Qinfeng Zhu et.al.	2404.01705	null
2024-04-02	Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss	Jaeha Kim et.al.	2404.01692	null
2024-04-02	JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments	Duy-Tho Le et.al.	2404.01686	null
2024-04-01	SUGAR: Pre-training 3D Visual Representations for Robotics	Shizhe Chen et.al.	2404.01491	null
2024-03-29	ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning	Beomyoung Kim et.al.	2403.20126	link
2024-03-29	Modeling Weather Uncertainty for Multi-weather Co-Presence Estimation	Qi Bi et.al.	2403.20092	null
2024-03-29	Using Images as Covariates: Measuring Curb Appeal with Deep Learning	Ardyn Nordstrom et.al.	2403.19915	null
2024-03-29	MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection	Ali Behrouz et.al.	2403.19888	null
2024-03-28	Segmentation Re-thinking Uncertainty Estimation Metrics for Semantic Segmentation	Qitian Ma et.al.	2403.19826	null
2024-04-01	Efficient 3D Instance Mapping and Localization with Neural Fields	George Tang et.al.	2403.19797	null
2024-03-28	ENet-21: An Optimized light CNN Structure for Lane Detection	Seyed Rasoul Hosseini et.al.	2403.19782	null
2024-03-29	Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers	Pingcheng Dong et.al.	2403.19591	link
2024-03-28	DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs	Donghyun Kim et.al.	2403.19588	link
2024-03-28	Learning Multiple Representations with Inconsistency-Guided Detail Regularization for Mask-Guided Matting	Weihao Jiang et.al.	2403.19213	null
2024-03-27	Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D	Mukund Varma T et.al.	2403.18922	null
2024-03-27	Annolid: Annotate, Segment, and Track Anything You Need	Chen Yang et.al.	2403.18690	null
2024-03-27	I2CKD : Intra- and Inter-Class Knowledge Distillation for Semantic Segmentation	Ayoub Karine et.al.	2403.18490	null
2024-03-28	ViTAR: Vision Transformer with Any Resolution	Qihang Fan et.al.	2403.18361	null
2024-03-27	Generating Diverse Agricultural Data for Vision-Based Farming Applications	Mikolaj Cieslak et.al.	2403.18351	null
2024-03-27	Road Obstacle Detection based on Unknown Objectness Scores	Chihiro Noguchi et.al.	2403.18207	null
2024-03-26	Spectral Convolutional Transformer: Harmonizing Real vs. Complex Multi-View Spectral Operators for Vision Transformer	Badri N. Patro et.al.	2403.18063	link
2024-03-26	The Need for Speed: Pruning Transformers with One Recipe	Samir Khaki et.al.	2403.17921	link
2024-03-26	Compressed Multi-task embeddings for Data-Efficient Downstream training and inference in Earth Observation	Carlos Gomes et.al.	2403.17886	null
2024-03-26	PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition	Chenhongyi Yang et.al.	2403.17695	link
2024-03-26	Integrating Mamba Sequence Model and Hierarchical Upsampling Network for Accurate Semantic Segmentation of Multiple Sclerosis Legion	Kazi Shahriar Sanjid et.al.	2403.17432	null
2024-03-25	Optimizing LiDAR Placements for Robust Driving Perception in Adverse Conditions	Ye Li et.al.	2403.17009	link
2024-03-25	DreamLIP: Language-Image Pre-training with Long Captions	Kecheng Zheng et.al.	2403.17007	null
2024-03-25	TwinLiteNetPlus: A Stronger Model for Real-time Drivable Area and Lane Segmentation	Quang-Huy Che et.al.	2403.16958	null
2024-03-25	HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation	Linglin Jing et.al.	2403.16788	null
2024-03-25	Clustering Propagation for Universal Medical Image Segmentation	Yuhang Ding et.al.	2403.16646	null
2024-03-25	SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation	Aysim Toker et.al.	2403.16605	null
2024-03-25	Self-Supervised Learning for Medical Image Data with Anatomy-Oriented Imaging Planes	Tianwei Zhang et.al.	2403.16499	null
2024-03-25	GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation	Weiming Zhang et.al.	2403.16370	null
2024-03-24	AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans	Cedric Perauer et.al.	2403.16318	null
2024-03-24	Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for Intelligent Transportation System	Jing Li et.al.	2403.16227	null
2024-03-24	Segment Anything Model for Road Network Graph Extraction	Congrui Hetang et.al.	2403.16051	link
2024-03-24	SM2C: Boost the Semi-supervised Segmentation for Medical Image by using Meta Pseudo Labels and Mixed Images	Yifei Wang et.al.	2403.16009	null
2024-03-22	Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting	Jun Guo et.al.	2403.15624	null
2024-03-22	A2DMN: Anatomy-Aware Dilated Multiscale Network for Breast Ultrasound Semantic Segmentation	Kyle Lucke et.al.	2403.15560	null
2024-03-22	InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding	Yi Wang et.al.	2403.15377	null
2024-03-22	Anytime, Anywhere, Anyone: Investigating the Feasibility of Segment Anything Model for Crowd-Sourcing Medical Image Annotations	Pranav Kulkarni et.al.	2403.15218	null
2024-03-22	Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion	Sofia Casarin et.al.	2403.15194	null
2024-03-22	IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence	Shreyas Chandgothia et.al.	2403.15089	null
2024-03-22	Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model using 3D Whole-body CT Scans	Heng Guo et.al.	2403.15063	null
2024-03-22	BSNet: Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation	Jiahao Lu et.al.	2403.15019	null
2024-03-22	Improve Cross-domain Mixed Sampling with Guidance Training for Adaptive Segmentation	Wenlve Zhou et.al.	2403.14995	null
2024-03-21	WeatherProof: Leveraging Language Guidance for Semantic Segmentation in Adverse Weather	Blake Gella et.al.	2403.14874	null
2024-03-21	PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model	Zheng Zhang et.al.	2403.14598	link
2024-03-21	Learning to Project for Cross-Task Knowledge Distillation	Dylan Auty et.al.	2403.14494	null
2024-03-21	OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation	Bohao Peng et.al.	2403.14418	link
2024-03-21	Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models	Pablo Marcos-Manchón et.al.	2403.14291	link
2024-03-21	OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation	Kwanyoung Kim et.al.	2403.14183	null
2024-03-21	Evidential Semantic Mapping in Off-road Environments with Uncertainty-aware Bayesian Kernel Inference	Junyoung Kim et.al.	2403.14138	null
2024-03-21	Soft Masked Transformer for Point Cloud Processing with Skip Attention-Based Upsampling	Yong He et.al.	2403.14124	null
2024-03-21	Semantics from Space: Satellite-Guided Thermal Semantic Segmentation Annotation for Aerial Field Robots	Connor Lee et.al.	2403.14056	null
2024-03-20	When Cars meet Drones: Hyperbolic Federated Learning for Source-Free Domain Adaptation in Adverse Weather	Giulia Rizzoli et.al.	2403.13762	null
2024-03-20	Next day fire prediction via semantic segmentation	Konstantinos Alexis et.al.	2403.13545	null
2024-03-20	MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining	Di Wang et.al.	2403.13430	link
2024-03-20	AMCO: Adaptive Multimodal Coupling of Vision and Proprioception for Quadruped Robot Navigation in Outdoor Environments	Mohamed Elnoor et.al.	2403.13235	null
2024-03-20	Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation	Linshan Wu et.al.	2403.13225	null
2024-03-19	Reflectivity Is All You Need!: Advancing LiDAR Semantic Segmentation	Kasi Viswanath et.al.	2403.13188	null
2024-03-19	As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks?	Anjun Hu et.al.	2403.12693	null
2024-03-19	PCT: Perspective Cue Training Framework for Multi-Camera BEV Segmentation	Haruya Ishikawa et.al.	2403.12530	null
2024-03-19	Semantics, Distortion, and Style Matter: Towards Source-free UDA for Panoramic Segmentation	Xu Zheng et.al.	2403.12505	null
2024-03-19	CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation	Wenqi Zhu et.al.	2403.12455	link
2024-03-19	Multi-Object RANSAC: Efficient Plane Clustering Method in a Clutter	Seunghyeon Lim et.al.	2403.12449	null
2024-03-18	EffiPerception: an Efficient Framework for Various Perception Tasks	Xinhao Xiang et.al.	2403.12317	null
2024-03-18	Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery	Yuqi Zhang et.al.	2403.11812	null
2024-03-18	Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation	Wangbo Zhao et.al.	2403.11808	null
2024-03-18	LSKNet: A Foundation Lightweight Backbone for Remote Sensing	Yuxuan Li et.al.	2403.11735	null
2024-03-18	TTT-KD: Test-Time Training for 3D Semantic Segmentation through Knowledge Distillation from Foundation Models	Lisa Weijler et.al.	2403.11691	null
2024-03-18	Better (pseudo-)labels for semi-supervised instance segmentation	François Porcher et.al.	2403.11675	null
2024-03-18	Synthesizing multi-log grasp poses	Arvid Fälldin et.al.	2403.11623	null
2024-03-18	OurDB: Ouroboric Domain Bridging for Multi-Target Domain Adaptive Semantic Segmentation	Seungbeom Woo et.al.	2403.11582	null
2024-03-18	MISS: Memory-efficient Instance Segmentation Framework By Visual Inductive Priors Flow Propagation	Chih-Chung Hsu et.al.	2403.11576	null
2024-03-18	Augment Before Copy-Paste: Data and Memory Efficiency-Oriented Instance Segmentation Framework for Sport-scenes	Chih-Chung Hsu et.al.	2403.11572	null
2024-03-18	Circle Representation for Medical Instance Object Segmentation	Juming Xiong et.al.	2403.11507	link
2024-03-18	MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception	Thien-Minh Nguyen et.al.	2403.11496	null
2024-03-18	Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting	Mingkui Tan et.al.	2403.11491	null
2024-03-18	ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation	Minh Tran et.al.	2403.11376	null
2024-03-14	PosSAM: Panoptic Open-vocabulary Segment Anything	Vibashan VS et.al.	2403.09620	null
2024-03-14	WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuity	Qiyuan Wang et.al.	2403.09551	null
2024-03-14	Annotation Free Semantic Segmentation with Vision Foundation Models	Soroush Seifi et.al.	2403.09307	null
2024-03-14	StainFuser: Controlling Diffusion for Faster Neural Style Transfer in Multi-Gigapixel Histology Images	Robert Jewsbury et.al.	2403.09302	link
2024-03-14	Customizing Segmentation Foundation Model via Prompt Learning for Instance Segmentation	Hyung-Il Kim et.al.	2403.09199	null
2024-03-14	When Semantic Segmentation Meets Frequency Aliasing	Linwei Chen et.al.	2403.09065	link
2024-03-13	CART: Caltech Aerial RGB-Thermal Dataset in the Wild	Connor Lee et.al.	2403.08997	link
2024-03-13	SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net	Helin Cao et.al.	2403.08885	null
2024-03-13	Segmentation of Knee Bones for Osteoarthritis Assessment: A Comparative Analysis of Supervised, Few-Shot, and Zero-Shot Learning Approaches	Yun Xin Teoh et.al.	2403.08761	null
2024-03-13	Real-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution	Samuel Sze et.al.	2403.08748	null
2024-03-13	Semantic Segmentation of Solar Radio Spikes at Low Frequencies	Pearse C. Murphy et.al.	2403.08546	null
2024-03-13	Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation	Zicheng Zhang et.al.	2403.08426	null
2024-03-13	LIX: Implicitly Infusing Spatial Geometric Prior Knowledge into Visual Semantic Segmentation for Autonomous Driving	Sicen Guo et.al.	2403.08215	null
2024-03-13	Multiscale Low-Frequency Memory Network for Improved Feature Extraction in Convolutional Neural Networks	Fuzhi Wu et.al.	2403.08157	link
2024-03-12	Mitigating the Impact of Attribute Editing on Face Recognition	Sudipta Banerjee et.al.	2403.08092	null
2024-03-12	Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation	Feilong Tang et.al.	2403.07630	link
2024-03-12	PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution	Honghao Chen et.al.	2403.07589	null
2024-03-12	Open-World Semantic Segmentation Including Class Similarity	Matteo Sodano et.al.	2403.07532	null
2024-03-11	Average Calibration Error: A Differentiable Loss for Improved Reliability in Image Segmentation	Theodore Barfoot et.al.	2403.06759	link
2024-03-11	Forest Inspection Dataset for Aerial Semantic Segmentation and Depth Estimation	Bianca-Cerasela-Zelia Blaga et.al.	2403.06621	link
2024-03-11	OMH: Structured Sparsity via Optimally Matched Hierarchy for Unsupervised Semantic Segmentation	Baran Ozaydin et.al.	2403.06546	null
2024-03-11	3D Semantic Segmentation-Driven Representations for 3D Object Detection	Hayeon O et.al.	2403.06501	link
2024-03-11	Point Mamba: A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy	Jiuming Liu et.al.	2403.06467	link
2024-03-11	Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation	Xiaoyang Wang et.al.	2403.06462	null
2024-03-11	Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation	Peng Zhang et.al.	2403.06401	null
2024-03-10	Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning	Woo-Jin Ahn et.al.	2403.06122	link
2024-03-09	Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation	Hairong Shi et.al.	2403.05912	null
2024-03-09	Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration	Jingyun Xue et.al.	2403.05906	null
2024-03-08	Attention-guided Feature Distillation for Semantic Segmentation	Amir M. Mansourian et.al.	2403.05451	link
2024-03-08	Generalized Correspondence Matching via Flexible Hierarchical Refinement and Patch Descriptor Distillation	Yu Han et.al.	2403.05388	null
2024-03-08	Frequency-Adaptive Dilated Convolution for Semantic Segmentation	Linwei Chen et.al.	2403.05369	link
2024-03-08	Embedded Deployment of Semantic Segmentation in Medicine through Low-Resolution Inputs	Erik Ostrowski et.al.	2403.05340	null
2024-03-08	LVIC: Multi-modality segmentation by Lifting Visual Info as Cue	Zichao Dong et.al.	2403.05159	null
2024-03-07	SAM-PD: How Far Can SAM Take Us in Tracking and Segmenting Anything in Videos by Prompt Denoising	Tao Zhou et.al.	2403.04194	link
2024-03-06	ECAP: Extensive Cut-and-Paste Augmentation for Unsupervised Domain Adaptive Semantic Segmentation	Erik Brorsson et.al.	2403.03854	link
2024-03-06	Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision	Yajie Liu et.al.	2403.03707	null
2024-03-06	Causal Prototype-inspired Contrast Adaptation for Unsupervised Domain Adaptive Semantic Segmentation of High-resolution Remote Sensing Imagery	Jingru Zhu et.al.	2403.03704	null
2024-03-06	GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding	Zi-Ting Chou et.al.	2403.03608	null
2024-03-06	Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator	Wonhyeok Choi et.al.	2403.03468	null
2024-03-05	CenterDisks: Real-time instance segmentation with disk covering	Katia Jodogne-Del Litto et.al.	2403.03296	link
2024-03-05	Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection	Mohamed Afifi et.al.	2403.03111	null
2024-03-05	ActiveAD: Planning-Oriented Active Learning for End-to-End Autonomous Driving	Han Lu et.al.	2403.02877	null
2024-03-05	DDF: A Novel Dual-Domain Image Fusion Strategy for Remote Sensing Image Semantic Segmentation with Unsupervised Domain Adaptation	Lingyan Ran et.al.	2403.02784	null
2024-03-05	Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels	Zhuohong Li et.al.	2403.02746	null
2024-03-05	FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View	Jiawei Hou et.al.	2403.02710	null
2024-03-05	Deep Common Feature Mining for Efficient Video Semantic Segmentation	Yaoyan Zheng et.al.	2403.02689	null
2024-03-04	Self-Supervised Facial Representation Learning with Facial Region Awareness	Zheng Gao et.al.	2403.02138	null
2024-03-04	Semi-Supervised Semantic Segmentation Based on Pseudo-Labels: A Survey	Lingyan Ran et.al.	2403.01909	null
2024-03-04	Map-aided annotation for pole base detection	Benjamin Missaoui et.al.	2403.01868	null
2024-03-04	AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation	Haonan Wang et.al.	2403.01818	link
2024-03-02	Benchmarking Segmentation Models with Mask-Preserved Attribute Editing	Zijin Yin et.al.	2403.01231	link
2024-03-02	Boosting Box-supervised Instance Segmentation with Pseudo Depth	Xinyi Yu et.al.	2403.01214	null
2024-03-02	Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation	Lian Xu et.al.	2403.01156	null
2024-03-01	Rethinking Few-shot 3D Point Cloud Semantic Segmentation	Zhaochong An et.al.	2403.00592	link
2024-03-01	Small, Versatile and Mighty: A Range-View Perception Framework	Qiang Meng et.al.	2403.00325	null
2024-03-01	YOLO-MED : Multi-Task Interaction Network for Biomedical Images	Suizhi Huang et.al.	2403.00245	null
2024-02-29	FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything	Safouane El Ghazouali et.al.	2403.00175	link
2024-02-29	Leveraging AI Predicted and Expert Revised Annotations in Interactive Segmentation: Continual Tuning or Full Training?	Tiezheng Zhang et.al.	2402.19423	null
2024-03-01	PEM: Prototype-based Efficient MaskFormer for Image Segmentation	Niccolò Cavagnero et.al.	2402.19422	link
2024-02-29	RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation	Jie Zhang et.al.	2402.19004	null
2024-02-28	Spatial Coherence Loss for Salient and Camouflaged Object Detection and Beyond	Ziyun Yang et.al.	2402.18698	null
2024-02-29	Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation	Zhiwei Yang et.al.	2402.18467	link
2024-02-29	A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation	Francesco Barbato et.al.	2402.18402	null
2024-02-28	Enhancing Roadway Safety: LiDAR-based Tree Clearance Analysis	Miriam Louise Carnot et.al.	2402.18309	null
2024-02-28	Feature Denoising For Low-Light Instance Segmentation Using Weighted Non-Local Blocks	Joanne Lin et.al.	2402.18307	null
2024-02-28	Self-Supervised Learning in Electron Microscopy: Towards a Foundation Model for Advanced Image Analysis	Bashir Kazimi et.al.	2402.18286	null
2024-02-28	PRCL: Probabilistic Representation Contrastive Learning for Semi-Supervised Semantic Segmentation	Haoyu Xie et.al.	2402.18117	null
2024-02-28	Spannotation: Enhancing Semantic Segmentation for Autonomous Navigation with Efficient Image Annotation	Samuel O. Folorunsho et.al.	2402.18084	link
2024-02-27	Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation	Xinyu Yang et.al.	2402.17891	link
2024-02-27	Mitigating Distributional Shift in Semantic Segmentation via Uncertainty Estimation from Unlabelled Data	David S. W. Williams et.al.	2402.17653	null
2024-02-27	Masked Gamma-SSL: Learning Uncertainty Estimation via Masked Image Modeling	David S. W. Williams et.al.	2402.17622	null

(back to top)

Object Tracking

Publish Date	Title	Authors	PDF	Code
2024-05-30	WebUOT-1M: Advancing Deep Underwater Object Tracking with A Million-Scale Benchmark	Chunhui Zhang et.al.	2405.19818	null
2024-05-30	FaceLift: Semi-supervised 3D Facial Landmark Localization	David Ferman et.al.	2405.19646	null
2024-05-29	DGD: Dynamic 3D Gaussians Distillation	Isaac Labe et.al.	2405.19321	null
2024-05-28	Track Initialization and Re-Identification for~3D Multi-View Multi-Object Tracking	Linh Van Ma et.al.	2405.18606	link
2024-05-28	Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion	Hongze Sun et.al.	2405.17903	null
2024-05-28	Towards a Generalist and Blind RGB-X Tracker	Yuedong Tan et.al.	2405.17773	null
2024-05-29	BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos	Isla Duporge et.al.	2405.17698	null
2024-05-27	Tracking Small Birds by Detection Candidate Region Filtering and Detection History-aware Association	Tingwei Liu et.al.	2405.17323	null
2024-05-24	ETTrack: Enhanced Temporal Motion Predictor for Multi-Object Tracking	Xudong Han et.al.	2405.15755	null
2024-05-24	Trackastra: Transformer-based cell tracking for live-cell microscopy	Benjamin Gallusser et.al.	2405.15700	link
2024-05-24	An Approximate Dynamic Programming Framework for Occlusion-Robust Multi-Object Tracking	Pratyusha Musunuru et.al.	2405.15137	null
2024-05-23	Awesome Multi-modal Object Tracking	Chunhui Zhang et.al.	2405.14200	null
2024-05-23	Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning	Zhenyu Wei et.al.	2405.14195	null
2024-05-23	PuTR: A Pure Transformer for Decoupled and Online Multi-Object Tracking	Chongwei Liu et.al.	2405.14119	null
2024-05-22	Multi Player Tracking in Ice Hockey with Homographic Projections	Harish Prakash et.al.	2405.13397	null
2024-05-20	DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM	Xuchen Li et.al.	2405.12139	null
2024-05-19	Track Anything Rapter(TAR)	Tharun V. Puthanveettil et.al.	2405.11655	link
2024-05-19	RobMOT: Robust 3D Multi-Object Tracking by Observational Noise and State Estimation Drift Mitigation on LiDAR PointCloud	Mohamed Nagy et.al.	2405.11536	null
2024-05-18	City-Scale Multi-Camera Vehicle Tracking System with Improved Self-Supervised Camera Link Model	Yuqiang Lin et.al.	2405.11345	null
2024-05-17	Air Signing and Privacy-Preserving Signature Verification for Digital Documents	P. Sarveswarasarma et.al.	2405.10868	null
2024-05-16	A Novel Bounding Box Regression Method for Single Object Tracking	Omar Abdelaziz et.al.	2405.10444	null
2024-05-16	Beyond Traditional Single Object Tracking: A Survey	Omar Abdelaziz et.al.	2405.10439	null
2024-05-16	Spatial Cognition: a Wave Hypothesis	Robert Worden et.al.	2405.10112	null
2024-05-14	Learning Correspondence for Deformable Objects	Priya Sundaresan et.al.	2405.08996	null
2024-05-14	ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association	Shuxiao Ding et.al.	2405.08909	link
2024-05-12	MAML MOT: Multiple Object Tracking based on Meta-Learning	Jiayi Chen et.al.	2405.07272	null
2024-05-16	Common Corruptions for Enhancing and Evaluating Robustness in Air-to-Air Visual Object Detection	Anastasios Arsenos et.al.	2405.06765	null
2024-05-16	Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance Estimation	Vasileios Karampinis et.al.	2405.06749	null
2024-05-10	Multi-Object Tracking in the Dark	Xinzhe Wang et.al.	2405.06600	link
2024-05-09	Outlier-robust Kalman Filtering through Generalised Bayes	Gerardo Duran-Martin et.al.	2405.05646	link
2024-05-08	MOTLEE: Collaborative Multi-Object Tracking Using Temporal Consistency for Neighboring Robot Frame Alignment	Mason B. Peterson et.al.	2405.05210	link
2024-05-08	TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking	Pengcheng Shao et.al.	2405.05004	link
2024-05-07	DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving	Chen Min et.al.	2405.04390	null
2024-05-07	Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map	Yuxuan Xia et.al.	2405.04290	null
2024-05-06	Collecting Consistently High Quality Object Tracks with Minimal Human Involvement by Using Self-Supervised Learning to Detect Tracker Errors	Samreen Anjum et.al.	2405.03643	null
2024-05-03	Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning	Dhruva Tirumala et.al.	2405.02425	null
2024-05-03	DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos	Wen-Hsuan Chu et.al.	2405.02280	link
2024-05-02	Tracking and classifying objects with DAS data along railway	Simon L. B. Fredriksen et.al.	2405.01140	null
2024-04-29	Innovative Integration of Visual Foundation Model with a Robotic Arm on a Mobile Platform	Shimian Zhang et.al.	2404.18720	null
2024-04-27	3D Extended Object Tracking by Fusing Roadside Sparse Radar Point Clouds and Pixel Keypoints	Jiayin Deng et.al.	2404.17903	link
2024-04-22	360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos	Yinzhe Xu et.al.	2404.13953	null
2024-04-22	TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos	Atom Scott et.al.	2404.13868	null
2024-04-19	A comparison between single-stage and two-stage 3D tracking algorithms for greenhouse robotics	David Rapado-Rincon et.al.	2404.12963	null
2024-04-18	Inverse Neural Rendering for Explainable Multi-Object Tracking	Julian Ost et.al.	2404.12359	null
2024-04-24	On Target Detection in the Presence of Clutter in Joint Communication and Sensing Cellular Networks	Julia Vinogradova et.al.	2404.12133	null
2024-04-18	MLS-Track: Multilevel Semantic Interaction in RMOT	Zeliang Ma et.al.	2404.12031	null
2024-04-18	KnotResolver: Tracking self-intersecting filaments in microscopy using directed graphs	Dhruv Khatri et.al.	2404.12029	link
2024-04-17	How to deal with glare for improved perception of Autonomous Vehicles	Muhammad Z. Alam et.al.	2404.10992	null
2024-04-12	Into the Fog: Evaluating Multiple Object Tracking Robustness	Nadezda Kirillova et.al.	2404.10534	link
2024-04-15	3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow	Felix Taubner et.al.	2404.09819	null
2024-04-12	IDD-X: A Multi-View Dataset for Ego-relative Important Object Localization and Explanation in Dense and Unstructured Traffic	Chirag Parikh et.al.	2404.08561	null
2024-04-11	Gaga: Group Any Gaussians via 3D-aware Memory Bank	Weijie Lyu et.al.	2404.07977	null
2024-04-11	SFSORT: Scene Features-based Simple Online Real-Time Tracker	M. M. Morsali et.al.	2404.07553	link
2024-04-11	PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds	Weisheng Xu et.al.	2404.07495	link
2024-04-11	Trashbusters: Deep Learning Approach for Litter Detection and Tracking	Kashish Jain et.al.	2404.07467	null
2024-04-09	LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks	Jianlang Chen et.al.	2404.06247	link
2024-04-08	DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker	Jiapeng Wu et.al.	2404.05518	link
2024-04-08	Self-Supervised Multi-Object Tracking with Path Consistency	Zijia Lu et.al.	2404.05136	link
2024-04-07	Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind	Chiara Plizzari et.al.	2404.05072	null
2024-04-03	Ego-Motion Aware Target Prediction Module for Robust Multi-Object Tracking	Navid Mahdian et.al.	2404.03110	link
2024-04-03	Representation Alignment Contrastive Regularization for Multi-Object Tracking	Shujie Chen et.al.	2404.02562	link
2024-03-29	Bayesian Nonparametrics: An Alternative to Deep Learning	Bahman Moraffah et.al.	2404.00085	null
2024-03-29	MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark	Sanghyun Woo et.al.	2403.20225	null
2024-03-29	SceneTracker: Long-term Scene Flow Estimation Network	Bo Wang et.al.	2403.19924	null
2024-03-27	Enhancing Multiple Object Tracking Accuracy via Quantum Annealing	Yasuyuki Ihara et.al.	2403.18908	null
2024-03-27	TAFormer: A Unified Target-Aware Transformer for Video and Motion Joint Prediction in Aerial Scenes	Liangyu Xu et.al.	2403.18238	null
2024-03-27	Middle Fusion and Multi-Stage, Multi-Form Prompts for Robust RGB-T Tracking	Qiming Wang et.al.	2403.18193	null
2024-03-26	OmniVid: A Generative Framework for Universal Video Understanding	Junke Wang et.al.	2403.17935	link
2024-03-26	Exploring Dynamic Transformer for Efficient Object Tracking	Jiawen Zhu et.al.	2403.17651	null
2024-03-25	Multiple Object Tracking as ID Prediction	Ruopeng Gao et.al.	2403.16848	link
2024-03-25	From Two Stream to One Stream: Efficient RGB-T Tracking via Mutual Prompt Learning and Knowledge Distillation	Yang Luo et.al.	2403.16834	null
2024-03-29	Elysium: Exploring Object-level Perception in Videos via MLLM	Han Wang et.al.	2403.16558	link
2024-03-25	Spike-NeRF: Neural Radiance Field Based On Spike Camera	Yijia Guo et.al.	2403.16410	null
2024-03-28	SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking	Xiaojun Hou et.al.	2403.16002	link
2024-03-23	Spatio-Temporal Bi-directional Cross-frame Memory for Distractor Filtering Point Cloud Single Object Tracking	Shaoyu Sun et.al.	2403.15831	null
2024-03-23	PNAS-MOT: Multi-Modal Object Tracking with Pareto Neural Architecture Search	Chensheng Peng et.al.	2403.15712	link
2024-03-22	CR3DT: Camera-RADAR Fusion for 3D Detection and Tracking	Nicolas Baumann et.al.	2403.15313	null
2024-03-22	Reasoning-Enhanced Object-Centric Learning for Videos	Jian Li et.al.	2403.15245	null
2024-03-20	Fast-Poly: A Fast Polyhedral Framework For 3D Multi-Object Tracking	Xiaoyu Li et.al.	2403.13443	link
2024-03-19	Lifting Multi-View Detection and Tracking to the Bird's Eye View	Torben Teepe et.al.	2403.12573	link
2024-03-18	Pedestrian Tracking with Monocular Camera using Unconstrained 3D Motion Model	Jan Krejčí et.al.	2403.11978	null
2024-03-17	NetTrack: Tracking Highly Dynamic Objects with a Net	Guangze Zheng et.al.	2403.11186	null
2024-03-16	View-Centric Multi-Object Tracking with Homographic Matching in Moving UAV	Deyi Ji et.al.	2403.10830	null
2024-03-16	Exploring Learning-based Motion Models in Multi-Object Tracking	Hsiang-Wei Huang et.al.	2403.10826	null
2024-03-15	NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices	Zhiyong Zhang et.al.	2403.10425	link
2024-03-14	OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning	Lingyi Hong et.al.	2403.09634	null
2024-03-13	Object Permanence Filter for Robust Tracking with Interactive Robots	Shaoting Peng et.al.	2403.08231	null
2024-03-12	Learning Data Association for Multi-Object Tracking using Only Coordinates	Mehdi Miah et.al.	2403.08018	null
2024-03-12	A Study on Centralised and Decentralised Swarm Robotics Architecture for Part Delivery System	Angelos Dimakos et.al.	2403.07635	null
2024-03-12	LiDAR Point Cloud-based Multiple Vehicle Tracking with Probabilistic Measurement-Region Association	Guanhua Ding et.al.	2403.06423	null
2024-03-09	SSF-Net: Spatial-Spectral Fusion Network with Spectral Angle Awareness for Hyperspectral Object Tracking	Hanzheng Wang et.al.	2403.05852	null
2024-03-09	Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline	Xiao Wang et.al.	2403.05839	link
2024-03-11	Beyond MOT: Semantic Multi-Object Tracking	Yunhao Li et.al.	2403.05021	null
2024-03-07	Delving into the Trajectory Long-tail Distribution for Muti-object Tracking	Sijia Chen et.al.	2403.04700	link
2024-03-07	Towards learning-based planning:The nuPlan benchmark for real-world autonomous driving	Napat Karnchanachari et.al.	2403.04133	null
2024-03-06	Multi-Object Tracking with Camera-LiDAR Fusion for Autonomous Driving	Riccardo Pieroni et.al.	2403.04112	null
2024-03-06	VastTrack: Vast Category Visual Object Tracking	Liang Peng et.al.	2403.03493	link
2024-03-05	DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking	Cheng Huang et.al.	2403.02767	null
2024-03-04	DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction	Weiyi Lv et.al.	2403.02075	null
2024-03-04	Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning	Tung Le et.al.	2403.01781	null
2024-03-01	Joint Spatial-Temporal Calibration for Camera and Global Pose Sensor	Junlin Song et.al.	2403.00976	null
2024-02-28	Estimation of railway vehicle response for track geometry evaluation using branch Fourier neural operator	Qingjing Wang et.al.	2402.18366	null
2024-02-28	EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving	Jiacheng Lin et.al.	2402.18302	link
2024-02-28	Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks	Zhewei Wu et.al.	2402.17976	null
2024-02-27	SWTrack: Multiple Hypothesis Sliding Window 3D Multi-Object Tracking	Sandro Papais et.al.	2402.17892	null
2024-02-27	In Defense and Revival of Bayesian Filtering for Thermal Infrared Object Tracking	Peng Gao et.al.	2402.17098	null
2024-02-26	Searching a Lightweight Network Architecture for Thermal Infrared Pedestrian Tracking	Peng Gao et.al.	2402.16570	null
2024-02-26	SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud Tracking	Yu Lin et.al.	2402.16249	null
2024-02-26	Real-Time Vehicle Detection and Urban Traffic Behavior Analysis Based on UAV Traffic Videos on Mobile Devices	Yuan Zhu et.al.	2402.16246	null
2024-02-24	Multi-Object Tracking by Hierarchical Visual Representations	Jinkun Cao et.al.	2402.15895	null
2024-02-24	Detection Is Tracking: Point Cloud Multi-Sweep Deep Learning Models Revisited	Lingji Chen et.al.	2402.15756	null

(back to top)

Action Recognition

Publish Date	Title	Authors	PDF	Code
2024-05-30	From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehave	Michael Fuchs et.al.	2405.20025	null
2024-05-30	Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition	Masashi Hatano et.al.	2405.19917	null
2024-05-30	EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos	Ryo Fujii et.al.	2405.19644	null
2024-05-30	SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation	Junjie Zhang et.al.	2405.19586	null
2024-05-29	Matrix Manifold Neural Networks++	Xuan Son Nguyen et.al.	2405.19206	null
2024-05-29	Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation	Sabrina Cynthia Triess et.al.	2405.19173	null
2024-05-28	Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition	Muhammad Adi Nugroho et.al.	2405.18012	null
2024-05-30	Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking Sequences	Vida Adeli et.al.	2405.17817	link
2024-05-28	Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions	Rui Zhang et.al.	2405.17729	null
2024-05-28	EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions?	Boshen Xu et.al.	2405.17719	link
2024-05-27	Advancements in Tactile Hand Gesture Recognition for Enhanced Human-Machine Interaction	Chiara Fumelli et.al.	2405.17038	null
2024-05-27	A Cross-Dataset Study for Text-based 3D Human Motion Retrieval	Léore Bensabath et.al.	2405.16909	null
2024-05-26	Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to Biological Motion Perception	Shuangpeng Han et.al.	2405.16493	null
2024-05-25	Application of Artificial Intelligence in Hand Gesture Recognition with Virtual Reality: Survey and Analysis of Hand Gesture Hardware Selection	Jindi Wang et.al.	2405.16264	null
2024-05-22	From CNNs to Transformers in Multimodal Human Action Recognition: A Survey	Muhammad Bilal Shaikh et.al.	2405.15813	null
2024-05-24	V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM	Abdur Rahman et.al.	2405.15341	null
2024-05-23	Enhanced Spatiotemporal Prediction Using Physical-guided And Frequency-enhanced Recurrent Neural Networks	Xuanle Zhao et.al.	2405.14504	null
2024-05-23	SpGesture: Source-Free Domain-adaptive sEMG-based Gesture Recognition with Jaccard Attentive Spiking Neural Network	Weiyu Guo et.al.	2405.14398	null
2024-05-23	MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models	Jiuming Liu et.al.	2405.14338	null
2024-05-22	Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks	Mohit Prabhushankar et.al.	2405.13758	null
2024-05-21	Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding	Rong Gao et.al.	2405.13206	null
2024-05-22	Building Temporal Kernels with Orthogonal Polynomials	Yan Ru Pei et.al.	2405.12179	link
2024-05-18	GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture Recognition	Mallika Garg et.al.	2405.11180	link
2024-05-17	Air Signing and Privacy-Preserving Signature Verification for Digital Documents	P. Sarveswarasarma et.al.	2405.10868	null
2024-05-17	MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains	Zhaohuan Zhan et.al.	2405.10620	null
2024-05-06	MEET: Mixture of Experts Extra Tree-Based sEMG Hand Gesture Identification	Naveen Gehlot et.al.	2405.09562	null
2024-05-14	Wearable Sensor-Based Few-Shot Continual Learning on Hand Gestures for Motor-Impaired Individuals via Latent Embedding Exploitation	Riyad Bin Rafiq et.al.	2405.08969	link
2024-05-14	The impact of Compositionality in Zero-shot Multi-label action recognition for Object-based tasks	Carmela Calabrese et.al.	2405.08695	null
2024-05-15	POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning	Chang Huang et.al.	2405.08036	null
2024-05-13	Coarse or Fine? Recognising Action End States without Labels	Davide Moltisanti et.al.	2405.07723	link
2024-05-11	PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition	Shenglin He et.al.	2405.06929	null
2024-05-10	CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras	James Tang et.al.	2405.06845	link
2024-05-09	A Survey on Backbones for Deep Video Action Recognition	Zixuan Tang et.al.	2405.05584	null
2024-05-06	OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs	Jiahao Nick Li et.al.	2405.03901	null
2024-05-05	JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videos	Pietro Nardelli et.al.	2405.02961	null
2024-05-03	On the Utility of External Agent Intention Predictor for Human-AI Coordination	Chenxu Wang et.al.	2405.02229	null
2024-05-11	MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition	Hongyu Qu et.al.	2405.02077	null
2024-05-03	Enhancing Micro Gesture Recognition for Emotion Understanding via Context-aware Visual-Text Contrastive Learning	Deng Li et.al.	2405.01885	link
2024-05-02	Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy	Hoang-Quan Nguyen et.al.	2405.01337	null
2024-05-07	Towards Inclusive Face Recognition Through Synthetic Ethnicity Alteration	Praveen Kumar Chandaliya et.al.	2405.01273	null
2024-04-30	One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features	Trung Thanh Nguyen et.al.	2404.19542	link
2024-04-30	Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition	Zhendong Liu et.al.	2404.19383	null
2024-04-28	Enhancing Action Recognition from Low-Quality Skeleton Data via Part-Level Knowledge Distillation	Cuiwei Liu et.al.	2404.18206	null
2024-04-26	SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes	Georgia Baltsou et.al.	2404.17255	null
2024-04-25	Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition	Yu Wang et.al.	2404.16416	null
2024-04-25	An Improved Graph Pooling Network for Skeleton-Based Action Recognition	Cong Wu et.al.	2404.16359	null
2024-04-24	Unimodal and Multimodal Sensor Fusion for Wearable Activity Recognition	Hymalai Bello et.al.	2404.16005	null
2024-04-24	3D Face Morphing Attack Generation using Non-Rigid Registration	Jag Mohan Singh et.al.	2404.15765	null
2024-04-25	HDBN: A Novel Hybrid Dual-branch Network for Robust Skeleton-based Action Recognition	Jinfu Liu et.al.	2404.15719	link
2024-04-23	Combating Missing Modalities in Egocentric Videos at Test Time	Merey Ramazanova et.al.	2404.15161	null
2024-04-23	G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition	Kaikai Deng et.al.	2404.14934	null
2024-04-23	Driver Activity Classification Using Generalizable Representations from Vision-Language Models	Ross Greer et.al.	2404.14906	null
2024-04-23	DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition	Haozhe Cheng et.al.	2404.14890	null
2024-04-22	1st Place Solution to the 1st SkatingVerse Challenge	Tao Sun et.al.	2404.14032	null
2024-04-22	CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment	Kanglei Zhou et.al.	2404.13999	link
2024-04-21	Attack on Scene Flow using Point Clouds	Haniyeh Ehsani Oskouie et.al.	2404.13621	null
2024-04-20	STAT: Towards Generalizable Temporal Action Localization	Yangcen Liu et.al.	2404.13311	null
2024-04-19	Ring-a-Pose: A Ring for Continuous Hand Pose Tracking	Tianhong Catherine Yu et.al.	2404.12980	null
2024-04-19	VoxAtnNet: A 3D Point Clouds Convolutional Neural Network for Generalizable Face Presentation Attack Detection	Raghavendra Ramachandra et.al.	2404.12680	null
2024-04-18	DeepLocalization: Using change point detection for Temporal Action Localization	Mohammed Shaiqur Rahman et.al.	2404.12258	null
2024-04-18	Aligning Actions and Walking to LLM-Generated Textual Descriptions	Radu Chivereanu et.al.	2404.12192	link
2024-04-18	Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition	Xunsong Li et.al.	2404.11903	null
2024-04-18	sEMG-based Fine-grained Gesture Recognition via Improved LightGBM Model	Xiupeng Qiao et.al.	2404.11861	null
2024-04-17	VG4D: Vision-Language Model Goes 4D Video Recognition	Zhichao Deng et.al.	2404.11605	link
2024-04-17	A Data-Driven Representation for Sign Language Production	Harry Walsh et.al.	2404.11499	link
2024-04-17	Lower Limb Movements Recognition Based on Feature Recursive Elimination and Backpropagation Neural Network	Yongkai Ma et.al.	2404.11383	null
2024-04-17	Revisiting Noise Resilience Strategies in Gesture Recognition: Short-Term Enhancement in Surface Electromyographic Signal Analysis	Weiyu Guo et.al.	2404.11213	null
2024-04-17	Kathakali Hand Gesture Recognition With Minimal Data	Kavitha Raju et.al.	2404.11205	null
2024-04-16	HumMUSS: Human Motion Understanding using State Space Models	Arnab Kumar Mondal et.al.	2404.10880	null
2024-04-17	Learning to Score Sign Language with Two-stage Method	Hongli Wen et.al.	2404.10383	null
2024-04-16	MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition	Naichuan Zheng et.al.	2404.10210	null
2024-04-15	Design and Analysis of Efficient Attention in Transformers for Social Group Activity Recognition	Masato Tamura et.al.	2404.09964	null
2024-04-15	A Diffusion-based Data Generator for Training Object Recognition Models in Ultra-Range Distance	Eran Bamani et.al.	2404.09846	null
2024-04-15	Leveraging Temporal Contextualization for Video Action Recognition	Minji Kim et.al.	2404.09490	null
2024-04-14	In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition	Wiktor Mucha et.al.	2404.09308	null
2024-04-13	Exploring Explainability in Video Action Recognition	Avinab Saha et.al.	2404.09067	null
2024-04-12	MSSTNet: A Multi-Scale Spatio-Temporal CNN-Transformer Network for Dynamic Facial Expression Recognition	Linhuang Wang et.al.	2404.08433	null
2024-04-11	Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls	Amin Hosseiny Marani et.al.	2404.08155	null
2024-04-11	Simba: Mamba augmented U-ShiftGCN for Skeletal Action Recognition in Videos	Soumyabrata Chaudhuri et.al.	2404.07645	null
2024-04-15	Fine-Grained Side Information Guided Dual-Prompts for Zero-Shot Skeleton Action Recognition	Yang Chen et.al.	2404.07487	null
2024-04-10	O-TALC: Steps Towards Combating Oversegmentation within Online Action Segmentation	Matthew Kent Myers et.al.	2404.06894	null
2024-04-10	An Animation-based Augmentation Approach for Action Recognition from Discontinuous Video	Xingyu Song et.al.	2404.06741	null
2024-04-07	X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model	Jan Held et.al.	2404.06332	null
2024-04-10	Algorithms for Caching and MTS with reduced number of predictions	Karim Abdel Sadek et.al.	2404.06280	null
2024-04-09	ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos	Sharana Dharshikgan Suresh Dass et.al.	2404.06243	link
2024-04-08	Localizing Moments of Actions in Untrimmed Videos of Infants with Autism Spectrum Disorder	Halil Ismail Helvaci et.al.	2404.05849	null
2024-04-09	TIM: A Time Interval Machine for Audio-Visual Action Recognition	Jacob Chalk et.al.	2404.05559	link
2024-04-11	Test-Time Zero-Shot Temporal Action Localization	Benedetta Liberatori et.al.	2404.05426	link
2024-04-09	SDFR: Synthetic Data for Face Recognition Competition	Hatef Otroshi Shahreza et.al.	2404.04580	null
2024-04-05	PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos	Yufei Zhang et.al.	2404.04430	null
2024-04-05	Koala: Key frame-conditioned long video-LLM	Reuben Tan et.al.	2404.04346	null
2024-04-04	UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization	Tiantian Geng et.al.	2404.03179	null
2024-04-03	Optimizing the Deployment of Tiny Transformers on Low-Power MCUs	Victor J. B. Jung et.al.	2404.02945	link
2024-04-03	Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition	Ikuo Nakamura et.al.	2404.02624	null
2024-04-02	PREGO: online mistake detection in PRocedural EGOcentric videos	Alessandro Flaborea et.al.	2404.01933	link
2024-04-02	Disentangled Pre-training for Human-Object Interaction Detection	Zhuolong Li et.al.	2404.01725	link
2024-04-02	Language Model Guided Interpretable Video Action Reasoning	Ning Wang et.al.	2404.01591	null
2024-04-02	Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and Action Recognition in Drone Imagery	Christian Limberg et.al.	2404.01571	null
2024-04-01	LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization	Akshita Gupta et.al.	2404.01282	null
2024-03-31	LLMs are Good Action Recognizers	Haoxuan Qu et.al.	2404.00532	null
2024-03-29	Latent Embedding Clustering for Occlusion Robust Head Pose Estimation	José Celestino et.al.	2403.20251	null
2024-03-29	A Unified Framework for Human-centric Point Cloud Video Understanding	Yiteng Xu et.al.	2403.20031	null
2024-03-28	Zero-shot Prompt-based Video Encoder for Surgical Gesture Recognition	Mingxing Rao et.al.	2403.19786	link
2024-03-28	Hypergraph-based Multi-View Action Recognition using Event Cameras	Yue Gao et.al.	2403.19316	null
2024-03-27	PLOT-TAL -- Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization	Edward Fish et.al.	2403.18915	null
2024-03-27	iFace: Hand-Over-Face Gesture Recognition Leveraging Impedance Sensing	Mengxi Liu et.al.	2403.18433	null
2024-03-27	An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture Recognition	Yizhang Xia et.al.	2403.18208	null
2024-03-26	OmniVid: A Generative Framework for Universal Video Understanding	Junke Wang et.al.	2403.17935	link
2024-03-25	Understanding Long Videos in One Multimodal Language Model Pass	Kanchana Ranasinghe et.al.	2403.16998	link
2024-03-25	Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects	Zicong Fan et.al.	2403.16428	null
2024-03-24	Emotion Recognition from the perspective of Activity Recognition	Savinay Nagendra et.al.	2403.16263	null
2024-03-22	InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding	Yi Wang et.al.	2403.15377	link
2024-03-22	Gesture-Controlled Aerial Robot Formation for Human-Swarm Interaction in Safety Monitoring Applications	Vít Krátký et.al.	2403.15333	null
2024-03-22	GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition	Lei Jiang et.al.	2403.15212	link
2024-03-21	Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets	Ahmet Alp Kindiroglu et.al.	2403.14534	link
2024-03-20	Hierarchical NeuroSymbolic Approach for Action Quality Assessment	Lauren Okamoto et.al.	2403.13798	null
2024-03-19	Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition	Filip Ilic et.al.	2403.12710	null
2024-03-19	ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More	Jiazhou Zhou et.al.	2403.12534	null
2024-03-19	VideoBadminton: A Video Dataset for Badminton Action Recognition	Qi Li et.al.	2403.12385	null
2024-03-19	Multi-View Video-Based Learning: Leveraging Weak Labels for Frame-Level Perception	Vijay John et.al.	2403.11616	null
2024-03-19	VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation	Weiyao Wang et.al.	2403.11461	null
2024-03-17	A Lie Group Approach to Riemannian Batch Normalization	Ziheng Chen et.al.	2403.11261	link
2024-03-17	Boosting Semi-Supervised Temporal Action Localization by Learning from Non-Target Classes	Kun Xia et.al.	2403.11189	null
2024-03-16	CoPlay: Audio-agnostic Cognitive Scaling for Acoustic Sensing	Yin Li et.al.	2403.10796	null
2024-03-15	CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner	Tingbing Yan et.al.	2403.10082	null
2024-03-15	Skeleton-Based Human Action Recognition with Noisy Labels	Yi Xu et.al.	2403.09975	null
2024-03-14	On the Utility of 3D Hand Poses for Action Recognition	Md Salman Shamil et.al.	2403.09805	null
2024-03-14	3D-VLA: A 3D Vision-Language-Action Generative World Model	Haoyu Zhen et.al.	2403.09631	null
2024-03-14	SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition	Jeonghyeok Do et.al.	2403.09508	link
2024-03-14	EventRPG: Event Data Augmentation with Relevance Propagation Guidance	Mingyuan Sun et.al.	2403.09274	link
2024-03-14	Leveraging Foundation Model Automatic Data Augmentation Strategies and Skeletal Points for Hands Action Recognition in Industrial Assembly Lines	Liang Wu et.al.	2403.09056	null
2024-03-13	Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models	Wensheng Liang et.al.	2403.08420	null
2024-03-13	NaturalVLM: Leveraging Fine-grained Natural Language for Affordance-Guided Visual Manipulation	Ran Xu et.al.	2403.08355	null
2024-03-13	ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation	Guanxing Lu et.al.	2403.08321	null
2024-03-12	NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning	Bingqian Lin et.al.	2403.07376	link
2024-03-12	BID: Boundary-Interior Decoding for Unsupervised Temporal Action Localization Pre-Trainin	Qihang Fang et.al.	2403.07354	null
2024-03-11	Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling	Wele Gedara Chaminda Bandara et.al.	2403.06978	link
2024-03-11	Deep Learning Approaches for Human Action Recognition in Video Data	Yufei Xie et.al.	2403.06810	null
2024-03-11	Real-Time Multimodal Cognitive Assistant for Emergency Medical Services	Keshara Weerasinghe et.al.	2403.06734	null
2024-03-11	Multimodal Transformers for Real-Time Surgical Activity Prediction	Keshara Weerasinghe et.al.	2403.06705	link
2024-03-11	epsilon-Mesh Attack: A Surface-based Adversarial Point Cloud Attack for Facial Expression Recognition	Batuhan Cengiz et.al.	2403.06661	null
2024-03-11	Density-Guided Label Smoothing for Temporal Localization of Driving Actions	Tunc Alkanat et.al.	2403.06616	null
2024-03-11	Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition	Erkut Akdag et.al.	2403.06577	null
2024-03-10	Coherent Temporal Synthesis for Incremental Action Segmentation	Guodong Ding et.al.	2403.06102	null
2024-03-09	Dissecting Deep RL with High Update Ratios: Combatting Value Overestimation and Divergence	Marcel Hussing et.al.	2403.05996	null
2024-03-08	Benchmarking Micro-action Recognition: Dataset, Methods, and Applications	Dan Guo et.al.	2403.05234	link
2024-03-06	Video Relationship Detection Using Mixture of Experts	Ala Shaabana et.al.	2403.03994	null
2024-03-05	Behavior Generation with Latent Actions	Seungjae Lee et.al.	2403.03181	link
2024-03-05	Learning to Use Tools via Cooperative and Interactive Agents	Zhengliang Shi et.al.	2403.03031	null
2024-03-04	Gesture recognition with Brownian reservoir computing using geometrically confined skyrmion dynamics	Grischa Beneke et.al.	2403.01877	null
2024-03-04	A Simple Baseline for Efficient Hand Mesh Reconstruction	Zhishan Zhou et.al.	2403.01813	null
2024-03-03	A Unified Model Selection Technique for Spectral Clustering Based Motion Segmentation	Yuxiang Huang et.al.	2403.01606	null
2024-03-03	Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition	Kun-Yu Lin et.al.	2403.01560	link
2024-03-02	Dynamic 3D Point Cloud Sequences as 2D Videos	Yiming Zeng et.al.	2403.01129	null
2024-02-29	On the Design of Human-Robot Collaboration Gestures	Anas Shrinah et.al.	2402.19058	null
2024-02-23	Multimodal Transformer With a Low-Computational-Cost Guarantee	Sungjin Park et.al.	2402.15096	null
2024-02-17	Implementation of a Model of the Cortex Basal Ganglia Loop	Naoya Arakawa et.al.	2402.13275	null
2024-02-20	Radar-Based Recognition of Static Hand Gestures in American Sign Language	Christian Schuessler et.al.	2402.12800	null
2024-02-20	Learning Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition	Yuke Li et.al.	2402.12706	null
2024-02-19	Comprehensive Cognitive LLM Agent for Smartphone GUI Automation	Xinbei Ma et.al.	2402.11941	null
2024-02-15	Hand Shape and Gesture Recognition using Multiscale Template Matching, Background Subtraction and Binary Image Analysis	Ketan Suhaas Saichandran et.al.	2402.09663	null
2024-02-14	TikTokActions: A TikTok-Derived Video Dataset for Human Action Recognition	Yang Qian et.al.	2402.08875	null
2024-02-13	BdSLW60: A Word-Level Bangla Sign Language Dataset	Husne Ara Rubaiyeat et.al.	2402.08635	link
2024-02-13	Vision-Based Hand Gesture Customization from a Single Demonstration	Soroush Shahi et.al.	2402.08420	null
2024-02-12	PBADet: A One-Stage Anchor-Free Approach for Part-Body Association	Zhongpai Gao et.al.	2402.07814	null

(back to top)

Pose Estimation

Publish Date	Title	Authors	PDF	Code
2024-05-30	Infinite 3D Landmarks: Improving Continuous 2D Facial Landmark Detection	Prashanth Chandran et.al.	2405.20117	null
2024-05-30	Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach	Muhammad Saif Ullah Khan et.al.	2405.20084	null
2024-05-30	TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM	Peifeng Jiang et.al.	2405.19614	null
2024-05-29	Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives	Mingqi Yuan et.al.	2405.19531	null
2024-05-29	Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation	Sabrina Cynthia Triess et.al.	2405.19173	null
2024-05-28	World Models for General Surgical Grasping	Hongbin Lin et.al.	2405.17940	null
2024-05-27	MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds	Jiahui Lei et.al.	2405.17421	null
2024-05-27	Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding	Niloofar Azizi et.al.	2405.17397	null
2024-05-27	$\text{Di}^2\text{Pose}$ : Discrete Diffusion Model for Occluded 3D Human Pose Estimation	Weiquan Wang et.al.	2405.17016	null
2024-05-27	Clustering-based Learning for UAV Tracking and Pose Estimation	Jiaping Xiao et.al.	2405.16867	null
2024-05-26	Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge	Tianchen Deng et.al.	2405.16464	link
2024-05-25	Intensity and Texture Correction of Omnidirectional Image Using Camera Images for Indirect Augmented Reality	Hakim Ikebayashi et.al.	2405.16008	null
2024-05-23	CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments	Yang Zhou et.al.	2405.14731	link
2024-05-23	Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation	Daniel Kienzle et.al.	2405.14467	null
2024-05-21	Geometric Transformation Uncertainty for Improving 3D Fetal Brain Pose Prediction from Freehand 2D Ultrasound Videos	Jayroop Ramesh et.al.	2405.13235	null
2024-05-21	Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations	Antoine Legrand et.al.	2405.12728	null
2024-05-21	PoseGravity: Pose Estimation from Points and Lines with Axis Prior	Akshay Chandrasekhar et.al.	2405.12646	link
2024-05-19	Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation	Zejun Gu et.al.	2405.12247	null
2024-05-20	AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements	Calvin Yeung et.al.	2405.12070	link
2024-05-19	Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries	Christiaan G. A. Viviers et.al.	2405.11677	link
2024-05-19	Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation	Zejun Gu et.al.	2405.11448	null
2024-05-18	PS6D: Point Cloud Based Symmetry-Aware 6D Object Pose Estimation in Robot Bin-Picking	Yifan Yang et.al.	2405.11257	null
2024-05-18	MotionGS : Compact Gaussian Splatting SLAM by Motion Filter	Xinli Guo et.al.	2405.11129	link
2024-05-17	Resolving Symmetry Ambiguity in Correspondence-based Methods for Instance-level Object Pose Estimation	Yongliang Lin et.al.	2405.10557	null
2024-05-16	Diversity-Aware Sign Language Production through a Pose Encoding Variational Autoencoder	Mohamed Ilyes Lakhal et.al.	2405.10423	null
2024-05-17	Toon3D: Seeing Cartoons from a New Perspective	Ethan Weber et.al.	2405.10320	null
2024-05-15	Task-adaptive Q-Face	Haomiao Sun et.al.	2405.09059	null
2024-05-14	RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images	Zong-Wei Hong et.al.	2405.08483	link
2024-05-14	TP3M: Transformer-based Pseudo 3D Image Matching with Reference	Liming Han et.al.	2405.08434	null
2024-05-13	Deep Learning-Based Object Pose Estimation: A Comprehensive Survey	Jian Liu et.al.	2405.07801	link
2024-05-13	JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation	Xubo Luo et.al.	2405.07429	link
2024-05-11	TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization	Zhen Tan et.al.	2405.07027	null
2024-05-11	AHPPEBot: Autonomous Robot for Tomato Harvesting based on Phenotyping and Pose Estimation	Xingxu Li et.al.	2405.06959	null
2024-05-10	CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras	James Tang et.al.	2405.06845	link
2024-05-10	MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization	Pengcheng Zhu et.al.	2405.06241	null
2024-05-10	Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera	Haixin Shi et.al.	2405.05858	null
2024-05-09	Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion	Huanyu Tian et.al.	2405.05817	null
2024-05-09	NeuRSS: Enhancing AUV Localization and Bathymetric Mapping with Neural Rendering for Sidescan SLAM	Yiping Xie et.al.	2405.05807	null
2024-05-09	Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview	Yuhang Ming et.al.	2405.05526	null
2024-05-08	Adversary-Guided Motion Retargeting for Skeleton Anonymization	Thomas Carr et.al.	2405.05428	null
2024-05-08	FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models	Jinglin Xu et.al.	2405.05216	link
2024-05-08	ProbRadarM3F: mmWave Radar based Human Skeletal Pose Estimation with Probability Map Guided Multi-Format Feature Fusion	Bing Zhu et.al.	2405.05164	null
2024-05-08	GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation	Ivan Bilić et.al.	2405.04890	null
2024-05-07	Learning Distributional Demonstration Spaces for Task-Specific Cross-Pose Estimation	Jenny Wang et.al.	2405.04609	null
2024-05-07	Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform	Zhijian Qiao et.al.	2405.03969	null
2024-05-07	Joint Estimation of Identity Verification and Relative Pose for Partial Fingerprints	Xiongjun Guan et.al.	2405.03959	null
2024-05-06	Pose Priors from Language Models	Sanjay Subramanian et.al.	2405.03689	null
2024-05-06	Optimizing Hand Region Detection in MediaPipe Holistic Full-Body Pose Estimation to Improve Accuracy and Avoid Downstream Errors	Amit Moryossef et.al.	2405.03545	link
2024-05-05	Multi-hop graph transformer network for 3D human pose estimation	Zaedul Islam et.al.	2405.03055	null
2024-05-05	Blending Distributed NeRFs with Tri-stage Robust Pose Optimization	Baijun Ye et.al.	2405.02880	null
2024-05-03	WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD	Xuxin Cheng et.al.	2405.02241	null
2024-05-03	Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation	Xianzhou Zeng et.al.	2405.02114	link
2024-05-03	An Onboard Framework for Staircases Modeling Based on Point Clouds	Chun Qing et.al.	2405.01918	null
2024-05-06	ShadowNav: Autonomous Global Localization for Lunar Navigation in Darkness	Deegan Atha et.al.	2405.01673	null
2024-05-02	IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning	Ryan Hoque et.al.	2405.01472	null
2024-05-02	Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning	Liu Qiyuan et.al.	2405.01284	null
2024-05-02	Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors	Wenxuan Guo et.al.	2405.01112	null
2024-05-02	CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications	Jan Blumenkamp et.al.	2405.01107	null
2024-05-04	HandSSCA: 3D Hand Mesh Reconstruction with State Space Channel Attention from RGB images	Zixun Jiao et.al.	2405.01066	null
2024-05-01	Radar-Based Localization For Autonomous Ground Vehicles In Suburban Neighborhoods	Andrew J. Kramer et.al.	2405.00600	null
2024-04-30	Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging	Rayan Armani et.al.	2404.19541	link
2024-04-30	UniFS: Universal Few-shot Instance Perception with Point Representations	Sheng Jin et.al.	2404.19401	null
2024-04-30	Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training	Xingyu Song et.al.	2404.19279	null
2024-04-30	XFeat: Accelerated Features for Lightweight Image Matching	Guilherme Potje et.al.	2404.19174	null
2024-04-29	Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction	Antoine Maiorca et.al.	2404.18628	null
2024-04-29	Mesh-based Photorealistic and Real-time 3D Mapping for Robust Visual Perception of Autonomous Underwater Vehicle	Jungwoo Lee et.al.	2404.18395	null
2024-04-29	Reconstructing Satellites in 3D from Amateur Telescope Images	Zhiming Chang et.al.	2404.18394	null
2024-04-27	Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs	Yiming Bao et.al.	2404.17837	null
2024-04-26	Localization Through Particle Filter Powered Neural Network Estimated Monocular Camera Poses	Yi Shen et.al.	2404.17685	null
2024-04-26	SLAM for Indoor Mapping of Wide Area Construction Environments	Vincent Ress et.al.	2404.17215	null
2024-04-25	WheelPose: Data Synthesis Techniques to Improve Pose Estimation Performance on Wheelchair Users	William Huang et.al.	2404.17063	link
2024-04-25	Transformer-Based Local Feature Matching for Multimodal Image Registration	Remi Delaunay et.al.	2404.16802	null
2024-04-25	DeepKalPose: An Enhanced Deep-Learning Kalman Filter for Temporally Consistent Monocular Vehicle Pose Estimation	Leandro Di Bella et.al.	2404.16558	null
2024-04-25	Efficient Solution of Point-Line Absolute Pose	Petr Hruby et.al.	2404.16552	link
2024-04-25	COBRA -- COnfidence score Based on shape Regression Analysis for method-independent quality assessment of object pose estimation from single images	Panagiotis Sapoutzoglou et.al.	2404.16471	link
2024-04-25	MegaParticles: Range-based 6-DoF Monte Carlo Localization with GPU-Accelerated Stein Particle Filter	Kenji Koide et.al.	2404.16370	null
2024-04-24	3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement	Filipa Lino et.al.	2404.16136	null
2024-04-23	SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation	Xiangyu Xu et.al.	2404.15276	link
2024-04-25	Domain adaptive pose estimation via multi-level alignment	Yugan Chen et.al.	2404.14885	link
2024-04-23	Semi-supervised 2D Human Pose Estimation via Adaptive Keypoint Masking	Kexin Meng et.al.	2404.14835	null
2024-04-23	UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues	Vandad Davoodnia et.al.	2404.14634	null
2024-04-22	DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation	Yonghao Dang et.al.	2404.14025	null
2024-04-23	CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory	Yunlong Ran et.al.	2404.13896	null
2024-04-21	Resampling-free Particle Filters in High-dimensions	Akhilan Boopathy et.al.	2404.13698	null
2024-04-20	EC-SLAM: Real-time Dense Neural RGB-D SLAM System with Effectively Constrained Global Bundle Adjustment	Guanghao Li et.al.	2404.13346	link
2024-04-18	Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds	Oliver Lemke et.al.	2404.12440	null
2024-04-18	Gait Recognition from Highly Compressed Videos	Andrei Niculae et.al.	2404.12183	null
2024-04-17	Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding	George Retsinas et.al.	2404.12144	link
2024-04-17	Kathakali Hand Gesture Recognition With Minimal Data	Kavitha Raju et.al.	2404.11205	null
2024-04-17	GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement	Linfang Zheng et.al.	2404.11139	null
2024-04-17	CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation	Lianyu Hu et.al.	2404.11111	link
2024-04-16	HumMUSS: Human Motion Understanding using State Space Models	Arnab Kumar Mondal et.al.	2404.10880	null
2024-04-16	Invariant Kalman Filtering with Noise-Free Pseudo-Measurements	Sven Goffin et.al.	2404.10687	null
2024-04-16	The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement	Gabriele Trivigno et.al.	2404.10438	null
2024-04-16	GaitPoint+: A Gait Recognition Network Incorporating Point Cloud Analysis and Recycling	Huantao Ren et.al.	2404.10213	null
2024-04-16	LWIRPOSE: A novel LWIR Thermal Image Dataset and Benchmark	Avinash Upadhyay et.al.	2404.10212	link
2024-04-15	LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives	Jiadi Cui et.al.	2404.09748	null
2024-04-14	In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition	Wiktor Mucha et.al.	2404.09308	null
2024-04-13	DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector	Johan Edstedt et.al.	2404.08928	link
2024-04-16	3D Human Scan With A Moving Event Camera	Kai Kohyama et.al.	2404.08504	null
2024-04-11	Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method	Tashmoy Ghosh et.al.	2404.07649	null
2024-04-11	GLID: Pre-training a Generalist Encoder-Decoder Vision Model	Jihao Liu et.al.	2404.07603	null
2024-04-10	Measuring proximity to standard planes during fetal brain ultrasound scanning	Chiara Di Vece et.al.	2404.07124	null
2024-04-10	MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints	Bedirhan Uguz et.al.	2404.07094	null
2024-04-10	Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting	Xiaolei Lang et.al.	2404.06926	null
2024-04-09	Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences	Axel Barroso-Laguna et.al.	2404.06337	link
2024-04-09	Incremental Joint Learning of Depth, Pose and Implicit Scene Representation on Monocular Camera in Large-scale Scenes	Tianchen Deng et.al.	2404.06050	null
2024-04-09	Improving Facial Landmark Detection Accuracy and Efficiency with Knowledge Distillation	Zong-Wei Hong et.al.	2404.06029	null
2024-04-08	Learning 3D-Aware GANs from Unposed Images with Template Feature Field	Xinya Chen et.al.	2404.05705	null
2024-04-08	Learning a Category-level Object Pose Estimator without Pose Annotations	Fengrui Tian et.al.	2404.05626	null
2024-04-08	DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker	Jiapeng Wu et.al.	2404.05518	link
2024-04-08	Two Hands Are Better Than One: Resolving Hand to Hand Intersections via Occupancy Networks	Maksym Ivashechkin et.al.	2404.05414	null
2024-04-08	STITCH: Augmented Dexterity for Suture Throws Including Thread Coordination and Handoffs	Kush Hari et.al.	2404.05151	null
2024-04-05	ToolEENet: Tool Affordance 6D Pose Estimation	Yunlong Wang et.al.	2404.04193	null
2024-04-04	SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation	Sichen Chen et.al.	2404.03518	link
2024-04-04	Multi Positive Contrastive Learning with Pose-Consistent Generated Images	Sho Inayoshi et.al.	2404.03256	null
2024-04-04	HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud	Wencan Cheng et.al.	2404.03159	link
2024-04-03	Fusing Multi-sensor Input with State Information on TinyML Brains for Autonomous Nano-drones	Luca Crupi et.al.	2404.02567	null
2024-04-03	Semi-Supervised Unconstrained Head Pose Estimation in the Wild	Huayi Zhou et.al.	2404.02544	link
2024-04-02	3D Congealing: 3D-Aware Image Alignment in the Wild	Yunzhi Zhang et.al.	2404.02125	null
2024-04-02	SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation	Vinkle Srivastav et.al.	2404.02041	null
2024-04-01	Marrying NeRF with Feature Matching for One-step Pose Estimation	Ronghan Chen et.al.	2404.00891	null
2024-03-31	Graph-Based vs. Error State Kalman Filter-Based Fusion Of 5G And Inertial Data For MAV Indoor Pose Estimation	Meisam Kabiri et.al.	2404.00691	null
2024-03-31	OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos	Dongyoung Choi et.al.	2404.00676	null
2024-04-02	KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation	Jihua Peng et.al.	2404.00658	link
2024-03-29	FetalDiffusion: Pose-Controllable 3D Fetal MRI Synthesis with Conditional Diffusion Model	Molin Zhang et.al.	2404.00132	null
2024-03-29	Latent Embedding Clustering for Occlusion Robust Head Pose Estimation	José Celestino et.al.	2403.20251	null
2024-03-29	A Unified Framework for Human-centric Point Cloud Video Understanding	Yiteng Xu et.al.	2403.20031	null
2024-04-01	Video-Based Human Pose Regression via Decoupled Space-Time Aggregation	Jijie He et.al.	2403.19926	link
2024-03-28	Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation	Xiao Lin et.al.	2403.19527	link
2024-03-27	Object Pose Estimation via the Aggregation of Diffusion Features	Tianfu Wang et.al.	2403.18791	link
2024-03-27	RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation	Yang Tian et.al.	2403.18259	null
2024-03-26	Mathematical Foundation and Corrections for Full Range Head Pose Estimation	Huei-Chung Hu et.al.	2403.18104	null
2024-03-26	EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose Estimation	Chenhongyi Yang et.al.	2403.18080	null
2024-03-26	A Survey on 3D Egocentric Human Pose Estimation	Md Mushfiqur Azam et.al.	2403.17893	null
2024-03-26	GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction	Hrishav Bakul Barua et.al.	2403.17837	link
2024-03-26	DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions	Sammy Christen et.al.	2403.17827	null
2024-03-26	System Calibration of a Field Phenotyping Robot with Multiple High-Precision Profile Laser Scanners	Felix Esser et.al.	2403.17788	null
2024-03-25	Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos	Remy Sabathier et.al.	2403.17103	null
2024-03-25	Characterisation of the Intel RealSense D415 Stereo Depth Camera for Motion-Corrected CT Perfusion Imaging	Mahdieh Dashtbani Moghari et.al.	2403.16490	null
2024-03-25	Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects	Zicong Fan et.al.	2403.16428	null
2024-03-25	A Geometric Perspective on Fusing Gaussian Distributions on Lie Groups	Yixiao Ge et.al.	2403.16411	null
2024-03-25	ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation	Hannah Schieber et.al.	2403.16400	null
2024-03-24	KITchen: A Real-World Benchmark and Dataset for 6D Object Pose Estimation in Kitchen Environments	Abdelrahman Younes et.al.	2403.16238	null
2024-03-24	Diffusion Model is a Good Pose Estimator from 3D RF-Vision	Junqiao Fan et.al.	2403.16198	null
2024-03-23	UPNeRF: A Unified Framework for Monocular 3D Object Reconstruction and Pose Estimation	Yuliang Guo et.al.	2403.15705	null
2024-03-22	InterFusion: Text-Driven Generation of 3D Human-Object Interaction	Sisi Dai et.al.	2403.15612	null
2024-03-22	Augmented Reality Warnings in Roadway Work Zones: Evaluating the Effect of Modality on Worker Reaction Times	Sepehr Sabeti et.al.	2403.15571	null
2024-03-22	Gesture-Controlled Aerial Robot Formation for Human-Swarm Interaction in Safety Monitoring Applications	Vít Krátký et.al.	2403.15333	null
2024-03-22	WSCLoc: Weakly-Supervised Sparse-View Camera Relocalization	Jialu Wang et.al.	2403.15272	null
2024-03-22	DITTO: Demonstration Imitation by Trajectory Transformation	Nick Heppert et.al.	2403.15203	null
2024-03-22	Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning	Bumsoo Kim et.al.	2403.15048	null
2024-03-22	Trajectory Regularization Enhances Self-Supervised Geometric Representation	Jiayun Wang et.al.	2403.14973	null
2024-03-21	VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding	Ahmad Mahmood et.al.	2403.14743	null
2024-03-21	Visibility-Aware Keypoint Localization for 6DoF Object Pose Estimation	Ruyi Lian et.al.	2403.14559	null
2024-03-21	Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset	Andrea Avogaro. Andrea Toaiari et.al.	2403.14447	null
2024-03-21	Evaluation and Deployment of LiDAR-based Place Recognition in Dense Forests	Haedam Oh et.al.	2403.14326	null
2024-03-21	Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation	Francesco Di Felice et.al.	2403.14279	null
2024-03-20	DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses	Chen Zhao et.al.	2403.13683	link
2024-03-20	Meta-Point Learning and Refining for Category-Agnostic Pose Estimation	Junjie Chen et.al.	2403.13647	link
2024-03-20	Advancing 6D Pose Estimation in Augmented Reality -- Overcoming Projection Ambiguity with Uncontrolled Imagery	Mayura Manawadu et.al.	2403.13434	null
2024-03-20	DOR3D-Net: Dense Ordinal Regression Network for 3D Hand Pose Estimation	Yamin Mao et.al.	2403.13405	null
2024-03-20	ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics	Qiaojun Yu et.al.	2403.13365	null
2024-03-20	MULAN-WC: Multi-Robot Localization Uncertainty-aware Active NeRF with Wireless Coordination	Weiying Wang et.al.	2403.13348	null
2024-03-19	FaceXFormer: A Unified Transformer for Facial Analysis	Kartik Narayan et.al.	2403.12960	null
2024-03-19	WHAC: World-grounded Humans and Cameras	Wanqi Yin et.al.	2403.12959	null
2024-03-19	Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation	Jingtao Sun et.al.	2403.12728	link
2024-03-19	IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model	Matteo Bortolon et.al.	2403.12682	null
2024-03-19	In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing	Mingrui Yu et.al.	2403.12676	null
2024-03-19	Self-learning Canonical Space for Multi-view 3D Human Pose Estimation	Xiaoben Li et.al.	2403.12440	null
2024-03-19	Human Mesh Recovery from Arbitrary Multi-view Images	Xiaoben Li et.al.	2403.12434	null
2024-03-19	XPose: eXplainable Human Pose Estimation	Luyu Qiu et.al.	2403.12370	null
2024-03-18	HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data	Mengqi Zhang et.al.	2403.12011	null
2024-03-18	Normalized Validity Scores for DNNs in Regression based Eye Feature Extraction	Wolfgang Fuhl et.al.	2403.11665	null
2024-03-18	An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation	Zewen Xu et.al.	2403.11639	null
2024-03-18	LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models	Yang Yang et.al.	2403.11627	link
2024-03-18	GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects	Sungphill Moon et.al.	2403.11510	null
2024-03-17	A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation	Qucheng Peng et.al.	2403.11310	null
2024-03-17	Compact 3D Gaussian Splatting For Dense Visual SLAM	Tianchen Deng et.al.	2403.11247	null
2024-03-16	Robotic Task Success Evaluation Under Multi-modal Non-Parametric Object Pose Uncertainty	Lakshadeep Naik et.al.	2403.10874	null
2024-03-16	DPPE: Dense Pose Estimation in a Plenoxels Environment using Gradient Approximation	Christopher Kolios et.al.	2403.10773	null
2024-03-15	GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation	Dingding Cai et.al.	2403.10683	null
2024-03-15	CLOSURE: Fast Quantification of Pose Uncertainty Sets	Yihuai Gao et.al.	2403.09990	null
2024-03-14	Scalable Autonomous Drone Flight in the Forest with Visual-Inertial SLAM and Dense Submaps Built without LiDAR	Sebastián Barbas Laina et.al.	2403.09596	null
2024-03-14	Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting	Pawel Knap et.al.	2403.09437	null
2024-03-14	LM2D: Lyrics- and Music-Driven Dance Synthesis	Wenjie Yin et.al.	2403.09407	null
2024-03-14	SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios	Ding-Tao Huang et.al.	2403.09317	link
2024-03-14	MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion	Arul Selvam Periyasamy et.al.	2403.09309	null
2024-03-13	Data Augmentation in Human-Centric Vision	Wentao Jiang et.al.	2403.08650	null
2024-03-13	PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections	Matteo Taiana et.al.	2403.08586	null
2024-03-13	NeRF-Supervised Feature Point Detection and Description	Ali Youssef et.al.	2403.08156	null
2024-03-12	Q-SLAM: Quadric Representations for Monocular SLAM	Chensheng Peng et.al.	2403.08125	null
2024-03-12	MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation	Yuelong Li et.al.	2403.08019	null
2024-03-12	Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation	Kira Wursthorn et.al.	2403.07741	null
2024-03-12	Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving	JunDa Cheng et.al.	2403.07535	null
2024-03-12	Category-Agnostic Pose Estimation for Point Clouds	Bowen Liu et.al.	2403.07437	null
2024-03-12	Monocular Microscope to CT Registration using Pose Estimation of the Incus for Augmented Reality Cochlear Implant Surgery	Yike Zhang et.al.	2403.07219	null
2024-03-11	Real-Time Simulated Avatar from Head-Mounted Sensors	Zhengyi Luo et.al.	2403.06862	null
2024-03-11	Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition	Erkut Akdag et.al.	2403.06577	null
2024-03-10	Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation	Paweł A. Pierzchlewicz et.al.	2403.06164	link
2024-03-10	Diffusion Models Trained with Large Data Are Transferable Visual Models	Guangkai Xu et.al.	2403.06090	null
2024-03-08	Prepared for the Worst: A Learning-Based Adversarial Attack for Resilience Analysis of the ICP Algorithm	Ziyu Zhang et.al.	2403.05666	null
2024-03-11	Exploiting polar symmetry in designing equivariant observers for vision-based motion estimation	Tarek Bouazza et.al.	2403.05450	null
2024-03-07	Real-Time Planning Under Uncertainty for AUVs Using Virtual Maps	Ivana Collado-Gonzalez et.al.	2403.04936	null
2024-03-07	That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation	Georgi Pramatarov et.al.	2403.04755	null
2024-03-07	Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser	Qingyuan Cai et.al.	2403.04444	null
2024-03-09	Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation	Ruicong Liu et.al.	2403.04381	null
2024-03-05	FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation	Chris Rockwell et.al.	2403.03221	null
2024-03-05	NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors	Yannan He et.al.	2403.03122	null
2024-03-05	Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection	Mohamed Afifi et.al.	2403.03111	null
2024-03-05	Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps	Timothy Chen et.al.	2403.02751	null
2024-03-04	PowerSkel: A Device-Free Framework Using CSI Signal for Human Skeleton Estimation in Power Station	Cunyi Yin et.al.	2403.01913	link
2024-03-04	A Simple Baseline for Efficient Hand Mesh Reconstruction	Zhishan Zhou et.al.	2403.01813	null
2024-03-03	MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images	Junwen Huang et.al.	2403.01517	null
2024-03-02	Single-image camera calibration with model-free distortion correction	Katia Genovese et.al.	2403.01263	null
2024-03-02	Grid-based Fast and Structural Visual Odometry	Zhang Zhihe et.al.	2403.01110	null
2024-03-01	Optimal Robot Formations: Balancing Range-Based Observability and User-Defined Configurations	Syed Shabbir Ahmed et.al.	2403.00988	null
2024-03-04	TEXterity -- Tactile Extrinsic deXterity: Simultaneous Tactile Estimation and Control for Extrinsic Dexterity	Sangwoon Kim et.al.	2403.00049	null
2024-03-01	Graph Convolutional Neural Networks for Automated Echocardiography View Recognition: A Holistic Approach	Sarina Thomas et.al.	2402.19062	null
2024-02-29	Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey	Yang Liu et.al.	2402.18844	link
2024-02-28	Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting	Taeho Kang et.al.	2402.18330	link
2024-02-28	Location-guided Head Pose Estimation for Fisheye Image	Bing Li et.al.	2402.18320	null
2024-02-28	NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images	Jingrui Yu et.al.	2402.18196	null
2024-02-28	Six-Point Method for Multi-Camera Systems with Reduced Solution Space	Banglei Guan et.al.	2402.18066	null
2024-02-27	Real-Time Estimation of Relative Pose for UAVs Using a Dual-Channel Feature Association	Zhaoying Wang et.al.	2402.17504	null
2024-02-26	HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields	Haozhe Qi et.al.	2402.17062	link
2024-02-26	DRSI-Net: Dual-Residual Spatial Interaction Network for Multi-Person Pose Estimation	Shang Wu et.al.	2402.16640	null
2024-02-26	GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video	Xinqi Liu et.al.	2402.16607	null
2024-02-26	DreamUp3D: Object-Centric Generative Models for Single-View 3D Scene Understanding and Real-to-Sim Transfer	Yizhe Wu et.al.	2402.16308	null
2024-02-25	XAI-based gait analysis of patients walking with Knee-Ankle-Foot orthosis using video cameras	Arnav Mishra et.al.	2402.16175	null

(back to top)

Image Generation

Publish Date	Title	Authors	PDF	Code
2024-05-30	SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow	Chaoyang Wang et.al.	2405.20282	link
2024-05-30	ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections	Massimo Bini et.al.	2405.20271	link
2024-05-30	Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback	Sanghyeon Na et.al.	2405.20216	null
2024-05-30	RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection	Zhiyuan He et.al.	2405.20112	null
2024-05-30	RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection	Fangyi Chen et.al.	2405.19854	null
2024-05-30	Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network	Sizhe Zheng et.al.	2405.19775	null
2024-05-30	MAE-GAN: A Novel Strategy for Simultaneous Super-resolution Reconstruction and Denoising of Post-stack Seismic Profile	Wenshuo Yu et.al.	2405.19767	null
2024-05-30	Mitigating annotation shift in cancer classification using single image generative models	Marta Buetas Arcas et.al.	2405.19754	link
2024-05-30	Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian	Wei Sun et.al.	2405.19657	null
2024-05-29	Quo Vadis ChatGPT? From Large Language Models to Large Knowledge Models	Venkat Venkatasubramanian et.al.	2405.19561	null
2024-05-29	ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning	Ruchika Chavhan et.al.	2405.19237	link
2024-05-29	Going beyond compositional generalization, DDPMs can produce zero-shot interpolation	Justin Deschenaux et.al.	2405.19201	link
2024-05-29	The ethical situation of DALL-E 2	Eduard Hogea et.al.	2405.19176	null
2024-05-29	Patch-enhanced Mask Encoder Prompt Image Generation	Shusong Xu et.al.	2405.19085	null
2024-05-29	EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture	Jiaqi Xu et.al.	2405.18991	link
2024-05-29	Topological Perspectives on Optimal Multimodal Embedding Spaces	Abdul Aziz A. B et.al.	2405.18867	null
2024-05-29	Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory Matching	Yasi Zhang et.al.	2405.18816	null
2024-05-29	SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation	Zhenbei Wu et.al.	2405.18801	null
2024-05-29	Inpaint Biases: A Pathway to Accurate and Unbiased Image Generation	Jiyoon Myung et.al.	2405.18762	null
2024-05-29	SketchDeco: Decorating B&W Sketches with Colour	Chaitat Utintu et.al.	2405.18716	null
2024-05-28	Phased Consistency Model	Fu-Yun Wang et.al.	2405.18407	null
2024-05-28	Multi-modal Generation via Cross-Modal In-Context Learning	Amandeep Kumar et.al.	2405.18304	link
2024-05-28	Are Image Distributions Indistinguishable to Humans Indistinguishable to Classifiers?	Zebin You et.al.	2405.18029	null
2024-05-28	Cycle-YOLO: A Efficient and Robust Framework for Pavement Damage Detection	Zhengji Li et.al.	2405.17905	null
2024-05-27	RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance	Jiaojiao Fan et.al.	2405.17661	null
2024-05-27	Enhancing Global Sensitivity and Uncertainty Quantification in Medical Image Reconstruction with Monte Carlo Arbitrary-Masked Mamba	Jiahao Huang et.al.	2405.17659	null
2024-05-27	EM-GANSim: Real-time and Accurate EM Simulation Using Conditional GANs for 3D Indoor Scenes	Ruichen Wang et.al.	2405.17366	null
2024-05-27	Prompt Optimization with Human Feedback	Xiaoqiang Lin et.al.	2405.17346	link
2024-05-27	From Text to Blueprint: Leveraging Text-to-Image Tools for Floor Plan Creation	Xiaoyu Li et.al.	2405.17236	null
2024-05-27	MCGAN: Enhancing GAN Training with Regression-Based Generator Loss	Baoren Xiao et.al.	2405.17191	null
2024-05-27	Training-free Editioning of Text-to-Image Models	Jinqi Wang et.al.	2405.17069	null
2024-05-27	The Poisson Midpoint Method for Langevin Dynamics: Provably Efficient Discretization for Diffusion Models	Saravanan Kandasamy et.al.	2405.17068	null
2024-05-27	Glauber Generative Model: Discrete Diffusion Models via Binary Classification	Harshit Varma et.al.	2405.17035	null
2024-05-27	A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis	Minh H. Vu et.al.	2405.16971	null
2024-05-27	Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation	Liang Shi et.al.	2405.16895	null
2024-05-27	Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks	Yunqi Zhang et.al.	2405.16860	link
2024-05-24	Learning to Discretize Denoising Diffusion ODEs	Vinh Tong et.al.	2405.15506	null
2024-05-24	A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence	Ali Kashefi et.al.	2405.15406	null
2024-05-24	Stochastic SR for Gaussian microtextures	Emile Pierret et.al.	2405.15399	null
2024-05-24	Challenges and Opportunities in 3D Content Generation	Ke Zhao et.al.	2405.15335	null
2024-05-24	Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model	Mingyang Yi et.al.	2405.15330	null
2024-05-24	SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance	Guibao Shen et.al.	2405.15321	null
2024-05-24	Decaf: Data Distribution Decompose Attack against Federated Learning	Zhiyang Dai et.al.	2405.15316	null
2024-05-24	Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient	Yongliang Wu et.al.	2405.15304	null
2024-05-24	StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models	Chengming Xu et.al.	2405.15287	null
2024-05-24	Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models	Yimeng Zhang et.al.	2405.15234	link
2024-05-23	Improved Distribution Matching Distillation for Fast Image Synthesis	Tianwei Yin et.al.	2405.14867	null
2024-05-23	Semantica: An Adaptable Image-Conditioned Diffusion Model	Manoj Kumar et.al.	2405.14857	null
2024-05-23	TerDiT: Ternary Diffusion Models with Transformers	Xudong Lu et.al.	2405.14854	link
2024-05-23	Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models	Katherine Xu et.al.	2405.14828	null
2024-05-24	Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation	Hongxu Jiang et.al.	2405.14802	null
2024-05-23	Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy	Shengfang Zhai et.al.	2405.14800	null
2024-05-23	RetAssist: Facilitating Vocabulary Learners with Generative Images in Story Retelling Practices	Qiaoyi Chen et.al.	2405.14794	null
2024-05-23	OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance	Shuheng Ge et.al.	2405.14709	null
2024-05-23	Learning Multi-dimensional Human Preference for Text-to-Image Generation	Sixian Zhang et.al.	2405.14705	null
2024-05-23	RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance	Zhicheng Sun et.al.	2405.14677	link
2024-05-21	Personalized Residuals for Concept-Driven Text-to-Image Generation	Cusuh Ham et.al.	2405.12978	null
2024-05-21	An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation	Zhiyu Tan et.al.	2405.12914	null
2024-05-21	Spatial-aware Attention Generative Adversarial Network for Semi-supervised Anomaly Detection in Medical Image	Zerui Zhang et.al.	2405.12872	null
2024-05-21	A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability	Li-Yang Tseng et.al.	2405.12847	null
2024-05-21	Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations	Antoine Legrand et.al.	2405.12728	null
2024-05-21	CustomText: Customized Textual Image Generation using Diffusion Models	Shubham Paliwal et.al.	2405.12531	null
2024-05-20	Diffusion for World Modeling: Visual Details Matter in Atari	Eloi Alonso et.al.	2405.12399	link
2024-05-20	Paired Conditional Generative Adversarial Network for Highly Accelerated Liver 4D MRI	Di Xu et.al.	2405.12357	null
2024-05-20	EGAN: Evolutional GAN for Ransomware Evasion	Daniel Commey et.al.	2405.12266	null
2024-05-20	Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices	Nathaniel Cohen et.al.	2405.12211	null
2024-05-20	Diffusion Models for Generating Ballistic Spacecraft Trajectories	Tyler Presser et.al.	2405.11738	null
2024-05-19	URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images	Zoey Chen et.al.	2405.11656	null
2024-05-19	Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation	Sangyeop Yeo et.al.	2405.11614	null
2024-05-19	A GAN-Based Data Poisoning Attack Against Federated Learning Systems and Its Countermeasure	Wei Sun et.al.	2405.11440	null
2024-05-18	UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers	Duo Peng et.al.	2405.11336	null
2024-05-18	On the Trajectory Regularity of ODE-based Diffusion Sampling	Defang Chen et.al.	2405.11326	null
2024-05-18	Few-Shot API Attack Detection: Overcoming Data Scarcity with GAN-Inspired Learning	Udi Aharon et.al.	2405.11258	null
2024-05-18	TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation	Chengcheng Feng et.al.	2405.11236	null
2024-05-17	Improving face generation quality and prompt following with synthetic captions	Michail Tarasiou et.al.	2405.10864	null
2024-05-17	Multi-scale Semantic Prior Features Guided Deep Neural Network for Urban Street-view Image	Jianshun Zeng et.al.	2405.10504	null
2024-05-17	Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers	Rya Sanovar et.al.	2405.10480	null
2024-05-16	Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model	Zheng Gu et.al.	2405.10316	null
2024-05-16	UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models	Sahel Sharifymoghaddam et.al.	2405.10311	null
2024-05-16	VirtualModel: Generating Object-ID-retentive Human-object Interaction Image by Diffusion Model for E-commerce Marketing	Binghui Chen et.al.	2405.09985	null
2024-05-16	KPNDepth: Depth Estimation of Lane Images under Complex Rainy Environment	Zhengxu Shi et.al.	2405.09964	null
2024-05-16	Chameleon: Mixed-Modal Early-Fusion Foundation Models	Chameleon Team et.al.	2405.09818	null
2024-05-16	MediSyn: Text-Guided Diffusion Models for Broad Medical 2D and 3D Image Synthesis	Joseph Cho et.al.	2405.09806	null
2024-05-16	An Autoencoder and Generative Adversarial Networks Approach for Multi-Omics Data Imbalanced Class Handling and Classification	Ibrahim Al-Hurani et.al.	2405.09756	null
2024-05-15	Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer	Weifei Jin et.al.	2405.09470	null
2024-05-16	Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images	Memoona Aziz et.al.	2405.09426	null
2024-05-15	DeCoDEx: Confounder Detector Guidance for Improved Diffusion-based Counterfactual Explanations	Nima Fathi et.al.	2405.09288	link
2024-05-15	SOEDiff: Efficient Distillation for Small Object Editing	Qihe Pan et.al.	2405.09114	null
2024-05-15	Deep Learning in Earthquake Engineering: A Comprehensive Review	Yazhou Xie et.al.	2405.09021	null
2024-05-14	Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding	Zhimin Li et.al.	2405.08748	link
2024-05-15	Similarity Metrics for MR Image-To-Image Translation	Melanie Dohmen et.al.	2405.08431	null
2024-05-14	Compositional Text-to-Image Generation with Dense Blob Representations	Weili Nie et.al.	2405.08246	null
2024-05-13	RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations	Chengde Lin et.al.	2405.08114	link
2024-05-13	CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models	Nick Stracke et.al.	2405.07913	null
2024-05-13	SAR Image Synthesis with Diffusion Models	Denisa Qosja et.al.	2405.07776	null
2024-05-12	Semantic Loss Functions for Neuro-Symbolic Structured Prediction	Kareem Ahmed et.al.	2405.07387	null
2024-05-12	Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning	Jiarui Wang et.al.	2405.07346	link
2024-05-12	PotatoGANs: Utilizing Generative Adversarial Networks, Instance Segmentation, and Explainable AI for Enhanced Potato Disease Identification and Classification	Mohammad Shafiul Alam et.al.	2405.07332	link
2024-05-12	Stable Signature is Unstable: Removing Image Watermark from Diffusion Models	Yuepeng Hu et.al.	2405.07145	null
2024-05-12	MAxPrototyper: A Multi-Agent Generation System for Interactive User Interface Prototyping	Mingyue Yuan et.al.	2405.07131	null
2024-05-11	Unsupervised Density Neural Representation for CT Metal Artifact Reduction	Qing Wu et.al.	2405.07047	null
2024-05-11	Semantic Guided Large Scale Factor Remote Sensing Image Super-resolution with Generative Diffusion Prior	Ce Wang et.al.	2405.07044	link
2024-05-11	Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation	Shengyuan Liu et.al.	2405.06948	null
2024-05-10	Controllable Image Generation With Composed Parallel Token Prediction	Jamie Stirling et.al.	2405.06535	null
2024-05-10	SketchDream: Sketch-based Text-to-3D Generation and Editing	Feng-Lin Liu et.al.	2405.06461	null
2024-05-09	Photonic quantum generative adversarial networks for classical data	Tigran Sedrakyan et.al.	2405.06023	null
2024-05-09	Frame Interpolation with Consecutive Brownian Bridge Diffusion	Zonglin Lyu et.al.	2405.05953	null
2024-05-09	Could It Be Generated? Towards Practical Analysis of Memorization in Text-To-Image Diffusion Models	Zhe Ma et.al.	2405.05846	null
2024-05-10	MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation	Yuxiang Wei et.al.	2405.05806	link
2024-05-09	Exploring Text-Guided Single Image Editing for Remote Sensing Images	Fangzhou Han et.al.	2405.05769	null
2024-05-09	End-to-End Generative Semantic Communication Powered by Shared Semantic Knowledge Base	Shuling Li et.al.	2405.05738	null
2024-05-09	VM-DDPM: Vision Mamba Diffusion for Medical Image Synthesis	Zhihan Ju et.al.	2405.05667	null
2024-05-09	A Survey on Personalized Content Synthesis with Diffusion Models	Xulu Zhang et.al.	2405.05538	null
2024-05-09	Characteristic Learning for Provable One Step Generation	Zhao Ding et.al.	2405.05512	link
2024-05-08	Cross-Modality Translation with Generative Adversarial Networks to Unveil Alzheimer's Disease Biomarkers	Reihaneh Hassanzadeh et.al.	2405.05462	null
2024-05-08	DrawL: Understanding the Effects of Non-Mainstream Dialects in Prompted Image Generation	Joshua N. Williams et.al.	2405.05382	null
2024-05-08	Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo	Nayantara Mudur et.al.	2405.05255	link
2024-05-08	StyleMamba : State Space Model for Efficient Text-driven Image Style Transfer	Zijia Wang et.al.	2405.05027	null
2024-05-08	Discrepancy-based Diffusion Models for Lesion Detection in Brain MRI	Keqiang Fan et.al.	2405.04974	null
2024-05-08	Improving Long Text Understanding with Knowledge Distilled from Summarization Model	Yan Liu et.al.	2405.04955	null
2024-05-08	HAGAN: Hybrid Augmented Generative Adversarial Network for Medical Image Synthesis	Zhihan Ju et.al.	2405.04902	null
2024-05-08	FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation	Xuehai He et.al.	2405.04834	null
2024-05-07	TexControl: Sketch-Based Two-Stage Fashion Image Generation Using Diffusion Model	Yongming Zhang et.al.	2405.04675	null
2024-05-07	ResNCT: A Deep Learning Model for the Synthesis of Nephrographic Phase Images in CT Urography	Syed Jamal Safdar Gardezi et.al.	2405.04629	null
2024-05-07	SingIt! Singer Voice Transformation	Amit Eliav et.al.	2405.04627	null
2024-05-07	Towards Geographic Inclusion in the Evaluation of Text-to-Image Models	Melissa Hall et.al.	2405.04457	null
2024-05-07	Data augmentation experiments with style-based quantum generative adversarial networks on trapped-ion and superconducting-qubit technologies	Julien Baglio et.al.	2405.04401	null
2024-05-07	Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation	Jihyun Kim et.al.	2405.04356	null
2024-05-07	Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer	Zhuoyi Yang et.al.	2405.04312	link
2024-05-07	Improving Offline Reinforcement Learning with Inaccurate Simulators	Yiwen Hou et.al.	2405.04307	null
2024-05-07	Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map	Yuxuan Xia et.al.	2405.04290	null
2024-05-07	Bidirectional Adversarial Autoencoders for the design of Plasmonic Metasurfaces	Yuansan Liu et.al.	2405.04056	link
2024-05-07	Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model	Joo Young Choi et.al.	2405.03958	null
2024-05-06	Generated Contents Enrichment	Mahdi Naseri et.al.	2405.03650	null
2024-05-06	CCDM: Continuous Conditional Diffusion Models for Image Generation	Xin Ding et.al.	2405.03546	link
2024-05-06	GLIP: Electromagnetic Field Exposure Map Completion by Deep Generative Networks	Mohammed Mallik et.al.	2405.03384	null
2024-05-05	AnoGAN for Tabular Data: A Novel Approach to Anomaly Detection	Aditya Singh et.al.	2405.03075	null
2024-05-05	Boundary-aware Decoupled Flow Networks for Realistic Extreme Rescaling	Jinmin Li et.al.	2405.02941	null
2024-05-05	Data-Efficient Molecular Generation with Hierarchical Textual Inversion	Seojin Kim et.al.	2405.02845	null
2024-05-05	SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion	Ziyun Qian et.al.	2405.02844	null
2024-05-05	ImageInWords: Unlocking Hyper-Detailed Image Descriptions	Roopal Garg et.al.	2405.02793	link
2024-05-04	U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers	Yuchuan Tian et.al.	2405.02730	null
2024-05-03	Functional Imaging Constrained Diffusion for Brain PET Synthesis from Structural MRI	Minhui Yu et.al.	2405.02504	null
2024-05-03	Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification	Siqi Yin et.al.	2405.02155	null
2024-05-03	Reconstructing the mid-infrared spectra of galaxies using ultraviolet to submillimeter photometry and Deep Generative Networks	Agapi Rissaki et.al.	2405.02153	null
2024-05-03	Three-Dimensional Amyloid-Beta PET Synthesis from Structural MRI with Conditional Generative Adversarial Networks	Fernando Vega et.al.	2405.02109	null
2024-05-03	AI-generated art perceptions with GenFrame -- an image-generating picture frame	Peter Kun et.al.	2405.01901	null
2024-05-03	Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition	Yichun Tai et.al.	2405.01872	null
2024-05-03	Report on the AAPM Grand Challenge on deep generative modeling for learning medical image statistics	Rucha Deshpande et.al.	2405.01822	null
2024-05-02	Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning	Rafael Elberg et.al.	2405.01705	link
2024-05-02	Investigation on optimal microstructure of dual-phase steel with high strength and ductility by machine learning	Misato Suzuki et.al.	2405.01689	null
2024-05-02	Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance	Kelvin C. K. Chan et.al.	2405.01356	null
2024-05-02	Towards Inclusive Face Recognition Through Synthetic Ethnicity Alteration	Praveen Kumar Chandaliya et.al.	2405.01273	null
2024-05-02	DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines	Ye Tian et.al.	2405.01248	null
2024-05-02	On Mechanistic Knowledge Localization in Text-to-Image Generative Models	Samyadeep Basu et.al.	2405.01008	null
2024-05-01	SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models	Burak Can Biner et.al.	2405.00878	null
2024-05-01	Guided Conditional Diffusion Classifier (ConDiff) for Enhanced Prediction of Infection in Diabetic Foot Ulcers	Palawat Busaranuvong et.al.	2405.00858	null
2024-05-01	RGB $\leftrightarrow$ X: Image decomposition and synthesis using material- and lighting-aware diffusion models	Zheng Zeng et.al.	2405.00666	null
2024-05-01	UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-scale Generation and Registration Enhancement	Ruiquan Ge et.al.	2405.00542	link
2024-05-01	Compressive Sensing Imaging Using Caustic Lens Mask Generated by Periodic Perturbation in a Ripple Tank	Doğan Tunca Arık et.al.	2405.00407	null
2024-05-01	Beamforming Inferring by Conditional WGAN-GP for Holographic Antenna Arrays	Fenghao Zhu et.al.	2405.00391	null
2024-05-01	Streamlining Image Editing with Layered Diffusion Brushes	Peyman Gholami et.al.	2405.00313	null
2024-04-30	IgCONDA-PET: Implicitly-Guided Counterfactual Diffusion for Detecting Anomalies in PET Images	Shadab Ahamed et.al.	2405.00239	link
2024-04-30	DOCCI: Descriptions of Connected and Contrasting Images	Yasumasa Onoe et.al.	2404.19753	null
2024-04-30	Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation	Yunhao Ge et.al.	2404.19752	null
2024-04-30	SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space Exploration	Yuto Nakashima et.al.	2404.19693	null
2024-04-30	Seeing Through the Clouds: Cloud Gap Imputation with Prithvi Foundation Model	Denys Godwin et.al.	2404.19609	null
2024-04-30	TwinDiffusion: Enhancing Coherence and Efficiency in Panoramic Image Generation with Diffusion Models	Teng Zhou et.al.	2404.19475	null
2024-04-30	InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation	Chanran Kim et.al.	2404.19427	null
2024-05-01	Mapping New Realities: Ground Truth Image Creation with Pix2Pix Image-to-Image Translation	Zhenglin Li et.al.	2404.19265	null
2024-05-01	FOTS: A Fast Optical Tactile Simulator for Sim2Real Learning of Tactile-motor Robot Manipulation Skills	Yongqiang Zhao et.al.	2404.19217	null
2024-04-30	NeRF-Insert: 3D Local Editing with Multimodal Control Signals	Benet Oriol Sabat et.al.	2404.19204	null
2024-04-29	DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing	Minghao Chen et.al.	2404.18929	null
2024-04-29	TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation	Junhao Cheng et.al.	2404.18919	null
2024-04-29	Hide and Seek: How Does Watermarking Impact Face Recognition?	Yuguang Yao et.al.	2404.18890	null
2024-04-29	Learning Mixtures of Gaussians Using Diffusion Models	Khashayar Gatmiry et.al.	2404.18869	null
2024-04-29	Socially Adaptive Path Planning Based on Generative Adversarial Network	Yao Wang et.al.	2404.18687	null
2024-04-29	FlexiFilm: Long Video Generation with Flexible Conditions	Yichen Ouyang et.al.	2404.18620	link
2024-04-29	Anywhere: A Multi-Agent Framework for Reliable and Diverse Foreground-Conditioned Image Inpainting	Tianyidan Xie et.al.	2404.18598	null
2024-04-29	SIDBench: A Python Framework for Reliably Assessing Synthetic Image Detection Methods	Manos Schinas et.al.	2404.18552	link
2024-04-29	Towards Image Synthesis with Photon Counting Stellar Intensity Interferometry	Alessia Spolon et.al.	2404.18507	null
2024-04-29	Autonomous Quality and Hallucination Assessment for Virtual Tissue Staining and Digital Pathology	Luzhe Huang et.al.	2404.18458	null
2024-04-26	Federated Transfer Component Analysis Towards Effective VNF Profiling	Xunzheng ZhangB et.al.	2404.17553	null
2024-04-26	Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement	Zishu Yao et.al.	2404.17400	null
2024-04-26	Trinity Detector:text-assisted and attention mechanisms based spectral fusion for diffusion generation image detection	Jiawei Song et.al.	2404.17254	null
2024-04-26	ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion	Ziyue Zhang et.al.	2404.17230	link
2024-04-26	DPGAN: A Dual-Path Generative Adversarial Network for Missing Data Imputation in Graphs	Xindi Zheng et.al.	2404.17164	null
2024-04-26	An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder	Yicheng Gu et.al.	2404.17161	null
2024-04-26	Synthesizing Iris Images using Generative Adversarial Networks: Survey and Comparative Analysis	Shivangi Yadav et.al.	2404.17105	null
2024-04-25	Channel Modeling for FR3 Upper Mid-band via Generative Adversarial Networks	Yaqi Hu et.al.	2404.17069	null
2024-04-25	DE-CGAN: Boosting rTMS Treatment Prediction with Diversity Enhancing Conditional Generative Adversarial Networks	Matthew Squires et.al.	2404.16913	null
2024-04-25	REBEL: Reinforcement Learning via Regressing Relative Rewards	Zhaolin Gao et.al.	2404.16767	null
2024-04-25	Denoising: from classical methods to deep CNNs	Jean-Eric Campagne et.al.	2404.16617	link
2024-04-25	MuseumMaker: Continual Style Customization without Catastrophic Forgetting	Chenxi Liu et.al.	2404.16612	null
2024-04-25	Conditional Distribution Modelling for Few-Shot Image Synthesis with Diffusion Models	Parul Gupta et.al.	2404.16556	null
2024-04-25	OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images	Ye Mao et.al.	2404.16538	null
2024-04-25	Cross-sensor super-resolution of irregularly sampled Sentinel-2 time series	Aimi Okabayashi et.al.	2404.16409	link
2024-04-24	Guardians of the Quantum GAN	Archisman Ghosh et.al.	2404.16156	null
2024-04-24	Quantitative Characterization of Retinal Features in Translated OCTA	Rashadul Hasan Badhon et.al.	2404.16133	null
2024-04-24	Spinning solar jets explained through the interplay between plasma sheets and vortex columns	Sahel Dey et.al.	2404.16096	null
2024-04-24	PuLID: Pure and Lightning ID Customization via Contrastive Alignment	Zinan Guo et.al.	2404.16022	null
2024-04-24	Security Analysis of WiFi-based Sensing Systems: Threats from Perturbation Attacks	Hangcheng Cao et.al.	2404.15587	null
2024-04-23	Multi-scale Intervention Planning based on Generative Design	Ioannis Kavouras et.al.	2404.15492	null
2024-04-23	ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning	Weifeng Chen et.al.	2404.15449	null
2024-04-23	GLoD: Composing Global Contexts and Local Details in Image Generation	Moyuru Yamada et.al.	2404.15447	null
2024-04-23	From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation	Zehuan Huang et.al.	2404.15267	null
2024-04-23	Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment	Tianwei Zhou et.al.	2404.15163	null
2024-04-23	Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation	Xun Wu et.al.	2404.15100	null
2024-04-23	CoARF: Controllable 3D Artistic Style Transfer for Radiance Fields	Deheng Zhang et.al.	2404.14967	null
2024-04-23	Music Style Transfer With Diffusion Model	Hong Huang et.al.	2404.14771	null
2024-04-23	SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models	Bo Lin et.al.	2404.14755	null
2024-04-23	Skip the Benchmark: Generating System-Level High-Level Synthesis Data using Generative Machine Learning	Yuchao Liao et.al.	2404.14754	null
2024-04-23	FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction	Hang Hua et.al.	2404.14715	null
2024-04-22	The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking	Yuying Li et.al.	2404.14581	null
2024-04-22	GeoDiffuser: Geometry-Based Image Editing with Diffusion Models	Rahul Sajnani et.al.	2404.14403	null
2024-04-22	SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation	Yuying Ge et.al.	2404.14396	link
2024-04-22	MultiBooth: Towards Generating All Your Concepts in an Image from Text	Chenyang Zhu et.al.	2404.14239	link
2024-04-22	RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance	Chengrui Wang et.al.	2404.13984	null
2024-04-23	Accelerating Image Generation with Sub-path Linear Approximation Model	Chen Xu et.al.	2404.13903	null
2024-04-22	Towards Better Text-to-Image Generation Alignment via Attention Modulation	Yihang Wu et.al.	2404.13899	null
2024-04-22	Regional Style and Color Transfer	Zhicheng Ding et.al.	2404.13880	null
2024-04-22	Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning	Huan Bao et.al.	2404.13860	null
2024-04-22	A Comparative Study on Enhancing Prediction in Social Network Advertisement through Data Augmentation	Qikai Yang et.al.	2404.13812	null
2024-04-21	Enforcing Conditional Independence for Fair Representation Learning and Causal Image Generation	Jensen Hwa et.al.	2404.13798	null
2024-04-19	RadRotator: 3D Rotation of Radiographs with Diffusion Models	Pouria Rouzrokh et.al.	2404.13000	null
2024-04-19	Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images	Santosh et.al.	2404.12908	link
2024-04-19	Explainable Deepfake Video Detection using Convolutional Neural Network and CapsuleNet	Gazi Hasin Ishrak et.al.	2404.12841	null
2024-04-19	Generative Modelling with High-Order Langevin Dynamics	Ziqiang Shi et.al.	2404.12814	null
2024-04-19	PATE-TripleGAN: Privacy-Preserving Image Synthesis with Gaussian Differential Privacy	Zepeng Jiang et.al.	2404.12730	null
2024-04-19	MLSD-GAN -- Generating Strong High Quality Face Morphing Attacks using Latent Semantic Disentanglement	Aravinda Reddy PN et.al.	2404.12679	null
2024-04-19	How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples	Dren Fazlija et.al.	2404.12653	null
2024-04-19	F2FLDM: Latent Diffusion Models with Histopathology Pre-Trained Embeddings for Unpaired Frozen Section to FFPE Translation	Man M. Ho et.al.	2404.12650	null
2024-04-18	Alleviating Catastrophic Forgetting in Facial Expression Recognition with Emotion-Centered Models	Israel A. Laurensi et.al.	2404.12260	null
2024-04-18	First 2D electron density measurements using Coherence Imaging Spectroscopy in the MAST-U Super-X divertor	N. Lonigro et.al.	2404.12021	null
2024-04-18	©Plug-in Authorization for Human Content Copyright Protection in Text-to-Image Model	Chao Zhou et.al.	2404.11962	null
2024-04-18	Sketch-guided Image Inpainting with Partial Discrete Diffusion Process	Nakul Sharma et.al.	2404.11949	link
2024-04-18	LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights	Thibault Castells et.al.	2404.11936	null
2024-04-18	EdgeFusion: On-Device Text-to-Image Generation	Thibault Castells et.al.	2404.11925	null
2024-04-18	Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans	Lixing Tan et.al.	2404.11889	null
2024-04-18	Generating synthetic electroretinogram waveforms using Artificial Intelligence to improve classification of retinal conditions in under-represented populations	Mikhail Kulyabin et.al.	2404.11842	null
2024-04-18	TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation	Tianyi Liang et.al.	2404.11824	null
2024-04-18	Tailoring Generative Adversarial Networks for Smooth Airfoil Design	Joyjit Chattoraj et.al.	2404.11816	null
2024-04-17	On the Scalability of GNNs for Molecular Graphs	Maciej Sypetkowski et.al.	2404.11568	null
2024-04-17	MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation	Kuan-Chieh et.al.	2404.11565	null
2024-04-17	SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening	Yu Zhong et.al.	2404.11537	null
2024-04-17	Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt	Zhanjie Zhang et.al.	2404.11474	link
2024-04-17	What-if Analysis Framework for Digital Twins in 6G Wireless Network Management	Elif Ak et.al.	2404.11394	null
2024-04-17	Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks	Eri Hosonuma et.al.	2404.11280	null
2024-04-17	Optical Image-to-Image Translation Using Denoising Diffusion Models: Heterogeneous Change Detection as a Use Case	João Gabriel Vinholi et.al.	2404.11243	null
2024-04-17	KI-GAN: Knowledge-Informed Generative Adversarial Networks for Enhanced Multi-Vehicle Trajectory Forecasting at Signalized Intersections	Chuheng Wei et.al.	2404.11181	link
2024-04-17	TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing	Sherry X. Chen et.al.	2404.11120	link
2024-04-17	Object Remover Performance Evaluation Methods using Class-wise Object Removal Images	Changsuk Oh et.al.	2404.11104	null
2024-04-16	RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting	Ashkan Mirzaei et.al.	2404.10765	null
2024-04-16	LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?	Yuchi Wang et.al.	2404.10763	link
2024-04-16	AV-GAN: Attention-Based Varifocal Generative Adversarial Network for Uneven Medical Image Translation	Zexin Li et.al.	2404.10714	null
2024-04-16	Gaussian Splatting Decoder for 3D-aware Generative Adversarial Networks	Florian Barthel et.al.	2404.10625	null
2024-04-16	Adversarial Identity Injection for Semantic Face Image Synthesis	Giuseppe Tarollo et.al.	2404.10408	null
2024-04-16	Generating Counterfactual Trajectories with Latent Diffusion Models for Concept Discovery	Payal Varshney et.al.	2404.10356	null
2024-04-16	CanvasPic: An Interactive Tool for Freely Generating Facial Images Based on Spatial Layout	Jiafu Wei et.al.	2404.10352	null
2024-04-16	OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model	Runyi Li et.al.	2404.10312	null
2024-04-16	Learnable Prompt for Few-Shot Semantic Segmentation in Remote Sensing Domain	Steve Andreas Immanuel et.al.	2404.10307	link
2024-04-16	OneActor: Consistent Character Generation via Cluster-Conditioned Guidance	Jiahao Wang et.al.	2404.10267	null
2024-04-15	Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models	Ziwei Luo et.al.	2404.09732	link
2024-04-15	VFLGAN: Vertical Federated Learning-based Generative Adversarial Network for Vertically Partitioned Data Publication	Xun Yuan et.al.	2404.09722	null
2024-04-15	In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation	Han Xue et.al.	2404.09633	null
2024-04-15	Text-Driven Diverse Facial Texture Generation via Progressive Latent-Space Refinement	Chi Wang et.al.	2404.09540	null
2024-04-15	Magic Clothing: Controllable Garment-Driven Image Synthesis	Weifeng Chen et.al.	2404.09512	link
2024-04-15	Improved Object-Based Style Transfer with Single Deep Network	Harshmohan Kulkarni et.al.	2404.09461	null
2024-04-15	Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models	Peifei Zhu et.al.	2404.09401	null
2024-04-14	Counteracting Concept Drift by Learning with Future Malware Predictions	Branislav Bosansky et.al.	2404.09352	null
2024-04-14	DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling	Xuening Yuan et.al.	2404.09227	null
2024-04-13	InverseVis: Revealing the Hidden with Curved Sphere Tracing	Kai Lawonn et.al.	2404.09092	null
2024-04-12	An improved tabular data generator with VAE-GMM integration	Patricia A. Apellániz et.al.	2404.08434	null
2024-04-12	Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts	Yang Li et.al.	2404.08341	link
2024-04-11	Latent Guard: a Safety Framework for Text-to-image Generation	Runtao Liu et.al.	2404.08031	link
2024-04-11	Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models	Mazda Moayeri et.al.	2404.08030	null
2024-04-11	OpenBias: Open-set Bias Detection in Text-to-Image Generative Models	Moreno D'Incà et.al.	2404.07990	null
2024-04-11	Taming Stable Diffusion for Text to 360° Panorama Image Generation	Cheng Zhang et.al.	2404.07949	link
2024-04-11	Generating Synthetic Satellite Imagery With Deep-Learning Text-to-Image Models -- Technical Challenges and Implications for Monitoring and Verification	Tuong Vy Nguyen et.al.	2404.07754	null
2024-04-11	Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models	Tuomas Kynkäänniemi et.al.	2404.07724	null
2024-04-11	Model-based Cleaning of the QUILT-1M Pathology Dataset for Text-Conditional Image Synthesis	Marc Aubreville et.al.	2404.07676	null
2024-04-11	Implicit and Explicit Language Guidance for Diffusion-based Visual Perception	Hefeng Wang et.al.	2404.07600	null
2024-04-11	GAN-based iterative motion estimation in HASTE MRI	Mathias S. Feinler et.al.	2404.07576	null
2024-04-11	ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation	Stanislav Frolov et.al.	2404.07564	null
2024-04-11	CAT: Contrastive Adapter Training for Personalized Image Generation	Jae Wan Park et.al.	2404.07554	link
2024-04-11	Enhancing Network Intrusion Detection Performance using Generative Adversarial Networks	Xinxing Zhao et.al.	2404.07464	null
2024-04-10	RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion	Jaidev Shriram et.al.	2404.07199	null
2024-04-10	A Gauss-Newton Approach for Min-Max Optimization in Generative Adversarial Networks	Neel Mishra et.al.	2404.07172	link
2024-04-10	Implicit Multi-Spectral Transformer: An Lightweight and Effective Visible to Infrared Image Translation Model	Yijia Chen et.al.	2404.07072	link
2024-04-10	Fine color guidance in diffusion models and its application to image compression at extremely low bitrates	Tom Bordin et.al.	2404.06865	null
2024-04-10	UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion	Junsheng Zhou et.al.	2404.06851	null
2024-04-10	Tuning-Free Adaptive Style Incorporation for Structure-Consistent Text-Driven Style Transfer	Yanqi Ge et.al.	2404.06835	null
2024-04-10	MedRG: Medical Report Grounding with Multi-modal Large Language Model	Ke Zou et.al.	2404.06798	null
2024-04-10	CryinGAN: Design and evaluation of point-cloud-based generative adversarial networks using disordered materials $-$ application to Li$_3$ScCl$_6$-LiCoO$_2$ battery interfaces	Adrian Xiao Bin Yong et.al.	2404.06734	null
2024-04-10	Deep Generative Data Assimilation in Multimodal Setting	Yongquan Qu et.al.	2404.06665	link
2024-04-09	GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis	Srikumar Sastry et.al.	2404.06637	link
2024-04-09	High Noise Scheduling is a Must	Mahmut S. Gokmen et.al.	2404.06353	null
2024-04-09	Fortifying Fully Convolutional Generative Adversarial Networks for Image Super-Resolution Using Divergence Measures	Arkaprabha Basu et.al.	2404.06294	null
2024-04-09	Hyperparameter-Free Medical Image Synthesis for Sharing Data and Improving Site-Specific Segmentation	Alexander Chebykin et.al.	2404.06240	link
2024-04-09	DiffHarmony: Latent Diffusion Model Meets Image Harmonization	Pengfei Zhou et.al.	2404.06139	null
2024-04-09	Greedy-DiM: Greedy Algorithms for Unreasonably Effective Face Morphs	Zander W. Blasingame et.al.	2404.06025	null
2024-04-09	Boosting Digital Safeguards: Blending Cryptography and Steganography	Anamitra Maiti et.al.	2404.05985	null
2024-04-09	Tackling Structural Hallucination in Image Translation with Local Diffusion	Seunghoi Kim et.al.	2404.05980	null
2024-04-09	StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion	Ming Tao et.al.	2404.05979	link
2024-04-09	Quantum Generative Adversarial Networks in a Silicon Photonic Chip with Maximum Expressibility	Haoran Ma et.al.	2404.05921	null
2024-04-08	SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing	Jing Gu et.al.	2404.05717	null
2024-04-08	Learning 3D-Aware GANs from Unposed Images with Template Feature Field	Xinya Chen et.al.	2404.05705	null
2024-04-08	SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation	Heyuan Li et.al.	2404.05680	null
2024-04-08	MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation	Kunpeng Song et.al.	2404.05674	null
2024-04-08	Automatic Controllable Colorization via Imagination	Xiaoyan Cong et.al.	2404.05661	null
2024-04-08	UniFL: Improve Stable Diffusion via Unified Feedback Learning	Jiacheng Zhang et.al.	2404.05595	null
2024-04-08	Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI	Hugo Caselles-Dupré et.al.	2404.05468	null
2024-04-08	CDAD-Net: Bridging Domain Gaps in Generalized Category Discovery	Sai Bhargav Rongali et.al.	2404.05366	null
2024-04-08	Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt	Zhiqi Huang et.al.	2404.05331	null
2024-04-08	MC $^2$ : Multi-concept Guidance for Customized Multi-concept Generation	Jiaxiu Jiang et.al.	2404.05268	null
2024-04-04	No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance	Vishaal Udandarao et.al.	2404.04125	link
2024-04-05	3D Facial Expressions through Analysis-by-Neural-Synthesis	George Retsinas et.al.	2404.04104	null
2024-04-05	Dynamic Prompt Optimizing for Text-to-Image Generation	Wenyi Mo et.al.	2404.04095	link
2024-04-05	Physics-Inspired Synthesized Underwater Image Dataset	Reina Kaneko et.al.	2404.03998	null
2024-04-05	Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models	Gihyun Kwon et.al.	2404.03913	null
2024-04-04	RaFE: Generative Radiance Fields Restoration	Zhongkai Wu et.al.	2404.03654	null
2024-04-04	CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching	Dongzhi Jiang et.al.	2404.03653	link
2024-04-04	Reference-Based 3D-Aware Image Editing with Triplane	Bahri Batuhan Bilecen et.al.	2404.03632	null
2024-04-04	Robust Concept Erasure Using Task Vectors	Minh Pham et.al.	2404.03631	null
2024-04-04	Terrain Point Cloud Inpainting via Signal Decomposition	Yizhou Xie et.al.	2404.03572	null
2024-04-04	Integrating Generative AI into Financial Market Prediction for Improved Decision Making	Chang Che et.al.	2404.03523	null
2024-04-04	Knowledge Distillation-Based Model Extraction Attack using Private Counterfactual Explanations	Fatima Ezzeddine et.al.	2404.03348	null
2024-04-04	Multi Positive Contrastive Learning with Pose-Consistent Generated Images	Sho Inayoshi et.al.	2404.03256	null
2024-04-04	Would Deep Generative Models Amplify Bias in Future Models?	Tianwei Chen et.al.	2404.03242	null
2024-04-04	Diverse and Tailored Image Generation for Zero-shot Multi-label Classification	Kaixin Zhang et.al.	2404.03144	null
2024-04-03	Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction	Keyu Tian et.al.	2404.02905	link
2024-04-03	MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment	Duygu Ceylan et.al.	2404.02899	null
2024-04-03	On the Scalability of Diffusion-based Text-to-Image Generation	Hao Li et.al.	2404.02883	null
2024-04-03	MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation	Petru-Daniel Tudosiu et.al.	2404.02790	null
2024-04-03	InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation	Haofan Wang et.al.	2404.02733	link
2024-04-03	Model-agnostic Origin Attribution of Generated Images with Few-shot Examples	Fengyuan Liu et.al.	2404.02697	null
2024-04-03	Deep Privacy Funnel Model: From a Discriminative to a Generative Approach with an Application to Face Recognition	Behrooz Razeghi et.al.	2404.02696	null
2024-04-03	Severity Controlled Text-to-Image Generative Model Bias Manipulation	Jordan Vice et.al.	2404.02530	null
2024-04-03	Designing a Photonic Physically Unclonable Function Having Resilience to Machine Learning Attacks	Elena R. Henderson et.al.	2404.02440	null
2024-04-02	Diffusion $^2$ : Dynamic 3D Content Generation via Score Composition of Orthogonal Diffusion Models	Zeyu Yang et.al.	2404.02148	link
2024-04-02	3D Congealing: 3D-Aware Image Alignment in the Wild	Yunzhi Zhang et.al.	2404.02125	null
2024-04-02	Red-Teaming Segment Anything Model	Krzysztof Jankowski et.al.	2404.02067	link
2024-04-02	MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages	Daryna Dementieva et.al.	2404.02037	null
2024-04-02	Enhancing Portfolio Optimization with Transformer-GAN Integration: A Novel Approach in the Black-Litterman Framework	Enmin Zhu et.al.	2404.02029	null
2024-04-02	Bi-LORA: A Vision-Language Approach for Synthetic Image Detection	Mamadou Keita et.al.	2404.01959	null
2024-04-02	Real, fake and synthetic faces -- does the coin have three sides?	Shahzeb Naeem et.al.	2404.01878	null
2024-04-02	Disentangled Pre-training for Human-Object Interaction Detection	Zhuolong Li et.al.	2404.01725	null
2024-04-01	PlayFutures: Imagining Civic Futures with AI and Puppets	Supratim Pait et.al.	2404.01527	null
2024-04-01	Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data	Matthias Gerstgrasser et.al.	2404.01413	null
2024-03-29	Benchmarking Counterfactual Image Generation	Thomas Melistas et.al.	2403.20287	link
2024-03-29	FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models	Barbara Toniella Corradini et.al.	2403.20105	null
2024-03-29	SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image	Yunhao Li et.al.	2403.20018	link
2024-03-29	FairRAG: Fair Human Generation via Fair Retrieval Augmentation	Robik Shrestha et.al.	2403.19964	null
2024-04-01	Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting	Haipeng Liu et.al.	2403.19898	link
2024-03-28	Vision-Language Synthetic Data Enhances Echocardiography Downstream Tasks	Pooria Ashrafian et.al.	2403.19880	link
2024-03-28	Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization	Yuhang Li et.al.	2403.19866	null
2024-03-28	CLoRA: A Contrastive Approach to Compose Multiple LoRA Models	Tuna Han Salih Meral et.al.	2403.19776	null
2024-03-28	Detecting Image Attribution for Text-to-Image Diffusion Models in RGB and Beyond	Katherine Xu et.al.	2403.19653	link
2024-03-28	GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models	Yusuf Dalva et.al.	2403.19645	null
2024-03-28	Lane-Change in Dense Traffic with Model Predictive Control and Neural Networks	Sangjae Bae et.al.	2403.19633	link
2024-03-28	Collaborative Interactive Evolution of Art in the Latent Space of Deep Generative Models	Ole Hall et.al.	2403.19620	null
2024-03-28	Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model	Zhicai Wang et.al.	2403.19600	link
2024-03-28	Frame by Familiar Frame: Understanding Replication in Video Diffusion Models	Aimon Rahman et.al.	2403.19593	null
2024-03-28	Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance	Yulin Pan et.al.	2403.19534	null
2024-03-28	Imperceptible Protection against Style Imitation from Diffusion Models	Namhyuk Ahn et.al.	2403.19254	null
2024-03-28	QNCD: Quantization Noise Correction for Diffusion Models	Huanpeng Chu et.al.	2403.19140	link
2024-03-28	Synthetic Medical Imaging Generation with Generative Adversarial Networks For Plain Radiographs	John R. McNulty et.al.	2403.19107	null
2024-03-27	Conditional Wasserstein Distances with Applications in Bayesian OT Flow Matching	Jannis Chemseddine et.al.	2403.18705	null
2024-03-27	Attention Calibration for Disentangled Text-to-Image Personalization	Yanbing Zhang et.al.	2403.18551	link
2024-03-27	DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis	Zhongxi Chen et.al.	2403.18471	link
2024-03-27	DiffStyler: Diffusion-based Localized Image Style Transfer	Shaoxu Li et.al.	2403.18461	null
2024-03-27	U-Sketch: An Efficient Approach for Sketch to Image Diffusion Models	Ilias Mitsouras et.al.	2403.18425	null
2024-03-27	ECNet: Effective Controllable Text-to-Image Diffusion Models	Sicheng Li et.al.	2403.18417	null
2024-03-27	Colour and Brush Stroke Pattern Recognition in Abstract Art using Modified Deep Convolutional Generative Adversarial Networks	Srinitish Srinivasan et.al.	2403.18397	link
2024-03-27	Ship in Sight: Diffusion Models for Ship-Image Super Resolution	Luigi Sigillo et.al.	2403.18370	link
2024-03-27	DSF-GAN: DownStream Feedback Generative Adversarial Network	Oriel Perets et.al.	2403.18267	link
2024-03-27	Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting	Haiwei Chen et.al.	2403.18186	null
2024-03-26	Boosting Diffusion Models with Moving Average Sampling in Frequency Domain	Yurui Qian et.al.	2403.17870	null
2024-03-26	CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation	Yongrui Yu et.al.	2403.17770	null
2024-03-26	FaultGuard: A Generative Approach to Resilient Fault Prediction in Smart Electrical Grids	Emad Efatinasab et.al.	2403.17494	null
2024-03-26	LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection	Yunpeng Luo et.al.	2403.17465	null
2024-03-26	An inexact proximal MM method for a class of nonconvex composite image reconstruction models	Bujin Li et.al.	2403.17450	null
2024-03-25	DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment	Stella Bounareli et.al.	2403.17217	null
2024-03-25	FlashFace: Human Image Personalization with High-fidelity Identity Preservation	Shilong Zhang et.al.	2403.17008	null
2024-03-25	SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer	Rui Zhu et.al.	2403.17004	null
2024-03-25	Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation	Omer Dahary et.al.	2403.16990	null
2024-03-25	Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance	Jingyuan Zhu et.al.	2403.16954	null
2024-03-25	Iso-Diffusion: Improving Diffusion Probabilistic Models Using the Isotropy of the Additive Gaussian Noise	Dilum Fernando et.al.	2403.16790	null
2024-03-25	Diff-Def: Diffusion-Generated Deformation Fields for Conditional Atlases	Sophie Starck et.al.	2403.16776	null
2024-03-25	Multi-Scale Texture Loss for CT denoising with GANs	Francesco Di Feola et.al.	2403.16640	link
2024-03-25	SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions	Yuda Song et.al.	2403.16627	null
2024-03-25	Enhancing Cross-Dataset EEG Emotion Recognition: A Novel Approach with Emotional EEG Style Transfer Network	Yijin Zhou et.al.	2403.16540	null
2024-03-25	An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models	Zizhao Hu et.al.	2403.16530	null
2024-03-25	Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator	Takuhiro Kaneko et.al.	2403.16464	null
2024-03-25	Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation	Sanyam Lakhanpal et.al.	2403.16422	null
2024-03-25	Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation	Yingshan Chang et.al.	2403.16394	null
2024-03-25	Illuminating Systematic Trends in Nuclear Data with Generative Machine Learning Models	Jordan M. R. Fox et.al.	2403.16389	null
2024-03-25	FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models	Lin Zhao et.al.	2403.16379	null
2024-03-24	Fill in the ____ (a Diffusion-based Image Inpainting Pipeline)	Eyoel Gebre et.al.	2403.16016	null
2024-03-22	DragAPart: Learning a Part-Level Motion Prior for Articulated Objects	Ruining Li et.al.	2403.15382	null
2024-03-22	Long-CLIP: Unlocking the Long-Text Capability of CLIP	Beichen Zhang et.al.	2403.15378	null
2024-03-22	A Wasserstein perspective of Vanilla GANs	Lea Kunkel et.al.	2403.15312	null
2024-03-22	Controlled Training Data Generation with Diffusion Models	Teresa Yeo et.al.	2403.15309	null
2024-03-22	Robust Utility Optimization via a GAN Approach	Florian Krach et.al.	2403.15243	null
2024-03-22	A Multimodal Approach for Cross-Domain Image Retrieval	Lucas Iijima et.al.	2403.15152	null
2024-03-22	MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration	Zhichao Wei et.al.	2403.15059	null
2024-03-22	Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning	Bumsoo Kim et.al.	2403.15048	null
2024-03-22	Generative Active Learning for Image Synthesis Personalization	Xulu Zhang et.al.	2403.14987	null
2024-03-22	CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model	Seungdae Han et.al.	2403.14944	null
2024-03-21	Implicit Style-Content Separation using B-LoRA	Yarden Frenkel et.al.	2403.14572	null
2024-03-21	DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing	Yueru Jia et.al.	2403.14487	null
2024-03-21	AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks	Max Ku et.al.	2403.14468	null
2024-03-21	Analysing Diffusion Segmentation for Medical Images	Mathias Öttl et.al.	2403.14440	null
2024-03-21	Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation	Mathias Öttl et.al.	2403.14429	null
2024-03-21	HySim: An Efficient Hybrid Similarity Measure for Patch Matching in Image Inpainting	Saad Noufel et.al.	2403.14292	null
2024-03-21	Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models	Pablo Marcos-Manchón et.al.	2403.14291	link
2024-03-21	Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations	Xun Lin et.al.	2403.14250	null
2024-03-21	StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN	Jongwoo Choi et.al.	2403.14186	null
2024-03-21	QSMDiff: Unsupervised 3D Diffusion Models for Quantitative Susceptibility Mapping	Zhuang Xiong et.al.	2403.14070	null
2024-03-20	Learning from Models and Data for Visual Grounding	Ruozhen He et.al.	2403.13804	null
2024-03-20	Step-Calibrated Diffusion for Biomedical Optical Image Restoration	Yiwei Lyu et.al.	2403.13680	null
2024-03-20	ReGround: Improving Textual and Spatial Grounding at No Cost	Yuseung Lee et.al.	2403.13589	null
2024-03-20	Diversity-aware Channel Pruning for StyleGAN Compression	Jiwoo Chung et.al.	2403.13548	link
2024-03-20	IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models	Siying Cui et.al.	2403.13535	null
2024-03-20	Deepfake Detection without Deepfakes: Generalization via Synthetic Frequency Patterns Injection	Davide Alessandro Coccomini et.al.	2403.13479	null
2024-03-20	S2DM: Sector-Shaped Diffusion Models for Video Generation	Haoran Lang et.al.	2403.13408	null
2024-03-20	IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis	Feng Liu et.al.	2403.13378	null
2024-03-20	AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation	Jingkun An et.al.	2403.13352	null
2024-03-20	TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation	Santosh Sanjeev et.al.	2403.13343	null
2024-03-19	FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis	Linjiang Huang et.al.	2403.12963	link
2024-03-19	Segment Anything for comprehensive analysis of grapevine cluster architecture and berry properties	Efrain Torres-Lomas et.al.	2403.12935	null
2024-03-19	You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs	Yihong Luo et.al.	2403.12931	link
2024-03-19	Ultra-High-Resolution Image Synthesis with Pyramid Diffusion Model	Jiajie Yang et.al.	2403.12915	link
2024-03-19	Generative Enhancement for 3D Medical Images	Lingting Zhu et.al.	2403.12852	link
2024-03-19	How Spammers and Scammers Leverage AI-Generated Images on Facebook for Audience Growth	Renee DiResta et.al.	2403.12838	null
2024-03-19	Total Disentanglement of Font Images into Style and Character Class Features	Daichi Haraguchi et.al.	2403.12784	null
2024-03-19	Towards Controllable Face Generation with Semantic Latent Diffusion Models	Alex Ergasti et.al.	2403.12743	link
2024-03-19	Tuning-Free Image Customization with Image and Text Guidance	Pengzhi Li et.al.	2403.12658	null
2024-03-19	NSGAN: A Non-Dominant Sorting Optimisation-Based Generative Adversarial Design Framework for Alloy Discovery	Zhipeng Li et.al.	2403.12495	null
2024-03-18	Urban Scene Diffusion through Semantic Occupancy Map	Junge Zhang et.al.	2403.11697	null
2024-03-18	Binary Noise for Binary Tasks: Masked Bernoulli Diffusion for Unsupervised Anomaly Detection	Julia Wolleb et.al.	2403.11667	null
2024-03-18	LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model	Yuxin Cao et.al.	2403.11656	null
2024-03-18	QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation	Zhizhen Zhou et.al.	2403.11626	null
2024-03-18	CRS-Diff: Controllable Generative Remote Sensing Foundation Model	Datao Tang et.al.	2403.11614	null
2024-03-18	VmambaIR: Visual State Space Model for Image Restoration	Yuan Shi et.al.	2403.11423	link
2024-03-17	StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining	Tushar Kataria et.al.	2403.11340	null
2024-03-17	Fast Personalized Text-to-Image Syntheses With Attention Injection	Yuxuan Zhang et.al.	2403.11284	null
2024-03-17	Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation	Silvia Corbara et.al.	2403.11265	null
2024-03-17	Understanding Diffusion Models by Feynman's Path Integral	Yuji Hirono et.al.	2403.11262	null
2024-03-14	SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior	Huan-ang Gao et.al.	2403.09638	null
2024-03-14	Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering	Zeyu Liu et.al.	2403.09622	null
2024-03-14	PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation	Yuhan Guo et.al.	2403.09615	null
2024-03-14	Counterfactual contrastive learning: robust representations via causal image synthesis	Melanie Roschewitz et.al.	2403.09605	link
2024-03-14	Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing	Wonjun Kang et.al.	2403.09468	link
2024-03-14	Mitigating attribute amplification in counterfactual image generation	Tian Xia et.al.	2403.09422	null
2024-03-14	Machine Learning Processes as Sources of Ambiguity: Insights from AI Art	Christian Sivertsen et.al.	2403.09374	null
2024-03-14	Mitigating Data Consistency Induced Discrepancy in Cascaded Diffusion Models for Sparse-view CT Reconstruction	Hanyu Chen et.al.	2403.09355	null
2024-03-14	StainFuser: Controlling Diffusion for Faster Neural Style Transfer in Multi-Gigapixel Histology Images	Robert Jewsbury et.al.	2403.09302	link
2024-03-14	Noise Dimension of GAN: An Image Compression Perspective	Ziran Zhu et.al.	2403.09196	null
2024-03-13	Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models trained on Corrupted Data	Asad Aali et.al.	2403.08728	link
2024-03-13	HAIFIT: Human-Centered AI for Fashion Image Translation	Jianan Jiang et.al.	2403.08651	link
2024-03-13	Gaussian Splatting in Style	Abhishek Saroha et.al.	2403.08498	null
2024-03-13	An Analysis of Human Alignment of Latent Diffusion Models	Lorenz Linhardt et.al.	2403.08469	null
2024-03-13	Generating Synthetic Computed Tomography for Radiotherapy: SynthRAD2023 Challenge Report	Evi M. C. Huijben et.al.	2403.08447	null
2024-03-13	Iterative Online Image Synthesis via Diffusion Model for Imbalanced Classification	Shuhan Li et.al.	2403.08407	null
2024-03-13	StyleDyRF: Zero-shot 4D Style Transfer for Dynamic Neural Radiance Fields	Hongbin Xu et.al.	2403.08310	null
2024-03-13	Attack Deterministic Conditional Image Generative Models for Diverse and Controllable Generation	Tianyi Chu et.al.	2403.08294	null
2024-03-13	VIGFace: Virtual Identity Generation Model for Face Image Synthesis	Minsoo Kim et.al.	2403.08277	null
2024-03-13	CoroNetGAN: Controlled Pruning of GANs via Hypernetworks	Aman Kumar et.al.	2403.08261	null
2024-03-12	Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation	Shihao Zhao et.al.	2403.07860	link
2024-03-12	Quantifying and Mitigating Privacy Risks for Tabular Generative Models	Chaoyi Zhu et.al.	2403.07842	null
2024-03-12	StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting	Kunhao Liu et.al.	2403.07807	null
2024-03-12	BraSyn 2023 challenge: Missing MRI synthesis and the effect of different learning objectives	Ivo M. Baltruschat et.al.	2403.07800	null
2024-03-12	Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model	Yuxuan Zhang et.al.	2403.07764	null
2024-03-12	Synth $^2$ : Boosting Visual-Language Models with Synthetic Captions and Image Embeddings	Sahand Sharifzadeh et.al.	2403.07750	null
2024-03-12	Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion	Dongyang Li et.al.	2403.07721	link
2024-03-12	SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces	Yuta Oshima et.al.	2403.07711	link
2024-03-12	Towards Model Extraction Attacks in GAN-Based Image Translation via Domain Shift Mitigation	Di Mi et.al.	2403.07673	null
2024-03-12	Gender-ambiguous voice generation through feminine speaking style transfer in male voices	Maria Koutsogiannaki et.al.	2403.07661	null
2024-03-11	BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion	Xuan Ju et.al.	2403.06976	null
2024-03-11	Surface-aware Mesh Texture Synthesis with Pre-trained 2D CNNs	Áron Samuel Kovács et.al.	2403.06855	null
2024-03-11	Medical Image Synthesis via Fine-Grained Image-Text Alignment and Anatomy-Pathology Prompting	Wenting Chen et.al.	2403.06835	null
2024-03-11	Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection	Chuangchuang Tan et.al.	2403.06803	link
2024-03-11	FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation	Pengchong Qiao et.al.	2403.06775	link
2024-03-11	Distribution-Aware Data Expansion with Diffusion Models	Haowei Zhu et.al.	2403.06741	link
2024-03-11	Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback	Adarsh N L et.al.	2403.06735	null
2024-03-11	Galaxy Morphologies Revealed with Subaru HSC and Super-Resolution Techniques II: Environmental Dependence of Galaxy Mergers at z~2-5	Takatoshi Shibuya et.al.	2403.06729	null
2024-03-11	FFAD: A Novel Metric for Assessing Generated Time Series Data Utilizing Fourier Transform and Auto-encoder	Yang Chen et.al.	2403.06576	null
2024-03-11	Active Generation for Image Classification	Tao Huang et.al.	2403.06517	null
2024-03-08	Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapola	Yijiang Li et.al.	2403.05523	null
2024-03-08	A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN	Cristiana Tiago et.al.	2403.05384	null
2024-03-08	Federated Learning Method for Preserving Privacy in Face Recognition System	Enoch Solomon et.al.	2403.05344	null
2024-03-08	Fine-tuning a Multiple Instance Learning Feature Extractor with Masked Context Modelling and Knowledge Distillation	Juan I. Pisula et.al.	2403.05325	null
2024-03-08	GAN-based Massive MIMO Channel Model Trained on Measured Data	Florian Euchner et.al.	2403.05321	null
2024-03-08	An Efficient Quasi-Random Sampling for Copulas	Sumin Wang et.al.	2403.05281	null
2024-03-08	Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation	Junyan Wang et.al.	2403.05239	null
2024-03-08	Synthetic Privileged Information Enhances Medical Image Representation Learning	Lucas Farndale et.al.	2403.05220	null
2024-03-08	Denoising Autoregressive Representation Learning	Yazhe Li et.al.	2403.05196	null
2024-03-08	Robust Semantic Communications for Speech-to-Text Translation	Zhenzi Weng et.al.	2403.05187	null
2024-03-07	Photonic probabilistic machine learning using quantum vacuum noise	Seou Choi et.al.	2403.04731	null
2024-03-07	PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation	Junsong Chen et.al.	2403.04692	null
2024-03-07	A Domain Translation Framework with an Adversarial Denoising Diffusion Model to Generate Synthetic Datasets of Echocardiography Images	Cristiana Tiago et.al.	2403.04612	null
2024-03-07	Discriminative Probing and Tuning for Text-to-Image Generation	Leigang Qu et.al.	2403.04321	null
2024-03-06	PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement	Zhijie Wang et.al.	2403.04014	link
2024-03-06	Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer	Naifu Xue et.al.	2403.03736	null
2024-03-06	Seamless Virtual Reality with Integrated Synchronizer and Synthesizer for Autonomous Driving	He Li et.al.	2403.03541	null
2024-03-06	NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging	Takahiro Shirakawa et.al.	2403.03485	null
2024-03-06	FLAME Diffuser: Grounded Wildfire Image Synthesis using Mask Guided Diffusion	Hao Wang et.al.	2403.03463	null
2024-03-07	DLP-GAN: learning to draw modern Chinese landscape photos with generative adversarial network	Xiangquan Gui et.al.	2403.03456	null
2024-03-06	Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing	Bingyan Liu et.al.	2403.03431	null
2024-03-05	Scaling Rectified Flow Transformers for High-Resolution Image Synthesis	Patrick Esser et.al.	2403.03206	null
2024-03-05	Behavior Generation with Latent Actions	Seungjae Lee et.al.	2403.03181	link
2024-03-05	Doubly Abductive Counterfactual Inference for Text-based Image Editing	Xue Song et.al.	2403.02981	null
2024-03-05	Bias in Generative AI	Mi Zhou et.al.	2403.02726	null
2024-03-05	Time Weaver: A Conditional Time Series Generation Model	Sai Shankar Narasimhan et.al.	2403.02682	null
2024-03-04	Transformer for Times Series: an Application to the S&P500	Pierre Brugiere et.al.	2403.02523	null
2024-03-04	NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function	Abdullah Nazhat Abdullah et.al.	2403.02411	link
2024-03-04	ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models	Jiaxiang Cheng et.al.	2403.02084	null
2024-03-05	Matrix Completion with Convex Optimization and Column Subset Selection	Antonina Krajewska et.al.	2403.01919	link
2024-03-04	PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis	Zhengyao Lv et.al.	2403.01852	link
2024-03-02	Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models	Neta Shaul et.al.	2403.01329	null
2024-03-02	TCIG: Two-Stage Controlled Image Generation with Quality Enhancement through Diffusion	Salaheldin Mohamed et.al.	2403.01212	null
2024-03-02	A Hybrid Model for Traffic Incident Detection based on Generative Adversarial Networks and Transformer Model	Xinying Lu et.al.	2403.01147	null
2024-03-02	Distilling Text Style Transfer With Self-Explanation From LLMs	Chiyu Zhang et.al.	2403.01106	null
2024-03-01	BasedAI: A decentralized P2P network for Zero Knowledge Large Language Models (ZK-LLMs)	Sean Wellington et.al.	2403.01008	null
2024-03-01	Improving Android Malware Detection Through Data Augmentation Using Wasserstein Generative Adversarial Networks	Kawana Stalin et.al.	2403.00890	null
2024-03-01	Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks	Yuhao Liu et.al.	2403.00644	null
2024-03-01	Improving Explicit Spatial Relationships in Text-to-Image Generation through an Automatically Derived Dataset	Ander Salaberria et.al.	2403.00587	link
2024-03-01	Rethinking cluster-conditioned diffusion models	Nikolas Adaloglou et.al.	2403.00570	null
2024-03-01	VisionLLaMA: A Unified LLaMA Interface for Vision Tasks	Xiangxiang Chu et.al.	2403.00522	link
2024-02-29	SeD: Semantic-Aware Discriminator for Image Super-Resolution	Bingchen Li et.al.	2402.19387	null
2024-02-29	A Novel Approach to Industrial Defect Generation through Blended Latent Diffusion Model with Online Adaptation	Hanxi Li et.al.	2402.19330	null
2024-02-29	Memory-Augmented Generative Adversarial Transformers	Stephan Raaijmakers et.al.	2402.19218	null
2024-02-29	Generative models struggle with kirigami metamaterials	Gerrit Felsch et.al.	2402.19196	null
2024-02-29	Disentangling representations of retinal images with generative models	Sarah Müller et.al.	2402.19186	null
2024-02-29	Trajectory Consistency Distillation	Jianbin Zheng et.al.	2402.19159	link
2024-02-29	Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection	Christos Koutlis et.al.	2402.19091	null
2024-02-29	WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis	Paul Friedrich et.al.	2402.19043	link
2024-02-29	Lotka-Volterra Model with Mutations and Generative Adversarial Networks	S. V. Kozyrev et.al.	2402.19035	null
2024-02-29	Generating, Reconstructing, and Representing Discrete and Continuous Data: Generalized Diffusion with Learnable Encoding-Decoding	Guangyi Liu et.al.	2402.19009	null
2024-02-28	MambaMIR: An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation	Jiahao Huang et.al.	2402.18451	null
2024-02-28	FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes	Ziying Pan et.al.	2402.18331	null
2024-02-28	Balancing Act: Distribution-Guided Debiasing in Diffusion Models	Rishubh Parihar et.al.	2402.18206	null
2024-02-28	Misalignment-Robust Frequency Distribution Loss for Image Transformation	Zhangkai Ni et.al.	2402.18192	null
2024-02-28	VulMCI : Code Splicing-based Pixel-row Oversampling for More Continuous Vulnerability Image Generation	Tao Peng et.al.	2402.18189	null
2024-02-28	Block and Detail: Scaffolding Sketch-to-Image Generation	Vishnu Sarukkai et.al.	2402.18116	null
2024-02-28	Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis	Yanzuo Lu et.al.	2402.18078	link
2024-02-28	SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model	Bin Cao et.al.	2402.18068	null
2024-02-28	Breaking the Black-Box: Confidence-Guided Model Inversion Attack for Distribution Shift	Xinhao Liu et.al.	2402.18027	null
2024-02-27	CustomSketching: Sketch Concept Extraction for Sketch-based Image Synthesis and Editing	Chufeng Xiao et.al.	2402.17624	null

(back to top)

LLM

Publish Date	Title	Authors	PDF	Code
2024-05-30	MotionLLM: Understanding Human Behaviors from Human Motions and Videos	Ling-Hao Chen et.al.	2405.20340	null
2024-05-30	Visual Perception by Large Language Model's Weights	Feipeng Ma et.al.	2405.20339	null
2024-05-30	Xwin-LM: Strong and Scalable Alignment Practice for LLMs	Bolin Ni et.al.	2405.20335	link
2024-05-30	ParSEL: Parameterized Shape Editing with Language	Aditya Ganeshan et.al.	2405.20319	null
2024-05-30	CausalQuest: Collecting Natural Causal Questions for AI Agents	Roberto Ceraolo et.al.	2405.20318	link
2024-05-30	ANAH: Analytical Annotation of Hallucinations in Large Language Models	Ziwei Ji et.al.	2405.20315	link
2024-05-30	Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation	Guillaume Huguet et.al.	2405.20313	null
2024-05-30	Large Language Models Can Self-Improve At Web Agent Tasks	Ajay Patel et.al.	2405.20309	null
2024-05-30	Group Robust Preference Optimization in Reward-free RLHF	Shyam Sundhar Ramesh et.al.	2405.20304	null
2024-05-30	Who Writes the Review, Human or AI?	Panagiotis C. Theocharopoulos et.al.	2405.20285	null
2024-05-29	X-VILA: Cross-Modality Alignment for Large Language Model	Hanrong Ye et.al.	2405.19335	null
2024-05-29	LLMs Meet Multimodal Generation and Editing: A Survey	Yingqing He et.al.	2405.19334	link
2024-05-29	Multi-Modal Generative Embedding Model	Feipeng Ma et.al.	2405.19333	null
2024-05-29	Self-Exploring Language Models: Active Preference Elicitation for Online Alignment	Shenao Zhang et.al.	2405.19332	link
2024-05-29	Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation	Atrisha Sarkar et.al.	2405.19328	null
2024-05-29	MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series	Ge Zhang et.al.	2405.19327	null
2024-05-29	Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models	Tianrun Chen et.al.	2405.19326	null
2024-05-29	Nearest Neighbor Speculative Decoding for LLM Generation and Attribution	Minghan Li et.al.	2405.19325	null
2024-05-29	Are Large Language Models Chameleons?	Mingmeng Geng et.al.	2405.19323	null
2024-05-29	Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF	Shicong Cen et.al.	2405.19320	null
2024-05-28	Don't Forget to Connect! Improving RAG with Graph-based Reranking	Jialin Dong et.al.	2405.18414	null
2024-05-28	Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass	Ethan Shen et.al.	2405.18400	link
2024-05-28	Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning	Yixiao Zhang et.al.	2405.18386	link
2024-05-28	OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning	Pengxiang Li et.al.	2405.18380	link
2024-05-28	LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models	Anthony Sarah et.al.	2405.18377	null
2024-05-28	Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning	Dongjie Chen et.al.	2405.18376	link
2024-05-28	Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning	Phakphum Artkaew et.al.	2405.18375	null
2024-05-28	PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework	Eshaan Agarwal et.al.	2405.18369	null
2024-05-28	Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving?	Yifan Bai et.al.	2405.18361	null
2024-05-28	Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs	Somnath Kumar et.al.	2405.18359	null
2024-05-27	Matryoshka Multimodal Models	Mu Cai et.al.	2405.17430	null
2024-05-27	NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models	Chankyu Lee et.al.	2405.17428	null
2024-05-27	Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model	Kuan-Chih Huang et.al.	2405.17427	link
2024-05-27	LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence	Zhuoling Li et.al.	2405.17424	null
2024-05-27	Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation	Jiaming Liu et.al.	2405.17418	null
2024-05-27	THREAD: Thinking Deeper with Recursive Spawning	Philip Schroeder et.al.	2405.17402	null
2024-05-27	MindMerger: Efficient Boosting LLM Reasoning in non-English Languages	Zixian Huang et.al.	2405.17386	null
2024-05-27	ReMoDetect: Reward Models Recognize Aligned LLM's Generations	Hyunseok Lee et.al.	2405.17382	null
2024-05-27	RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects	Ahmed Allam et.al.	2405.17378	null
2024-05-27	Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models	ShengYun Peng et.al.	2405.17374	null
2024-05-24	Scaling Laws for Discriminative Classification in Large Language Models	Dean Wyatte et.al.	2405.15765	null
2024-05-24	Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias	Andres Algaba et.al.	2405.15739	null
2024-05-24	More Insight from Being More Focused: Analysis of Clustered Market Apps	Maleknaz Nayebi et.al.	2405.15737	null
2024-05-24	LM4LV: A Frozen Large Language Model for Low-level Vision Tasks	Boyang Zheng et.al.	2405.15734	null
2024-05-24	Optimizing Large Language Models for OpenAPI Code Completion	Bohdan Petryshyn et.al.	2405.15729	null
2024-05-24	Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models	Yue Zhang et.al.	2405.15684	null
2024-05-24	What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models	Abdelrahman Abdelhamed et.al.	2405.15668	null
2024-05-24	Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning	Wenhan Chang et.al.	2405.15662	null
2024-05-24	$$\mathbf{L^2\cdot M = C^2}$$ Large Language Models as Covert Channels... a Systematic Analysis	Simen Gaure et.al.	2405.15652	null
2024-05-24	LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots	Ruoyu Wang et.al.	2405.15646	null
2024-05-23	A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns	Asaf Yehudai et.al.	2405.14863	null
2024-05-23	Bitune: Bidirectional Instruction-Tuning	Dawid J. Kopiczko et.al.	2405.14862	null
2024-05-23	PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression	Vladimir Malinovskii et.al.	2405.14852	null
2024-05-23	HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models	Bernal Jiménez Gutiérrez et.al.	2405.14831	null
2024-05-23	Can LLMs Solve longer Math Word Problems Better?	Xin Xu et.al.	2405.14804	null
2024-05-23	Lessons from the Trenches on Reproducible Evaluation of Language Models	Stella Biderman et.al.	2405.14782	null
2024-05-23	WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models	Peng Wang et.al.	2405.14768	link
2024-05-23	FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models	Hongyang Yang et.al.	2405.14767	link
2024-05-23	Evaluating Large Language Models for Public Health Classification and Extraction Tasks	Joshua Harris et.al.	2405.14766	null
2024-05-23	Large language models can be zero-shot anomaly detectors for time series?	Sarah Alnegheimish et.al.	2405.14755	null
2024-05-21	Reducing Transformer Key-Value Cache Size with Cross-Layer Attention	William Brandon et.al.	2405.12981	null
2024-05-21	Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale	Shriram Chennakesavalu et.al.	2405.12961	null
2024-05-21	Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models	Zhangyue Yin et.al.	2405.12939	null
2024-05-21	Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs	Bilgehan Sel et.al.	2405.12933	null
2024-05-21	Code-mixed Sentiment and Hate-speech Prediction	Anjali Yadav et.al.	2405.12929	null
2024-05-21	Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples	Tim Menzies et.al.	2405.12920	null
2024-05-21	G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation	Xingyuan Pan et.al.	2405.12915	null
2024-05-21	An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation	Zhiyu Tan et.al.	2405.12914	null
2024-05-21	Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment	Holli Sargeant et.al.	2405.12910	link
2024-05-21	Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents	San Kim et.al.	2405.12900	null
2024-05-20	Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning	Guanglin Zhou et.al.	2405.12217	link
2024-05-20	MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark	Hongwei Liu et.al.	2405.12209	link
2024-05-20	Developers' Perceptions on the Impact of ChatGPT in Software Development: A Survey	Thiago S. Vaillant et.al.	2405.12195	null
2024-05-20	CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models	Haoxiang Shi et.al.	2405.12174	null
2024-05-20	Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging	Xiaobo Liang et.al.	2405.12163	link
2024-05-20	Eliciting Problem Specifications via Large Language Models	Robert E. Wray et.al.	2405.12147	null
2024-05-20	DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM	Xuchen Li et.al.	2405.12139	null
2024-05-20	MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning	Ting Jiang et.al.	2405.12130	link
2024-05-20	Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation	Zhankui He et.al.	2405.12119	null
2024-05-20	Imp: Highly Capable Large Multimodal Models for Mobile Devices	Zhenwei Shao et.al.	2405.12107	link
2024-05-17	A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers	Kaiyu Huang et.al.	2405.10936	link
2024-05-17	The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks	Lucius Bushnaq et.al.	2405.10928	null
2024-05-17	COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain	Dimitrios P. Panagoulias et.al.	2405.10893	null
2024-05-17	Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review	Hongyi Yang et.al.	2405.10883	null
2024-05-17	The Future of Large Language Model Pre-training is Federated	Lorenzo Sani et.al.	2405.10853	null
2024-05-17	Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities	Hao Zhou et.al.	2405.10825	null
2024-05-17	Modeling Supply Chain Interaction and Disruption: Insights from Real-world Data and Complex Adaptive System	Jiawei Feng et.al.	2405.10818	null
2024-05-17	ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios	Markus Bayer et.al.	2405.10808	null
2024-05-17	Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings	Albert Sawczyn et.al.	2405.10745	null
2024-05-17	Efficient Multimodal Large Language Models: A Survey	Yizhang Jin et.al.	2405.10739	link
2024-05-16	UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models	Sahel Sharifymoghaddam et.al.	2405.10311	null
2024-05-16	4D Panoptic Scene Graph Generation	Jingkang Yang et.al.	2405.10305	link
2024-05-16	HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models	Rhea Sanjay Sukthanker et.al.	2405.10299	link
2024-05-16	Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction	Jianhao Chen et.al.	2405.10288	null
2024-05-16	FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models	Adrian Bulat et.al.	2405.10286	null
2024-05-16	Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers	Tuo Zhang et.al.	2405.10276	null
2024-05-16	Keep It Private: Unsupervised Privatization of Online Text	Calvin Bao et.al.	2405.10260	link
2024-05-16	When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models	Xianzheng Ma et.al.	2405.10255	null
2024-05-16	A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks	Xuanfan Ni et.al.	2405.10251	null
2024-05-16	IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers	Hao Yan et.al.	2405.10250	null
2024-05-15	Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming	Bushi Xiao et.al.	2405.09508	null
2024-05-15	ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata	Jonne Sälevä et.al.	2405.09496	null
2024-05-15	Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts	Donya Rooein et.al.	2405.09482	null
2024-05-15	Tell Me Why: Explainable Public Health Fact-Checking with Large Language Models	Majid Zarharan et.al.	2405.09454	link
2024-05-15	Facilitating Opinion Diversity through Hybrid NLP Approaches	Michiel van der Meer et.al.	2405.09439	null
2024-05-15	MicroPython Testbed for Federated Learning Algorithms	Miroslav Popovic et.al.	2405.09423	null
2024-05-15	Matching domain experts by training from scratch on domain knowledge	Xiaoliang Luo et.al.	2405.09395	null
2024-05-15	PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models	Devansh Jain et.al.	2405.09373	null
2024-05-15	Large Language Model Bias Mitigation from the Perspective of Knowledge Editing	Ruizhe Chen et.al.	2405.09341	null
2024-05-15	Prompting-based Synthetic Data Generation for Few-Shot Question Answering	Maximilian Schmidt et.al.	2405.09335	null
2024-05-14	Towards Enhanced RAC Accessibility: Leveraging Datasets and LLMs	Edison Jair Bejarano Sepulveda et.al.	2405.08792	null
2024-05-14	Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring	Tiantian Zhang et.al.	2405.08786	null
2024-05-14	Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs	Akhila Yerukola et.al.	2405.08760	link
2024-05-14	Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach	Syed Mhamudul Hasan et.al.	2405.08755	null
2024-05-14	Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding	Zhimin Li et.al.	2405.08748	link
2024-05-14	ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation	Dimitris Gkoumas et.al.	2405.08619	null
2024-05-14	A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine	Hanguang Xiao et.al.	2405.08603	null
2024-05-14	EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark	Xiaohui Zhang et.al.	2405.08596	null
2024-05-14	Falcon 7b for Software Mention Detection in Scholarly Documents	AmeerAli Khan et.al.	2405.08514	null
2024-05-14	Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure	Odysseas S. Chlapanis et.al.	2405.08502	null
2024-05-13	Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots	Chengyue Wu et.al.	2405.07990	null
2024-05-13	A Generalist Learner for Multifaceted Medical Image Interpretation	Hong-Yu Zhou et.al.	2405.07988	null
2024-05-13	PyZoBot: A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented Generation	Suad Alshammari et.al.	2405.07963	null
2024-05-13	AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments	Samuel Schmidgall et.al.	2405.07960	null
2024-05-13	EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning	Yinzhu Quan et.al.	2405.07938	null
2024-05-13	PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition	Ziyang Zhang et.al.	2405.07932	link
2024-05-13	Can Better Text Semantics in Prompt Tuning Improve VLM Generalization?	Hari Chandana Kuchibhotla et.al.	2405.07921	null
2024-05-13	A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking	Ferdinand Schlatt et.al.	2405.07920	null
2024-05-13	Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers	Alena Tsanda et.al.	2405.07886	null
2024-05-13	Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques	Michela Lorandi et.al.	2405.07875	null
2024-05-10	Linearizing Large Language Models	Jean Mercat et.al.	2405.06640	link
2024-05-10	Value Augmented Sampling for Language Model Alignment and Personalization	Seungwook Han et.al.	2405.06639	link
2024-05-10	Federated Document Visual Question Answering: A Pilot Study	Khanh Nguyen et.al.	2405.06636	null
2024-05-10	Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models	Chakshu Moar et.al.	2405.06626	null
2024-05-10	What Can Natural Language Processing Do for Peer Review?	Ilia Kuznetsov et.al.	2405.06563	null
2024-05-10	Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval	Mengjia Niu et.al.	2405.06545	null
2024-05-10	Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts	Wenyu Huang et.al.	2405.06524	null
2024-05-10	UniDM: A Unified Framework for Data Manipulation with Large Language Models	Yichen Qian et.al.	2405.06510	null
2024-05-10	Aspect-based Sentiment Evaluation of Chess Moves (ASSESS): an NLP-based Method for Evaluating Chess Strategies from Textbooks	Haifa Alrdahi et.al.	2405.06499	null
2024-05-10	Storypark: Leveraging Large Language Models to Enhance Children Story Learning Through Child-AI collaboration Storytelling	Lyumanshan Ye et.al.	2405.06495	null
2024-05-09	Natural Language Processing RELIES on Linguistics	Juri Opitz et.al.	2405.05966	null
2024-05-09	OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning	Dan Qiao et.al.	2405.05957	link
2024-05-09	Probing Multimodal LLMs as World Models for Driving	Shiva Sreeram et.al.	2405.05956	link
2024-05-09	Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning	Junzhi Chen et.al.	2405.05955	null
2024-05-09	CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts	Jiachen Li et.al.	2405.05949	link
2024-05-09	Trustworthy AI-Generative Content in Intelligent 6G Network: Adversarial, Privacy, and Fairness	Siyuan Li et.al.	2405.05930	null
2024-05-09	Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?	Zorik Gekhman et.al.	2405.05904	null
2024-05-09	Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes	Ziang Guo et.al.	2405.05885	null
2024-05-09	FlockGPT: Guiding UAV Flocking with Linguistic Orchestration	Artem Lykov et.al.	2405.05872	null
2024-05-09	Robots Can Feel: LLM-based Framework for Robot Ethical Reasoning	Artem Lykov et.al.	2405.05824	link
2024-05-08	You Only Cache Once: Decoder-Decoder Architectures for Language Models	Yutao Sun et.al.	2405.05254	null
2024-05-08	Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-Judge	Charles Koutcheme et.al.	2405.05253	link
2024-05-09	LLMs with Personalities in Multi-issue Negotiation Games	Sean Noh et.al.	2405.05248	null
2024-05-08	SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants	Masoud Moghani et.al.	2405.05226	null
2024-05-08	Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers	Jiuxiang Gu et.al.	2405.05219	null
2024-05-08	MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning	Inderjeet Nair et.al.	2405.05189	null
2024-05-08	Air Gap: Protecting Privacy-Conscious Conversational Agents	Eugene Bagdasaryan et.al.	2405.05175	null
2024-05-08	XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples	Peiqin Lin et.al.	2405.05116	null
2024-05-08	QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs	Weijia Zhang et.al.	2405.05109	null
2024-05-08	Concerns on Bias in Large Language Models when Creating Synthetic Personae	Helena A. Haxvig et.al.	2405.05080	null
2024-05-07	ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning	Jing Lin et.al.	2405.04533	null
2024-05-07	QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving	Yujun Lin et.al.	2405.04532	link
2024-05-07	NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts	Shudan Zhang et.al.	2405.04520	null
2024-05-07	xLSTM: Extended Long Short-Term Memory	Maximilian Beck et.al.	2405.04517	null
2024-05-07	A Transformer with Stack Attention	Jiaoda Li et.al.	2405.04515	link
2024-05-08	Unveiling Disparities in Web Task Handling Between Human and Web Agent	Kihoon Son et.al.	2405.04497	null
2024-05-07	Toward In-Context Teaching: Adapting Examples to Students' Misconceptions	Alexis Ross et.al.	2405.04495	null
2024-05-07	The Silicone Ceiling: Auditing GPT's Race and Gender Biases in Hiring	Lena Armstrong et.al.	2405.04412	null
2024-05-07	Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks	Georgios Pantazopoulos et.al.	2405.04403	link
2024-05-07	Large Language Models Cannot Explain Themselves	Advait Sarkar et.al.	2405.04382	null
2024-05-06	Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs	Muhammad Uzair Khattak et.al.	2405.03690	null
2024-05-06	Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames	Keith Burghardt et.al.	2405.03688	null
2024-05-06	Language-Image Models with 3D Understanding	Jang Hyun Cho et.al.	2405.03685	null
2024-05-06	AtomGPT: Atomistic Generative Pre-trained Transformer for Forward and Inverse Materials Design	Kamal Choudhary et.al.	2405.03680	null
2024-05-06	A New Robust Partial $p$ -Wasserstein-Based Metric for Comparing Distributions	Sharath Raghvendra et.al.	2405.03664	null
2024-05-06	When LLMs Meet Cybersecurity: A Systematic Literature Review	Jie Zhang et.al.	2405.03644	null
2024-05-06	A Controlled Experiment on the Energy Efficiency of the Source Code Generated by Code Llama	Vlad-Andrei Cursaru et.al.	2405.03616	null
2024-05-06	Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment	Abhinav Agarwalla et.al.	2405.03594	null
2024-05-06	AlphaMath Almost Zero: process Supervision without process	Guoxin Chen et.al.	2405.03553	null
2024-05-06	MAmmoTH2: Scaling Instructions from the Web	Xiang Yue et.al.	2405.03548	null
2024-05-03	Leveraging Large Language Models to Enhance Domain Expert Inclusion in Data Science Workflows	Jasmine Y. Shih et.al.	2405.02260	null
2024-05-03	What matters when building vision-language models?	Hugo Laurençon et.al.	2405.02246	null
2024-05-03	REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs	Deepa Tilwani et.al.	2405.02228	null
2024-05-03	Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks	Lujing Zhang et.al.	2405.02225	null
2024-05-03	FairEvalLLM. A Comprehensive Framework for Benchmarking Fairness in Large Language Model Recommender Systems	Yashar Deldjoo et.al.	2405.02219	null
2024-05-03	Automatic Programming: Large Language Models and Beyond	Michael R. Lyu et.al.	2405.02213	null
2024-05-03	Assessing and Verifying Task Utility in LLM-Powered Applications	Negar Arabzadeh et.al.	2405.02178	null
2024-05-03	The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates	Giuseppe Russo Latona et.al.	2405.02150	null
2024-05-03	MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain	Chao Jiang et.al.	2405.02144	null
2024-05-03	Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection	Guillem Ramírez et.al.	2405.02134	null
2024-05-02	Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks	Murtaza Dalal et.al.	2405.01534	null
2024-05-02	OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning	Shihao Wang et.al.	2405.01533	null
2024-05-02	FLAME: Factuality-Aware Alignment for Large Language Models	Sheng-Chieh Lin et.al.	2405.01525	null
2024-05-02	Transformer-Aided Semantic Communications	Matin Mortaheb et.al.	2405.01521	null
2024-05-02	Analyzing the Role of Semantic Representations in the Era of Large Language Models	Zhijing Jin et.al.	2405.01502	link
2024-05-02	Supporting Business Document Workflows via Collection-Centric Information Foraging with Large Language Models	Raymond Fok et.al.	2405.01501	null
2024-05-02	Controllable Text Generation in the Instruction-Tuning Era	Dhananjay Ashok et.al.	2405.01490	null
2024-05-02	NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment	Gerald Shen et.al.	2405.01481	link
2024-05-02	V-FLUTE: Visual Figurative Language Understanding with Textual Explanations	Arkadiy Saakyan et.al.	2405.01474	null
2024-05-02	Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning	Théo Moutakanni et.al.	2405.01469	null
2024-05-01	Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3	Junsang Yoon et.al.	2405.00664	null
2024-05-01	HalluVault: A Novel Logic Programming-aided Metamorphic Testing Framework for Detecting Fact-Conflicting Hallucinations in Large Language Models	Ningke Li et.al.	2405.00648	null
2024-05-01	When Quantization Affects Confidence of Large Language Models?	Irina Proskurina et.al.	2405.00632	null
2024-05-01	"I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust	Sunnie S. Y. Kim et.al.	2405.00623	null
2024-05-01	Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling	Yida Mu et.al.	2405.00611	null
2024-05-01	Investigating Automatic Scoring and Feedback using Large Language Models	Gloria Ashiya Katuka et.al.	2405.00602	null
2024-05-01	Are Models Biased on Text without Gender-related Language?	Catarina G Belém et.al.	2405.00588	link
2024-05-01	The Real, the Better: Aligning Large Language Models with Online Human Behaviors	Guanying Jiang et.al.	2405.00578	null
2024-05-01	EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model	Deng Li et.al.	2405.00574	null
2024-05-01	Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval	Young Kyun Jang et.al.	2405.00571	null
2024-04-30	DOCCI: Descriptions of Connected and Contrasting Images	Yasumasa Onoe et.al.	2404.19753	null
2024-04-30	Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation	Yunhao Ge et.al.	2404.19752	null
2024-04-30	PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification	Leon Garza et.al.	2404.19744	null
2024-04-30	Better & Faster Large Language Models via Multi-token Prediction	Fabian Gloeckle et.al.	2404.19737	null
2024-04-30	A Framework for Leveraging Human Computation Gaming to Enhance Knowledge Graphs for Accuracy Critical Generative AI Applications	Steph Buongiorno et.al.	2404.19729	null
2024-04-30	PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games	Steph Buongiorno et.al.	2404.19721	null
2024-04-30	Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns	Constantinos Patsakis et.al.	2404.19715	null
2024-04-30	Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language Models	Scott Sumpter et.al.	2404.19713	null
2024-04-30	When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively	Tiziano Labruna et.al.	2404.19705	null
2024-04-30	Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners	Chun Feng et.al.	2404.19696	null
2024-04-29	Hallucination of Multimodal Large Language Models: A Survey	Zechen Bai et.al.	2404.18930	link
2024-04-29	DPO Meets PPO: Reinforced Token Optimization for RLHF	Han Zhong et.al.	2404.18922	null
2024-04-29	TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation	Junhao Cheng et.al.	2404.18919	null
2024-04-29	Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting	Fangcheng Liu et.al.	2404.18911	null
2024-04-29	Human-in-the-Loop Synthetic Text Data Inspection with Provenance Tracking	Hong Jin Kang et.al.	2404.18881	link
2024-04-29	More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness	Aaron J. Li et.al.	2404.18870	link
2024-04-29	Truth-value judgment in language models: belief directions are context sensitive	Stefan F. Schouten et.al.	2404.18865	null
2024-04-29	Performance-Aligned LLMs for Generating Fast Code	Daniel Nichols et.al.	2404.18864	null
2024-04-29	VERT: Verified Equivalent Rust Transpilation with Few-Shot Learning	Aidan Z. H. Yang et.al.	2404.18852	null
2024-04-29	It's Difficult to be Neutral -- Human and LLM-based Sentiment Annotation of Patient Comments	Petter Mæhlum et.al.	2404.18832	null
2024-04-26	Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo	Stephen Zhao et.al.	2404.17546	null
2024-04-26	Large Language Model Agent as a Mechanical Designer	Yayati Jadhav et.al.	2404.17525	null
2024-04-26	On the Use of Large Language Models to Generate Capability Ontologies	Luis Miguel Vieira da Silva et.al.	2404.17524	null
2024-04-26	Enhancing Legal Compliance and Regulation Analysis with Large Language Models	Shabnam Hassani et.al.	2404.17522	null
2024-04-26	A Comprehensive Evaluation on Event Reasoning of Large Language Models	Zhengwei Tao et.al.	2404.17513	link
2024-04-26	Learning text-to-video retrieval from image captioning	Lucas Ventura et.al.	2404.17498	null
2024-04-26	CEval: A Benchmark for Evaluating Counterfactual Text Generation	Van Bach Nguyen et.al.	2404.17475	null
2024-04-26	Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System	Robin Schmucker et.al.	2404.17460	null
2024-04-26	"ChatGPT Is Here to Help, Not to Replace Anybody" -- An Evaluation of Students' Opinions On Integrating ChatGPT In CS Courses	Bruno Pereira Cipriano et.al.	2404.17443	null
2024-04-26	InspectorRAGet: An Introspection Platform for RAG Evaluation	Kshitij Fadnis et.al.	2404.17347	null
2024-04-25	Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials	Ye Fang et.al.	2404.16829	null
2024-04-25	How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites	Zhe Chen et.al.	2404.16821	link
2024-04-25	IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages	Harman Singh et.al.	2404.16816	null
2024-04-25	Make Your LLM Fully Utilize the Context	Shengnan An et.al.	2404.16811	link
2024-04-25	Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning	Tianhui Zhang et.al.	2404.16807	null
2024-04-25	Weak-to-Strong Extrapolation Expedites Alignment	Chujie Zheng et.al.	2404.16792	link
2024-04-25	SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension	Bohao Li et.al.	2404.16790	link
2024-04-25	Continual Learning of Large Language Models: A Comprehensive Survey	Haizhou Shi et.al.	2404.16789	link
2024-04-25	Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model	Runzhe Zhan et.al.	2404.16766	null
2024-04-25	RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis	Xiaoman Zhang et.al.	2404.16754	null
2024-04-24	Hybrid LLM/Rule-based Approaches to Business Insights Generation from Structured Data	Aliaksei Vertsel et.al.	2404.15604	null
2024-04-24	ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction	Henry Peng Zou et.al.	2404.15592	link
2024-04-24	Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations?	Hossein Salami et.al.	2404.15578	null
2024-04-23	PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models	Shashi Kant Gupta et.al.	2404.15549	null
2024-04-23	Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models	Mihir Parmar et.al.	2404.15522	link
2024-04-23	Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval	Young Kyun Jang et.al.	2404.15516	null
2024-04-23	ToM-LM: Delegating Theory Of Mind Reasoning to External Symbolic Executors in Large Language Models	Weizhi Tang et.al.	2404.15515	null
2024-04-23	GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots	Simranjit Singh et.al.	2404.15500	null
2024-04-23	IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents	Jean-Philippe Corbeil et.al.	2404.15488	link
2024-04-23	Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance	Het Patel et.al.	2404.15485	null
2024-04-23	Aligning LLM Agents by Learning Latent Preference from User Edits	Ge Gao et.al.	2404.15269	null
2024-04-23	XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts	Yifeng Ding et.al.	2404.15247	link
2024-04-23	Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models	Aidan Z. H. Yang et.al.	2404.15236	null
2024-04-23	Re-Thinking Inverse Graphics With Large Language Models	Peter Kulits et.al.	2404.15228	null
2024-04-23	Setting up the Data Printer with Improved English to Ukrainian Machine Translation	Yurii Paniv et.al.	2404.15196	null
2024-04-23	Regressive Side Effects of Training Language Models to Mimic Student Misconceptions	Shashank Sonkar et.al.	2404.15156	null
2024-04-23	Bias patterns in the application of LLMs for clinical decision support: A comprehensive study	Raphael Poulain et.al.	2404.15149	null
2024-04-23	Rethinking LLM Memorization through the Lens of Adversarial Compression	Avi Schwarzschild et.al.	2404.15146	null
2024-04-23	MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning	Sunan He et.al.	2404.15127	null
2024-04-23	Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation	Xun Wu et.al.	2404.15100	null
2024-04-22	AutoAD III: The Prequel -- Back to the Pixels	Tengda Han et.al.	2404.14412	null
2024-04-22	SpaceByte: Towards Deleting Tokenization from Large Language Modeling	Kevin Slagle et.al.	2404.14408	link
2024-04-22	RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?	Adrian de Wynter et.al.	2404.14397	null
2024-04-22	A Survey on Self-Evolution of Large Language Models	Zhengwei Tao et.al.	2404.14387	null
2024-04-22	Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph	Xiaochen Kev Gao et.al.	2404.14372	link
2024-04-22	Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data	Fahim Tajwar et.al.	2404.14367	link
2024-04-22	Better Synthetic Data by Retrieving and Transforming Existing Datasets	Saumya Gandhi et.al.	2404.14361	link
2024-04-22	Rethinking Legal Compliance Automation: Opportunities with Large Language Models	Shabnam Hassani et.al.	2404.14356	null
2024-04-22	Automated Long Answer Grading with RiceChem Dataset	Shashank Sonkar et.al.	2404.14316	null
2024-04-22	Explaining Arguments' Strength: Unveiling the Role of Attacks and Supports (Technical Report)	Xiang Yin et.al.	2404.14304	null
2024-04-19	MoVA: Adapting Mixture of Vision Experts to Multimodal Context	Zhuofan Zong et.al.	2404.13046	link
2024-04-19	Unified Scene Representation and Reconstruction for 3D Large Language Models	Tao Chu et.al.	2404.13044	null
2024-04-19	Data Alignment for Zero-Shot Concept Generation in Dermatology AI	Soham Gadgil et.al.	2404.13043	null
2024-04-19	LaPA: Latent Prompt Assist Model For Medical Visual Question Answering	Tiancheng Gu et.al.	2404.13039	link
2024-04-19	Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs	Biyang Guo et.al.	2404.13033	link
2024-04-19	When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering	Stephen Choi et.al.	2404.13028	null
2024-04-19	Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models	Chuofan Ma et.al.	2404.13013	null
2024-04-19	Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs	Clemencia Siro et.al.	2404.12994	link
2024-04-19	RedactBuster: Entity Type Recognition from Redacted Documents	Mirco Beltrame et.al.	2404.12991	null
2024-04-19	FineRec:Exploring Fine-grained Sequential Recommendation	Xiaokun Zhang et.al.	2404.12975	null
2024-04-18	BLINK: Multimodal Large Language Models Can See but Not Perceive	Xingyu Fu et.al.	2404.12390	null
2024-04-18	MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale	Xiaotang Gai et.al.	2404.12372	null
2024-04-18	When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes	Asaf Yehudai et.al.	2404.12365	null
2024-04-18	Towards a Foundation Model for Partial Differential Equation: Multi-Operator Learning and Extrapolation	Jingmin Sun et.al.	2404.12355	link
2024-04-18	V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning	Hang Hua et.al.	2404.12353	null
2024-04-18	Large Language Models in Targeted Sentiment Analysis	Nicolay Rusnachenko et.al.	2404.12342	link
2024-04-18	Normative Requirements Operationalization with Large Language Models	Nick Feng et.al.	2404.12335	null
2024-04-18	Large Language Models for Synthetic Participatory Planning of Shared Automated Electric Mobility Systems	Jiangbo Yu et.al.	2404.12317	null
2024-04-18	Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair	Yusuke Sakai et.al.	2404.12299	null
2024-04-18	Augmenting emotion features in irony detection with Large language modeling	Yucheng Lin et.al.	2404.12291	null
2024-04-17	A Deep Dive into Large Language Models for Automated Bug Localization and Repair	Soneya Binta Hossain et.al.	2404.11595	null
2024-04-17	Related Work and Citation Text Generation: A Survey	Xiangci Li et.al.	2404.11588	null
2024-04-17	LLMTune: Accelerate Database Knob Tuning with Large Language Models	Xinmei Huang et.al.	2404.11581	null
2024-04-17	MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation	Kuan-Chieh et.al.	2404.11565	null
2024-04-17	Quantifying Multilingual Performance of Large Language Models Across Languages	Zihao Li et.al.	2404.11553	null
2024-04-17	Evaluating Span Extraction in Generative Paradigm: A Reflection on Aspect-Based Sentiment Analysis	Soyoung Yang et.al.	2404.11539	null
2024-04-17	Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization	Costas Mavromatis et.al.	2404.11531	null
2024-04-17	Embedding Privacy in Computational Social Science and Artificial Intelligence Research	Keenan Jones et.al.	2404.11515	null
2024-04-17	Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models	Yushuo Chen et.al.	2404.11502	link
2024-04-17	Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models	Yue Zhou et.al.	2404.11500	link
2024-04-16	Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback	Qiwei Di et.al.	2404.10776	null
2024-04-16	LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation?	Yuchi Wang et.al.	2404.10763	link
2024-04-16	Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification	Yu-Yang Li et.al.	2404.10757	null
2024-04-16	Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study	Shusheng Xu et.al.	2404.10719	null
2024-04-16	An empirical study on code review activity prediction in practice	Doriane Olewicki et.al.	2404.10703	null
2024-04-16	Automating REST API Postman Test Cases Using LLM	S Deepika Sri et.al.	2404.10678	null
2024-04-16	ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images	Quan Van Nguyen et.al.	2404.10652	link
2024-04-16	Self-playing Adversarial Language Game Enhances LLM Reasoning	Pengyu Cheng et.al.	2404.10642	link
2024-04-16	HLAT: High-quality Large Language Model Pre-trained on AWS Trainium	Haozheng Fan et.al.	2404.10630	null
2024-04-16	Private Attribute Inference from Images with Vision-Language Models	Batuhan Tömekçe et.al.	2404.10618	null
2024-04-15	Personalized Collaborative Fine-Tuning for On-Device Large Language Models	Nicolas Wagner et.al.	2404.09753	null
2024-04-15	Quantization of Large Language Models with an Overdetermined Basis	Daniil Merkulov et.al.	2404.09737	null
2024-04-15	Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model	Hyunsoo Cho et.al.	2404.09717	null
2024-04-15	Enhancing Robot Explanation Capabilities through Vision-Language Models: a Preliminary Study by Interpreting Visual Inputs for Improved Human-Robot Interaction	David Sobrín-Hidalgo et.al.	2404.09705	null
2024-04-15	Generative AI for Game Theory-based Mobile Networking	Long He et.al.	2404.09699	null
2024-04-15	Are Large Language Models Reliable Argument Quality Annotators?	Nailia Mirzakhmedova et.al.	2404.09696	null
2024-04-15	LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models	Guangyan Li et.al.	2404.09695	null
2024-04-15	Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation	Juhwan Choi et.al.	2404.09682	null
2024-04-15	Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection	Jiaqi Zhu et.al.	2404.09654	null
2024-04-15	Bridging Vision and Language Spaces with Assignment Prediction	Jungin Park et.al.	2404.09632	link
2024-04-12	Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts	Övgü Özdemir et.al.	2404.08589	link
2024-04-12	Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation	Hanlin Tian et.al.	2404.08570	null
2024-04-12	RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs	Shreyas Chaudhari et.al.	2404.08555	null
2024-04-12	Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward	Xuan Xie et.al.	2404.08517	null
2024-04-12	Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction	Haoran Qiu et.al.	2404.08509	link
2024-04-12	LaSagnA: Language-based Segmentation Assistant for Complex Queries	Cong Wei et.al.	2404.08506	link
2024-04-12	Strategic Interactions between Large Language Models-based Agents in Beauty Contests	Siting Lu et.al.	2404.08492	null
2024-04-12	Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian	Stefano De Paoli et.al.	2404.08488	null
2024-04-12	Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task	Hassan Ali et.al.	2404.08424	null
2024-04-12	AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees	William Fleshman et.al.	2404.08417	null
2024-04-11	OpenBias: Open-set Bias Detection in Text-to-Image Generative Models	Moreno D'Incà et.al.	2404.07990	null
2024-04-11	View Selection for 3D Captioning via Diffusion Ranking	Tiange Luo et.al.	2404.07984	null
2024-04-11	Manipulating Large Language Models to Increase Product Visibility	Aounon Kumar et.al.	2404.07981	link
2024-04-11	LLoCO: Learning Long Contexts Offline	Sijun Tan et.al.	2404.07979	link
2024-04-11	Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models	Haotian Zhang et.al.	2404.07973	null
2024-04-11	Leveraging Large Language Models (LLMs) to Support Collaborative Human-AI Online Risk Data Annotation	Jinkyung Park et.al.	2404.07926	null
2024-04-11	LaVy: Vietnamese Multimodal Large Language Model	Chi Tran et.al.	2404.07922	null
2024-04-11	AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs	Zeyi Liao et.al.	2404.07921	link
2024-04-11	DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation	Anna C. Doris et.al.	2404.07917	link
2024-04-11	High-Dimension Human Value Representation in Large Language Models	Samuel Cahyawijaya et.al.	2404.07900	null
2024-04-10	UMBRAE: Unified Multimodal Decoding of Brain Signals	Weihao Xia et.al.	2404.07202	null
2024-04-10	Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention	Tsendsuren Munkhdalai et.al.	2404.07143	null
2024-04-11	Semantically-correlated memories in a dense associative model	Thomas F Burns et.al.	2404.07123	null
2024-04-10	Continuous Language Model Interpolation for Dynamic and Controllable Text Generation	Sara Kangaslahti et.al.	2404.07117	null
2024-04-11	From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications	Yongqiang Ma et.al.	2404.07108	null
2024-04-10	Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs	Bowen Jin et.al.	2404.07103	null
2024-04-10	Dynamic Generation of Personalities with Large Language Models	Jianzhi Liu et.al.	2404.07084	null
2024-04-10	VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning	Alexandros Xenos et.al.	2404.07078	link
2024-04-10	Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers?	Mingyu Jin et.al.	2404.07066	link
2024-04-10	Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study	Alessandro Stolfo et.al.	2404.07060	null
2024-04-09	Pitfalls of Conversational LLMs on News Debiasing	Ipek Baris Schlicht et.al.	2404.06488	null
2024-04-09	Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks	Chonghua Wang et.al.	2404.06480	link
2024-04-09	Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models	Zihan Fang et.al.	2404.06448	null
2024-04-09	Large Language Models to the Rescue: Deadlock Resolution in Multi-Robot Systems	Kunal Garg et.al.	2404.06413	null
2024-04-09	AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents	Luca Gioacchini et.al.	2404.06411	link
2024-04-09	Take a Look at it! Rethinking How to Evaluate Language Model Jailbreak	Hongyu Cai et.al.	2404.06407	link
2024-04-09	Apprentices to Research Assistants: Advancing Research with Large Language Models	M. Namvarpour et.al.	2404.06404	null
2024-04-09	MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies	Shengding Hu et.al.	2404.06395	link
2024-04-09	MuPT: A Generative Symbolic Music Pretrained Transformer	Xingwei Qu et.al.	2404.06393	null
2024-04-09	Latent Distance Guided Alignment Training for Large Language Models	Haotian Luo et.al.	2404.06390	null
2024-04-08	MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding	Bo He et.al.	2404.05726	null
2024-04-08	Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs	Keen You et.al.	2404.05719	null
2024-04-08	Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding	Ahmad Idrissi-Yaghir et.al.	2404.05694	null
2024-04-08	Evaluating Mathematical Reasoning Beyond Accuracy	Shijie Xia et.al.	2404.05692	link
2024-04-08	Retrieval-Augmented Open-Vocabulary Object Detection	Jooyeon Kim et.al.	2404.05687	link
2024-04-08	MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation	Kunpeng Song et.al.	2404.05674	null
2024-04-08	CoReS: Orchestrating the Dance of Reasoning and Segmentation	Xiaoyi Bao et.al.	2404.05673	null
2024-04-08	Fighting crime with Transformers: Empirical analysis of address parsing methods in payment data	Haitham Hammami et.al.	2404.05632	link
2024-04-08	LTNER: Large Language Model Tagging for Named Entity Recognition with Contextualized Entity Marking	Faren Yan et.al.	2404.05624	null
2024-04-08	MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering	Iñigo Alonso et.al.	2404.05590	null
2024-04-05	Physical Property Understanding from Language-Embedded Feature Fields	Albert J. Zhai et.al.	2404.04242	null
2024-04-05	Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents	Harsh Kohli et.al.	2404.04237	null
2024-04-05	Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation	Tianqi Zhong et.al.	2404.04232	link
2024-04-05	Social Skill Training with Large Language Models	Diyi Yang et.al.	2404.04204	null
2024-04-05	Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model	Xinrun Du et.al.	2404.04167	null
2024-04-05	Large language models as oracles for instantiating ontologies with domain-specific knowledge	Giovanni Ciatto et.al.	2404.04108	link
2024-04-05	Improving Factual Accuracy of Neural Table-to-Text Output by Addressing Input Problems in ToTTo	Barkavi Sundararajan et.al.	2404.04103	link
2024-04-05	Robust Preference Optimization with Provable Noise Tolerance for LLMs	Xize Liang et.al.	2404.04102	null
2024-04-05	Assessing the quality of information extraction	Filip Seitl et.al.	2404.04068	null
2024-04-05	CLUE: A Clinical Language Understanding Evaluation for LLMs	Amin Dada et.al.	2404.04067	null
2024-04-04	CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching	Dongzhi Jiang et.al.	2404.03653	link
2024-04-04	AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent	Hanyu Lai et.al.	2404.03648	link
2024-04-04	Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra	Darioush Kevian et.al.	2404.03647	null
2024-04-04	Training LLMs over Neurally Compressed Text	Brian Lester et.al.	2404.03626	null
2024-04-04	Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph	Marco Bronzini et.al.	2404.03623	null
2024-04-04	Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models	Wenshan Wu et.al.	2404.03622	null
2024-04-04	DeViDe: Faceted medical knowledge for improved medical vision-language pre-training	Haozhe Luo et.al.	2404.03618	null
2024-04-04	Sailor: Open Language Models for South-East Asia	Longxu Dou et.al.	2404.03608	link
2024-04-04	Evaluating LLMs at Detecting Errors in LLM Responses	Ryo Kamoi et.al.	2404.03602	link
2024-04-04	Intent Detection and Entity Extraction from BioMedical Literature	Ankan Mullick et.al.	2404.03598	link
2024-04-03	ALOHa: A New Measure for Hallucination in Captioning Models	Suzanne Petryk et.al.	2404.02904	null
2024-04-03	MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment	Duygu Ceylan et.al.	2404.02899	null
2024-04-03	ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline	Yifan Xu et.al.	2404.02893	null
2024-04-03	Integrating Explanations in Learning LTL Specifications from Demonstrations	Ashutosh Gupta et.al.	2404.02872	null
2024-04-03	Toward Inference-optimal Mixture-of-Expert Large Language Models	Longfei Yun et.al.	2404.02852	null
2024-04-03	I-Design: Personalized LLM Interior Designer	Ata Çelen et.al.	2404.02838	null
2024-04-03	Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models	Wanyun Cui et.al.	2404.02837	null
2024-04-03	Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison	Maxime Bouthors et.al.	2404.02835	null
2024-04-03	Empowering Biomedical Discovery with AI Agents	Shanghua Gao et.al.	2404.02831	null
2024-04-03	BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models	Qijun Luo et.al.	2404.02827	link
2024-04-02	Topic-based Watermarks for LLM-Generated Text	Alexander Nemecek et.al.	2404.02138	null
2024-04-02	Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models	Wanyong Feng et.al.	2404.02124	null
2024-04-02	GINopic: Topic Modeling with Graph Isomorphism Network	Suman Adhya et.al.	2404.02115	link
2024-04-02	CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems	Sara Rosenthal et.al.	2404.02103	link
2024-04-02	Advancing LLM Reasoning Generalists with Preference Trees	Lifan Yuan et.al.	2404.02078	link
2024-04-02	Digital Forgetting in Large Language Models: A Survey of Unlearning Methods	Alberto Blanco-Justicia et.al.	2404.02062	null
2024-04-02	Long-context LLMs Struggle with Long In-context Learning	Tianle Li et.al.	2404.02060	link
2024-04-02	Deconstructing In-Context Learning: Understanding Prompts via Corruption	Namrata Shivagunde et.al.	2404.02054	link
2024-04-02	BERTopic-Driven Stock Market Predictions: Unraveling Sentiment Insights	Enmin Zhu et.al.	[2404.02053](http://arxiv

ChenControl/paper-list

Paper-List-DAILYAutomatically Update Papers Daily in list

Updated on 2024.06.02

Classification

Object Detection

Semantic Segmentation

Object Tracking

Action Recognition

Pose Estimation

Image Generation

LLM

Paper-List-DAILY
Automatically Update Papers Daily in list