Table of Contents
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-05-30 | DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark | Haoxing Chen et.al. | 2405.19707 | link |
2024-05-30 | A Novel Approach for Automated Design Information Mining from Issue Logs | Jiuang Zhao et.al. | 2405.19623 | null |
2024-05-29 | I Bet You Did Not Mean That: Testing Semantic Importance via Betting | Jacopo Teneggi et.al. | 2405.19146 | null |
2024-05-29 | Verifiably Robust Conformal Prediction | Linus Jeary et.al. | 2405.18942 | null |
2024-05-29 | Leveraging Many-To-Many Relationships for Defending Against Visual-Language Adversarial Attacks | Futa Waseda et.al. | 2405.18770 | null |
2024-05-29 | GIST: Greedy Independent Set Thresholding for Diverse Data Summarization | Matthew Fahrbach et.al. | 2405.18754 | null |
2024-05-29 | LLM-based Hierarchical Concept Decomposition for Interpretable Fine-Grained Image Classification | Renyi Qu et.al. | 2405.18672 | null |
2024-05-28 | Its Not a Modality Gap: Characterizing and Addressing the Contrastive Gap | Abrar Fahim et.al. | 2405.18570 | null |
2024-05-28 | Why are Visually-Grounded Language Models Bad at Image Classification? | Yuhui Zhang et.al. | 2405.18415 | link |
2024-05-28 | MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution | Wenzhuo Liu et.al. | 2405.18240 | null |
2024-05-28 | Confidence-aware multi-modality learning for eye disease screening | Ke Zou et.al. | 2405.18167 | link |
2024-05-28 | 4-bit Shampoo for Memory-Efficient Network Training | Sike Wang et.al. | 2405.18144 | null |
2024-05-28 | DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture | Shentong Mo et.al. | 2405.17995 | null |
2024-05-27 | WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average | Louis Fournier et.al. | 2405.17517 | null |
2024-05-27 | Model-Agnostic Zeroth-Order Policy Optimization for Meta-Learning of Ergodic Linear Quadratic Regulators | Yunian Pan et.al. | 2405.17370 | null |
2024-05-27 | On the Noise Robustness of In-Context Learning for Text Generation | Hongfu Gao et.al. | 2405.17264 | null |
2024-05-27 | Superpixelwise Low-rank Approximation based Partial Label Learning for Hyperspectral Image Classification | Shujun Yang et.al. | 2405.17110 | link |
2024-05-26 | Demystify Mamba in Vision: A Linear Attention Perspective | Dongchen Han et.al. | 2405.16605 | null |
2024-05-26 | AdaFisher: Adaptive Second Order Optimization via Fisher Information | Damien Martins Gomes et.al. | 2405.16397 | null |
2024-05-25 | ModelLock: Locking Your Model With a Spell | Yifeng Gao et.al. | 2405.16285 | null |
2024-05-25 | Accelerating Transformers with Spectrum-Preserving Token Merging | Hoai-Chau Tran et.al. | 2405.16148 | null |
2024-05-25 | Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack | Mingli Zhu et.al. | 2405.16134 | null |
2024-05-24 | Grounding Stylistic Domain Generalization with Quantitative Domain Shift Measures and Synthetic Scene Images | Yiran Luo et.al. | 2405.15961 | null |
2024-05-24 | A Neurosymbolic Framework for Bias Correction in CNNs | Parth Padalkar et.al. | 2405.15886 | null |
2024-05-24 | What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models | Abdelrahman Abdelhamed et.al. | 2405.15668 | null |
2024-05-24 | Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning | Wenhan Chang et.al. | 2405.15662 | null |
2024-05-24 | Exposing Image Classifier Shortcuts with Counterfactual Frequency (CoF) Tables | James Hinns et.al. | 2405.15661 | null |
2024-05-24 | Harnessing Increased Client Participation with Cohort-Parallel Federated Learning | Akash Dhasade et.al. | 2405.15644 | null |
2024-05-24 | Transformer-based Federated Learning for Multi-Label Remote Sensing Image Classification | Barış Büyüktaş et.al. | 2405.15405 | null |
2024-05-24 | CLIP model is an Efficient Online Lifelong Learner | Leyuan Wang et.al. | 2405.15155 | null |
2024-05-24 | OptLLM: Optimal Assignment of Queries to Large Language Models | Yueyue Liu et.al. | 2405.15130 | null |
2024-05-23 | A Lost Opportunity for Vision-Language Models: A Comparative Study of Online Test-time Adaptation for Vision-Language Models | Mario Döbler et.al. | 2405.14977 | link |
2024-05-23 | Domain Wall Magnetic Tunnel Junction Reliable Integrate and Fire Neuron | Can Cui1 et.al. | 2405.14851 | null |
2024-05-23 | Explaining Black-box Model Predictions via Two-level Nested Feature Attributions with Consistency Property | Yuya Yoshikawa et.al. | 2405.14522 | null |
2024-05-23 | SIAVC: Semi-Supervised Framework for Industrial Accident Video Classification | Zuoyong Li et.al. | 2405.14506 | null |
2024-05-23 | Scalable Visual State Space Model with Fractal Scanning | Lv Tang et.al. | 2405.14480 | null |
2024-05-23 | Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation | Daniel Kienzle et.al. | 2405.14467 | null |
2024-05-23 | Boosting Robustness by Clipping Gradients in Distributed Learning | Youssef Allouah et.al. | 2405.14432 | null |
2024-05-23 | Advancing Spiking Neural Networks for Sequential Modeling with Central Pattern Generators | Changze Lv et.al. | 2405.14362 | null |
2024-05-23 | Simple Hamiltonian dynamics is a powerful quantum processing resource | Akitada Sakurai et.al. | 2405.14245 | null |
2024-05-23 | ChronosLex: Time-aware Incremental Training for Temporal Generalization of Legal Classification Tasks | T. Y. S. S Santosh et.al. | 2405.14211 | null |
2024-05-22 | Just rotate it! Uncertainty estimation in closed-source models via multiple queries | Konstantinos Pitas et.al. | 2405.13864 | null |
2024-05-21 | Decentralized Federated Learning Over Imperfect Communication Channels | Weicai Li et.al. | 2405.12894 | null |
2024-05-21 | Multimodal Adaptive Inference for Document Image Classification with Anytime Early Exiting | Omar Hamed et.al. | 2405.12705 | null |
2024-05-21 | Exploration of Masked and Causal Language Modelling for Text Generation | Nicolo Micheletti et.al. | 2405.12630 | null |
2024-05-21 | 3DSS-Mamba: 3D-Spectral-Spatial Mamba for Hyperspectral Image Classification | Yan He et.al. | 2405.12487 | null |
2024-05-20 | Alzheimer's Magnetic Resonance Imaging Classification Using Deep and Meta-Learning Models | Nida Nasir et.al. | 2405.12126 | null |
2024-05-20 | Mamba-in-Mamba: Centralized Mamba-Cross-Scan in Tokenized Mamba Model for Hyperspectral Image Classification | Weilian Zhou et.al. | 2405.12003 | link |
2024-05-20 | A Constraint-Enforcing Reward for Adversarial Attacks on Text Classifiers | Tom Roth et.al. | 2405.11904 | null |
2024-05-21 | A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus | Eduard Poesina et.al. | 2405.11877 | link |
2024-05-20 | SSAMBA: Self-Supervised Audio Representation Learning with Mamba State Space Model | Siavash Shams et.al. | 2405.11831 | link |
2024-05-20 | Exploring Ordinality in Text Classification: A Comparative Study of Explicit and Implicit Techniques | Siva Rajesh Kasa et.al. | 2405.11775 | null |
2024-05-19 | SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization | Jialong Guo et.al. | 2405.11582 | link |
2024-05-19 | Reproducibility Study of CDUL: CLIP-Driven Unsupervised Learning for Multi-Label Image Classification | Manan Shah et.al. | 2405.11574 | link |
2024-05-19 | An Invisible Backdoor Attack Based On Semantic Feature | Yangming Chen et.al. | 2405.11551 | null |
2024-05-19 | Verification technology for finger vein biometric | George Kumi Kyeremeh et.al. | 2405.11540 | null |
2024-05-17 | Reduced storage direct tensor ring decomposition for convolutional neural networks compression | Mateusz Gabor et.al. | 2405.10802 | link |
2024-05-17 | Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation Dataset | Jie Zhu et.al. | 2405.10542 | link |
2024-05-17 | Smart Expert System: Large Language Models as Text Classifiers | Zhiqiang Wang et.al. | 2405.10523 | link |
2024-05-16 | Data-Efficient Low-Complexity Acoustic Scene Classification in the DCASE 2024 Challenge | Florian Schmid et.al. | 2405.10018 | null |
2024-05-16 | ROCOv2: Radiology Objects in COntext Version 2, an Updated Multimodal Image Dataset | Johannes Rückert et.al. | 2405.10004 | link |
2024-05-15 | Improving Label Error Detection and Elimination with Uncertainty Quantification | Johannes Jakubik et.al. | 2405.09602 | null |
2024-05-15 | Tackling Distribution Shifts in Task-Oriented Communication with Information Bottleneck | Hongru Li et.al. | 2405.09514 | null |
2024-05-15 | Feature-based Federated Transfer Learning: Communication Efficiency, Robustness and Privacy | Feng Wang et.al. | 2405.09014 | link |
2024-05-14 | The Pitfalls and Promise of Conformal Inference Under Adversarial Attacks | Ziquan Liu et.al. | 2405.08886 | link |
2024-05-14 | Harnessing the power of longitudinal medical imaging for eye disease prognosis using Transformer-based sequence modeling | Gregory Holste et.al. | 2405.08780 | null |
2024-05-14 | FolkTalent: Enhancing Classification and Tagging of Indian Folk Paintings | Nancy Hada et.al. | 2405.08776 | null |
2024-05-14 | The impact of Compositionality in Zero-shot Multi-label action recognition for Object-based tasks | Carmela Calabrese et.al. | 2405.08695 | null |
2024-05-14 | Achieving Fairness Through Channel Pruning for Dermatological Disease Diagnosis | Qingpeng Kong et.al. | 2405.08681 | link |
2024-05-14 | Investigating Design Choices in Joint-Embedding Predictive Architectures for General Audio Representation Learning | Alain Riou et.al. | 2405.08679 | null |
2024-05-14 | Dual-Branch Network for Portrait Image Quality Assessment | Wei Sun et.al. | 2405.08555 | null |
2024-05-13 | Who's in and who's out? A case study of multimodal CLIP-filtering in DataComp | Rachel Hong et.al. | 2405.08209 | link |
2024-05-14 | MambaOut: Do We Really Need Mamba for Vision? | Weihao Yu et.al. | 2405.07992 | link |
2024-05-13 | Constrained Exploration via Reflected Replica Exchange Stochastic Gradient Langevin Dynamics | Haoyang Zheng et.al. | 2405.07839 | link |
2024-05-13 | Analysis of the rate of convergence of an over-parametrized convolutional neural network image classifier learned by gradient descent | Michael Kohler et.al. | 2405.07619 | null |
2024-05-13 | On-device Online Learning and Semantic Management of TinyML Systems | Haoyu Ren et.al. | 2405.07601 | null |
2024-05-13 | GLiRA: Black-Box Membership Inference Attack via Knowledge Distillation | Andrey V. Galichin et.al. | 2405.07562 | null |
2024-05-13 | Fine-tuning the SwissBERT Encoder Model for Embedding Sentences and Documents | Juri Grosjean et.al. | 2405.07513 | null |
2024-05-13 | MoVL:Exploring Fusion Strategies for the Domain-Adaptive Application of Pretrained Models in Medical Imaging Tasks | Haijiang Tian et.al. | 2405.07411 | null |
2024-05-12 | Explainable Convolutional Neural Networks for Retinal Fundus Classification and Cutting-Edge Segmentation Models for Retinal Blood Vessels from Fundus Images | Fatema Tuj Johora Faria et.al. | 2405.07338 | null |
2024-05-12 | Differentiable Model Scaling using Differentiable Topk | Kai Liu et.al. | 2405.07194 | null |
2024-05-11 | A framework of text-dependent speaker verification for chinese numerical string corpus | Litong Zheng et.al. | 2405.07029 | null |
2024-05-10 | Pseudo-Prompt Generating in Pre-trained Vision-Language Models for Multi-Label Medical Image Classification | Yaoqin Ye et.al. | 2405.06468 | null |
2024-05-10 | Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data | Rongyu Zhang et.al. | 2405.06413 | null |
2024-05-10 | SaudiBERT: A Large Language Model Pretrained on Saudi Dialect Corpora | Faisal Qarah et.al. | 2405.06239 | null |
2024-05-09 | Deep Multi-Task Learning for Malware Image Classification | Ahmed Bensaoud et.al. | 2405.05906 | null |
2024-05-09 | Enhancing Suicide Risk Detection on Social Media through Semi-Supervised Deep Label Smoothing | Matthew Squires et.al. | 2405.05795 | null |
2024-05-09 | CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks | Nick et.al. | 2405.05755 | null |
2024-05-09 | How Quality Affects Deep Neural Networks in Fine-Grained Image Classification | Joseph Smith et.al. | 2405.05742 | null |
2024-05-09 | End-to-End Generative Semantic Communication Powered by Shared Semantic Knowledge Base | Shuling Li et.al. | 2405.05738 | null |
2024-05-09 | Using Machine Translation to Augment Multilingual Classification | Adam King et.al. | 2405.05478 | null |
2024-05-08 | AFEN: Respiratory Disease Classification using Ensemble Learning | Rahul Nadkarni et.al. | 2405.05467 | null |
2024-05-08 | XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples | Peiqin Lin et.al. | 2405.05116 | link |
2024-05-08 | Explanation as a Watermark: Towards Harmless and Multi-bit Model Ownership Verification via Watermarking Feature Attribution | Shuo Shao et.al. | 2405.04825 | null |
2024-05-07 | Exploring Explainable AI Techniques for Improved Interpretability in Lung and Colon Cancer Classification | Mukaffi Bin Moin et.al. | 2405.04610 | link |
2024-05-07 | Pragmatist Intelligence: Where the Principle of Usefulness Can Take ANNs | Antonio Bikić et.al. | 2405.04386 | null |
2024-05-07 | Semi-Supervised Disease Classification based on Limited Medical Image Data | Yan Zhang et.al. | 2405.04295 | null |
2024-05-07 | DCNN: Dual Cross-current Neural Networks Realized Using An Interactive Deep Learning Discriminator for Fine-grained Objects | Da Fu et.al. | 2405.04093 | null |
2024-05-07 | Feature Map Convergence Evaluation for Functional Module | Ludan Zhang et.al. | 2405.04041 | null |
2024-05-07 | VMambaCC: A Visual State Space Model for Crowd Counting | Hao-Yuan Ma et.al. | 2405.03978 | null |
2024-05-06 | On Adversarial Examples for Text Classification by Perturbing Latent Representations | Korn Sooksatra et.al. | 2405.03789 | null |
2024-05-06 | CICA: Content-Injected Contrastive Alignment for Zero-Shot Document Image Classification | Sankalp Sinha et.al. | 2405.03660 | null |
2024-05-06 | Deep Space Separable Distillation for Lightweight Acoustic Scene Classification | ShuQi Ye et.al. | 2405.03567 | null |
2024-05-06 | Liberating Seen Classes: Boosting Few-Shot and Zero-Shot Text Classification via Anchor Generation and Classification Reframing | Han Liu et.al. | 2405.03565 | null |
2024-05-06 | A Lightweight Neural Architecture Search Model for Medical Image Classification | Lunchen Xie et.al. | 2405.03462 | null |
2024-05-06 | Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification | Matteo Bianchi et.al. | 2405.03301 | null |
2024-05-06 | TED: Accelerate Model Training by Internal Generalization | Jinying Xiao et.al. | 2405.03228 | null |
2024-05-06 | Advancing Multimodal Medical Capabilities of Gemini | Lin Yang et.al. | 2405.03162 | null |
2024-05-05 | A scoping review of using Large Language Models (LLMs) to investigate Electronic Health Records (EHRs) | Lingyao Li et.al. | 2405.03066 | null |
2024-05-05 | Parameter-Efficient Fine-Tuning with Discrete Fourier Transform | Ziqi Gao et.al. | 2405.03003 | null |
2024-05-04 | MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning | Vishal Nedungadi et.al. | 2405.02771 | null |
2024-05-03 | Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification | Siqi Yin et.al. | 2405.02155 | null |
2024-05-03 | The Trade-off between Performance, Efficiency, and Fairness in Adapter Modules for Text Classification | Minh Duc Bui et.al. | 2405.02010 | null |
2024-05-03 | Which Identities Are Mobilized: Towards an automated detection of social group appeals in political texts | Felicia Riethmüller et.al. | 2405.01904 | null |
2024-05-02 | PVF (Parameter Vulnerability Factor): A Quantitative Metric Measuring AI Vulnerability and Resilience Against Parameter Corruptions | Xun Jiao et.al. | 2405.01741 | null |
2024-05-02 | Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey | Guoping Xu et.al. | 2405.01725 | link |
2024-05-02 | SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients | Tushar Verma et.al. | 2405.01699 | null |
2024-05-02 | Explainable AI (XAI) in Image Segmentation in Medicine, Industry, and Beyond: A Survey | Rokas Gipiškis et.al. | 2405.01636 | null |
2024-05-02 | Improving Intervention Efficacy via Concept Realignment in Concept Bottleneck Models | Nishad Singhi et.al. | 2405.01531 | null |
2024-05-03 | Decoupling Feature Extraction and Classification Layers for Calibrated Neural Networks | Mikkel Jordahn et.al. | 2405.01196 | null |
2024-05-02 | Uncertainty-aware self-training with expectation maximization basis transformation | Zijia Wang et.al. | 2405.01175 | null |
2024-05-02 | Transformers Fusion across Disjoint Samples for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2405.01095 | null |
2024-05-02 | Efficient and Flexible Method for Reducing Moderate-size Deep Neural Networks with Condensation | Tianyi Chen et.al. | 2405.01041 | null |
2024-05-02 | Benchmarking Representations for Speech, Music, and Acoustic Events | Moreno La Quatra et.al. | 2405.00934 | link |
2024-05-01 | Digital-analog quantum convolutional neural networks for image classification | Anton Simen et.al. | 2405.00548 | null |
2024-05-03 | BiomedRAG: A Retrieval Augmented Large Language Model for Biomedicine | Mingchen Li et.al. | 2405.00465 | null |
2024-05-01 | Visual and audio scene classification for detecting discrepancies in video: a baseline method and experimental protocol | Konstantinos Apostolidis et.al. | 2405.00384 | null |
2024-05-01 | Data Augmentation Policy Search for Long-Term Forecasting | Liran Nochumsohn et.al. | 2405.00319 | null |
2024-04-30 | Let's Focus: Focused Backdoor Attack against Federated Transfer Learning | Marco Arazzi et.al. | 2404.19420 | null |
2024-04-30 | Large Language Model Informed Patent Image Retrieval | Hao-Cheng Lo et.al. | 2404.19360 | null |
2024-04-30 | Enhancing Intrinsic Features for Debiasing via Investigating Class-Discerning Common Attributes in Bias-Contrastive Pair | Jeonghoon Park et.al. | 2404.19250 | null |
2024-04-29 | Spectral-Spatial Mamba for Hyperspectral Image Classification | Lingbo Huang et.al. | 2404.18401 | null |
2024-04-28 | TextGram: Towards a better domain-adaptive pretraining | Sharayu Hiwarkhedkar et.al. | 2404.18228 | null |
2024-04-28 | L3Cube-MahaNews: News-based Short Text and Long Document Classification Datasets in Marathi | Saloni Mittal et.al. | 2404.18216 | link |
2024-04-28 | S |
Guanchun Wang et.al. | 2404.18213 | null |
2024-04-27 | Implicit Generative Prior for Bayesian Neural Networks | Yijia Liu et.al. | 2404.18008 | link |
2024-04-27 | Towards Privacy-Preserving Audio Classification Systems | Bhawana Chhaglani et.al. | 2404.18002 | null |
2024-04-27 | A Method of Moments Embedding Constraint and its Application to Semi-Supervised Learning | Michael Majurski et.al. | 2404.17978 | null |
2024-04-27 | Spatial, Temporal, and Geometric Fusion for Remote Sensing Images | Hessah Albanwan et.al. | 2404.17851 | null |
2024-04-27 | Leveraging Cross-Modal Neighbor Representation for Improved CLIP Classification | Chao Yi et.al. | 2404.17753 | link |
2024-04-26 | SPLICE -- Streamlining Digital Pathology Image Processing | Areej Alsaafin et.al. | 2404.17704 | null |
2024-04-26 | SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes | Georgia Baltsou et.al. | 2404.17255 | null |
2024-04-25 | Incorporating Lexical and Syntactic Knowledge for Unsupervised Cross-Lingual Transfer | Jianyu Zheng et.al. | 2404.16627 | link |
2024-04-25 | IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks | Zitong Huang et.al. | 2404.16331 | null |
2024-04-25 | Lacunarity Pooling Layers for Plant Image Classification using Texture Analysis | Akshatha Mohan et.al. | 2404.16268 | link |
2024-04-24 | MiMICRI: Towards Domain-centered Counterfactual Explanations of Cardiovascular Image Classification Models | Grace Guo et.al. | 2404.16174 | null |
2024-04-24 | MoDE: CLIP Data Experts via Clustering | Jiawei Ma et.al. | 2404.16030 | link |
2024-04-26 | A Survey on Visual Mamba | Hanwei Zhang et.al. | 2404.15956 | null |
2024-04-24 | Vision Transformer-based Adversarial Domain Adaptation | Yahan Li et.al. | 2404.15817 | link |
2024-04-24 | Rethinking Model Prototyping through the MedMNIST+ Dataset Collection | Sebastian Doerrich et.al. | 2404.15786 | null |
2024-04-24 | Efficient Multi-Model Fusion with Adversarial Complementary Representation Learning | Zuheng Kang et.al. | 2404.15704 | null |
2024-04-24 | Brain Storm Optimization Based Swarm Learning for Diabetic Retinopathy Image Classification | Liang Qu et.al. | 2404.15585 | null |
2024-04-23 | An MRP Formulation for Supervised Learning: Generalized Temporal Difference Learning Models | Yangchen Pan et.al. | 2404.15518 | null |
2024-04-23 | Deep multi-prototype capsule networks | Saeid Abbassi et.al. | 2404.15445 | null |
2024-04-23 | A review of deep learning-based information fusion techniques for multimodal medical image classification | Yihao Li et.al. | 2404.15022 | null |
2024-04-23 | Social Media and Artificial Intelligence for Sustainable Cities and Societies: A Water Quality Analysis Use-case | Muhammad Asif Auyb et.al. | 2404.14977 | null |
2024-04-23 | Traditional to Transformers: A Survey on Current Trends and Future Prospects for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2404.14955 | link |
2024-04-23 | Pyramid Hierarchical Transformer for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2404.14945 | link |
2024-04-23 | Importance of Disjoint Sampling in Conventional and Transformer Models for Hyperspectral Image Classification | Muhammad Ahmad et.al. | 2404.14944 | link |
2024-04-23 | CoProNN: Concept-based Prototypical Nearest Neighbors for Explaining Vision Models | Teodor Chiaburu et.al. | 2404.14830 | link |
2024-04-22 | WangLab at MEDIQA-M3G 2024: Multimodal Medical Answer Generation using Large Language Models | Ronald Xie et.al. | 2404.14567 | null |
2024-04-22 | CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective | Wencheng Zhu et.al. | 2404.14109 | null |
2024-04-21 | EncodeNet: A Framework for Boosting DNN Accuracy with Entropy-driven Generalized Converting Autoencoder | Hasanul Mahmud et.al. | 2404.13770 | null |
2024-04-21 | PEACH: Pretrained-embedding Explanation Across Contextual and Hierarchical Structure | Feiqi Cao et.al. | 2404.13645 | link |
2024-04-21 | I2CANSAY:Inter-Class Analogical Augmentation and Intra-Class Significance Analysis for Non-Exemplar Online Task-Free Continual Learning | Songlin Dong et.al. | 2404.13576 | null |
2024-04-21 | IMO: Greedy Layer-Wise Sparse Representation Learning for Out-of-Distribution Text Classification with Pre-trained Models | Tao Feng et.al. | 2404.13504 | null |
2024-04-20 | Nested-TNT: Hierarchical Vision Transformers with Multi-Scale Feature Processing | Yuang Liu et.al. | 2404.13434 | null |
2024-04-20 | Evaluating Subword Tokenization: Alien Subword Composition and OOV Generalization Challenge | Khuyagbaatar Batsuren et.al. | 2404.13292 | link |
2024-04-20 | 3D-Convolution Guided Spectral-Spatial Transformer for Hyperspectral Image Classification | Shyam Varahagiri et.al. | 2404.13252 | link |
2024-04-19 | On-board classification of underwater images using hybrid classical-quantum CNN based method | Sreeraj Rajan Warrier et.al. | 2404.13130 | null |
2024-04-19 | Next Generation Loss Function for Image Classification | Shakhnaz Akhmedova et.al. | 2404.12948 | null |
2024-04-19 | A Hybrid Generative and Discriminative PointNet on Unordered Point Sets | Yang Ye et.al. | 2404.12925 | null |
2024-04-19 | Transformer-Based Classification Outcome Prediction for Multimodal Stroke Treatment | Danqing Ma et.al. | 2404.12634 | null |
2024-04-18 | When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes | Asaf Yehudai et.al. | 2404.12365 | null |
2024-04-18 | Observation, Analysis, and Solution: Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training | Jin Gao et.al. | 2404.12210 | link |
2024-04-18 | Concept Induction using LLMs: a user experiment for assessment | Adrita Barua et.al. | 2404.11875 | null |
2024-04-17 | Pretraining Billion-scale Geospatial Foundational Models on Frontier | Aristeidis Tsaris et.al. | 2404.11706 | null |
2024-04-17 | AI-Enhanced Cognitive Behavioral Therapy: Deep Learning and Large Language Models for Extracting Cognitive Pathways from Social Media Texts | Meng Jiang et.al. | 2404.11449 | null |
2024-04-17 | Achieving Rotation Invariance in Convolution Operations: Shifting from Data-Driven to Mechanism-Assured | Hanlin Mo et.al. | 2404.11309 | null |
2024-04-17 | A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene | Wenbo Zhang et.al. | 2404.11249 | null |
2024-04-17 | A Novel ICD Coding Framework Based on Associated and Hierarchical Code Description Distillation | Bin Zhang et.al. | 2404.11132 | null |
2024-04-17 | Small Language Models are Good Too: An Empirical Study of Zero-Shot Classification | Pierre Lepagnol et.al. | 2404.11122 | null |
2024-04-18 | Supervised Contrastive Vision Transformer for Breast Histopathological Image Classification | Mohammad Shiri et.al. | 2404.11052 | null |
2024-04-17 | InfoMatch: Entropy Neural Estimation for Semi-Supervised Image Classification | Qi Han et.al. | 2404.11003 | link |
2024-04-16 | Incubating Text Classifiers Following User Instruction with Nothing but LLM | Letian Peng et.al. | 2404.10877 | null |
2024-04-16 | Vocabulary-free Image Classification and Semantic Segmentation | Alessandro Conti et.al. | 2404.10864 | link |
2024-04-16 | Assessing The Impact of CNN Auto Encoder-Based Image Denoising on Image Classification Tasks | Mohsen Hami et.al. | 2404.10664 | null |
2024-04-16 | Tree Bandits for Generative Bayes | Sean O'Hagan et.al. | 2404.10436 | null |
2024-04-16 | AudioProtoPNet: An interpretable deep learning model for bird sound classification | René Heinrich et.al. | 2404.10420 | null |
2024-04-16 | Lighter, Better, Faster Multi-Source Domain Adaptation with Gaussian Mixture Models and Optimal Transport | Eduardo Fernandes Montesuma et.al. | 2404.10261 | null |
2024-04-15 | Distributed Federated Learning-Based Deep Learning Model for Privacy MRI Brain Tumor Detection | Lisang Zhou et.al. | 2404.10026 | null |
2024-04-15 | Interaction as Explanation: A User Interaction-based Method for Explaining Image Classification Models | Hyeonggeun Yun et.al. | 2404.09828 | null |
2024-04-15 | Quantization of Large Language Models with an Overdetermined Basis | Daniil Merkulov et.al. | 2404.09737 | null |
2024-04-15 | Pseudo-label Learning with Calibrated Confidence Using an Energy-based Model | Masahito Toba et.al. | 2404.09585 | null |
2024-04-14 | Breast Cancer Image Classification Method Based on Deep Transfer Learning | Weimin Wang et.al. | 2404.09226 | null |
2024-04-14 | Coreset Selection for Object Detection | Hojun Lee et.al. | 2404.09161 | null |
2024-04-13 | Exploring Explainability in Video Action Recognition | Avinab Saha et.al. | 2404.09067 | null |
2024-04-13 | Fast Fishing: Approximating BAIT for Efficient and Scalable Deep Active Image Classification | Denis Huseljic et.al. | 2404.08981 | link |
2024-04-13 | PM2: A New Prompting Multi-modal Model Paradigm for Few-shot Medical Image Classification | Zhenwei Wang et.al. | 2404.08915 | null |
2024-04-12 | VertAttack: Taking advantage of Text Classifiers' horizontal vision | Jonathan Rusert et.al. | 2404.08538 | null |
2024-04-12 | SpectralMamba: Efficient Mamba for Hyperspectral Image Classification | Jing Yao et.al. | 2404.08489 | null |
2024-04-12 | OTTER: Improving Zero-Shot Classification via Optimal Transport | Changho Shin et.al. | 2404.08461 | null |
2024-04-12 | A Survey of Neural Network Robustness Assessment in Image Recognition | Jie Wang et.al. | 2404.08285 | null |
2024-04-12 | Convolutional neural network classification of cancer cytopathology images: taking breast cancer as an example | MingXuan Xiao et.al. | 2404.08279 | null |
2024-04-11 | HGRN2: Gated Linear RNNs with State Expansion | Zhen Qin et.al. | 2404.07904 | link |
2024-04-11 | Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification | Ricardo Pereira et.al. | 2404.07739 | null |
2024-04-11 | Contrastive-Based Deep Embeddings for Label Noise-Resilient Histopathology Image Classification | Lucas Dedieu et.al. | 2404.07605 | link |
2024-04-11 | Learning to Classify New Foods Incrementally Via Compressed Exemplars | Justin Yang et.al. | 2404.07507 | null |
2024-04-11 | Interactive Prompt Debugging with Sequence Salience | Ian Tenney et.al. | 2404.07498 | null |
2024-04-11 | Privacy preserving layer partitioning for Deep Neural Network models | Kishore Rajasekar et.al. | 2404.07437 | null |
2024-04-11 | CopilotCAD: Empowering Radiologists with Report Completion Models and Quantitative Evidence from Medical Image Foundation Models | Sheng Wang et.al. | 2404.07424 | null |
2024-04-11 | Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling | Sourajit Saha et.al. | 2404.07410 | null |
2024-04-10 | Lost in Translation: Modern Neural Networks Still Struggle With Small Realistic Image Transformations | Ofir Shifman et.al. | 2404.07153 | null |
2024-04-10 | Learning of deep convolutional network image classifiers via stochastic gradient descent and over-parametrization | Michael Kohler et.al. | 2404.07128 | null |
2024-04-10 | Accelerating Cardiac MRI Reconstruction with CMRatt: An Attention-Driven Approach | Anam Hashmi et.al. | 2404.06941 | null |
2024-04-10 | Multi-Label Continual Learning for the Medical Domain: A Novel Benchmark | Marina Ceccon et.al. | 2404.06859 | null |
2024-04-10 | Neural Optimizer Equation, Decay Function, and Learning Rate Schedule Joint Evolution | Brandon Morgan et.al. | 2404.06679 | null |
2024-04-09 | Variational Stochastic Gradient Descent for Deep Neural Networks | Haotian Chen et.al. | 2404.06549 | link |
2024-04-09 | On adversarial training and the 1 Nearest Neighbor classifier | Amir Hagai et.al. | 2404.06313 | link |
2024-04-09 | Audio-Visual Generalized Zero-Shot Learning using Pre-Trained Large Multi-Modal Models | David Kurzendörfer et.al. | 2404.06309 | link |
2024-04-09 | Counterfactual Reasoning for Multi-Label Image Classification via Patching-Based Training | Ming-Kun Xie et.al. | 2404.06287 | null |
2024-04-09 | Quantum Circuit |
Yuka Hashimoto et.al. | 2404.06218 | null |
2024-04-09 | VI-OOD: A Unified Representation Learning Framework for Textual Out-of-distribution Detection | Li-Ming Zhan et.al. | 2404.06217 | link |
2024-04-09 | Symmetry-guided gradient descent for quantum neural networks | Kaiming Bian et.al. | 2404.06108 | null |
2024-04-10 | Using Few-Shot Learning to Classify Primary Lung Cancer and Other Malignancy with Lung Metastasis in Cytological Imaging via Endobronchial Ultrasound Procedures | Ching-Kai Lin et.al. | 2404.06080 | null |
2024-04-08 | Neural Cellular Automata for Lightweight, Robust and Explainable Classification of White Blood Cell Images | Michael Deutges et.al. | 2404.05584 | null |
2024-04-08 | On the Convergence of Continual Learning with Adaptive Methods | Seungyub Han et.al. | 2404.05555 | null |
2024-04-08 | Multi-Task Learning for Features Extraction in Financial Annual Reports | Syrielle Montariol et.al. | 2404.05281 | link |
2024-04-08 | Allowing humans to interactively guide machines where to look does not always improve a human-AI team's classification accuracy | Giang Nguyen et.al. | 2404.05238 | null |
2024-04-08 | iVPT: Improving Task-relevant Information Sharing in Visual Prompt Tuning by Cross-layer Dynamic Connection | Nan Zhou et.al. | 2404.05207 | null |
2024-04-08 | Semantic Stealth: Adversarial Text Attacks on NLP Using Several Methods | Roopkatha Dey et.al. | 2404.05159 | null |
2024-04-07 | PairAug: What Can Augmented Image-Text Pairs Do for Radiology? | Yutong Xie et.al. | 2404.04960 | link |
2024-04-07 | GvT: A Graph-based Vision Transformer with Talking-Heads Utilizing Sparsity, Trained from Scratch on Small Datasets | Dongjing Shan et.al. | 2404.04924 | null |
2024-04-06 | Focused Active Learning for Histopathological Image Classification | Arne Schmidt et.al. | 2404.04663 | null |
2024-04-06 | Trustless Audits without Revealing Data or Models | Suppakit Waiwitlikhit et.al. | 2404.04500 | null |
2024-04-05 | Evaluating Adversarial Robustness: A Comparison Of FGSM, Carlini-Wagner Attacks, And The Role of Distillation as Defense Mechanism | Trilokesh Ranjan Sarkar et.al. | 2404.04245 | null |
2024-04-05 | Noisy Label Processing for Classification: A Survey | Mengting Li et.al. | 2404.04159 | null |
2024-04-05 | Learning Correlation Structures for Vision Transformers | Manjin Kim et.al. | 2404.03924 | null |
2024-04-05 | LiDAR-Guided Cross-Attention Fusion for Hyperspectral Band Selection and Image Classification | Judy X Yang et.al. | 2404.03883 | null |
2024-04-04 | Dendrites endow artificial neural networks with accurate, robust and parameter-efficient learning | Spyridon Chavlis et.al. | 2404.03708 | null |
2024-04-05 | A Methodology to Study the Impact of Spiking Neural Network Parameters considering Event-Based Automotive Data | Iqra Bano et.al. | 2404.03493 | null |
2024-04-04 | Meta Invariance Defense Towards Generalizable Robustness to Unknown Adversarial Attacks | Lei Zhang et.al. | 2404.03340 | null |
2024-04-04 | Sparse Concept Bottleneck Models: Gumbel Tricks in Contrastive Learning | Andrei Semenov et.al. | 2404.03323 | link |
2024-04-04 | FACTUAL: A Novel Framework for Contrastive Learning Based Robust SAR Image Classification | Xu Wang et.al. | 2404.03225 | null |
2024-04-03 | Exploring the Trade-off Between Model Performance and Explanation Plausibility of Text Classifiers Using Human Rationales | Lucas E. Resck et.al. | 2404.03098 | link |
2024-04-03 | Guarantees of confidentiality via Hammersley-Chapman-Robbins bounds | Kamalika Chaudhuri et.al. | 2404.02866 | link |
2024-04-03 | FPT: Feature Prompt Tuning for Few-shot Readability Assessment | Ziyang Wang et.al. | 2404.02772 | link |
2024-04-03 | Adversarial Attacks and Dimensionality in Text Classifiers | Nandish Chattopadhyay et.al. | 2404.02660 | null |
2024-04-04 | Non-negative Subspace Feature Representation for Few-shot Learning in Medical Imaging | Keqiang Fan et.al. | 2404.02656 | null |
2024-04-03 | Adaptive Cross-lingual Text Classification through In-Context One-Shot Demonstrations | Emilio Villa-Cueva et.al. | 2404.02452 | link |
2024-04-03 | A Novel Approach to Breast Cancer Histopathological Image Classification Using Cross-Colour Space Feature Fusion and Quantum-Classical Stack Ensemble Method | Sambit Mallick et.al. | 2404.02447 | null |
2024-04-03 | Enhancing Low-Resource LLMs Classification with PEFT and Synthetic Data | Parth Patwa et.al. | 2404.02422 | null |
2024-04-02 | Smooth Deep Saliency | Rudolf Herdt et.al. | 2404.02282 | null |
2024-04-02 | Visual Concept Connectome (VCC): Open World Concept Discovery and their Interlayer Connections in Deep Models | Matthew Kowal et.al. | 2404.02233 | null |
2024-04-02 | ImageNot: A contrast with ImageNet preserves model rankings | Olawale Salaudeen et.al. | 2404.02112 | null |
2024-04-02 | Explainability in JupyterLab and Beyond: Interactive XAI Systems for Integrated and Collaborative Workflows | Grace Guo et.al. | 2404.02081 | null |
2024-04-02 | Ukrainian Texts Classification: Exploration of Cross-lingual Knowledge Transfer Approaches | Daryna Dementieva et.al. | 2404.02043 | null |
2024-04-02 | CAM-Based Methods Can See through Walls | Magamed Taimeskhanov et.al. | 2404.01964 | link |
2024-04-02 | Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss | Jaeha Kim et.al. | 2404.01692 | null |
2024-04-02 | A Universal Knowledge Embedded Contrastive Learning Framework for Hyperspectral Image Classification | Quanwei Liu et.al. | 2404.01673 | null |
2024-04-01 | Can Biases in ImageNet Models Explain Generalization? | Paul Gavrikov et.al. | 2404.01509 | link |
2024-04-01 | Parallel Proportional Fusion of Spiking Quantum Neural Network for Optimizing Image Classification | Zuyu Xu et.al. | 2404.01359 | null |
2024-04-01 | Bridging Remote Sensors with Multisensor Geospatial Foundation Models | Boran Han et.al. | 2404.01260 | link |
2024-04-01 | Diagnosis of Skin Cancer Using VGG16 and VGG19 Based Transfer Learning Models | Amir Faghihi et.al. | 2404.01160 | null |
2024-03-29 | Learn "No" to Say "Yes" Better: Improving Vision-Language Models via Negations | Jaisidh Singh et.al. | 2403.20312 | link |
2024-03-29 | MCNet: A crowd denstity estimation network based on integrating multiscale attention module | Qiang Guo et.al. | 2403.20173 | null |
2024-03-29 | Segmentation, Classification and Interpretation of Breast Cancer Medical Images using Human-in-the-Loop Machine Learning | David Vázquez-Lema et.al. | 2403.20112 | null |
2024-03-29 | Adverb Is the Key: Simple Text Data Augmentation with Adverb Deletion | Juhwan Choi et.al. | 2403.20015 | null |
2024-03-29 | Diverse Feature Learning by Self-distillation and Reset | Sejik Park et.al. | 2403.19941 | null |
2024-03-29 | Heterogeneous Network Based Contrastive Learning Method for PolSAR Land Cover Classification | Jianfeng Cai et.al. | 2403.19902 | link |
2024-03-28 | X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization | Anna Kukleva et.al. | 2403.19811 | link |
2024-03-28 | RSMamba: Remote Sensing Image Classification with State Space Model | Keyan Chen et.al. | 2403.19654 | link |
2024-03-28 | Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model | Zhicai Wang et.al. | 2403.19600 | link |
2024-03-28 | The Bad Batches: Enhancing Self-Supervised Learning in Image Classification Through Representative Batch Curation | Ozgu Goksu et.al. | 2403.19579 | null |
2024-03-28 | Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach | Wei Dong et.al. | 2403.19067 | link |
2024-03-27 | Evaluating Large Language Models for Health-Related Text Classification Tasks with Public Social Media Data | Yuting Guo et.al. | 2403.19031 | null |
2024-03-27 | Robustness and Visual Explanation for Black Box Image, Video, and ECG Signal Classification with Reinforcement Learning | Soumyendu Sarkar et.al. | 2403.18985 | null |
2024-03-27 | The Impact of Uniform Inputs on Activation Sparsity and Energy-Latency Attacks in Computer Vision | Andreas Müller et.al. | 2403.18587 | link |
2024-03-27 | Uncertainty-Aware SAR ATR: Defending Against Adversarial Attacks via Bayesian Neural Networks | Tian Ye et.al. | 2403.18318 | null |
2024-03-27 | Multi-scale Unified Network for Image Classification | Wenzhuo Liu et.al. | 2403.18294 | null |
2024-03-26 | The Need for Speed: Pruning Transformers with One Recipe | Samir Khaki et.al. | 2403.17921 | link |
2024-03-26 | Compressed Multi-task embeddings for Data-Efficient Downstream training and inference in Earth Observation | Carlos Gomes et.al. | 2403.17886 | null |
2024-03-26 | PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition | Chenhongyi Yang et.al. | 2403.17695 | link |
2024-03-26 | Language Models for Text Classification: Is In-Context Learning Enough? | Aleksandra Edwards et.al. | 2403.17661 | null |
2024-03-26 | Boosting Few-Shot Learning with Disentangled Self-Supervised Learning and Meta-Learning for Medical Image Classification | Eva Pachetti et.al. | 2403.17530 | null |
2024-03-26 | HILL: Hierarchy-aware Information Lossless Contrastive Learning for Hierarchical Text Classification | He Zhu et.al. | 2403.17307 | link |
2024-03-25 | Histogram Layers for Neural Engineered Features | Joshua Peeples et.al. | 2403.17176 | link |
2024-03-25 | Task2Box: Box Embeddings for Modeling Asymmetric Task Relationships | Rangel Daroya et.al. | 2403.17173 | link |
2024-03-25 | CipherFormer: Efficient Transformer Private Inference with Low Round Complexity | Weize Wang et.al. | 2403.16860 | null |
2024-03-25 | Assessing the Performance of Deep Learning for Automated Gleason Grading in Prostate Cancer | Dominik Müller et.al. | 2403.16695 | null |
2024-03-25 | DeepGleason: a System for Automated Gleason Grading of Prostate Cancer using Deep Neural Networks | Dominik Müller et.al. | 2403.16678 | link |
2024-03-25 | LARA: Linguistic-Adaptive Retrieval-Augmented LLMs for Multi-Turn Intent Classification | Liu Junhua et.al. | 2403.16504 | null |
2024-03-24 | On machine learning analysis of atomic force microscopy images for image classification, sample surface recognition | Igor Sokolov et.al. | 2403.16230 | null |
2024-03-24 | Leveraging Deep Learning and Xception Architecture for High-Accuracy MRI Classification in Alzheimer Diagnosis | Shaojie Li et.al. | 2403.16212 | null |
2024-03-24 | Multi-Task Learning with Multi-Task Optimization | Lu Bai et.al. | 2403.16162 | null |
2024-03-24 | CBGT-Net: A Neuromimetic Architecture for Robust Classification of Streaming Data | Shreya Sharma et.al. | 2403.15974 | link |
2024-03-23 | A Deep Learning Architectures for Kidney Disease Classification | Muhammad Shoaib Farooq et.al. | 2403.15895 | null |
2024-03-23 | VLUE: A New Benchmark and Multi-task Knowledge Transfer Learning for Vietnamese Natural Language Understanding | Phong Nguyen-Thuan Do et.al. | 2403.15882 | null |
2024-03-23 | VLM-CPL: Consensus Pseudo Labels from Vision-Language Models for Human Annotation-Free Pathological Image Classification | Lanfeng Zhong et.al. | 2403.15836 | null |
2024-03-22 | Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion | Sofia Casarin et.al. | 2403.15194 | null |
2024-03-22 | Image Classification with Rotation-Invariant Variational Quantum Circuits | Paul San Sebastian et.al. | 2403.15031 | null |
2024-03-22 | Extracting Human Attention through Crowdsourced Patch Labeling | Minsuk Chang et.al. | 2403.15013 | null |
2024-03-22 | Clean-image Backdoor Attacks | Dazhong Rong et.al. | 2403.15010 | null |
2024-03-22 | ParFormer: Vision Transformer Baseline with Parallel Local Global Token Mixer and Convolution Attention Patch Embedding | Novendra Setyawan et.al. | 2403.15004 | null |
2024-03-22 | MasonTigers at SemEval-2024 Task 8: Performance Analysis of Transformer-based Models on Machine-Generated Text Detection | Sadiya Sayara Chowdhury Puspo et.al. | 2403.14989 | null |
2024-03-21 | Learning with SASQuaTCh: a Novel Variational Quantum Transformer Architecture with Kernel-Based Self-Attention | Ethan N. Evans et.al. | 2403.14753 | null |
2024-03-21 | Estimating Physical Information Consistency of Channel Data Augmentation for Remote Sensing Images | Tom Burgert et.al. | 2403.14547 | null |
2024-03-21 | Multi-Level Explanations for Generative Language Models | Lucas Monteiro Paes et.al. | 2403.14459 | null |
2024-03-21 | Tensor network compressibility of convolutional models | Sukhbinder Singh et.al. | 2403.14379 | null |
2024-03-21 | LayoutLLM: Large Language Model Instruction Tuning for Visually Rich Document Understanding | Masato Fujitake et.al. | 2403.14252 | null |
2024-03-21 | Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations | Xun Lin et.al. | 2403.14250 | null |
2024-03-21 | Improving Image Classification Accuracy through Complementary Intra-Class and Inter-Class Mixup | Ye Xu et.al. | 2403.14137 | link |
2024-03-20 | Bridge the Modality and Capacity Gaps in Vision-Language Model Selection | Chao Yi et.al. | 2403.13797 | null |
2024-03-20 | Leveraging feature communication in federated learning for remote sensing image classification | Anh-Kiet Duong et.al. | 2403.13575 | null |
2024-03-20 | MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining | Di Wang et.al. | 2403.13430 | link |
2024-03-20 | Building Optimal Neural Architectures using Interpretable Knowledge | Keith G. Mills et.al. | 2403.13293 | link |
2024-03-19 | LUWA Dataset: Learning Lithic Use-Wear Analysis on Microscopic Images | Jing Zhang et.al. | 2403.13171 | null |
2024-03-19 | Improved EATFormer: A Vision Transformer for Medical Image Classification | Yulong Shisu et.al. | 2403.13167 | null |
2024-03-19 | SIFT-DBT: Self-supervised Initialization and Fine-Tuning for Imbalanced Digital Breast Tomosynthesis Image Classification | Yuexi Du et.al. | 2403.13148 | link |
2024-03-19 | Using evolutionary computation to optimize task performance of unclocked, recurrent Boolean circuits in FPGAs | Raphael Norman-Tenazas et.al. | 2403.13105 | null |
2024-03-19 | Investigating Text Shortening Strategy in BERT: Truncation vs Summarization | Mirza Alim Mutasodirin et.al. | 2403.12799 | link |
2024-03-18 | Posterior Uncertainty Quantification in Neural Networks using Data Augmentation | Luhuan Wu et.al. | 2403.12729 | null |
2024-03-19 | SEVEN: Pruning Transformer Model by Reserving Sentinels | Jinying Xiao et.al. | 2403.12688 | link |
2024-03-19 | Simple Hack for Transformers against Heavy Long-Text Classification on a Time- and Memory-Limited GPU Service | Mirza Alim Mutasodirin et.al. | 2403.12563 | null |
2024-03-19 | Prompt-Guided Adaptive Model Transformation for Whole Slide Image Classification | Yi Lin et.al. | 2403.12537 | null |
2024-03-19 | CrossTune: Black-Box Few-Shot Classification with Label Enhancement | Danqing Luo et.al. | 2403.12468 | null |
2024-03-18 | Generalizing deep learning models for medical image classification | Matta Sarah et.al. | 2403.12167 | null |
2024-03-19 | Leveraging Spatial and Semantic Feature Extraction for Skin Cancer Diagnosis with Capsule Networks and Graph Neural Networks | K. P. Santoso et.al. | 2403.12009 | null |
2024-03-18 | High-energy physics image classification: A Survey of Jet Applications | Hamza Kheddar et.al. | 2403.11934 | null |
2024-03-18 | Better (pseudo-)labels for semi-supervised instance segmentation | François Porcher et.al. | 2403.11675 | null |
2024-03-18 | Continual Forgetting for Pre-trained Vision Models | Hongbo Zhao et.al. | 2403.11530 | link |
2024-03-18 | Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting | Mingkui Tan et.al. | 2403.11491 | null |
2024-03-17 | Potential of Domain Adaptation in Machine Learning in Ecology and Hydrology to Improve Model Extrapolability | Haiyang Shi et.al. | 2403.11331 | null |
2024-03-17 | A Modified Word Saliency-Based Adversarial Attack on Text Classification Models | Hetvi Waghela et.al. | 2403.11297 | null |
2024-03-17 | Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation | Silvia Corbara et.al. | 2403.11265 | null |
2024-03-17 | Multiple Teachers-Meticulous Student: A Domain Adaptive Meta-Knowledge Distillation Model for Medical Image Classification | Shahabedin Nabavi et.al. | 2403.11226 | null |
2024-03-16 | Forward Learning of Graph Neural Networks | Namyong Park et.al. | 2403.11004 | null |
2024-03-16 | Understanding Robustness of Visual State Space Models for Image Classification | Chengbin Du et.al. | 2403.10935 | null |
2024-03-16 | Automatic location detection based on deep learning | Anjali Karangiya et.al. | 2403.10912 | null |
2024-03-14 | Transformers Get Stable: An End-to-End Signal Propagation Theory for Language Models | Akhil Kedia et.al. | 2403.09635 | link |
2024-03-14 | XCoOp: Explainable Prompt Learning for Computer-Aided Diagnosis via Concept-guided Context Optimization | Yequan Bie et.al. | 2403.09410 | null |
2024-03-14 | ConDiSR: Contrastive Disentanglement and Style Regularization for Single Domain Generalization | Aleksandr Matsun et.al. | 2403.09400 | null |
2024-03-14 | A Hierarchical Fused Quantum Fuzzy Neural Network for Image Classification | Sheng-Yao Wu et.al. | 2403.09318 | null |
2024-03-14 | CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification | Yiming Ma et.al. | 2403.09281 | null |
2024-03-14 | Are Vision Language Models Texture or Shape Biased and Can We Steer Them? | Paul Gavrikov et.al. | 2403.09193 | null |
2024-03-14 | Randomized Principal Component Analysis for Hyperspectral Image Classification | Mustafa Ustuner et.al. | 2403.09117 | null |
2024-03-14 | CardioCaps: Attention-based Capsule Network for Class-Imbalanced Echocardiogram Classification | Hyunkyung Han et.al. | 2403.09108 | link |
2024-03-14 | The First to Know: How Token Distributions Reveal Hidden Knowledge in Large Vision-Language Models? | Qinyu Zhao et.al. | 2403.09037 | link |
2024-03-13 | PathM3: A Multimodal Multi-Task Multiple Instance Learning Framework for Whole Slide Image Classification and Captioning | Qifeng Zhou et.al. | 2403.08967 | null |
2024-03-13 | DAM: Dynamic Adapter Merging for Continual Video QA Learning | Feng Cheng et.al. | 2403.08755 | link |
2024-03-13 | Leveraging Compressed Frame Sizes For Ultra-Fast Video Classification | Yuxing Han et.al. | 2403.08580 | null |
2024-03-13 | HOLMES: HOLonym-MEronym based Semantic inspection for Convolutional Image Classifiers | Francesco Dibitonto et.al. | 2403.08536 | link |
2024-03-13 | Pig aggression classification using CNN, Transformers and Recurrent Networks | Junior Silva Souza et.al. | 2403.08528 | null |
2024-03-13 | Reduced Jeffries-Matusita distance: A Novel Loss Function to Improve Generalization Performance of Deep Classification Models | Mohammad Lashkari et.al. | 2403.08408 | null |
2024-03-13 | Iterative Online Image Synthesis via Diffusion Model for Imbalanced Classification | Shuhan Li et.al. | 2403.08407 | null |
2024-03-13 | Advancing Security in AI Systems: A Novel Approach to Detecting Backdoors in Deep Neural Networks | Khondoker Murad Hossain et.al. | 2403.08208 | null |
2024-03-13 | Multiscale Low-Frequency Memory Network for Improved Feature Extraction in Convolutional Neural Networks | Fuzhi Wu et.al. | 2403.08157 | link |
2024-03-12 | Harnessing Artificial Intelligence to Combat Online Hate: Exploring the Challenges and Opportunities of Large Language Models in Hate Speech Detection | Tharindu Kumarage et.al. | 2403.08035 | null |
2024-03-13 | Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion | Dongyang Li et.al. | 2403.07721 | link |
2024-03-12 | FPT: Fine-grained Prompt Tuning for Parameter and Memory Efficient Fine Tuning in High-resolution Medical Image Classification | Yijin Huang et.al. | 2403.07576 | null |
2024-03-12 | Backdoor Attack with Mode Mixture Latent Modification | Hongwei Zhang et.al. | 2403.07463 | null |
2024-03-12 | In-context learning enables multimodal large language models to classify cancer pathology images | Dyke Ferber et.al. | 2403.07407 | null |
2024-03-12 | Premonition: Using Generative Models to Preempt Future Data Changes in Continual Learning | Mark D. McDonnell et.al. | 2403.07356 | null |
2024-03-12 | How does promoting the minority fraction affect generalization? A theoretical study of the one-hidden-layer neural network on group imbalance | Hongkang Li et.al. | 2403.07310 | null |
2024-03-12 | A Bayesian Approach to OOD Robustness in Image Classification | Prakhar Kaushik et.al. | 2403.07277 | null |
2024-03-11 | LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations | Mohammad Alkhalefi et.al. | 2403.06813 | null |
2024-03-11 | Dynamic Perturbation-Adaptive Adversarial Training on Medical Image Classification | Shuai Li et.al. | 2403.06798 | null |
2024-03-11 | Leveraging Internal Representations of Model for Magnetic Image Classification | Adarsh N L et.al. | 2403.06797 | null |
2024-03-11 | Shortcut Learning in Medical Image Segmentation | Manxi Lin et.al. | 2403.06748 | null |
2024-03-11 | Active Generation for Image Classification | Tao Huang et.al. | 2403.06517 | null |
2024-03-11 | Evolving Knowledge Distillation with Large Language Models and Active Learning | Chengyuan Liu et.al. | 2403.06414 | null |
2024-03-11 | 'One size doesn't fit all': Learning how many Examples to use for In-Context Learning for Improved Text Classification | Manish Chandra et.al. | 2403.06402 | null |
2024-03-10 | Probing Image Compression For Class-Incremental Learning | Justin Yang et.al. | 2403.06288 | null |
2024-03-10 | Bayesian Random Semantic Data Augmentation for Medical Image Classification | Yaoyao Zhu et.al. | 2403.06138 | link |
2024-03-10 | Universal Debiased Editing for Fair Medical Image Classification | Ruinan Jin et.al. | 2403.06104 | null |
2024-03-08 | Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets | Lorenzo Brigato et.al. | 2403.05532 | null |
2024-03-08 | Generalized Correspondence Matching via Flexible Hierarchical Refinement and Patch Descriptor Distillation | Yu Han et.al. | 2403.05388 | null |
2024-03-08 | The Impact of Quantization on the Robustness of Transformer-based Text Classifiers | Seyed Parsa Neshaei et.al. | 2403.05365 | null |
2024-03-08 | Multiple Instance Learning with random sampling for Whole Slide Image Classification | H. Keshvarikhojasteh et.al. | 2403.05351 | null |
2024-03-08 | Learning Expressive And Generalizable Motion Features For Face Forgery Detection | Jingyi Zhang et.al. | 2403.05172 | null |
2024-03-08 | Defending Against Unforeseen Failure Modes with Latent Adversarial Training | Stephen Casper et.al. | 2403.05030 | link |
2024-03-07 | Fooling Neural Networks for Motion Forecasting via Adversarial Attacks | Edgar Medina et.al. | 2403.04954 | null |
2024-03-07 | T-TAME: Trainable Attention Mechanism for Explaining Convolutional Networks and Vision Transformers | Mariano V. Ntrougkas et.al. | 2403.04523 | null |
2024-03-07 | Source Matters: Source Dataset Impact on Model Robustness in Medical Imaging | Dovile Juodelyte et.al. | 2403.04484 | link |
2024-03-07 | Advancing Biomedical Text Mining with Community Challenges | Hui Zong et.al. | 2403.04261 | null |
2024-03-07 | Scalable On-Chip Optical Linear Processing Unit Using a Single Thin-Film Lithium Niobate Ring Modulator | Zhaoang Deng et.al. | 2403.04216 | null |
2024-03-07 | Scalable and Robust Transformer Decoders for Interpretable Image Classification with Foundation Models | Evelyn Mannix et.al. | 2403.04125 | null |
2024-03-07 | Privacy-preserving Fine-tuning of Large Language Models through Flatness | Tiejin Chen et.al. | 2403.04124 | null |
2024-03-06 | MedMamba: Vision Mamba for Medical Image Classification | Yubiao Yue et.al. | 2403.03849 | link |
2024-03-06 | On the Effectiveness of Distillation in Mitigating Backdoors in Pre-trained Encoder | Tingxu Han et.al. | 2403.03846 | link |
2024-03-06 | RADIA -- Radio Advertisement Detection with Intelligent Analytics | Jorge Álvarez et.al. | 2403.03538 | null |
2024-03-06 | Inverse-Free Fast Natural Gradient Descent Method for Deep Learning | Xinwei Ou et.al. | 2403.03473 | null |
2024-03-06 | Sparse Spiking Neural Network: Exploiting Heterogeneity in Timescales for Pruning Recurrent SNN | Biswadeep Chakraborty et.al. | 2403.03409 | null |
2024-03-05 | RulePrompt: Weakly Supervised Text Classification with Prompting PLMs and Self-Iterative Logical Rules | Miaomiao Li et.al. | 2403.02932 | link |
2024-03-05 | Demonstrating Mutual Reinforcement Effect through Information Flow | Chengguang Gan et.al. | 2403.02902 | null |
2024-03-05 | Quantum Mixed-State Self-Attention Network | Fu Chen et.al. | 2403.02871 | null |
2024-03-05 | SOFIM: Stochastic Optimization Using Regularized Fisher Information Matrix | Gayathri C et.al. | 2403.02833 | null |
2024-03-05 | SGD with Partial Hessian for Deep Neural Networks Optimization | Ying Sun et.al. | 2403.02681 | link |
2024-03-05 | G-EvoNAS: Evolutionary Neural Architecture Search Based on Network Growth | Juan Zou et.al. | 2403.02667 | null |
2024-03-05 | Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad | Sayantan Choudhury et.al. | 2403.02648 | link |
2024-03-05 | Modeling Collaborator: Enabling Subjective Vision Classification With Minimal Human Effort via LLM Tool-Use | Imad Eddine Toubal et.al. | 2403.02626 | null |
2024-03-04 | When do Convolutional Neural Networks Stop Learning? | Sahan Ahmad et.al. | 2403.02473 | link |
2024-03-04 | NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function | Abdullah Nazhat Abdullah et.al. | 2403.02411 | link |
2024-03-02 | Can a Confident Prior Replace a Cold Posterior? | Martin Marek et.al. | 2403.01272 | link |
2024-03-02 | Leveraging Self-Supervised Learning for Scene Recognition in Child Sexual Abuse Imagery | Pedro H. V. Valois et.al. | 2403.01183 | null |
2024-03-02 | Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation | Lian Xu et.al. | 2403.01156 | null |
2024-03-02 | ELA: Efficient Local Attention for Deep Convolutional Neural Networks | Wei Xu et.al. | 2403.01123 | null |
2024-03-01 | Margin Discrepancy-based Adversarial Training for Multi-Domain Text Classification | Yuan Wu et.al. | 2403.00888 | null |
2024-03-01 | Text classification of column headers with a controlled vocabulary: leveraging LLMs for metadata enrichment | Margherita Martorana et.al. | 2403.00884 | null |
2024-03-01 | SURE: SUrvey REcipes for building reliable and robust deep networks | Yuting Li et.al. | 2403.00543 | link |
2024-03-01 | Invariant Test-Time Adaptation for Vision-Language Model Generalization | Huan Ma et.al. | 2403.00376 | null |
2024-02-29 | TELEClass: Taxonomy Enrichment and LLM-Enhanced Hierarchical Text Classification with Minimal Supervision | Yunyi Zhang et.al. | 2403.00165 | null |
2024-02-29 | Assessing Visually-Continuous Corruption Robustness of Neural Networks Relative to Human Performance | Huakun Shen et.al. | 2402.19401 | null |
2024-02-29 | Stitching Gaps: Fusing Situated Perceptual Knowledge with Vision Transformers for High-Level Image Classification | Delfina Sol Martinez Pandiani et.al. | 2402.19339 | null |
2024-02-29 | Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction | Hao Li et.al. | 2402.19326 | null |
2024-02-29 | Decompose-and-Compose: A Compositional Approach to Mitigating Spurious Correlation | Fahimeh Hosseini Noohdani et.al. | 2402.18919 | null |
2024-02-29 | Utilizing Local Hierarchy with Adversarial Training for Hierarchical Text Classification | Zihan Wang et.al. | 2402.18825 | link |
2024-02-28 | Comparing Importance Sampling Based Methods for Mitigating the Effect of Class Imbalance | Indu Panigrahi et.al. | 2402.18742 | link |
2024-02-28 | Deep Neural Network Models Trained With A Fixed Random Classifier Transfer Better Across Domains | Hafiz Tiomoko Ali et.al. | 2402.18614 | null |
2024-02-28 | Orchid: Flexible and Data-Dependent Convolution for Sequence Modeling | Mahdi Karami et.al. | 2402.18508 | null |
2024-02-28 | Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization | Deng Li et.al. | 2402.18447 | null |
2024-02-29 | A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation | Francesco Barbato et.al. | 2402.18402 | null |
2024-02-28 | A Multimodal Handover Failure Detection Dataset and Baselines | Santosh Thoduka et.al. | 2402.18319 | null |
2024-02-28 | Classes Are Not Equal: An Empirical Study on Image Recognition Fairness | Jiequan Cui et.al. | 2402.18133 | null |
2024-02-27 | Understanding Neural Network Binarization with Forward and Backward Proximal Quantizers | Yiwei Lu et.al. | 2402.17710 | null |
2024-02-27 | SDF2Net: Shallow to Deep Feature Fusion Network for PolSAR Image Classification | Mohammed Q. Alkhatib et.al. | 2402.17672 | link |
2024-02-27 | Predict the Next Word: | Evgenia Ilia et.al. | 2402.17527 | null |
2024-02-27 | Scaling Supervised Local Learning with Augmented Auxiliary Networks | Chenxiang Ma et.al. | 2402.17318 | link |
2024-02-26 | Offline Writer Identification Using Convolutional Neural Network Activation Features | Vincent Christlein et.al. | 2402.17029 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-05-30 | RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection | Fangyi Chen et.al. | 2405.19854 | null |
2024-05-30 | Improving Object Detector Training on Synthetic Data by Starting With a Strong Baseline Methodology | Frank A. Ruis et.al. | 2405.19822 | null |
2024-05-30 | Towards Unified Multi-granularity Text Detection with Interactive Attention | Xingyu Wan et.al. | 2405.19765 | null |
2024-05-30 | Fully Test-Time Adaptation for Monocular 3D Object Detection | Hongbin Lin et.al. | 2405.19682 | null |
2024-05-30 | YotoR-You Only Transform One Representation | José Ignacio Díaz Villa et.al. | 2405.19629 | null |
2024-05-29 | Enabling Visual Recognition at Radio Frequency | Haowen Lai et.al. | 2405.19516 | null |
2024-05-29 | Model Agnostic Defense against Adversarial Patch Attacks on Object Detection in Unmanned Aerial Vehicles | Saurabh Pathak et.al. | 2405.19179 | null |
2024-05-29 | RGB-T Object Detection via Group Shuffled Multi-receptive Attention and Multi-modal Supervision | Jinzhong Wang et.al. | 2405.18955 | null |
2024-05-29 | SSGA-Net: Stepwise Spatial Global-local Aggregation Networks for for Autonomous Driving | Yiming Cui et.al. | 2405.18857 | null |
2024-05-29 | PillarHist: A Quantization-aware Pillar Feature Encoder based on Height-aware Histogram | Sifan Zhou et.al. | 2405.18734 | null |
2024-05-28 | A Review and Implementation of Object Detection Models and Optimizations for Real-time Medical Mask Detection during the COVID-19 Pandemic | Ioanna Gogou et.al. | 2405.18387 | link |
2024-05-28 | Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? | Yifan Bai et.al. | 2405.18361 | null |
2024-05-28 | Intent3D: 3D Object Detection in RGB-D Scans Based on Human Intention | Weitai Kang et.al. | 2405.18295 | null |
2024-05-28 | DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture | Shentong Mo et.al. | 2405.17995 | null |
2024-05-28 | Transformer and Hybrid Deep Learning Based Models for Machine-Generated Text Detection | Teodor-George Marchitan et.al. | 2405.17964 | null |
2024-05-28 | Self-supervised Pre-training for Transferable Multi-modal Perception | Xiaohao Xu et.al. | 2405.17942 | null |
2024-05-28 | Boosting General Trimap-free Matting in the Real-World Image | Leo Shan Wenzhang Zhou Grace Zhao et.al. | 2405.17916 | null |
2024-05-28 | The Binary Quantized Neural Network for Dense Prediction via Specially Designed Upsampling and Attention | Xingyu Ding et.al. | 2405.17776 | null |
2024-05-27 | Understanding differences in applying DETR to natural and medical images | Yanqi Xu et.al. | 2405.17677 | null |
2024-05-27 | Hardness-Aware Scene Synthesis for Semi-Supervised 3D Object Detection | Shuai Zeng et.al. | 2405.17422 | link |
2024-05-27 | Tracking Small Birds by Detection Candidate Region Filtering and Detection History-aware Association | Tingwei Liu et.al. | 2405.17323 | null |
2024-05-27 | Enhanced Automotive Radar Collaborative Sensing By Exploiting Constructive Interference | Lifan Xu et.al. | 2405.17297 | null |
2024-05-27 | SCaRL- A Synthetic Multi-Modal Dataset for Autonomous Driving | Avinash Nittur Ramesh et.al. | 2405.17030 | null |
2024-05-27 | Collective Perception Datasets for Autonomous Driving: A Comprehensive Review | Sven Teufel et.al. | 2405.16973 | null |
2024-05-27 | OED: Towards One-stage End-to-End Dynamic Scene Graph Generation | Guan Wang et.al. | 2405.16925 | link |
2024-05-27 | ContrastAlign: Toward Robust BEV Feature Alignment via Contrastive Learning for Multi-Modal 3D Object Detection | Ziying Song et.al. | 2405.16873 | null |
2024-05-27 | A re-calibration method for object detection with multi-modal alignment bias in autonomous driving | Zhihang Song et.al. | 2405.16848 | null |
2024-05-26 | A Study on Unsupervised Anomaly Detection and Defect Localization using Generative Model in Ultrasonic Non-Destructive Testing | Yusaku Ando et.al. | 2405.16580 | null |
2024-05-26 | AI-Generated Text Detection and Classification Based on BERT Deep Learning Algorithm | Hao Wang et.al. | 2405.16422 | null |
2024-05-24 | UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes | Ted Lentsch et.al. | 2405.15688 | null |
2024-05-24 | Multimodal Object Detection via Probabilistic a priori Information Integration | Hafsa El Hafyani et.al. | 2405.15596 | null |
2024-05-24 | Scale-Invariant Feature Disentanglement via Adversarial Learning for UAV-based Object Detection | Fan Liu et.al. | 2405.15465 | null |
2024-05-24 | Leveraging knowledge distillation for partial multi-task learning from multiple remote sensing datasets | Hoàng-Ân Lê et.al. | 2405.15394 | null |
2024-05-24 | Towards Global Optimal Visual In-Context Learning Prompt Selection | Chengming Xu et.al. | 2405.15279 | null |
2024-05-24 | Unbiased Faster R-CNN for Single-source Domain Generalized Object Detection | Yajing Liu et.al. | 2405.15225 | null |
2024-05-24 | ODGEN: Domain-specific Object Detection Data Generation with Diffusion Models | Jingyuan Zhu et.al. | 2405.15199 | null |
2024-05-24 | MonoDETRNext: Next-generation Accurate and Efficient Monocular 3D Object Detection Method | Pan Liao et.al. | 2405.15176 | null |
2024-05-23 | Learning to Detect and Segment Mobile Objects from Unlabeled Videos | Yihong Sun et.al. | 2405.14841 | null |
2024-05-23 | Designing A Sustainable Marine Debris Clean-up Framework without Human Labels | Raymond Wang et.al. | 2405.14815 | null |
2024-05-23 | Drones Help Drones: A Collaborative Framework for Multi-Drone Object Trajectory Prediction and Beyond | Zhechao Wang et.al. | 2405.14674 | null |
2024-05-23 | Improving Single Domain-Generalized Object Detection: A Focus on Diversification and Alignment | Muhammad Sohail Danish et.al. | 2405.14497 | null |
2024-05-23 | YOLOv10: Real-Time End-to-End Object Detection | Ao Wang et.al. | 2405.14458 | link |
2024-05-23 | Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations | Mohammed Baharoon et.al. | 2405.14239 | null |
2024-05-22 | Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation | Mykhailo Uss et.al. | 2405.14024 | null |
2024-05-22 | TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System | Diogo Lavado et.al. | 2405.13989 | null |
2024-05-22 | Class-Conditional self-reward mechanism for improved Text-to-Image models | Safouane El Ghazouali et.al. | 2405.13473 | link |
2024-05-22 | Adaptive Wireless Image Semantic Transmission and Over-The-Air Testing | Jiarun Ding et.al. | 2405.13403 | null |
2024-05-21 | BiomedParse: a biomedical foundation model for image parsing of everything everywhere all at once | Theodore Zhao et.al. | 2405.12971 | null |
2024-05-21 | AMFD: Distillation via Adaptive Multimodal Fusion for Multispectral Pedestrian Detection | Zizhao Chen et.al. | 2405.12944 | link |
2024-05-21 | Predicting the Influence of Adverse Weather on Pedestrian Detection with Automotive Radar and Lidar Sensors | Daniel Weihmayr et.al. | 2405.12736 | null |
2024-05-21 | Spotting AI's Touch: Identifying LLM-Paraphrased Spans in Text | Yafu Li et.al. | 2405.12689 | null |
2024-05-21 | Automating Attendance Management in Human Resources: A Design Science Approach Using Computer Vision and Facial Recognition | Bao-Thien Nguyen-Tat et.al. | 2405.12633 | null |
2024-05-21 | FFAM: Feature Factorization Activation Map for Explanation of 3D Detectors | Shuai Liu et.al. | 2405.12601 | link |
2024-05-21 | Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering | Hiba Maryam et.al. | 2405.12533 | null |
2024-05-21 | Active Object Detection with Knowledge Aggregation and Distillation from Large Models | Dejie Yang et.al. | 2405.12509 | null |
2024-05-21 | Mutual Information Analysis in Multimodal Learning Systems | Hadi Hadizadeh et.al. | 2405.12456 | null |
2024-05-20 | Multi-View Attentive Contextualization for Multi-View 3D Object Detection | Xianpeng Liu et.al. | 2405.12200 | null |
2024-05-20 | Bangladeshi Native Vehicle Detection in Wild | Bipin Saha et.al. | 2405.12150 | link |
2024-05-20 | Salience-guided Ground Factor for Robust Localization of Delivery Robots in Complex Urban Environments | Jooyong Park et.al. | 2405.11855 | null |
2024-05-20 | DATR: Unsupervised Domain Adaptive Detection Transformer with Dataset-Level Adaptation and Prototypical Alignment | Jianhong Han et.al. | 2405.11765 | link |
2024-05-20 | Versatile Teacher: A Class-aware Teacher-student Framework for Cross-domain Adaptation | Runou Yang et.al. | 2405.11754 | link |
2024-05-19 | FADet: A Multi-sensor 3D Object Detection Network based on Local Featured Attention | Ziang Guo et.al. | 2405.11682 | link |
2024-05-19 | SLAB: Efficient Transformers with Simplified Linear Attention and Progressive Re-parameterized Batch Normalization | Jialong Guo et.al. | 2405.11582 | link |
2024-05-19 | The First Swahili Language Scene Text Detection and Recognition Dataset | Fadila Wendigoundi Douamba et.al. | 2405.11437 | link |
2024-05-18 | InfRS: Incremental Few-Shot Object Detection in Remote Sensing Images | Wuzhou Li et.al. | 2405.11293 | null |
2024-05-18 | Visible and Clear: Finding Tiny Objects in Difference Map | Bing Cao et.al. | 2405.11276 | null |
2024-05-17 | A Versatile Framework for Analyzing Galaxy Image Data by Implanting Human-in-the-loop on a Large Vision Model | Mingxiang Fu et.al. | 2405.10890 | null |
2024-05-17 | DeepPavlov at SemEval-2024 Task 8: Leveraging Transfer Learning for Detecting Boundaries of Machine-Generated Texts | Anastasia Voznyuk et.al. | 2405.10629 | link |
2024-05-17 | DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object Detection | Zhe Huang et.al. | 2405.10577 | null |
2024-05-16 | Drone-type-Set: Drone types detection benchmark for drone detection and tracking | Kholoud AlDosari et.al. | 2405.10398 | null |
2024-05-16 | Grounded 3D-LLM with Referent Tokens | Yilun Chen et.al. | 2405.10370 | null |
2024-05-16 | Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection | Tianhe Ren et.al. | 2405.10300 | link |
2024-05-16 | Towards Task-Compatible Compressible Representations | Anderson de Andrade et.al. | 2405.10244 | link |
2024-05-16 | SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network | Zhaoxu Li et.al. | 2405.10148 | null |
2024-05-16 | SHiNe: Semantic Hierarchy Nexus for Open-vocabulary Object Detection | Mingxuan Liu et.al. | 2405.10053 | null |
2024-05-16 | FPDIoU Loss: A Loss Function for Efficient Bounding Box Regression of Rotated Object Detection | Siliang Ma et.al. | 2405.09942 | null |
2024-05-16 | Infrared Adversarial Car Stickers | Xiaopei Zhu et.al. | 2405.09924 | null |
2024-05-16 | PillarNeXt: Improving the 3D detector by introducing Voxel2Pillar feature encoding and extracting multi-scale features | Xusheng Li et.al. | 2405.09828 | null |
2024-05-16 | Size-invariance Matters: Rethinking Metrics and Losses for Imbalanced Multi-object Salient Object Detection | Feiran Li et.al. | 2405.09782 | link |
2024-05-15 | Synth-to-Real Unsupervised Domain Adaptation for Instance Segmentation | Guo Yachan et.al. | 2405.09682 | null |
2024-05-15 | Dynamic Loss Decay based Robust Oriented Object Detection on Remote Sensing Images with Noisy Labels | Guozhang Liu et.al. | 2405.09024 | null |
2024-05-14 | CLIP with Quality Captions: A Strong Pretraining for Vision Tasks | Pavan Kumar Anasosalu Vasu et.al. | 2405.08911 | null |
2024-05-14 | Open-Vocabulary Object Detection via Neighboring Region Attention Alignment | Sunyuan Qiang et.al. | 2405.08593 | null |
2024-05-14 | Semantic Contextualization of Face Forgery: A New Definition, Dataset, and Detection Method | Mian Zou et.al. | 2405.08487 | null |
2024-05-14 | RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images | Zong-Wei Hong et.al. | 2405.08483 | link |
2024-05-14 | Multimodal Collaboration Networks for Geospatial Vehicle Detection in Dense, Occluded, and Large-Scale Events | Xin Wu et.al. | 2405.08251 | link |
2024-05-13 | RAID: A Shared Benchmark for Robust Evaluation of Machine-Generated Text Detectors | Liam Dugan et.al. | 2405.07940 | null |
2024-05-13 | oTTC: Object Time-to-Contact for Motion Estimation in Autonomous Driving | Abdul Hannan Khan et.al. | 2405.07698 | null |
2024-05-13 | MonoMAE: Enhancing Monocular 3D Detection through Depth-Aware Masked Autoencoders | Xueying Jiang et.al. | 2405.07696 | null |
2024-05-13 | Quality-aware Selective Fusion Network for V-D-T Salient Object Detection | Liuxin Bao et.al. | 2405.07655 | link |
2024-05-13 | Fast Training Data Acquisition for Object Detection and Segmentation using Black Screen Luminance Keying | Thomas Pöllabauer et.al. | 2405.07653 | null |
2024-05-13 | Integrity Monitoring of 3D Object Detection in Automated Driving Systems using Raw Activation Patterns and Spatial Filtering | Hakan Yekta Yatbaz et.al. | 2405.07600 | null |
2024-05-13 | Environmental Matching Attack Against Unmanned Aerial Vehicles Object Detection | Dehong Kong et.al. | 2405.07595 | null |
2024-05-13 | Text Grouping Adapter: Adapting Pre-trained Text Detector for Layout Analysis | Tianci Bi et.al. | 2405.07481 | null |
2024-05-13 | Enhancing 3D Object Detection by Using Neural Network with Self-adaptive Thresholding | Houze Liu et.al. | 2405.07479 | null |
2024-05-12 | MAML MOT: Multiple Object Tracking based on Meta-Learning | Jiayi Chen et.al. | 2405.07272 | null |
2024-05-10 | How to Augment for Atmospheric Turbulence Effects on Thermal Adapted Object Detection Models? | Engin Uzun et.al. | 2405.06383 | null |
2024-05-10 | Precise Apple Detection and Localization in Orchards using YOLOv5 for Robotic Harvesting Systems | Jiang Ziyue et.al. | 2405.06260 | null |
2024-05-09 | CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks | Nick et.al. | 2405.05755 | null |
2024-05-09 | Depth Awakens: A Depth-perceptual Attention Fusion Network for RGB-D Camouflaged Object Detection | Xinran Liua et.al. | 2405.05614 | null |
2024-05-09 | The object detection model uses combined extraction with KNN and RF classification | Florentina Tatrin Kurniati et.al. | 2405.05551 | null |
2024-05-08 | Reviewing Intelligent Cinematography: AI research for camera-based video production | Adrian Azzarelli et.al. | 2405.05039 | null |
2024-05-07 | A Novel Wide-Area Multiobject Detection System with High-Probability Region Searching | Xianlei Long et.al. | 2405.04589 | null |
2024-05-07 | DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving | Chen Min et.al. | 2405.04390 | null |
2024-05-07 | A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields | Raiyan Rahman et.al. | 2405.04305 | null |
2024-05-07 | ViewFormer: Exploring Spatiotemporal Modeling for Multi-View 3D Occupancy Perception via View-Guided Transformers | Jinke Li et.al. | 2405.04299 | null |
2024-05-07 | Who Wrote This? The Key to Zero-Shot LLM-Generated Text Detection Is GECScore | Junchao Wu et.al. | 2405.04286 | null |
2024-05-07 | Deep Event-based Object Detection in Autonomous Driving: A Survey | Bingquan Zhou et.al. | 2405.03995 | null |
2024-05-06 | BadFusion: 2D-Oriented Backdoor Attacks against 3D Object Detection | Saket S. Chaturvedi et.al. | 2405.03884 | null |
2024-05-06 | RepVGG-GELAN: Enhanced GELAN with VGG-STYLE ConvNets for Brain Tumour Detection | Thennarasi Balakrishnan et.al. | 2405.03541 | link |
2024-05-06 | Low-light Object Detection | Pengpeng Li et.al. | 2405.03519 | null |
2024-05-06 | Salient Object Detection From Arbitrary Modalities | Nianchang Huang et.al. | 2405.03352 | null |
2024-05-06 | Modality Prompts for Arbitrary Modality Salient Object Detection | Nianchang Huang et.al. | 2405.03351 | null |
2024-05-06 | Vietnamese AI Generated Text Detection | Quang-Dan Tran et.al. | 2405.03206 | null |
2024-05-06 | PTQ4SAM: Post-Training Quantization for Segment Anything | Chengtao Lv et.al. | 2405.03144 | link |
2024-05-05 | Performance Evaluation of Real-Time Object Detection for Electric Scooters | Dong Chen et.al. | 2405.03039 | link |
2024-05-05 | SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection | Kassaw Abraham Mulat et.al. | 2405.02906 | null |
2024-05-07 | Adaptive Guidance Learning for Camouflaged Object Detection | Zhennan Chen et.al. | 2405.02824 | null |
2024-05-05 | PVTransformer: Point-to-Voxel Transformer for Scalable 3D Object Detection | Zhaoqi Leng et.al. | 2405.02811 | null |
2024-05-02 | Segmentation-Free Outcome Prediction in Head and Neck Cancer: Deep Learning-based Feature Extraction from Multi-Angle Maximum Intensity Projections (MA-MIPs) of PET Images | Amirhosein Toosi et.al. | 2405.01756 | null |
2024-05-02 | PointCompress3D -- A Point Cloud Compression Framework for Roadside LiDARs in Intelligent Transportation Systems | Walter Zimmer et.al. | 2405.01750 | null |
2024-05-02 | Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey | Guoping Xu et.al. | 2405.01725 | link |
2024-05-02 | SOAR: Advancements in Small Body Object Detection for Aerial Imagery Using State Space Models and Programmable Gradients | Tushar Verma et.al. | 2405.01699 | null |
2024-05-02 | Imagine the Unseen: Occluded Pedestrian Detection via Adversarial Feature Completion | Shanshan Zhang et.al. | 2405.01311 | null |
2024-05-02 | Overcoming LLM Challenges using RAG-Driven Precision in Coffee Leaf Disease Remediation | Dr. Selva Kumar S et.al. | 2405.01310 | null |
2024-05-02 | Towards Consistent Object Detection via LiDAR-Camera Synergy | Kai Luo et.al. | 2405.01258 | link |
2024-05-02 | Federated Learning with Heterogeneous Data Handling for Robust Vehicular Object Detection | Ahmad Khalil et.al. | 2405.01108 | null |
2024-05-01 | Grains of Saliency: Optimizing Saliency-based Training of Biometric Attack Detection Models | Colton R. Crum et.al. | 2405.00650 | null |
2024-05-01 | Object detection under the linear subspace model with application to cryo-EM images | Amitay Eldar et.al. | 2405.00364 | null |
2024-04-30 | Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | Yunhao Ge et.al. | 2404.19752 | null |
2024-04-30 | Quantifying Nematodes through Images: Datasets, Models, and Baselines of Deep Learning | Zhipeng Yuan et.al. | 2404.19748 | null |
2024-04-30 | Masked Multi-Query Slot Attention for Unsupervised Object Discovery | Rishav Pramanik et.al. | 2404.19654 | link |
2024-04-30 | Physical Backdoor: Towards Temperature-based Backdoor Attacks in the Physical World | Wen Yin et.al. | 2404.19417 | null |
2024-04-30 | UniFS: Universal Few-shot Instance Perception with Point Representations | Sheng Jin et.al. | 2404.19401 | null |
2024-04-30 | Pseudo Label Refinery for Unsupervised Domain Adaptation on Cross-dataset 3D Object Detection | Zhanwei Zhang et.al. | 2404.19384 | null |
2024-04-30 | Robust Pedestrian Detection via Constructing Versatile Pedestrian Knowledge Bank | Sungjune Park et.al. | 2404.19299 | null |
2024-04-29 | MiPa: Mixed Patch Infrared-Visible Modality Agnostic Object Detection | Heitor R. Medeiros et.al. | 2404.18849 | null |
2024-04-29 | Leveraging PointNet and PointNet++ for Lyft Point Cloud Classification Challenge | Rajat K. Doshi et.al. | 2404.18665 | null |
2024-04-29 | CoSense3D: an Agent-based Efficient Learning Framework for Collective Perception | Yunshuang Yuan et.al. | 2404.18617 | null |
2024-04-29 | Assessing Quality Metrics for Neural Reality Gap Input Mitigation in Autonomous Driving Testing | Stefano Carlo Lambertenghi et.al. | 2404.18577 | null |
2024-04-29 | Efficient Meta-Learning Enabled Lightweight Multiscale Few-Shot Object Detection in Remote Sensing Images | Wenbin Guan et.al. | 2404.18426 | null |
2024-04-29 | Multi-modal Perception Dataset of In-water Objects for Autonomous Surface Vehicles | Mingi Jeong et.al. | 2404.18411 | null |
2024-04-28 | FAD-SAR: A Novel Fishing Activity Detection System via Synthetic Aperture Radar Images Based on Deep Learning Method | Yanbing Bai et.al. | 2404.18245 | null |
2024-04-28 | RadSimReal: Bridging the Gap Between Synthetic and Real Data in Radar Object Detection With Simulation | Oded Bialer et.al. | 2404.18150 | null |
2024-04-27 | Reliable Student: Addressing Noise in Semi-Supervised 3D Object Detection | Farzad Nozarian et.al. | 2404.17910 | link |
2024-04-27 | A Hybrid Approach for Document Layout Analysis in Document images | Tahira Shehzadi et.al. | 2404.17888 | null |
2024-04-26 | Inhomogeneous illuminated image enhancement under extremely low visibility condition | Libang Chen et.al. | 2404.17503 | null |
2024-04-26 | Cost-Sensitive Uncertainty-Based Failure Recognition for Object Detection | Moussa Kassem Sbeyti et.al. | 2404.17427 | null |
2024-04-26 | Enhancing mmWave Radar Point Cloud via Visual-inertial Supervision | Cong Fan et.al. | 2404.17229 | null |
2024-04-26 | MorphText: Deep Morphology Regularized Arbitrary-shape Scene Text Detection | Chengpei Xu et.al. | 2404.17151 | null |
2024-04-25 | Generating Minimalist Adversarial Perturbations to Test Object-Detection Models: An Adaptive Multi-Metric Evolutionary Search Approach | Cristopher McIntyre-Garcia et.al. | 2404.17020 | link |
2024-04-25 | Constellation Dataset: Benchmarking High-Altitude Object Detection for an Urban Intersection | Mehmet Kerem Turkcan et.al. | 2404.16944 | link |
2024-04-25 | Self-Balanced R-CNN for Instance Segmentation | Leonardo Rossi et.al. | 2404.16633 | link |
2024-04-25 | Cross-Domain Spatial Matching for Camera and Radar Sensor Data Fusion in Autonomous Vehicle Perception System | Daniel Dworak et.al. | 2404.16548 | null |
2024-04-25 | Commonsense Prototype for Outdoor Unsupervised 3D Object Detection | Hai Wu et.al. | 2404.16493 | link |
2024-04-25 | IMWA: Iterative Model Weight Averaging Benefits Class-Imbalanced Learning Tasks | Zitong Huang et.al. | 2404.16331 | null |
2024-04-25 | CFMW: Cross-modality Fusion Mamba for Multispectral Object Detection under Adverse Weather Conditions | Haoyuan Li et.al. | 2404.16302 | link |
2024-04-24 | AutoGluon-Multimodal (AutoMM): Supercharging Multimodal AutoML with Foundation Models | Zhiqiang Tang et.al. | 2404.16233 | null |
2024-04-24 | Observational parameters of Blue Large-Amplitude Pulsators | P. Pietrukowicz et.al. | 2404.16089 | null |
2024-04-24 | A Survey on Visual Mamba | Hanwei Zhang et.al. | 2404.15956 | null |
2024-04-24 | Steal Now and Attack Later: Evaluating Robustness of Object Detection against Black-box Adversarial Attacks | Erh-Chung Chen et.al. | 2404.15881 | null |
2024-04-24 | Revisiting Out-of-Distribution Detection in LiDAR-based 3D Object Detection | Michael Kösel et.al. | 2404.15879 | link |
2024-04-23 | CFPFormer: Feature-pyramid like Transformer Decoder for Segmentation and Detection | Hongyi Cai et.al. | 2404.15451 | null |
2024-04-23 | ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning | Weifeng Chen et.al. | 2404.15449 | null |
2024-04-23 | Source-free Domain Adaptation for Video Object Detection Under Adverse Image Conditions | Xingguang Zhang et.al. | 2404.15252 | null |
2024-04-23 | Efficient Transformer Encoders for Mask2Former-style models | Manyi Yao et.al. | 2404.15244 | null |
2024-04-23 | Gallbladder Cancer Detection in Ultrasound Images based on YOLO and Faster R-CNN | Sara Dadjouy et.al. | 2404.15129 | null |
2024-04-23 | External Prompt Features Enhanced Parameter-efficient Fine-tuning for Salient Object Detection | Wen Liang et.al. | 2404.15008 | null |
2024-04-23 | ContextualFusion: Context-Based Multi-Sensor Fusion for 3D Object Detection in Adverse Operating Conditions | Shounak Sural et.al. | 2404.14780 | null |
2024-04-23 | Unified Unsupervised Salient Object Detection via Knowledge Transfer | Yao Yuan et.al. | 2404.14759 | link |
2024-04-22 | SemEval-2024 Task 8: Multidomain, Multimodel and Multilingual Machine-Generated Text Detection | Yuxia Wang et.al. | 2404.14183 | null |
2024-04-22 | Text in the Dark: Extremely Low-Light Text Image Enhancement | Che-Tsung Lin et.al. | 2404.14135 | null |
2024-04-22 | CKD: Contrastive Knowledge Distillation from A Sample-wise Perspective | Wencheng Zhu et.al. | 2404.14109 | null |
2024-04-22 | Benchmarking Multi-Modal LLMs for Testing Visual Deep Learning Systems Through the Lens of Image Mutation | Liwen Wang et.al. | 2404.13945 | null |
2024-04-22 | NeRF-DetS: Enhancing Multi-View 3D Object Detection with Sampling-adaptive Network of Continuous NeRF-based Representation | Chi Huang et.al. | 2404.13921 | null |
2024-04-22 | TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos | Atom Scott et.al. | 2404.13868 | null |
2024-04-22 | Toward Robust LiDAR based 3D Object Detection via Density-Aware Adaptive Thresholding | Eunho Lee et.al. | 2404.13852 | null |
2024-04-21 | A Nasal Cytology Dataset for Object Detection and Deep Learning | Mauro Camporeale et.al. | 2404.13745 | null |
2024-04-23 | Clio: Real-time Task-Driven Open-Set 3D Scene Graphs | Dominic Maggio et.al. | 2404.13696 | null |
2024-04-20 | FisheyeDetNet: Object Detection on Fisheye Surround View Camera Systems for Automated Driving | Ganesh Sistu et.al. | 2404.13443 | null |
2024-04-19 | A comparison between single-stage and two-stage 3D tracking algorithms for greenhouse robotics | David Rapado-Rincon et.al. | 2404.12963 | null |
2024-04-19 | Language-Driven Active Learning for Diverse Open-Set 3D Object Detection | Ross Greer et.al. | 2404.12856 | null |
2024-04-19 | ECOR: Explainable CLIP for Object Recognition | Ali Rasekh et.al. | 2404.12839 | null |
2024-04-19 | A Point-Based Approach to Efficient LiDAR Multi-Task Perception | Christopher Lang et.al. | 2404.12798 | null |
2024-04-19 | ELEV-VISION-SAM: Integrated Vision Language and Foundation Model for Automated Estimation of Building Lowest Floor Elevation | Yu-Hsuan Ho et.al. | 2404.12606 | null |
2024-04-18 | The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models | Cheng Shi et.al. | 2404.11957 | link |
2024-04-18 | Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition | Xunsong Li et.al. | 2404.11903 | null |
2024-04-17 | TempBEV: Improving Learned BEV Encoders with Combined Image and BEV Space Temporal Aggregation | Thomas Monninger et.al. | 2404.11803 | null |
2024-04-17 | Multimodal 3D Object Detection on Unseen Domains | Deepti Hegde et.al. | 2404.11764 | null |
2024-04-17 | Equivariant Spatio-Temporal Self-Supervision for LiDAR Object Detection | Deepti Hegde et.al. | 2404.11737 | null |
2024-04-17 | Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems | Luca Bompani et.al. | 2404.11488 | link |
2024-04-17 | EcoMLS: A Self-Adaptation Approach for Architecting Green ML-Enabled Systems | Meghana Tedla et.al. | 2404.11411 | null |
2024-04-17 | Detector Collapse: Backdooring Object Detection to Catastrophic Overload or Blindness | Hangtao Zhang et.al. | 2404.11357 | null |
2024-04-17 | Simple In-place Data Augmentation for Surveillance Object Detection | Munkh-Erdene Otgonbold et.al. | 2404.11226 | null |
2024-04-17 | Feature Corrective Transfer Learning: End-to-End Solutions to Object Detection in Non-Ideal Visual Conditions | Chuheng Wei et.al. | 2404.11214 | null |
2024-04-17 | GhostNetV3: Exploring the Training Strategies for Compact Models | Zhenhua Liu et.al. | 2404.11202 | null |
2024-04-17 | How to deal with glare for improved perception of Autonomous Vehicles | Muhammad Z. Alam et.al. | 2404.10992 | null |
2024-04-17 | Leveraging 3D LiDAR Sensors to Enable Enhanced Urban Safety and Public Health: Pedestrian Monitoring and Abnormal Activity Detection | Nawfal Guefrachi et.al. | 2404.10978 | null |
2024-04-16 | OSR-ViT: A Simple and Modular Framework for Open-Set Object Detection and Discovery | Matthew Inkawhich et.al. | 2404.10865 | null |
2024-04-16 | Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark | Jiangning Zhang et.al. | 2404.10760 | null |
2024-04-16 | Watch Your Step: Optimal Retrieval for Continual Learning at Scale | Truman Hickok et.al. | 2404.10758 | null |
2024-04-16 | Efficient optimal dispersed Haar-like filters for face detection | Zeinab Sedaghatjoo et.al. | 2404.10476 | null |
2024-04-16 | Camera clustering for scalable stream-based active distillation | Dani Manjah et.al. | 2404.10411 | null |
2024-04-15 | Low-Light Image Enhancement Framework for Improved Object Detection in Fisheye Lens Datasets | Dai Quoc Tran et.al. | 2404.10078 | link |
2024-04-15 | Explainable Light-Weight Deep Learning Pipeline for Improved Drought Stres | Aswini Kumar Patra et.al. | 2404.10073 | null |
2024-04-15 | VFMM3D: Releasing the Potential of Image by Vision Foundation Model for Monocular 3D Object Detection | Bonan Ding et.al. | 2404.09431 | null |
2024-04-14 | TEXT2TASTE: A Versatile Egocentric Vision System for Intelligent Reading Assistance Using Large Language Model | Wiktor Mucha et.al. | 2404.09254 | null |
2024-04-14 | DetCLIPv3: Towards Versatile Generative Open-vocabulary Object Detection | Lewei Yao et.al. | 2404.09216 | null |
2024-04-14 | Coreset Selection for Object Detection | Hojun Lee et.al. | 2404.09161 | null |
2024-04-14 | Fusion-Mamba for Cross-modality Object Detection | Wenhao Dong et.al. | 2404.09146 | null |
2024-04-13 | The Snake's Beating Heart? A Millisecond Pulsar Binary in the Galactic Center Radio Filament G359.1 |
Marcus E. Lower et.al. | 2404.09098 | null |
2024-04-13 | BG-YOLO: A Bidirectional-Guided Method for Underwater Object Detection | Jian Zhang et.al. | 2404.08979 | null |
2024-04-13 | Shifting Spotlight for Co-supervision: A Simple yet Efficient Single-branch Network to See Through Camouflage | Yang Hu et.al. | 2404.08936 | null |
2024-04-12 | Training-free Boost for Open-Vocabulary Object Detection with Confidence Aggregation | Yanhao Zheng et.al. | 2404.08603 | link |
2024-04-12 | FashionFail: Addressing Failure Cases in Fashion Object Detection and Segmentation | Riza Velioglu et.al. | 2404.08582 | null |
2024-04-12 | Analyzing Decades-Long Environmental Changes in Namibia Using Archival Aerial Photography and Deep Learning | Girmaw Abebe Tadesse et.al. | 2404.08544 | null |
2024-04-12 | MambaDFuse: A Mamba-based Dual-phase Model for Multi-modality Image Fusion | Zhe Li et.al. | 2404.08406 | null |
2024-04-12 | Overcoming Scene Context Constraints for Object Detection in wild using Defilters | Vamshi Krishna Kancharla et.al. | 2404.08293 | null |
2024-04-11 | ConsistencyDet: Robust Object Detector with Denoising Paradigm of Consistency Model | Lifan Jiang et.al. | 2404.07773 | null |
2024-04-11 | Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification | Ricardo Pereira et.al. | 2404.07739 | null |
2024-04-11 | Run-time Monitoring of 3D Object Detection in Automated Driving Systems Using Early Layer Neural Activation Patterns | Hakan Yekta Yatbaz et.al. | 2404.07685 | null |
2024-04-11 | Finding Dino: A plug-and-play framework for unsupervised detection of out-of-distribution objects using prototypes | Poulami Sinhamahapatra et.al. | 2404.07664 | null |
2024-04-11 | Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method | Tashmoy Ghosh et.al. | 2404.07649 | null |
2024-04-11 | GLID: Pre-training a Generalist Encoder-Decoder Vision Model | Jihao Liu et.al. | 2404.07603 | null |
2024-04-11 | SFSORT: Scene Features-based Simple Online Real-Time Tracker | M. M. Morsali et.al. | 2404.07553 | link |
2024-04-11 | The Sydney Radio Star Catalogue: properties of radio stars at megahertz to gigahertz frequencies | Laura N. Driessen et.al. | 2404.07418 | null |
2024-04-11 | Simplifying Two-Stage Detectors for On-Device Inference in Remote Sensing | Jaemin Kang et.al. | 2404.07405 | null |
2024-04-11 | A fine-tuning workflow for automatic first-break picking with deep learning | Amir Mardan et.al. | 2404.07400 | link |
2024-04-10 | Identification of Fine-grained Systematic Errors via Controlled Scene Generation | Valentyn Boreiko et.al. | 2404.07045 | null |
2024-04-10 | Accurate Tennis Court Line Detection on Amateur Recorded Matches | Sameer Agrawal et.al. | 2404.06977 | null |
2024-04-10 | SARA: Smart AI Reading Assistant for Reading Comprehension | Enkeleda Thaqi et.al. | 2404.06906 | null |
2024-04-10 | Sparse Points to Dense Clouds: Enhancing 3D Detection with Limited LiDAR Data | Aakash Kumar et.al. | 2404.06715 | null |
2024-04-10 | Scaling Multi-Camera 3D Object Detection through Weak-to-Strong Eliciting | Hao Lu et.al. | 2404.06700 | link |
2024-04-09 | Learning Embeddings with Centroid Triplet Loss for Object Identification in Robotic Grasping | Anas Gouda et.al. | 2404.06277 | null |
2024-04-09 | Label-Efficient 3D Object Detection For Road-Side Units | Minh-Quan Dao et.al. | 2404.06256 | null |
2024-04-09 | Automatic Defect Detection in Sewer Network Using Deep Learning Based Object Detector | Bach Ha et.al. | 2404.06219 | null |
2024-04-09 | YOLC: You Only Look Clusters for Tiny Object Detection in Aerial Images | Chenguang Liu et.al. | 2404.06180 | null |
2024-04-09 | Enhanced Radar Perception via Multi-Task Learning: Towards Refined Data for Sensor Fusion Applications | Huawei Sun et.al. | 2404.06165 | null |
2024-04-09 | Improving Facial Landmark Detection Accuracy and Efficiency with Knowledge Distillation | Zong-Wei Hong et.al. | 2404.06029 | null |
2024-04-08 | Retrieval-Augmented Open-Vocabulary Object Detection | Jooyeon Kim et.al. | 2404.05687 | link |
2024-04-08 | 3D-COCO: extension of MS-COCO dataset for image detection and 3D reconstruction modules | Maxence Bideaux et.al. | 2404.05641 | null |
2024-04-08 | PetKaz at SemEval-2024 Task 8: Can Linguistics Capture the Specifics of LLM-generated Text? | Kseniia Petukhova et.al. | 2404.05483 | null |
2024-04-08 | Detecting Every Object from Events | Haitian Zhang et.al. | 2404.05285 | link |
2024-04-08 | MOSE: Boosting Vision-based Roadside 3D Object Detection with Scene Cues | Xiahan Chen et.al. | 2404.05280 | null |
2024-04-08 | Rendering-Enhanced Automatic Image-to-Point Cloud Registration for Roadside Scenes | Yu Sheng et.al. | 2404.05164 | null |
2024-04-08 | Better Monocular 3D Detectors with LiDAR from the Past | Yurong You et.al. | 2404.05139 | link |
2024-04-07 | AirShot: Efficient Few-Shot Detection for Autonomous Exploration | Zihan Wang et.al. | 2404.05069 | link |
2024-04-07 | PlateSegFL: A Privacy-Preserving License Plate Detection Using Federated Segmentation Learning | Md. Shahriar Rahman Anuvab et.al. | 2404.05049 | null |
2024-04-07 | PathFinder: Attention-Driven Dynamic Non-Line-of-Sight Tracking with a Mobile Robot | Shenbagaraj Kannapiran et.al. | 2404.05024 | null |
2024-04-05 | SCAResNet: A ResNet Variant Optimized for Tiny Object Detection in Transmission and Distribution Towers | Weile Li et.al. | 2404.04179 | link |
2024-04-05 | Designing Robots to Help Women | Martin Cooney et.al. | 2404.04123 | null |
2024-04-04 | Is CLIP the main roadblock for fine-grained open-world perception? | Lorenzo Bianchi et.al. | 2404.03539 | link |
2024-04-04 | DQ-DETR: DETR with Dynamic Query for Tiny Object Detection | Yi-Xin Huang et.al. | 2404.03507 | null |
2024-04-05 | A Methodology to Study the Impact of Spiking Neural Network Parameters considering Event-Based Automotive Data | Iqra Bano et.al. | 2404.03493 | null |
2024-04-04 | MonoCD: Monocular 3D Object Detection with Complementary Depths | Longfei Yan et.al. | 2404.03181 | link |
2024-04-03 | DPFT: Dual Perspective Fusion Transformer for Camera-Radar-based Object Detection | Felix Fent et.al. | 2404.03015 | null |
2024-04-03 | ALOHa: A New Measure for Hallucination in Captioning Models | Suzanne Petryk et.al. | 2404.02904 | null |
2024-04-03 | FlightScope: A Deep Comprehensive Assessment of Aircraft Detection Algorithms in Satellite Imagery | Safouane El Ghazouali et.al. | 2404.02877 | link |
2024-04-03 | HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras | Zhongyu Xia et.al. | 2404.02517 | link |
2024-04-04 | TE-TAD: Towards Full End-to-End Temporal Action Detection via Time-Aligned Coordinate Expression | Ho-Joong Kim et.al. | 2404.02405 | null |
2024-04-04 | EGTR: Extracting Graph from Transformer for Scene Graph Generation | Jinbae Im et.al. | 2404.02072 | link |
2024-04-03 | Cooperative Students: Navigating Unsupervised Domain Adaptation in Nighttime Object Detection | Jicheng Yuan et.al. | 2404.01988 | link |
2024-04-02 | Towards Enhanced Analysis of Lung Cancer Lesions in EBUS-TBNA -- A Semi-Supervised Video Object Detection Method | Jyun-An Lin et.al. | 2404.01929 | null |
2024-04-02 | Humanizing Machine-Generated Content: Evading AI-Text Detection through Adversarial Attack | Ying Zhou et.al. | 2404.01907 | link |
2024-04-02 | Scene Adaptive Sparse Transformer for Event-based Object Detection | Yansong Peng et.al. | 2404.01882 | link |
2024-04-02 | Semi-Supervised Domain Adaptation for Wildfire Detection | JooYoung Jang et.al. | 2404.01842 | null |
2024-04-02 | Sparse Semi-DETR: Sparse Learnable Queries for Semi-Supervised Object Detection | Tahira Shehzadi et.al. | 2404.01819 | null |
2024-04-02 | Analyzing the Single Event Upset Vulnerability of Binarized Neural Networks on SRAM FPGAs | Ioanna Souvatzoglou et.al. | 2404.01757 | null |
2024-04-02 | Disentangled Pre-training for Human-Object Interaction Detection | Zhuolong Li et.al. | 2404.01725 | null |
2024-04-02 | Task Integration Distillation for Object Detectors | Hai Su et.al. | 2404.01699 | null |
2024-03-29 | PLoc: A New Evaluation Criterion Based on Physical Location for Autonomous Driving Datasets | Ruining Yang et.al. | 2403.19893 | null |
2024-03-29 | MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection | Ali Behrouz et.al. | 2403.19888 | null |
2024-03-28 | DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs | Donghyun Kim et.al. | 2403.19588 | link |
2024-03-28 | OV-Uni3DETR: Towards Unified Open-Vocabulary 3D Object Detection via Cycle-Modality Propagation | Zhenyu Wang et.al. | 2403.19580 | null |
2024-03-28 | AIpom at SemEval-2024 Task 8: Detecting AI-produced Outputs in M4 | Alexander Shirnin et.al. | 2403.19354 | null |
2024-03-28 | Sparse Generation: Making Pseudo Labels Sparse for weakly supervision with points | Tian Ma et.al. | 2403.19306 | null |
2024-03-28 | CAT: Exploiting Inter-Class Dynamics for Domain Adaptive Object Detection | Mikhail Kennerley et.al. | 2403.19278 | link |
2024-03-28 | Algorithmic Ways of Seeing: Using Object Detection to Facilitate Art Exploration | Louie Søs Meyer et.al. | 2403.19174 | null |
2024-03-28 | CRKD: Enhanced Camera-Radar Object Detection with Cross-modality Knowledge Distillation | Lingjun Zhao et.al. | 2403.19104 | null |
2024-03-28 | A Real-Time Framework for Domain-Adaptive Underwater Object Detection with Image Enhancement | Junjie Wen et.al. | 2403.19079 | null |
2024-03-27 | Illicit object detection in X-ray images using Vision Transformers | Jorgen Cani et.al. | 2403.19043 | null |
2024-03-27 | Benchmarking Object Detectors with COCO: A New Path Forward | Shweta Singh et.al. | 2403.18819 | link |
2024-03-27 | PhysicsAssistant: An LLM-Powered Interactive Learning Robot for Physics Lab Investigations | Ehsan Latif et.al. | 2403.18721 | null |
2024-03-27 | CosalPure: Learning Concept from Group Images for Robust Co-Saliency Detection | Jiayi Zhu et.al. | 2403.18554 | null |
2024-03-27 | BAM: Box Abstraction Monitors for Real-time OoD Detection in Object Detection | Changshun Wu et.al. | 2403.18373 | null |
2024-03-27 | Ship in Sight: Diffusion Models for Ship-Image Super Resolution | Luigi Sigillo et.al. | 2403.18370 | link |
2024-03-27 | DODA: Diffusion for Object-detection Domain Adaptation in Agriculture | Shuai Xiang et.al. | 2403.18334 | null |
2024-03-27 | Tracking-Assisted Object Detection with Event Cameras | Ting-Kang Yen et.al. | 2403.18330 | null |
2024-03-27 | SGDM: Static-Guided Dynamic Module Make Stronger Visual Models | Wenjie Xing et.al. | 2403.18282 | null |
2024-03-27 | Road Obstacle Detection based on Unknown Objectness Scores | Chihiro Noguchi et.al. | 2403.18207 | null |
2024-03-26 | State of the art applications of deep learning within tracking and detecting marine debris: A survey | Zoe Moorton et.al. | 2403.18067 | null |
2024-03-26 | The Solution for the CVPR 2023 1st foundation model challenge-Track2 | Haonan Xu et.al. | 2403.17702 | null |
2024-03-26 | PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition | Chenhongyi Yang et.al. | 2403.17695 | link |
2024-03-26 | UADA3D: Unsupervised Adversarial Domain Adaptation for 3D Object Detection with Sparse LiDAR and Large Domain Gaps | Maciej K Wozniak et.al. | 2403.17633 | null |
2024-03-26 | SSF3D: Strict Semi-Supervised 3D Object Detection with Switching Filter | Songbur Wong et.al. | 2403.17390 | null |
2024-03-26 | Decoupled Pseudo-labeling for Semi-Supervised Monocular 3D Object Detection | Jiacheng Zhang et.al. | 2403.17387 | null |
2024-03-26 | AIDE: An Automatic Data Engine for Object Detection in Autonomous Driving | Mingfu Liang et.al. | 2403.17373 | null |
2024-03-26 | Staircase Localization for Autonomous Exploration in Urban Environments | Jinrae Kim et.al. | 2403.17330 | null |
2024-03-25 | Co-Occurring of Object Detection and Identification towards unlabeled object discovery | Binay Kumar Singh et.al. | 2403.17223 | null |
2024-03-25 | Optimizing LiDAR Placements for Robust Driving Perception in Adverse Conditions | Ye Li et.al. | 2403.17009 | link |
2024-03-25 | Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance | Jingyuan Zhu et.al. | 2403.16954 | null |
2024-03-25 | TrustAI at SemEval-2024 Task 8: A Comprehensive Analysis of Multi-domain Machine Generated Text Detection Techniques | Ashok Urlana et.al. | 2403.16592 | null |
2024-03-25 | RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection | Zhiwei Lin et.al. | 2403.16440 | link |
2024-03-25 | ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation | Hannah Schieber et.al. | 2403.16400 | null |
2024-03-25 | Impact of Video Compression Artifacts on Fisheye Camera Visual Perception Tasks | Madhumitha Sakthi et.al. | 2403.16338 | null |
2024-03-24 | Cross-domain Multi-modal Few-shot Object Detection via Rich Text | Zeyu Shangguan et.al. | 2403.16188 | null |
2024-03-24 | Semantic Is Enough: Only Semantic Information For NeRF Reconstruction | Ruibo Wang et.al. | 2403.16043 | null |
2024-03-23 | Adversarial Defense Teacher for Cross-Domain Object Detection under Poor Visibility Conditions | Kaiwen Wang et.al. | 2403.15786 | null |
2024-03-23 | EAGLE: A Domain Generalization Framework for AI-generated Text Detection | Amrita Bhattacharjee et.al. | 2403.15690 | null |
2024-03-25 | Point-DETR3D: Leveraging Imagery Data with Spatial Point Prior for Weakly Semi-supervised 3D Object Detection | Hongzhi Gao et.al. | 2403.15317 | null |
2024-03-22 | CR3DT: Camera-RADAR Fusion for 3D Detection and Tracking | Nicolas Baumann et.al. | 2403.15313 | null |
2024-03-22 | IS-Fusion: Instance-Scene Collaborative Fusion for Multimodal 3D Object Detection | Junbo Yin et.al. | 2403.15241 | null |
2024-03-22 | MSCoTDet: Language-driven Multi-modal Fusion for Improved Multispectral Pedestrian Detection | Taeheon Kim et.al. | 2403.15209 | null |
2024-03-22 | SFOD: Spiking Fusion Object Detector | Yimeng Fan et.al. | 2403.15192 | link |
2024-03-22 | CRPlace: Camera-Radar Fusion with BEV Representation for Place Recognition | Shaowei Fu et.al. | 2403.15183 | null |
2024-03-22 | An In-Depth Analysis of Data Reduction Methods for Sustainable Deep Learning | Víctor Toscano-Durán et.al. | 2403.15150 | null |
2024-03-22 | Gradient-based Sampling for Class Imbalanced Semi-supervised Object Detection | Jiaming Li et.al. | 2403.15127 | link |
2024-03-22 | VRSO: Visual-Centric Reconstruction for Static Object Annotation | Chenyao Yu et.al. | 2403.15026 | null |
2024-03-22 | Vehicle Detection Performance in Nordic Region | Hamam Mokayed et.al. | 2403.15017 | null |
2024-03-21 | T-Rex2: Towards Generic Object Detection via Text-Visual Prompt Synergy | Qing Jiang et.al. | 2403.14610 | link |
2024-03-21 | UAV-Assisted Maritime Search and Rescue: A Holistic Approach | Martin Messmer et.al. | 2403.14281 | null |
2024-03-21 | Scene-Graph ViT: End-to-End Open-Vocabulary Visual Relationship Detection | Tim Salzmann et.al. | 2403.14270 | null |
2024-03-21 | 3D Object Detection from Point Cloud via Voting Step Diffusion | Haoran Hou et.al. | 2403.14133 | null |
2024-03-20 | EcoSense: Energy-Efficient Intelligent Sensing for In-Shore Ship Detection through Edge-Cloud Collaboration | Wenjun Huang et.al. | 2403.14027 | null |
2024-03-20 | RAR: Retrieving And Ranking Augmented MLLMs for Visual Recognition | Ziyu Liu et.al. | 2403.13805 | link |
2024-03-20 | Bounding Box Stability against Feature Dropout Reflects Detector Generalization across Environments | Yang Yang et.al. | 2403.13803 | link |
2024-03-20 | Fostc3net:A Lightweight YOLOv5 Based On the Network Structure Optimization | Danqing Ma et.al. | 2403.13703 | null |
2024-03-20 | Find n' Propagate: Open-Vocabulary 3D Object Detection in Urban Environments | Djamahl Etchegaray et.al. | 2403.13556 | null |
2024-03-20 | MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining | Di Wang et.al. | 2403.13430 | link |
2024-03-20 | Few-shot Oriented Object Detection with Memorable Contrastive Learning in Remote Sensing Images | Jiawei Zhou et.al. | 2403.13375 | null |
2024-03-20 | Adaptive Ensembles of Fine-Tuned Transformers for LLM-Generated Text Detection | Zhixin Lai et.al. | 2403.13335 | null |
2024-03-20 | DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception | Yibo Wang et.al. | 2403.13304 | null |
2024-03-20 | Facilitating Pornographic Text Detection for Open-Domain Dialogue Systems via Knowledge Distillation of Large Language Models | Huachuan Qiu et.al. | 2403.13250 | null |
2024-03-19 | SceneScript: Reconstructing Scenes With An Autoregressive Structured Language Model | Armen Avetisyan et.al. | 2403.13064 | null |
2024-03-19 | Wildfire danger prediction optimization with transfer learning | Spiros Maggioros et.al. | 2403.12871 | link |
2024-03-19 | As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks? | Anjun Hu et.al. | 2403.12693 | null |
2024-03-19 | EAS-SNN: End-to-End Adaptive Sampling and Representation for Event-based Detection with Recurrent Spiking Neural Networks | Ziming Wang et.al. | 2403.12574 | null |
2024-03-19 | DetToolChain: A New Prompting Paradigm to Unleash Detection Ability of MLLM | Yixuan Wu et.al. | 2403.12488 | null |
2024-03-19 | TransformMix: Learning Transformation and Mixing Strategies from Data | Tsz-Him Cheung et.al. | 2403.12429 | null |
2024-03-19 | VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation | Hao Wang et.al. | 2403.12415 | null |
2024-03-19 | Entity6K: A Large Open-Domain Evaluation Dataset for Real-World Entity Recognition | Jielin Qiu et.al. | 2403.12339 | null |
2024-03-18 | EffiPerception: an Efficient Framework for Various Perception Tasks | Xinhao Xiang et.al. | 2403.12317 | null |
2024-03-18 | Prototipo de un Contador Bidireccional Automático de Personas basado en sensores de visión 3D | Benjamín Ojeda-Magaña et.al. | 2403.12310 | null |
2024-03-18 | Align and Distill: Unifying and Improving Domain Adaptive Object Detection | Justin Kay et.al. | 2403.12029 | link |
2024-03-18 | TrajectoryNAS: A Neural Architecture Search for Trajectory Prediction | Ali Asghar Sharifi et.al. | 2403.11695 | null |
2024-03-18 | Just Add $100 More: Augmenting NeRF-based Pseudo-LiDAR Point Cloud for Resolving Class-imbalance Problem | Mincheol Chang et.al. | 2403.11573 | null |
2024-03-18 | R2SNet: Scalable Domain Adaptation for Object Detection in Cloud-Based Robots Ecosystems via Proposal Refinement | Michele Antonazzi et.al. | 2403.11567 | null |
2024-03-18 | Continual Forgetting for Pre-trained Vision Models | Hongbo Zhao et.al. | 2403.11530 | link |
2024-03-17 | V2X-DGW: Domain Generalization for Multi-agent Perception under Adverse Weather Conditions | Baolu Li et.al. | 2403.11371 | null |
2024-03-17 | Advanced Knowledge Extraction of Physical Design Drawings, Translation and conversion to CAD formats using Deep Learning | Jesher Joshua M et.al. | 2403.11291 | null |
2024-03-17 | ManipVQA: Injecting Robotic Affordance and Physically Grounded Information into Multi-Modal Large Language Models | Siyuan Huang et.al. | 2403.11289 | null |
2024-03-17 | CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations | Yuwei Zhang et.al. | 2403.11220 | link |
2024-03-17 | GRA: Detecting Oriented Objects through Group-wise Rotating and Attention | Jiangshan Wang et.al. | 2403.11127 | null |
2024-03-17 | Self-supervised co-salient object detection via feature correspondence at multiple scales | Souradeep Chakraborty et.al. | 2403.11107 | link |
2024-03-14 | Open-Vocabulary Object Detection with Meta Prompt Representation and Instance Contrastive Optimization | Zhao Wang et.al. | 2403.09433 | null |
2024-03-14 | D3T: Distinctive Dual-Domain Teacher Zigzagging Across RGB-Thermal Gap for Domain-Adaptive Object Detection | Dinh Phat Do et.al. | 2403.09359 | link |
2024-03-14 | Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring | Yufei Zhan et.al. | 2403.09333 | link |
2024-03-14 | EfficientMFD: Towards More Efficient Multimodal Synchronous Fusion Detection | Jiaqing Zhang et.al. | 2403.09323 | link |
2024-03-14 | Knowledge Distillation in YOLOX-ViT for Side-Scan Sonar Object Detection | Martin Aubard et.al. | 2403.09313 | link |
2024-03-14 | MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion | Arul Selvam Periyasamy et.al. | 2403.09309 | null |
2024-03-14 | CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification | Yiming Ma et.al. | 2403.09281 | null |
2024-03-14 | D-YOLO a robust framework for object detection in adverse weather conditions | Zihan Chu et.al. | 2403.09233 | null |
2024-03-14 | Improving Distant 3D Object Detection Using 2D Box Supervision | Zetong Yang et.al. | 2403.09230 | null |
2024-03-14 | PoIFusion: Multi-Modal 3D Object Detection via Fusion at Points of Interest | Jiajun Deng et.al. | 2403.09212 | null |
2024-03-13 | VLOGGER: Multimodal Diffusion for Embodied Avatar Synthesis | Enric Corona et.al. | 2403.08764 | null |
2024-03-13 | MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning | Jialv Zou et.al. | 2403.08760 | link |
2024-03-13 | Data Augmentation in Human-Centric Vision | Wentao Jiang et.al. | 2403.08650 | null |
2024-03-13 | PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections | Matteo Taiana et.al. | 2403.08586 | null |
2024-03-13 | A Multimodal Fusion Network For Student Emotion Recognition Based on Transformer and Tensor Product | Ao Xiang et.al. | 2403.08511 | null |
2024-03-13 | Improved YOLOv5 Based on Attention Mechanism and FasterNet for Foreign Object Detection on Railway and Airway tracks | Zongqing Qi et.al. | 2403.08499 | null |
2024-03-13 | IAMCV Multi-Scenario Vehicle Interaction Dataset | Novel Certad et.al. | 2403.08455 | null |
2024-03-13 | Advancing Security in AI Systems: A Novel Approach to Detecting Backdoors in Deep Neural Networks | Khondoker Murad Hossain et.al. | 2403.08208 | null |
2024-03-12 | TaskCLIP: Extend Large Vision-Language Model for Task Oriented Object Detection | Hanning Chen et.al. | 2403.08108 | null |
2024-03-12 | Aedes aegypti Egg Counting with Neural Networks for Object Detection | Micheli Nayara de Oliveira Vicente et.al. | 2403.08016 | null |
2024-03-12 | Mondrian: On-Device High-Performance Video Analytics with Compressive Packed Inference | Changmin Jeon et.al. | 2403.07598 | null |
2024-03-12 | PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution | Honghao Chen et.al. | 2403.07589 | null |
2024-03-12 | A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions | Quoc-Vinh Lai-Dang et.al. | 2403.07542 | null |
2024-03-12 | JSTR: Joint Spatio-Temporal Reasoning for Event-based Moving Object Detection | Hanyu Zhou et.al. | 2403.07436 | null |
2024-03-12 | Eliminating Cross-modal Conflicts in BEV Space for LiDAR-Camera 3D Object Detection | Jiahui Fu et.al. | 2403.07372 | null |
2024-03-12 | GPT-generated Text Detection: Benchmark Dataset and Tensor-based Detection Method | Zubair Qazi et.al. | 2403.07321 | link |
2024-03-12 | MENTOR: Multilingual tExt detectioN TOward leaRning by analogy | Hsin-Ju Lin et.al. | 2403.07286 | null |
2024-03-12 | SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection | Hongcheng Zhang et.al. | 2403.07284 | null |
2024-03-12 | Adaptive Bounding Box Uncertainties via Two-Step Conformal Prediction | Alexander Timans et.al. | 2403.07263 | null |
2024-03-11 | Class Imbalance in Object Detection: An Experimental Diagnosis and Study of Mitigation Strategies | Nieves Crasto et.al. | 2403.07113 | link |
2024-03-11 | Real-time Transformer-based Open-Vocabulary Detection with Efficient Fusion Head | Tiancheng Zhao et.al. | 2403.06892 | null |
2024-03-11 | LeOCLR: Leveraging Original Images for Contrastive Learning of Visual Representations | Mohammad Alkhalefi et.al. | 2403.06813 | null |
2024-03-11 | Genetic Learning for Designing Sim-to-Real Data Augmentations | Bram Vanherle et.al. | 2403.06786 | null |
2024-03-11 | Evaluating the Energy Efficiency of Few-Shot Learning for Object Detection in Industrial Settings | Georgios Tsoumplekas et.al. | 2403.06631 | null |
2024-03-11 | Cross-domain and Cross-dimension Learning for Image-to-Graph Transformers | Alexander H. Berger et.al. | 2403.06601 | null |
2024-03-11 | SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection | Yuxuan Li et.al. | 2403.06534 | link |
2024-03-11 | 3D Semantic Segmentation-Driven Representations for 3D Object Detection | Hayeon O et.al. | 2403.06501 | null |
2024-03-11 | Fine-Grained Pillar Feature Encoding Via Spatio-Temporal Virtual Grid for 3D Object Detection | Konyul Park et.al. | 2403.06433 | null |
2024-03-10 | Transformer based Multitask Learning for Image Captioning and Object Detection | Debolena Basak et.al. | 2403.06292 | null |
2024-03-10 | Poly Kernel Inception Network for Remote Sensing Detection | Xinhao Cai et.al. | 2403.06258 | link |
2024-03-08 | EVD4UAV: An Altitude-Sensitive Benchmark to Evade Vehicle Detection in UAV | Huiming Sun et.al. | 2403.05422 | null |
2024-03-08 | SIRST-5K: Exploring Massive Negatives Synthesis with Self-supervised Learning for Robust Infrared Small Target Detection | Yahao Lu et.al. | 2403.05416 | link |
2024-03-08 | Exploring Robust Features for Few-Shot Object Detection in Satellite Imagery | Xavier Bou et.al. | 2403.05381 | null |
2024-03-08 | Frequency-Adaptive Dilated Convolution for Semantic Segmentation | Linwei Chen et.al. | 2403.05369 | link |
2024-03-08 | VLM-PL: Advanced Pseudo Labeling approach Class Incremental Object Detection with Vision-Language Model | Junsu Kim et.al. | 2403.05346 | null |
2024-03-08 | Improving the Successful Robotic Grasp Detection Using Convolutional Neural Networks | Hamed Hosseini et.al. | 2403.05211 | null |
2024-03-08 | LanePtrNet: Revisiting Lane Detection as Point Voting and Grouping on Curves | Jiayan Cao et.al. | 2403.05155 | null |
2024-03-08 | RadarDistill: Boosting Radar-based Object Detection Performance via Knowledge Distillation from LiDAR Features | Geonho Bang et.al. | 2403.05061 | null |
2024-03-08 | ActFormer: Scalable Collaborative Perception via Active Queries | Suozhi Huang et.al. | 2403.04968 | null |
2024-03-07 | FriendNet: Detection-Friendly Dehazing Network | Yihua Fan et.al. | 2403.04443 | null |
2024-03-07 | Effectiveness Assessment of Recent Large Vision-Language Models | Yao Jiang et.al. | 2403.04306 | null |
2024-03-07 | ACC-ViT : Atrous Convolution's Comeback in Vision Transformers | Nabil Ibtehaz et.al. | 2403.04200 | null |
2024-03-07 | CN-RMA: Combined Network with Ray Marching Aggregation for 3D Indoors Object Detection from Multi-view Images | Guanlin Shen et.al. | 2403.04198 | null |
2024-03-07 | Scalable and Robust Transformer Decoders for Interpretable Image Classification with Foundation Models | Evelyn Mannix et.al. | 2403.04125 | null |
2024-03-07 | CMDA: Cross-Modal and Domain Adversarial Adaptation for LiDAR-Based 3D Object Detection | Gyusam Chang et.al. | 2403.03721 | null |
2024-03-06 | Adversarial Infrared Geometry: Using Geometry to Perform Adversarial Attack against Infrared Pedestrian Detectors | Kalibinuer Tiliwalidi et.al. | 2403.03674 | null |
2024-03-06 | Towards Detecting AI-Generated Text within Human-AI Collaborative Hybrid Texts | Zijie Zeng et.al. | 2403.03506 | null |
2024-03-06 | Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator | Wonhyeok Choi et.al. | 2403.03468 | null |
2024-03-06 | FLAME Diffuser: Grounded Wildfire Image Synthesis using Mask Guided Diffusion | Hao Wang et.al. | 2403.03463 | null |
2024-03-06 | Performance Evaluation of Semi-supervised Learning Frameworks for Multi-Class Weed Detection | Jiajia Li et.al. | 2403.03390 | link |
2024-03-05 | Detecting Concrete Visual Tokens for Multimodal Machine Translation | Braeden Bowen et.al. | 2403.03075 | null |
2024-03-05 | Loss Design for Single-carrier Joint Communication and Neural Network-based Sensing | Charlotte Muth et.al. | 2403.02929 | null |
2024-03-05 | Are Dense Labels Always Necessary for 3D Object Detection from Point Cloud? | Chenqiang Gao et.al. | 2403.02818 | null |
2024-03-05 | Bootstrapping Rare Object Detection in High-Resolution Satellite Imagery | Akram Zaytar et.al. | 2403.02736 | null |
2024-03-05 | FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View | Jiawei Hou et.al. | 2403.02710 | null |
2024-03-05 | False Positive Sampling-based Data Augmentation for Enhanced 3D Object Detection Accuracy | Jiyong Oh et.al. | 2403.02639 | null |
2024-03-05 | BSDP: Brain-inspired Streaming Dual-level Perturbations for Online Open World Object Detection | Yu Chen et.al. | 2403.02637 | null |
2024-03-04 | NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function | Abdullah Nazhat Abdullah et.al. | 2403.02411 | link |
2024-03-04 | COMMIT: Certifying Robustness of Multi-Sensor Fusion Systems against Semantic Attacks | Zijian Huang et.al. | 2403.02329 | null |
2024-03-04 | Scalable Vision-Based 3D Object Detection and Monocular Depth Estimation for Autonomous Driving | Yuxuan Liu et.al. | 2403.02037 | link |
2024-03-02 | TUMTraf V2X Cooperative Perception Dataset | Walter Zimmer et.al. | 2403.01316 | null |
2024-03-02 | Causal Mode Multiplexer: A Novel Framework for Unbiased Multispectral Pedestrian Detection | Taeheon Kim et.al. | 2403.01300 | null |
2024-03-02 | Run-time Introspection of 2D Object Detection in Automated Driving Systems Using Learning Representations | Hakan Yekta Yatbaz et.al. | 2403.01172 | null |
2024-03-02 | ELA: Efficient Local Attention for Deep Convolutional Neural Networks | Wei Xu et.al. | 2403.01123 | null |
2024-03-02 | Face Swap via Diffusion Model | Feifei Wang et.al. | 2403.01108 | null |
2024-03-02 | Beyond Night Visibility: Adaptive Multi-Scale Fusion of Infrared and Visible Images | Shufan Pei et.al. | 2403.01083 | null |
2024-03-01 | Learning Causal Features for Incremental Object Detection | Zhenwei He et.al. | 2403.00591 | null |
2024-03-01 | Abductive Ego-View Accident Video Understanding for Safe Driving Perception | Jianwu Fang et.al. | 2403.00436 | null |
2024-03-04 | DAMS-DETR: Dynamic Adaptive Multispectral Detection Transformer with Competitive Query Selection and Adaptive Feature Fusion | Junjie Guo et.al. | 2403.00326 | null |
2024-03-01 | ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting | Chen Duan et.al. | 2403.00303 | null |
2024-02-29 | SeMoLi: What Moves Together Belongs Together | Jenny Seidenschwarz et.al. | 2402.19463 | null |
2024-02-29 | Genie: Smart ROS-based Caching for Connected Autonomous Robots | Zexin Li et.al. | 2402.19410 | null |
2024-02-29 | ProtoP-OD: Explainable Object Detection with Prototypical Parts | Pavlos Rath-Manakidis et.al. | 2402.19142 | null |
2024-02-29 | Theoretically Achieving Continuous Representation of Oriented Bounding Boxes | Zikai Xiao et.al. | 2402.18975 | link |
2024-02-29 | Boosting Semi-Supervised Object Detection in Remote Sensing Images With Active Teaching | Boxuan Zhang et.al. | 2402.18958 | null |
2024-02-29 | Edge Computing Enabled Real-Time Video Analysis via Adaptive Spatial-Temporal Semantic Filtering | Xiang Chen et.al. | 2402.18927 | null |
2024-02-29 | A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection | Chao Hao et.al. | 2402.18922 | null |
2024-02-29 | Privacy-Preserving Autoencoder for Collaborative Object Detection | Bardia Azizian et.al. | 2402.18864 | null |
2024-02-29 | Debiased Novel Category Discovering and Localization | Juexiao Feng et.al. | 2402.18821 | null |
2024-02-28 | Spatial Coherence Loss for Salient and Camouflaged Object Detection and Beyond | Ziyun Yang et.al. | 2402.18698 | null |
2024-02-28 | UniMODE: Unified Monocular 3D Object Detection | Zhuoling Li et.al. | 2402.18573 | null |
2024-02-28 | Detection of Micromobility Vehicles in Urban Traffic Videos | Khalil Sabri et.al. | 2402.18503 | link |
2024-02-28 | Sunshine to Rainstorm: Cross-Weather Knowledge Distillation for Robust 3D Object Detection | Xun Huang et.al. | 2402.18493 | null |
2024-02-28 | Prompt-Driven Dynamic Object-Centric Learning for Single Domain Generalization | Deng Li et.al. | 2402.18447 | null |
2024-02-28 | Unveiling novel insights into Kirchhoff migration for effective object detection using experimental Fresnel dataset | Won-Kwang Park et.al. | 2402.18322 | null |
2024-02-28 | Zero-Shot Aerial Object Detection with Visual Description Regularization | Zhengqing Zang et.al. | 2402.18233 | null |
2024-02-28 | VulMCI : Code Splicing-based Pixel-row Oversampling for More Continuous Vulnerability Image Generation | Tao Peng et.al. | 2402.18189 | null |
2024-02-27 | SDDGR: Stable Diffusion-based Deep Generative Replay for Class Incremental Object Detection | Junsu Kim et.al. | 2402.17323 | null |
2024-02-27 | A Vanilla Multi-Task Framework for Dense Visual Prediction Solution to 1st VCL Challenge -- Multi-Task Robustness Track | Zehui Chen et.al. | 2402.17319 | null |
2024-02-27 | Probing Multimodal Large Language Models for Global and Local Semantic Representation | Mingxu Tao et.al. | 2402.17304 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-05-30 | SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow | Chaoyang Wang et.al. | 2405.20282 | link |
2024-05-30 | MCDS-VSS: Moving Camera Dynamic Scene Video Semantic Segmentation by Filtering with Self-Supervised Geometry and Motion | Angel Villar-Corrales et.al. | 2405.19921 | link |
2024-05-30 | Open-Set Domain Adaptation for Semantic Segmentation | Seun-An Choe et.al. | 2405.19899 | link |
2024-05-30 | DenseSeg: Joint Learning for Semantic Segmentation and Landmark Detection Using Dense Image-to-Shape Representation | Ron Keuth et.al. | 2405.19746 | null |
2024-05-30 | Twin Deformable Point Convolutions for Point Cloud Semantic Segmentation in Remote Sensing Scenes | Yong-Qiang Mao et.al. | 2405.19735 | null |
2024-05-30 | CRIS: Collaborative Refinement Integrated with Segmentation for Polyp Segmentation | Ankush Gajanan Arudkar et.al. | 2405.19672 | null |
2024-05-29 | Organizing Background to Explore Latent Classes for Incremental Few-shot Semantic Segmentation | Lianlei Shan et.al. | 2405.19568 | null |
2024-05-29 | Enabling Visual Recognition at Radio Frequency | Haowen Lai et.al. | 2405.19516 | null |
2024-05-29 | Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models | Tianrun Chen et.al. | 2405.19326 | null |
2024-05-29 | A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation | Niclas Vödisch et.al. | 2405.19035 | link |
2024-05-29 | Parameter-efficient Fine-tuning in Hyperspherical Space for Open-vocabulary Semantic Segmentation | Zelin Peng et.al. | 2405.18840 | null |
2024-05-29 | FocSAM: Delving Deeply into Focused Objects in Segmenting Anything | You Huang et.al. | 2405.18706 | null |
2024-05-28 | Learning to Detour: Shortcut Mitigating Augmentation for Weakly Supervised Semantic Segmentation | JuneHyoung Kwon et.al. | 2405.18148 | null |
2024-05-28 | Edge-guided and Class-balanced Active Learning for Semantic Segmentation of Aerial Images | Lianlei Shan et.al. | 2405.18078 | null |
2024-05-28 | RT-GS2: Real-Time Generalizable Semantic Segmentation for 3D Gaussian Representations of Radiance Fields | Mihnea-Bogdan Jurca et.al. | 2405.18033 | null |
2024-05-28 | DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture | Shentong Mo et.al. | 2405.17995 | null |
2024-05-28 | Adapting Pre-Trained Vision Models for Novel Instance Detection and Segmentation | Yangxiao Lu et.al. | 2405.17859 | link |
2024-05-28 | The Binary Quantized Neural Network for Dense Prediction via Specially Designed Upsampling and Attention | Xingyu Ding et.al. | 2405.17776 | null |
2024-05-27 | Evaluation of Multi-task Uncertainties in Joint Semantic Segmentation and Monocular Depth Estimation | Steven Landgraf et.al. | 2405.17097 | null |
2024-05-27 | DSU-Net: Dynamic Snake U-Net for 2-D Seismic First Break Picking | Hongtao Wang et.al. | 2405.16980 | null |
2024-05-27 | Collective Perception Datasets for Autonomous Driving: A Comprehensive Review | Sven Teufel et.al. | 2405.16973 | null |
2024-05-27 | Zero-Shot Video Semantic Segmentation based on Pre-Trained Diffusion Models | Qian Wang et.al. | 2405.16947 | null |
2024-05-27 | A re-calibration method for object detection with multi-modal alignment bias in autonomous driving | Zhihang Song et.al. | 2405.16848 | null |
2024-05-26 | Understanding the Effect of using Semantically Meaningful Tokens for Visual Representation Learning | Neha Kalibhat et.al. | 2405.16401 | null |
2024-05-25 | Video Prediction Models as General Visual Encoders | James Maier et.al. | 2405.16382 | null |
2024-05-25 | BOLD: Boolean Logic Deep Learning | Van Minh Nguyen et.al. | 2405.16339 | null |
2024-05-25 | Improving 3D Occupancy Prediction through Class-balancing Loss and Multi-scale Representation | Huizhou Chen et.al. | 2405.16099 | null |
2024-05-25 | Intensity and Texture Correction of Omnidirectional Image Using Camera Images for Indirect Augmented Reality | Hakim Ikebayashi et.al. | 2405.16008 | null |
2024-05-24 | Visualize and Paint GAN Activations | Rudolf Herdt et.al. | 2405.15636 | null |
2024-05-24 | Leveraging knowledge distillation for partial multi-task learning from multiple remote sensing datasets | Hoàng-Ân Lê et.al. | 2405.15394 | null |
2024-05-24 | Autonomous Quilt Spreading for Caregiving Robots | Yuchun Guo et.al. | 2405.15373 | null |
2024-05-24 | U3M: Unbiased Multiscale Modal Fusion Model for Multimodal Semantic Segmentation | Bingyu Li et.al. | 2405.15365 | link |
2024-05-24 | Cross-Domain Few-Shot Semantic Segmentation via Doubly Matching Transformation | Jiayi Chen et.al. | 2405.15265 | null |
2024-05-23 | Mamba-R: Vision Mamba ALSO Needs Registers | Feng Wang et.al. | 2405.14858 | null |
2024-05-23 | Efficient Robot Learning for Perception and Mapping | Niclas Vödisch et.al. | 2405.14688 | null |
2024-05-23 | Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation | Daniel Kienzle et.al. | 2405.14467 | null |
2024-05-23 | MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models | Jiuming Liu et.al. | 2405.14338 | null |
2024-05-23 | Tuning-free Universally-Supervised Semantic Segmentation | Xiaobo Yang et.al. | 2405.14294 | null |
2024-05-23 | SCMix: Stochastic Compound Mixing for Open Compound Domain Adaptation in Semantic Segmentation | Kai Yao et.al. | 2405.14278 | null |
2024-05-23 | Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations | Mohammed Baharoon et.al. | 2405.14239 | null |
2024-05-23 | Leveraging Semantic Segmentation Masks with Embeddings for Fine-Grained Form Classification | Taylor Archibald et.al. | 2405.14162 | null |
2024-05-23 | Skip-SCAR: A Modular Approach to ObjectGoal Navigation with Sparsity and Adaptive Skips | Yaotian Liu et.al. | 2405.14154 | null |
2024-05-22 | TS40K: a 3D Point Cloud Dataset of Rural Terrain and Electrical Transmission System | Diogo Lavado et.al. | 2405.13989 | null |
2024-05-21 | Transparency Distortion Robustness for SOTA Image Segmentation Tasks | Volker Knauthe et.al. | 2405.12864 | null |
2024-05-20 | A comprehensive overview of deep learning techniques for 3D point cloud classification and semantic segmentation | Sushmita Sarker et.al. | 2405.11903 | null |
2024-05-20 | Salience-guided Ground Factor for Robust Localization of Delivery Robots in Complex Urban Environments | Jooyong Park et.al. | 2405.11855 | null |
2024-05-20 | Improving the Explain-Any-Concept by Introducing Nonlinearity to the Trainable Surrogate Model | Mounes Zaval et.al. | 2405.11837 | null |
2024-05-20 | Universal Organizer of SAM for Unsupervised Semantic Segmentation | Tingting Li et.al. | 2405.11742 | null |
2024-05-19 | Interpreting a Semantic Segmentation Model for Coastline Detection | Conor O'Sullivan et.al. | 2405.11500 | null |
2024-05-19 | Unifying 3D Vision-Language Understanding via Promptable Queries | Ziyu Zhu et.al. | 2405.11442 | null |
2024-05-18 | PS6D: Point Cloud Based Symmetry-Aware 6D Object Pose Estimation in Robot Bin-Picking | Yifan Yang et.al. | 2405.11257 | null |
2024-05-17 | CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic Segmentation | Mushui Liu et.al. | 2405.10530 | link |
2024-05-16 | 4D Panoptic Scene Graph Generation | Jingkang Yang et.al. | 2405.10305 | link |
2024-05-16 | Towards Task-Compatible Compressible Representations | Anderson de Andrade et.al. | 2405.10244 | link |
2024-05-16 | DiverGen: Improving Instance Segmentation by Learning Wider Data Distribution with More Diverse Generative Data | Chengxiang Fan et.al. | 2405.10185 | link |
2024-05-16 | An Integrated Framework for Multi-Granular Explanation of Video Summarization | Konstantinos Tsigos et.al. | 2405.10082 | null |
2024-05-16 | A Preprocessing and Postprocessing Voxel-based Method for LiDAR Semantic Segmentation Improvement in Long Distance | Andrea Matteazzi et.al. | 2405.10046 | null |
2024-05-16 | Towards Realistic Incremental Scenario in Class Incremental Semantic Segmentation | Jihwan Kwak et.al. | 2405.09858 | null |
2024-05-15 | Synth-to-Real Unsupervised Domain Adaptation for Instance Segmentation | Guo Yachan et.al. | 2405.09682 | null |
2024-05-14 | CLIP with Quality Captions: A Strong Pretraining for Vision Tasks | Pavan Kumar Anasosalu Vasu et.al. | 2405.08911 | null |
2024-05-14 | Rethinking Scanning Strategies with Vision Mamba in Semantic Segmentation of Remote Sensing Imagery: An Experimental Study | Qinfeng Zhu et.al. | 2405.08493 | null |
2024-05-14 | TEDNet: Twin Encoder Decoder Neural Network for 2D Camera and LiDAR Road Detection | Martín Bayón-Gutiérrez et.al. | 2405.08429 | link |
2024-05-13 | IMAFD: An Interpretable Multi-stage Approach to Flood Detection from time series Multispectral Data | Ziyang Zhang et.al. | 2405.07916 | null |
2024-05-13 | PLUTO: Pathology-Universal Transformer | Dinkar Juyal et.al. | 2405.07905 | null |
2024-05-12 | PotatoGANs: Utilizing Generative Adversarial Networks, Instance Segmentation, and Explainable AI for Enhanced Potato Disease Identification and Classification | Mohammad Shafiul Alam et.al. | 2405.07332 | link |
2024-05-12 | Building a Strong Pre-Training Baseline for Universal 3D Large-Scale Perception | Haoming Chen et.al. | 2405.07201 | null |
2024-05-11 | Global Motion Understanding in Large-Scale Video Object Segmentation | Volodymyr Fedynyak et.al. | 2405.07031 | null |
2024-05-10 | GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs | Mustafa Munir et.al. | 2405.06849 | link |
2024-05-10 | Enhancing Weakly Supervised Semantic Segmentation with Multi-modal Foundation Models: An End-to-End Approach | Elham Ravanbakhsh et.al. | 2405.06586 | null |
2024-05-10 | Semantic and Spatial Adaptive Pixel-level Classifier for Semantic Segmentation | Xiaowen Ma et.al. | 2405.06525 | link |
2024-05-10 | Multi-Target Unsupervised Domain Adaptation for Semantic Segmentation without External Data | Yonghao Xu et.al. | 2405.06502 | null |
2024-05-10 | Multi-level Personalized Federated Learning on Heterogeneous and Long-Tailed Data | Rongyu Zhang et.al. | 2405.06413 | null |
2024-05-10 | Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation | Zhenliang Ni et.al. | 2405.06228 | link |
2024-05-10 | Zero-shot Degree of Ill-posedness Estimation for Active Small Object Change Detection | Koji Takeda et.al. | 2405.06185 | null |
2024-05-10 | Prior-guided Diffusion Model for Cell Segmentation in Quantitative Phase Imaging | Zhuchen Shao et.al. | 2405.06175 | null |
2024-05-09 | Mask-TS Net: Mask Temperature Scaling Uncertainty Calibration for Polyp Segmentation | Yudian Zhang et.al. | 2405.05830 | null |
2024-05-09 | CSA-Net: Channel-wise Spatially Autocorrelated Attention Networks | Nick et.al. | 2405.05755 | null |
2024-05-08 | OpenESS: Event-based Semantic Scene Understanding with Open Vocabularies | Lingdong Kong et.al. | 2405.05259 | link |
2024-05-08 | Multi-Modal Data-Efficient 3D Scene Understanding for Autonomous Driving | Lingdong Kong et.al. | 2405.05258 | link |
2024-05-08 | Weakly-supervised Semantic Segmentation via Dual-stream Contrastive Learning of Cross-image Contextual Information | Qi Lai et.al. | 2405.04913 | null |
2024-05-08 | DeepDamageNet: A two-step deep-learning model for multi-disaster building damage segmentation and classification using satellite imagery | Irene Alisjahbana et.al. | 2405.04800 | null |
2024-05-07 | A Self-Supervised Method for Body Part Segmentation and Keypoint Detection of Rat Images | László Kopácsi et.al. | 2405.04650 | null |
2024-05-07 | FRACTAL: An Ultra-Large-Scale Aerial Lidar Dataset for 3D Semantic Segmentation of Diverse Landscapes | Charles Gaydon et.al. | 2405.04634 | link |
2024-05-07 | AugmenTory: A Fast and Flexible Polygon Augmentation Library | Tanaz Ghahremani et.al. | 2405.04442 | null |
2024-05-07 | A New Dataset and Comparative Study for Aphid Cluster Detection and Segmentation in Sorghum Fields | Raiyan Rahman et.al. | 2405.04305 | null |
2024-05-07 | ELiTe: Efficient Image-to-LiDAR Knowledge Transfer for Semantic Segmentation | Zhibo Zhang et.al. | 2405.04121 | null |
2024-05-07 | Structured Click Control in Transformer-based Interactive Segmentation | Long Xu et.al. | 2405.04009 | link |
2024-05-06 | PTQ4SAM: Post-Training Quantization for Segment Anything | Chengtao Lv et.al. | 2405.03144 | link |
2024-05-04 | MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning | Vishal Nedungadi et.al. | 2405.02771 | null |
2024-05-04 | Few-Shot Fruit Segmentation via Transfer Learning | Jordan A. James et.al. | 2405.02556 | null |
2024-05-03 | Panoptic-SLAM: Visual SLAM in Dynamic Environments using Panoptic Segmentation | Gabriel Fischer Abati et.al. | 2405.02177 | null |
2024-05-03 | Towards general deep-learning-based tree instance segmentation models | Jonathan Henrich et.al. | 2405.02061 | null |
2024-05-03 | DiffMap: Enhancing Map Segmentation with Map Prior Using Diffusion Model | Peijin Jia et.al. | 2405.02008 | null |
2024-05-02 | Development of Skip Connection in Deep Neural Networks for Computer Vision and Medical Image Analysis: A Survey | Guoping Xu et.al. | 2405.01725 | link |
2024-05-02 | Explainable AI (XAI) in Image Segmentation in Medicine, Industry, and Beyond: A Survey | Rokas Gipiškis et.al. | 2405.01636 | null |
2024-05-02 | CromSS: Cross-modal pre-training with noisy labels for remote sensing image segmentation | Chenying Liu et.al. | 2405.01217 | null |
2024-05-02 | Uncertainty-aware self-training with expectation maximization basis transformation | Zijia Wang et.al. | 2405.01175 | null |
2024-05-01 | GraCo: Granularity-Controllable Interactive Segmentation | Yian Zhao et.al. | 2405.00587 | null |
2024-05-01 | Exploring Self-Supervised Vision Transformers for Deepfake Detection: A Comparative Analysis | Huy H. Nguyen et.al. | 2405.00355 | null |
2024-04-30 | Masked Multi-Query Slot Attention for Unsupervised Object Discovery | Rishav Pramanik et.al. | 2404.19654 | link |
2024-04-30 | UniFS: Universal Few-shot Instance Perception with Point Representations | Sheng Jin et.al. | 2404.19401 | null |
2024-04-30 | DELINE8K: A Synthetic Data Pipeline for the Semantic Segmentation of Historical Documents | Taylor Archibald et.al. | 2404.19259 | null |
2024-04-29 | Swin2-MoSE: A New Single Image Super-Resolution Model for Remote Sensing | Leonardo Rossi et.al. | 2404.18924 | null |
2024-04-29 | IPixMatch: Boost Semi-supervised Semantic Segmentation with Inter-Pixel Relation | Kebin Wu et.al. | 2404.18891 | null |
2024-04-29 | From Density to Geometry: YOLOv8 Instance Segmentation for Reverse Engineering of Optimized Structures | Thomas Rochefort-Beaudoin et.al. | 2404.18763 | null |
2024-04-29 | Towards Long-term Robotics in the Wild | Stephen Hausler et.al. | 2404.18477 | null |
2024-04-29 | Clicks2Line: Using Lines for Interactive Image Segmentation | Chaewon Lee et.al. | 2404.18461 | null |
2024-04-29 | MFP: Making Full Use of Probability Maps for Interactive Image Segmentation | Chaewon Lee et.al. | 2404.18448 | null |
2024-04-28 | Panoptic Segmentation and Labelling of Lumbar Spine Vertebrae using Modified Attention Unet | Rikathi Pal et.al. | 2404.18291 | null |
2024-04-28 | Garbage Segmentation and Attribute Analysis by Robotic Dogs | Nuo Xu et.al. | 2404.18112 | null |
2024-04-27 | Multi-Stream Cellular Test-Time Adaptation of Real-Time Models Evolving in Dynamic Environments | Benoît Gérin et.al. | 2404.17930 | link |
2024-04-27 | GLIMS: Attention-Guided Lightweight Multi-Scale Hybrid Network for Volumetric Semantic Segmentation | Ziya Ata Yazıcı et.al. | 2404.17854 | link |
2024-04-26 | Optimizing Universal Lesion Segmentation: State Space Model-Guided Hierarchical Networks with Feature Importance Adjustment | Kazi Shahriar Sanjid et.al. | 2404.17235 | null |
2024-04-25 | Calculation of Femur Caput Collum Diaphyseal angle for X-Rays images using Semantic Segmentation | Deepak Bhatia et.al. | 2404.17083 | null |
2024-04-25 | Boosting Unsupervised Semantic Segmentation with Principal Mask Proposals | Oliver Hahn et.al. | 2404.16818 | link |
2024-04-25 | Self-Balanced R-CNN for Instance Segmentation | Leonardo Rossi et.al. | 2404.16633 | link |
2024-04-26 | Multi-Scale Representations by Varying Window Attention for Semantic Segmentation | Haotian Yan et.al. | 2404.16573 | link |
2024-04-25 | 360SFUDA++: Towards Source-free UDA for Panoramic Segmentation by Learning Reliable Category Prototypes | Xu Zheng et.al. | 2404.16501 | null |
2024-04-25 | Semantic Segmentation Refiner for Ultrasound Applications with Zero-Shot Foundation Models | Hedda Cohen Indelman et.al. | 2404.16325 | null |
2024-04-25 | Style Adaptation for Domain-adaptive Semantic Segmentation | Ting Li et.al. | 2404.16301 | null |
2024-04-25 | A Multi-objective Optimization Benchmark Test Suite for Real-time Semantic Segmentation | Yifan Zhao et.al. | 2404.16266 | link |
2024-04-24 | Does SAM dream of EIG? Characterizing Interactive Segmenter Performance using Expected Information Gain | Kuan-I Chung et.al. | 2404.16155 | null |
2024-04-24 | 3D Freehand Ultrasound using Visual Inertial and Deep Inertial Odometry for Measuring Patellar Tracking | Russell Buchanan et.al. | 2404.15847 | null |
2024-04-24 | Vision Transformer-based Adversarial Domain Adaptation | Yahan Li et.al. | 2404.15817 | link |
2024-04-23 | PRISM: A Promptable and Robust Interactive Segmentation Model with Visual Prompts | Hao Li et.al. | 2404.15028 | link |
2024-04-23 | Unknown Object Grasping for Assistive Robotics | Elle Miller et.al. | 2404.15001 | null |
2024-04-22 | Surgical-DeSAM: Decoupling SAM for Instrument Segmentation in Robotic Surgery | Yuyang Sheng et.al. | 2404.14040 | link |
2024-04-22 | OccFeat: Self-supervised Occupancy Feature Prediction for Pretraining BEV Segmentation Networks | Sophia Sirko-Galouchenko et.al. | 2404.14027 | null |
2024-04-22 | PM-VIS: High-Performance Box-Supervised Video Instance Segmentation | Zhangjing Yang et.al. | 2404.13863 | null |
2024-04-21 | Semantic-Rearrangement-Based Multi-Level Alignment for Domain Generalized Segmentation | Guanlong Jiao et.al. | 2404.13701 | null |
2024-04-21 | PV-S3: Advancing Automatic Photovoltaic Defect Detection using Semi-Supervised Semantic Segmentation of Electroluminescence Images | Abhishek Jha et.al. | 2404.13693 | null |
2024-04-21 | A Complete System for Automated 3D Semantic-Geometric Mapping of Corrosion in Industrial Environments | Rui Pimentel de Figueiredo et.al. | 2404.13691 | null |
2024-04-21 | LMFNet: An Efficient Multimodal Fusion Approach for Semantic Segmentation in High-Resolution Remote Sensing | Tong Wang et.al. | 2404.13659 | null |
2024-04-21 | Towards Unified Representation of Multi-Modal Pre-training for 3D Understanding via Differentiable Rendering | Ben Fei et.al. | 2404.13619 | null |
2024-04-20 | FisheyeDetNet: Object Detection on Fisheye Surround View Camera Systems for Automated Driving | Ganesh Sistu et.al. | 2404.13443 | null |
2024-04-20 | AMMUNet: Multi-Scale Attention Map Merging for Remote Sensing Image Segmentation | Yang Yang et.al. | 2404.13408 | null |
2024-04-19 | Nuclei Instance Segmentation of Cryosectioned H&E Stained Histological Images using Triple U-Net Architecture | Zarif Ahmed et.al. | 2404.12986 | null |
2024-04-19 | FipTR: A Simple yet Effective Transformer Framework for Future Instance Prediction in Autonomous Driving | Xingtai Gui et.al. | 2404.12867 | null |
2024-04-19 | Foundation Model assisted Weakly Supervised LiDAR Semantic Segmentation | Yilong Chen et.al. | 2404.12861 | null |
2024-04-19 | COIN: Counterfactual inpainting for weakly supervised semantic segmentation for medical images | Dmytro Shvetsov et.al. | 2404.12832 | link |
2024-04-19 | A Point-Based Approach to Efficient LiDAR Multi-Task Perception | Christopher Lang et.al. | 2404.12798 | null |
2024-04-19 | Generalized Few-Shot Meets Remote Sensing: Discovering Novel Classes in Land Cover Mapping via Hybrid Semantic Segmentation Framework | Zhuohong Li et.al. | 2404.12721 | link |
2024-04-19 | Improving Prediction Accuracy of Semantic Segmentation Methods Using Convolutional Autoencoder Based Pre-processing Layers | Hisashi Shimodaira et.al. | 2404.12718 | null |
2024-04-19 | Show and Grasp: Few-shot Semantic Segmentation for Robot Grasping through Zero-shot Foundation Models | Leonardo Barcellona et.al. | 2404.12717 | null |
2024-04-18 | Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds | Oliver Lemke et.al. | 2404.12440 | null |
2024-04-18 | A Perspective on Deep Vision Performance with Standard Image and Video Codecs | Christoph Reich et.al. | 2404.12330 | null |
2024-04-18 | Performance Evaluation of Segment Anything Model with Variational Prompting for Application to Non-Visible Spectrum Imagery | Yona Falinie A. Gaus et.al. | 2404.12285 | null |
2024-04-18 | Deep Gaussian mixture model for unsupervised image segmentation | Matthias Schwab et.al. | 2404.12252 | null |
2024-04-18 | Observation, Analysis, and Solution: Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training | Jin Gao et.al. | 2404.12210 | link |
2024-04-18 | How to Benchmark Vision Foundation Models for Semantic Segmentation? | Tommie Kerssies et.al. | 2404.12172 | null |
2024-04-17 | Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding | George Retsinas et.al. | 2404.12144 | link |
2024-04-18 | Tendency-driven Mutual Exclusivity for Weakly Supervised Incremental Semantic Segmentation | Chongjie Si et.al. | 2404.11981 | null |
2024-04-18 | The devil is in the object boundary: towards annotation-free instance segmentation using Foundation Models | Cheng Shi et.al. | 2404.11957 | link |
2024-04-18 | Group-On: Boosting One-Shot Segmentation with Supportive Query | Hanjing Zhou et.al. | 2404.11871 | null |
2024-04-17 | Visual Prompting for Generalized Few-shot Segmentation: A Multi-scale Approach | Mir Rayat Imtiaz Hossain et.al. | 2404.11732 | null |
2024-04-17 | A Semantic Segmentation-guided Approach for Ground-to-Aerial Image Matching | Francesco Pro et.al. | 2404.11302 | link |
2024-04-17 | Learning from Unlabelled Data with Transformers: Domain Adaptation for Semantic Segmentation of High Resolution Aerial Images | Nikolaos Dionelis et.al. | 2404.11299 | link |
2024-04-17 | Criteria for Uncertainty-based Corner Cases Detection in Instance Segmentation | Florian Heidecker et.al. | 2404.11266 | null |
2024-04-16 | A Concise Tiling Strategy for Preserving Spatial Context in Earth Observation Imagery | Ellianna Abrahams et.al. | 2404.10927 | link |
2024-04-16 | Vocabulary-free Image Classification and Semantic Segmentation | Alessandro Conti et.al. | 2404.10864 | link |
2024-04-16 | Gasformer: A Transformer-based Architecture for Segmenting Methane Emissions from Livestock in Optical Gas Imaging | Toqi Tahamid Sarker et.al. | 2404.10841 | link |
2024-04-16 | Learning Feature Inversion for Multi-class Anomaly Detection under General-purpose COCO-AD Benchmark | Jiangning Zhang et.al. | 2404.10760 | null |
2024-04-16 | ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation | Iaroslav Melekhov et.al. | 2404.10699 | null |
2024-04-16 | Contextrast: Contextual Contrastive Learning for Semantic Segmentation | Changki Sung et.al. | 2404.10633 | null |
2024-04-16 | Label merge-and-split: A graph-colouring approach for memory-efficient brain parcellation | Aaron Kujawa et.al. | 2404.10572 | null |
2024-04-16 | LAECIPS: Large Vision Model Assisted Adaptive Edge-Cloud Collaboration for IoT-based Perception System | Shijing Hu et.al. | 2404.10498 | null |
2024-04-16 | Adversarial Identity Injection for Semantic Face Image Synthesis | Giuseppe Tarollo et.al. | 2404.10408 | null |
2024-04-16 | Domain-Rectifying Adapter for Cross-Domain Few-Shot Segmentation | Jiapeng Su et.al. | 2404.10322 | null |
2024-04-16 | Learnable Prompt for Few-Shot Semantic Segmentation in Remote Sensing Domain | Steve Andreas Immanuel et.al. | 2404.10307 | link |
2024-04-15 | NOISe: Nuclei-Aware Osteoclast Instance Segmentation for Mouse-to-Human Domain Transfer | Sai Kumar Reddy Manne et.al. | 2404.10130 | link |
2024-04-15 | Empowering Embodied Visual Tracking with Visual Foundation Models and Offline RL | Fangwei Zhong et.al. | 2404.09857 | null |
2024-04-15 | In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation | Han Xue et.al. | 2404.09633 | null |
2024-04-15 | The revenge of BiSeNet: Efficient Multi-Task Image Segmentation | Gabriele Rosi et.al. | 2404.09570 | null |
2024-04-15 | kNN-CLIP: Retrieval Enables Training-Free Segmentation on Continually Expanding Large Vocabularies | Zhongrui Gui et.al. | 2404.09447 | null |
2024-04-15 | Human-in-the-Loop Segmentation of Multi-species Coral Imagery | Scarlett Raine et.al. | 2404.09406 | null |
2024-04-14 | Bridging Data Islands: Geographic Heterogeneity-Aware Federated Learning for Collaborative Remote Sensing Semantic Segmentation | Jieyi Tan et.al. | 2404.09292 | null |
2024-04-12 | Structured Model Pruning for Efficient Inference in Computational Pathology | Mohammed Adnan et.al. | 2404.08831 | null |
2024-04-12 | COCONut: Modernizing COCO Segmentation | Xueqing Deng et.al. | 2404.08639 | null |
2024-04-12 | Benchmarking the Cell Image Segmentation Models Robustness under the Microscope Optical Aberrations | Boyuan Peng et.al. | 2404.08549 | null |
2024-04-12 | Analyzing Decades-Long Environmental Changes in Namibia Using Archival Aerial Photography and Deep Learning | Girmaw Abebe Tadesse et.al. | 2404.08544 | null |
2024-04-12 | LaSagnA: Language-based Segmentation Assistant for Complex Queries | Cong Wei et.al. | 2404.08506 | link |
2024-04-12 | Adapting the Segment Anything Model During Usage in Novel Situations | Robin Schön et.al. | 2404.08421 | null |
2024-04-12 | Let It Flow: Simultaneous Optimization of 3D Flow and Object Clustering | Patrik Vacek et.al. | 2404.08363 | null |
2024-04-12 | AdaContour: Adaptive Contour Descriptor with Hierarchical Representation | Tianyu Ding et.al. | 2404.08292 | null |
2024-04-12 | Tackling Ambiguity from Perspective of Uncertainty Inference and Affinity Diversification for Weakly Supervised Semantic Segmentation | Zhiwei Yang et.al. | 2404.08195 | link |
2024-04-12 | Pay Attention to Your Neighbours: Training-Free Open-Vocabulary Semantic Segmentation | Sina Hajimiri et.al. | 2404.08181 | link |
2024-04-11 | Exploiting Object-based and Segmentation-based Semantic Features for Deep Learning-based Indoor Scene Classification | Ricardo Pereira et.al. | 2404.07739 | null |
2024-04-11 | OpenTrench3D: A Photogrammetric 3D Point Cloud Dataset for Semantic Segmentation of Underground Utilities | Lasse H. Hansen et.al. | 2404.07711 | link |
2024-04-11 | ViM-UNet: Vision Mamba for Biomedical Segmentation | Anwai Archit et.al. | 2404.07705 | link |
2024-04-11 | Implicit and Explicit Language Guidance for Diffusion-based Visual Perception | Hefeng Wang et.al. | 2404.07600 | null |
2024-04-11 | Improving Shift Invariance in Convolutional Neural Networks with Translation Invariant Polyphase Sampling | Sourajit Saha et.al. | 2404.07410 | null |
2024-04-10 | AI-Guided Defect Detection Techniques to Model Single Crystal Diamond Growth | Rohan Reddy Mekala et.al. | 2404.07306 | null |
2024-04-10 | RESSCAL3D: Resolution Scalable 3D Semantic Segmentation of Point Clouds | Remco Royen et.al. | 2404.06863 | null |
2024-04-10 | O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation | Muer Tie et.al. | 2404.06836 | null |
2024-04-10 | Convolution-based Probability Gradient Loss for Semantic Segmentation | Guohang Shan et.al. | 2404.06704 | null |
2024-04-09 | Training-Free Open-Vocabulary Segmentation with Offline Diffusion-Augmented Prototype Generation | Luca Barsellotti et.al. | 2404.06542 | null |
2024-04-09 | QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding | Yash Mehan et.al. | 2404.06442 | null |
2024-04-09 | DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning | Senthil Yogamani et.al. | 2404.06352 | null |
2024-04-09 | Automated National Urban Map Extraction | Hasan Nasrallah et.al. | 2404.06202 | null |
2024-04-09 | Hierarchical Insights: Exploiting Structural Similarities for Reliable 3D Semantic Segmentation | Mariella Dreissig et.al. | 2404.06124 | null |
2024-04-09 | Improving Facial Landmark Detection Accuracy and Efficiency with Knowledge Distillation | Zong-Wei Hong et.al. | 2404.06029 | null |
2024-04-08 | Evaluating the Efficacy of Cut-and-Paste Data Augmentation in Semantic Segmentation for Satellite Imagery | Ionut M. Motoi et.al. | 2404.05693 | null |
2024-04-08 | AlignZeg: Mitigating Objective Misalignment for Zero-shot Semantic Segmentation | Jiannan Ge et.al. | 2404.05667 | null |
2024-04-08 | Impact of LiDAR visualisations on semantic segmentation of archaeological objects | Raveerat Jaturapitpornchai et.al. | 2404.05512 | null |
2024-04-08 | Rethinking the Spatial Inconsistency in Classifier-Free Diffusion Guidance | Dazhong Shen et.al. | 2404.05384 | link |
2024-04-08 | GPS-free Autonomous Navigation in Cluttered Tree Rows with Deep Semantic Segmentation | Alessandro Navone et.al. | 2404.05338 | null |
2024-04-08 | Human Detection from 4D Radar Data in Low-Visibility Field Conditions | Mikael Skog et.al. | 2404.05307 | null |
2024-04-08 | iVPT: Improving Task-relevant Information Sharing in Visual Prompt Tuning by Cross-layer Dynamic Connection | Nan Zhou et.al. | 2404.05207 | null |
2024-04-08 | UniMix: Towards Domain Adaptive and Generalizable LiDAR Semantic Segmentation in Adverse Weather | Haimei Zhao et.al. | 2404.05145 | null |
2024-04-07 | D2SL: Decouple Defogging and Semantic Learning for Foggy Domain-Adaptive Segmentation | Xuan Sun et.al. | 2404.04807 | null |
2024-04-06 | HawkDrive: A Transformer-driven Visual Perception System for Autonomous Driving in Night Scene | Ziang Guo et.al. | 2404.04653 | link |
2024-04-05 | Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation | Zifu Wan et.al. | 2404.04256 | null |
2024-04-05 | Image-Text Co-Decomposition for Text-Supervised Semantic Segmentation | Ji-Jia Wu et.al. | 2404.04231 | null |
2024-04-05 | MarsSeg: Mars Surface Semantic Segmentation with Multi-level Extractor and Connector | Junbo Li et.al. | 2404.04155 | null |
2024-04-04 | Language-Guided Instance-Aware Domain-Adaptive Panoptic Segmentation | Elham Amin Mansour et.al. | 2404.03799 | null |
2024-04-04 | Flattening the Parent Bias: Hierarchical Semantic Segmentation in the Poincaré Ball | Simon Weber et.al. | 2404.03778 | null |
2024-04-04 | OW-VISCap: Open-World Video Instance Segmentation and Captioning | Anwesa Choudhuri et.al. | 2404.03657 | null |
2024-04-04 | Background Noise Reduction of Attention Map for Weakly Supervised Semantic Segmentation | Izumi Fujimori et.al. | 2404.03394 | null |
2024-04-04 | iSeg: Interactive 3D Segmentation via Interactive Attention | Itai Lang et.al. | 2404.03219 | null |
2024-04-04 | CORP: A Multi-Modal Dataset for Campus-Oriented Roadside Perception Tasks | Beibei Wang et.al. | 2404.03191 | null |
2024-04-03 | GPU-Accelerated RSF Level Set Evolution for Large-Scale Microvascular Segmentation | Meher Niger et.al. | 2404.02813 | null |
2024-04-03 | RS-Mamba for Large Remote Sensing Image Dense Prediction | Sijie Zhao et.al. | 2404.02668 | link |
2024-04-03 | A Satellite Band Selection Framework for Amazon Forest Deforestation Detection Task | Eduardo Neto et.al. | 2404.02659 | null |
2024-04-03 | SG-BEV: Satellite-Guided BEV Fusion for Cross-View Semantic Segmentation | Junyan Ye et.al. | 2404.02638 | link |
2024-04-03 | Active learning for efficient annotation in precision agriculture: a use-case on crop-weed semantic segmentation | Bart M. van Marrewijk et.al. | 2404.02580 | null |
2024-04-03 | HENet: Hybrid Encoding for End-to-end Multi-task 3D Perception from Multi-view Cameras | Zhongyu Xia et.al. | 2404.02517 | link |
2024-04-03 | Optimizing traffic signs and lights visibility for the teleoperation of autonomous vehicles through ROI compression | I. Dror et.al. | 2404.02481 | null |
2024-04-03 | RS3Mamba: Visual State Space Model for Remote Sensing Images Semantic Segmentation | Xianping Ma et.al. | 2404.02457 | link |
2024-04-02 | Constrained Robotic Navigation on Preferred Terrains Using LLMs and Speech Instruction: Exploiting the Power of Adverbs | Faraz Lotfi et.al. | 2404.02294 | null |
2024-04-02 | Segment Any 3D Object with Language | Seungjun Lee et.al. | 2404.02157 | null |
2024-04-02 | Multi-Level Label Correction by Distilling Proximate Patterns for Semi-supervised Semantic Segmentation | Hui Xiao et.al. | 2404.02065 | null |
2024-04-01 | What is Point Supervision Worth in Video Instance Segmentation? | Shuaiyi Huang et.al. | 2404.01990 | null |
2024-04-02 | Synthetic Data for Robust Stroke Segmentation | Liam Chalcroft et.al. | 2404.01946 | link |
2024-04-02 | Improving Bird's Eye View Semantic Segmentation by Task Decomposition | Tianhao Zhao et.al. | 2404.01925 | null |
2024-04-02 | Rethinking Annotator Simulation: Realistic Evaluation of Whole-Body PET Lesion Interactive Segmentation Methods | Zdravko Marinov et.al. | 2404.01816 | null |
2024-04-02 | Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model | Qinfeng Zhu et.al. | 2404.01705 | null |
2024-04-02 | Beyond Image Super-Resolution for Image Recognition with Task-Driven Perceptual Loss | Jaeha Kim et.al. | 2404.01692 | null |
2024-04-02 | JRDB-PanoTrack: An Open-world Panoptic Segmentation and Tracking Robotic Dataset in Crowded Human Environments | Duy-Tho Le et.al. | 2404.01686 | null |
2024-04-01 | SUGAR: Pre-training 3D Visual Representations for Robotics | Shizhe Chen et.al. | 2404.01491 | null |
2024-03-29 | ECLIPSE: Efficient Continual Learning in Panoptic Segmentation with Visual Prompt Tuning | Beomyoung Kim et.al. | 2403.20126 | link |
2024-03-29 | Modeling Weather Uncertainty for Multi-weather Co-Presence Estimation | Qi Bi et.al. | 2403.20092 | null |
2024-03-29 | Using Images as Covariates: Measuring Curb Appeal with Deep Learning | Ardyn Nordstrom et.al. | 2403.19915 | null |
2024-03-29 | MambaMixer: Efficient Selective State Space Models with Dual Token and Channel Selection | Ali Behrouz et.al. | 2403.19888 | null |
2024-03-28 | Segmentation Re-thinking Uncertainty Estimation Metrics for Semantic Segmentation | Qitian Ma et.al. | 2403.19826 | null |
2024-04-01 | Efficient 3D Instance Mapping and Localization with Neural Fields | George Tang et.al. | 2403.19797 | null |
2024-03-28 | ENet-21: An Optimized light CNN Structure for Lane Detection | Seyed Rasoul Hosseini et.al. | 2403.19782 | null |
2024-03-29 | Genetic Quantization-Aware Approximation for Non-Linear Operations in Transformers | Pingcheng Dong et.al. | 2403.19591 | link |
2024-03-28 | DenseNets Reloaded: Paradigm Shift Beyond ResNets and ViTs | Donghyun Kim et.al. | 2403.19588 | link |
2024-03-28 | Learning Multiple Representations with Inconsistency-Guided Detail Regularization for Mask-Guided Matting | Weihao Jiang et.al. | 2403.19213 | null |
2024-03-27 | Lift3D: Zero-Shot Lifting of Any 2D Vision Model to 3D | Mukund Varma T et.al. | 2403.18922 | null |
2024-03-27 | Annolid: Annotate, Segment, and Track Anything You Need | Chen Yang et.al. | 2403.18690 | null |
2024-03-27 | I2CKD : Intra- and Inter-Class Knowledge Distillation for Semantic Segmentation | Ayoub Karine et.al. | 2403.18490 | null |
2024-03-28 | ViTAR: Vision Transformer with Any Resolution | Qihang Fan et.al. | 2403.18361 | null |
2024-03-27 | Generating Diverse Agricultural Data for Vision-Based Farming Applications | Mikolaj Cieslak et.al. | 2403.18351 | null |
2024-03-27 | Road Obstacle Detection based on Unknown Objectness Scores | Chihiro Noguchi et.al. | 2403.18207 | null |
2024-03-26 | Spectral Convolutional Transformer: Harmonizing Real vs. Complex Multi-View Spectral Operators for Vision Transformer | Badri N. Patro et.al. | 2403.18063 | link |
2024-03-26 | The Need for Speed: Pruning Transformers with One Recipe | Samir Khaki et.al. | 2403.17921 | link |
2024-03-26 | Compressed Multi-task embeddings for Data-Efficient Downstream training and inference in Earth Observation | Carlos Gomes et.al. | 2403.17886 | null |
2024-03-26 | PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition | Chenhongyi Yang et.al. | 2403.17695 | link |
2024-03-26 | Integrating Mamba Sequence Model and Hierarchical Upsampling Network for Accurate Semantic Segmentation of Multiple Sclerosis Legion | Kazi Shahriar Sanjid et.al. | 2403.17432 | null |
2024-03-25 | Optimizing LiDAR Placements for Robust Driving Perception in Adverse Conditions | Ye Li et.al. | 2403.17009 | link |
2024-03-25 | DreamLIP: Language-Image Pre-training with Long Captions | Kecheng Zheng et.al. | 2403.17007 | null |
2024-03-25 | TwinLiteNetPlus: A Stronger Model for Real-time Drivable Area and Lane Segmentation | Quang-Huy Che et.al. | 2403.16958 | null |
2024-03-25 | HPL-ESS: Hybrid Pseudo-Labeling for Unsupervised Event-based Semantic Segmentation | Linglin Jing et.al. | 2403.16788 | null |
2024-03-25 | Clustering Propagation for Universal Medical Image Segmentation | Yuhang Ding et.al. | 2403.16646 | null |
2024-03-25 | SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation | Aysim Toker et.al. | 2403.16605 | null |
2024-03-25 | Self-Supervised Learning for Medical Image Data with Anatomy-Oriented Imaging Planes | Tianwei Zhang et.al. | 2403.16499 | null |
2024-03-25 | GoodSAM: Bridging Domain and Capacity Gaps via Segment Anything Model for Distortion-aware Panoramic Semantic Segmentation | Weiming Zhang et.al. | 2403.16370 | null |
2024-03-24 | AutoInst: Automatic Instance-Based Segmentation of LiDAR 3D Scans | Cedric Perauer et.al. | 2403.16318 | null |
2024-03-24 | Dual-modal Prior Semantic Guided Infrared and Visible Image Fusion for Intelligent Transportation System | Jing Li et.al. | 2403.16227 | null |
2024-03-24 | Segment Anything Model for Road Network Graph Extraction | Congrui Hetang et.al. | 2403.16051 | link |
2024-03-24 | SM2C: Boost the Semi-supervised Segmentation for Medical Image by using Meta Pseudo Labels and Mixed Images | Yifei Wang et.al. | 2403.16009 | null |
2024-03-22 | Semantic Gaussians: Open-Vocabulary Scene Understanding with 3D Gaussian Splatting | Jun Guo et.al. | 2403.15624 | null |
2024-03-22 | A2DMN: Anatomy-Aware Dilated Multiscale Network for Breast Ultrasound Semantic Segmentation | Kyle Lucke et.al. | 2403.15560 | null |
2024-03-22 | InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding | Yi Wang et.al. | 2403.15377 | null |
2024-03-22 | Anytime, Anywhere, Anyone: Investigating the Feasibility of Segment Anything Model for Crowd-Sourcing Medical Image Annotations | Pranav Kulkarni et.al. | 2403.15218 | null |
2024-03-22 | Your Image is My Video: Reshaping the Receptive Field via Image-To-Video Differentiable AutoAugmentation and Fusion | Sofia Casarin et.al. | 2403.15194 | null |
2024-03-22 | IFSENet : Harnessing Sparse Iterations for Interactive Few-shot Segmentation Excellence | Shreyas Chandgothia et.al. | 2403.15089 | null |
2024-03-22 | Towards a Comprehensive, Efficient and Promptable Anatomic Structure Segmentation Model using 3D Whole-body CT Scans | Heng Guo et.al. | 2403.15063 | null |
2024-03-22 | BSNet: Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation | Jiahao Lu et.al. | 2403.15019 | null |
2024-03-22 | Improve Cross-domain Mixed Sampling with Guidance Training for Adaptive Segmentation | Wenlve Zhou et.al. | 2403.14995 | null |
2024-03-21 | WeatherProof: Leveraging Language Guidance for Semantic Segmentation in Adverse Weather | Blake Gella et.al. | 2403.14874 | null |
2024-03-21 | PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model | Zheng Zhang et.al. | 2403.14598 | link |
2024-03-21 | Learning to Project for Cross-Task Knowledge Distillation | Dylan Auty et.al. | 2403.14494 | null |
2024-03-21 | OA-CNNs: Omni-Adaptive Sparse CNNs for 3D Semantic Segmentation | Bohao Peng et.al. | 2403.14418 | link |
2024-03-21 | Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models | Pablo Marcos-Manchón et.al. | 2403.14291 | link |
2024-03-21 | OTSeg: Multi-prompt Sinkhorn Attention for Zero-Shot Semantic Segmentation | Kwanyoung Kim et.al. | 2403.14183 | null |
2024-03-21 | Evidential Semantic Mapping in Off-road Environments with Uncertainty-aware Bayesian Kernel Inference | Junyoung Kim et.al. | 2403.14138 | null |
2024-03-21 | Soft Masked Transformer for Point Cloud Processing with Skip Attention-Based Upsampling | Yong He et.al. | 2403.14124 | null |
2024-03-21 | Semantics from Space: Satellite-Guided Thermal Semantic Segmentation Annotation for Aerial Field Robots | Connor Lee et.al. | 2403.14056 | null |
2024-03-20 | When Cars meet Drones: Hyperbolic Federated Learning for Source-Free Domain Adaptation in Adverse Weather | Giulia Rizzoli et.al. | 2403.13762 | null |
2024-03-20 | Next day fire prediction via semantic segmentation | Konstantinos Alexis et.al. | 2403.13545 | null |
2024-03-20 | MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining | Di Wang et.al. | 2403.13430 | link |
2024-03-20 | AMCO: Adaptive Multimodal Coupling of Vision and Proprioception for Quadruped Robot Navigation in Outdoor Environments | Mohamed Elnoor et.al. | 2403.13235 | null |
2024-03-20 | Modeling the Label Distributions for Weakly-Supervised Semantic Segmentation | Linshan Wu et.al. | 2403.13225 | null |
2024-03-19 | Reflectivity Is All You Need!: Advancing LiDAR Semantic Segmentation | Kasi Viswanath et.al. | 2403.13188 | null |
2024-03-19 | As Firm As Their Foundations: Can open-sourced foundation models be used to create adversarial examples for downstream tasks? | Anjun Hu et.al. | 2403.12693 | null |
2024-03-19 | PCT: Perspective Cue Training Framework for Multi-Camera BEV Segmentation | Haruya Ishikawa et.al. | 2403.12530 | null |
2024-03-19 | Semantics, Distortion, and Style Matter: Towards Source-free UDA for Panoramic Segmentation | Xu Zheng et.al. | 2403.12505 | null |
2024-03-19 | CLIP-VIS: Adapting CLIP for Open-Vocabulary Video Instance Segmentation | Wenqi Zhu et.al. | 2403.12455 | link |
2024-03-19 | Multi-Object RANSAC: Efficient Plane Clustering Method in a Clutter | Seunghyeon Lim et.al. | 2403.12449 | null |
2024-03-18 | EffiPerception: an Efficient Framework for Various Perception Tasks | Xinhao Xiang et.al. | 2403.12317 | null |
2024-03-18 | Aerial Lifting: Neural Urban Semantic and Building Instance Lifting from Aerial Imagery | Yuqi Zhang et.al. | 2403.11812 | null |
2024-03-18 | Dynamic Tuning Towards Parameter and Inference Efficiency for ViT Adaptation | Wangbo Zhao et.al. | 2403.11808 | null |
2024-03-18 | LSKNet: A Foundation Lightweight Backbone for Remote Sensing | Yuxuan Li et.al. | 2403.11735 | null |
2024-03-18 | TTT-KD: Test-Time Training for 3D Semantic Segmentation through Knowledge Distillation from Foundation Models | Lisa Weijler et.al. | 2403.11691 | null |
2024-03-18 | Better (pseudo-)labels for semi-supervised instance segmentation | François Porcher et.al. | 2403.11675 | null |
2024-03-18 | Synthesizing multi-log grasp poses | Arvid Fälldin et.al. | 2403.11623 | null |
2024-03-18 | OurDB: Ouroboric Domain Bridging for Multi-Target Domain Adaptive Semantic Segmentation | Seungbeom Woo et.al. | 2403.11582 | null |
2024-03-18 | MISS: Memory-efficient Instance Segmentation Framework By Visual Inductive Priors Flow Propagation | Chih-Chung Hsu et.al. | 2403.11576 | null |
2024-03-18 | Augment Before Copy-Paste: Data and Memory Efficiency-Oriented Instance Segmentation Framework for Sport-scenes | Chih-Chung Hsu et.al. | 2403.11572 | null |
2024-03-18 | Circle Representation for Medical Instance Object Segmentation | Juming Xiong et.al. | 2403.11507 | link |
2024-03-18 | MCD: Diverse Large-Scale Multi-Campus Dataset for Robot Perception | Thien-Minh Nguyen et.al. | 2403.11496 | null |
2024-03-18 | Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting | Mingkui Tan et.al. | 2403.11491 | null |
2024-03-18 | ShapeFormer: Shape Prior Visible-to-Amodal Transformer-based Amodal Instance Segmentation | Minh Tran et.al. | 2403.11376 | null |
2024-03-14 | PosSAM: Panoptic Open-vocabulary Segment Anything | Vibashan VS et.al. | 2403.09620 | null |
2024-03-14 | WeakSurg: Weakly supervised surgical instrument segmentation using temporal equivariance and semantic continuity | Qiyuan Wang et.al. | 2403.09551 | null |
2024-03-14 | Annotation Free Semantic Segmentation with Vision Foundation Models | Soroush Seifi et.al. | 2403.09307 | null |
2024-03-14 | StainFuser: Controlling Diffusion for Faster Neural Style Transfer in Multi-Gigapixel Histology Images | Robert Jewsbury et.al. | 2403.09302 | link |
2024-03-14 | Customizing Segmentation Foundation Model via Prompt Learning for Instance Segmentation | Hyung-Il Kim et.al. | 2403.09199 | null |
2024-03-14 | When Semantic Segmentation Meets Frequency Aliasing | Linwei Chen et.al. | 2403.09065 | link |
2024-03-13 | CART: Caltech Aerial RGB-Thermal Dataset in the Wild | Connor Lee et.al. | 2403.08997 | link |
2024-03-13 | SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net | Helin Cao et.al. | 2403.08885 | null |
2024-03-13 | Segmentation of Knee Bones for Osteoarthritis Assessment: A Comparative Analysis of Supervised, Few-Shot, and Zero-Shot Learning Approaches | Yun Xin Teoh et.al. | 2403.08761 | null |
2024-03-13 | Real-time 3D semantic occupancy prediction for autonomous vehicles using memory-efficient sparse convolution | Samuel Sze et.al. | 2403.08748 | null |
2024-03-13 | Semantic Segmentation of Solar Radio Spikes at Low Frequencies | Pearse C. Murphy et.al. | 2403.08546 | null |
2024-03-13 | Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation | Zicheng Zhang et.al. | 2403.08426 | null |
2024-03-13 | LIX: Implicitly Infusing Spatial Geometric Prior Knowledge into Visual Semantic Segmentation for Autonomous Driving | Sicen Guo et.al. | 2403.08215 | null |
2024-03-13 | Multiscale Low-Frequency Memory Network for Improved Feature Extraction in Convolutional Neural Networks | Fuzhi Wu et.al. | 2403.08157 | link |
2024-03-12 | Mitigating the Impact of Attribute Editing on Face Recognition | Sudipta Banerjee et.al. | 2403.08092 | null |
2024-03-12 | Hunting Attributes: Context Prototype-Aware Learning for Weakly Supervised Semantic Segmentation | Feilong Tang et.al. | 2403.07630 | link |
2024-03-12 | PeLK: Parameter-efficient Large Kernel ConvNets with Peripheral Convolution | Honghao Chen et.al. | 2403.07589 | null |
2024-03-12 | Open-World Semantic Segmentation Including Class Similarity | Matteo Sodano et.al. | 2403.07532 | null |
2024-03-11 | Average Calibration Error: A Differentiable Loss for Improved Reliability in Image Segmentation | Theodore Barfoot et.al. | 2403.06759 | link |
2024-03-11 | Forest Inspection Dataset for Aerial Semantic Segmentation and Depth Estimation | Bianca-Cerasela-Zelia Blaga et.al. | 2403.06621 | link |
2024-03-11 | OMH: Structured Sparsity via Optimally Matched Hierarchy for Unsupervised Semantic Segmentation | Baran Ozaydin et.al. | 2403.06546 | null |
2024-03-11 | 3D Semantic Segmentation-Driven Representations for 3D Object Detection | Hayeon O et.al. | 2403.06501 | link |
2024-03-11 | Point Mamba: A Novel Point Cloud Backbone Based on State Space Model with Octree-Based Ordering Strategy | Jiuming Liu et.al. | 2403.06467 | link |
2024-03-11 | Towards the Uncharted: Density-Descending Feature Perturbation for Semi-supervised Semantic Segmentation | Xiaoyang Wang et.al. | 2403.06462 | null |
2024-03-11 | Refining Segmentation On-the-Fly: An Interactive Framework for Point Cloud Semantic Segmentation | Peng Zhang et.al. | 2403.06401 | null |
2024-03-10 | Style Blind Domain Generalized Semantic Segmentation via Covariance Alignment and Semantic Consistence Contrastive Learning | Woo-Jin Ahn et.al. | 2403.06122 | link |
2024-03-09 | Mask-Enhanced Segment Anything Model for Tumor Lesion Semantic Segmentation | Hairong Shi et.al. | 2403.05912 | null |
2024-03-09 | Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration | Jingyun Xue et.al. | 2403.05906 | null |
2024-03-08 | Attention-guided Feature Distillation for Semantic Segmentation | Amir M. Mansourian et.al. | 2403.05451 | link |
2024-03-08 | Generalized Correspondence Matching via Flexible Hierarchical Refinement and Patch Descriptor Distillation | Yu Han et.al. | 2403.05388 | null |
2024-03-08 | Frequency-Adaptive Dilated Convolution for Semantic Segmentation | Linwei Chen et.al. | 2403.05369 | link |
2024-03-08 | Embedded Deployment of Semantic Segmentation in Medicine through Low-Resolution Inputs | Erik Ostrowski et.al. | 2403.05340 | null |
2024-03-08 | LVIC: Multi-modality segmentation by Lifting Visual Info as Cue | Zichao Dong et.al. | 2403.05159 | null |
2024-03-07 | SAM-PD: How Far Can SAM Take Us in Tracking and Segmenting Anything in Videos by Prompt Denoising | Tao Zhou et.al. | 2403.04194 | link |
2024-03-06 | ECAP: Extensive Cut-and-Paste Augmentation for Unsupervised Domain Adaptive Semantic Segmentation | Erik Brorsson et.al. | 2403.03854 | link |
2024-03-06 | Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision | Yajie Liu et.al. | 2403.03707 | null |
2024-03-06 | Causal Prototype-inspired Contrast Adaptation for Unsupervised Domain Adaptive Semantic Segmentation of High-resolution Remote Sensing Imagery | Jingru Zhu et.al. | 2403.03704 | null |
2024-03-06 | GSNeRF: Generalizable Semantic Neural Radiance Fields with Enhanced 3D Scene Understanding | Zi-Ting Chou et.al. | 2403.03608 | null |
2024-03-06 | Multi-task Learning for Real-time Autonomous Driving Leveraging Task-adaptive Attention Generator | Wonhyeok Choi et.al. | 2403.03468 | null |
2024-03-05 | CenterDisks: Real-time instance segmentation with disk covering | Katia Jodogne-Del Litto et.al. | 2403.03296 | link |
2024-03-05 | Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection | Mohamed Afifi et.al. | 2403.03111 | null |
2024-03-05 | ActiveAD: Planning-Oriented Active Learning for End-to-End Autonomous Driving | Han Lu et.al. | 2403.02877 | null |
2024-03-05 | DDF: A Novel Dual-Domain Image Fusion Strategy for Remote Sensing Image Semantic Segmentation with Unsupervised Domain Adaptation | Lingyan Ran et.al. | 2403.02784 | null |
2024-03-05 | Learning without Exact Guidance: Updating Large-scale High-resolution Land Cover Maps from Low-resolution Historical Labels | Zhuohong Li et.al. | 2403.02746 | null |
2024-03-05 | FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View | Jiawei Hou et.al. | 2403.02710 | null |
2024-03-05 | Deep Common Feature Mining for Efficient Video Semantic Segmentation | Yaoyan Zheng et.al. | 2403.02689 | null |
2024-03-04 | Self-Supervised Facial Representation Learning with Facial Region Awareness | Zheng Gao et.al. | 2403.02138 | null |
2024-03-04 | Semi-Supervised Semantic Segmentation Based on Pseudo-Labels: A Survey | Lingyan Ran et.al. | 2403.01909 | null |
2024-03-04 | Map-aided annotation for pole base detection | Benjamin Missaoui et.al. | 2403.01868 | null |
2024-03-04 | AllSpark: Reborn Labeled Features from Unlabeled in Transformer for Semi-Supervised Semantic Segmentation | Haonan Wang et.al. | 2403.01818 | link |
2024-03-02 | Benchmarking Segmentation Models with Mask-Preserved Attribute Editing | Zijin Yin et.al. | 2403.01231 | link |
2024-03-02 | Boosting Box-supervised Instance Segmentation with Pseudo Depth | Xinyi Yu et.al. | 2403.01214 | null |
2024-03-02 | Auxiliary Tasks Enhanced Dual-affinity Learning for Weakly Supervised Semantic Segmentation | Lian Xu et.al. | 2403.01156 | null |
2024-03-01 | Rethinking Few-shot 3D Point Cloud Semantic Segmentation | Zhaochong An et.al. | 2403.00592 | link |
2024-03-01 | Small, Versatile and Mighty: A Range-View Perception Framework | Qiang Meng et.al. | 2403.00325 | null |
2024-03-01 | YOLO-MED : Multi-Task Interaction Network for Biomedical Images | Suizhi Huang et.al. | 2403.00245 | null |
2024-02-29 | FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything | Safouane El Ghazouali et.al. | 2403.00175 | link |
2024-02-29 | Leveraging AI Predicted and Expert Revised Annotations in Interactive Segmentation: Continual Tuning or Full Training? | Tiezheng Zhang et.al. | 2402.19423 | null |
2024-03-01 | PEM: Prototype-based Efficient MaskFormer for Image Segmentation | Niccolò Cavagnero et.al. | 2402.19422 | link |
2024-02-29 | RSAM-Seg: A SAM-based Approach with Prior Knowledge Integration for Remote Sensing Image Semantic Segmentation | Jie Zhang et.al. | 2402.19004 | null |
2024-02-28 | Spatial Coherence Loss for Salient and Camouflaged Object Detection and Beyond | Ziyun Yang et.al. | 2402.18698 | null |
2024-02-29 | Separate and Conquer: Decoupling Co-occurrence via Decomposition and Representation for Weakly Supervised Semantic Segmentation | Zhiwei Yang et.al. | 2402.18467 | link |
2024-02-29 | A Modular System for Enhanced Robustness of Multimedia Understanding Networks via Deep Parametric Estimation | Francesco Barbato et.al. | 2402.18402 | null |
2024-02-28 | Enhancing Roadway Safety: LiDAR-based Tree Clearance Analysis | Miriam Louise Carnot et.al. | 2402.18309 | null |
2024-02-28 | Feature Denoising For Low-Light Instance Segmentation Using Weighted Non-Local Blocks | Joanne Lin et.al. | 2402.18307 | null |
2024-02-28 | Self-Supervised Learning in Electron Microscopy: Towards a Foundation Model for Advanced Image Analysis | Bashir Kazimi et.al. | 2402.18286 | null |
2024-02-28 | PRCL: Probabilistic Representation Contrastive Learning for Semi-Supervised Semantic Segmentation | Haoyu Xie et.al. | 2402.18117 | null |
2024-02-28 | Spannotation: Enhancing Semantic Segmentation for Autonomous Navigation with Efficient Image Annotation | Samuel O. Folorunsho et.al. | 2402.18084 | link |
2024-02-27 | Weakly Supervised Co-training with Swapping Assignments for Semantic Segmentation | Xinyu Yang et.al. | 2402.17891 | link |
2024-02-27 | Mitigating Distributional Shift in Semantic Segmentation via Uncertainty Estimation from Unlabelled Data | David S. W. Williams et.al. | 2402.17653 | null |
2024-02-27 | Masked Gamma-SSL: Learning Uncertainty Estimation via Masked Image Modeling | David S. W. Williams et.al. | 2402.17622 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-05-30 | WebUOT-1M: Advancing Deep Underwater Object Tracking with A Million-Scale Benchmark | Chunhui Zhang et.al. | 2405.19818 | null |
2024-05-30 | FaceLift: Semi-supervised 3D Facial Landmark Localization | David Ferman et.al. | 2405.19646 | null |
2024-05-29 | DGD: Dynamic 3D Gaussians Distillation | Isaac Labe et.al. | 2405.19321 | null |
2024-05-28 | Track Initialization and Re-Identification for~3D Multi-View Multi-Object Tracking | Linh Van Ma et.al. | 2405.18606 | link |
2024-05-28 | Reliable Object Tracking by Multimodal Hybrid Feature Extraction and Transformer-Based Fusion | Hongze Sun et.al. | 2405.17903 | null |
2024-05-28 | Towards a Generalist and Blind RGB-X Tracker | Yuedong Tan et.al. | 2405.17773 | null |
2024-05-29 | BaboonLand Dataset: Tracking Primates in the Wild and Automating Behaviour Recognition from Drone Videos | Isla Duporge et.al. | 2405.17698 | null |
2024-05-27 | Tracking Small Birds by Detection Candidate Region Filtering and Detection History-aware Association | Tingwei Liu et.al. | 2405.17323 | null |
2024-05-24 | ETTrack: Enhanced Temporal Motion Predictor for Multi-Object Tracking | Xudong Han et.al. | 2405.15755 | null |
2024-05-24 | Trackastra: Transformer-based cell tracking for live-cell microscopy | Benjamin Gallusser et.al. | 2405.15700 | link |
2024-05-24 | An Approximate Dynamic Programming Framework for Occlusion-Robust Multi-Object Tracking | Pratyusha Musunuru et.al. | 2405.15137 | null |
2024-05-23 | Awesome Multi-modal Object Tracking | Chunhui Zhang et.al. | 2405.14200 | null |
2024-05-23 | Enhanced Object Tracking by Self-Supervised Auxiliary Depth Estimation Learning | Zhenyu Wei et.al. | 2405.14195 | null |
2024-05-23 | PuTR: A Pure Transformer for Decoupled and Online Multi-Object Tracking | Chongwei Liu et.al. | 2405.14119 | null |
2024-05-22 | Multi Player Tracking in Ice Hockey with Homographic Projections | Harish Prakash et.al. | 2405.13397 | null |
2024-05-20 | DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM | Xuchen Li et.al. | 2405.12139 | null |
2024-05-19 | Track Anything Rapter(TAR) | Tharun V. Puthanveettil et.al. | 2405.11655 | link |
2024-05-19 | RobMOT: Robust 3D Multi-Object Tracking by Observational Noise and State Estimation Drift Mitigation on LiDAR PointCloud | Mohamed Nagy et.al. | 2405.11536 | null |
2024-05-18 | City-Scale Multi-Camera Vehicle Tracking System with Improved Self-Supervised Camera Link Model | Yuqiang Lin et.al. | 2405.11345 | null |
2024-05-17 | Air Signing and Privacy-Preserving Signature Verification for Digital Documents | P. Sarveswarasarma et.al. | 2405.10868 | null |
2024-05-16 | A Novel Bounding Box Regression Method for Single Object Tracking | Omar Abdelaziz et.al. | 2405.10444 | null |
2024-05-16 | Beyond Traditional Single Object Tracking: A Survey | Omar Abdelaziz et.al. | 2405.10439 | null |
2024-05-16 | Spatial Cognition: a Wave Hypothesis | Robert Worden et.al. | 2405.10112 | null |
2024-05-14 | Learning Correspondence for Deformable Objects | Priya Sundaresan et.al. | 2405.08996 | null |
2024-05-14 | ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association | Shuxiao Ding et.al. | 2405.08909 | link |
2024-05-12 | MAML MOT: Multiple Object Tracking based on Meta-Learning | Jiayi Chen et.al. | 2405.07272 | null |
2024-05-16 | Common Corruptions for Enhancing and Evaluating Robustness in Air-to-Air Visual Object Detection | Anastasios Arsenos et.al. | 2405.06765 | null |
2024-05-16 | Ensuring UAV Safety: A Vision-only and Real-time Framework for Collision Avoidance Through Object Detection, Tracking, and Distance Estimation | Vasileios Karampinis et.al. | 2405.06749 | null |
2024-05-10 | Multi-Object Tracking in the Dark | Xinzhe Wang et.al. | 2405.06600 | link |
2024-05-09 | Outlier-robust Kalman Filtering through Generalised Bayes | Gerardo Duran-Martin et.al. | 2405.05646 | link |
2024-05-08 | MOTLEE: Collaborative Multi-Object Tracking Using Temporal Consistency for Neighboring Robot Frame Alignment | Mason B. Peterson et.al. | 2405.05210 | link |
2024-05-08 | TENet: Targetness Entanglement Incorporating with Multi-Scale Pooling and Mutually-Guided Fusion for RGB-E Object Tracking | Pengcheng Shao et.al. | 2405.05004 | link |
2024-05-07 | DriveWorld: 4D Pre-trained Scene Understanding via World Models for Autonomous Driving | Chen Min et.al. | 2405.04390 | null |
2024-05-07 | Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map | Yuxuan Xia et.al. | 2405.04290 | null |
2024-05-06 | Collecting Consistently High Quality Object Tracks with Minimal Human Involvement by Using Self-Supervised Learning to Detect Tracker Errors | Samreen Anjum et.al. | 2405.03643 | null |
2024-05-03 | Learning Robot Soccer from Egocentric Vision with Deep Reinforcement Learning | Dhruva Tirumala et.al. | 2405.02425 | null |
2024-05-03 | DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos | Wen-Hsuan Chu et.al. | 2405.02280 | link |
2024-05-02 | Tracking and classifying objects with DAS data along railway | Simon L. B. Fredriksen et.al. | 2405.01140 | null |
2024-04-29 | Innovative Integration of Visual Foundation Model with a Robotic Arm on a Mobile Platform | Shimian Zhang et.al. | 2404.18720 | null |
2024-04-27 | 3D Extended Object Tracking by Fusing Roadside Sparse Radar Point Clouds and Pixel Keypoints | Jiayin Deng et.al. | 2404.17903 | link |
2024-04-22 | 360VOTS: Visual Object Tracking and Segmentation in Omnidirectional Videos | Yinzhe Xu et.al. | 2404.13953 | null |
2024-04-22 | TeamTrack: A Dataset for Multi-Sport Multi-Object Tracking in Full-pitch Videos | Atom Scott et.al. | 2404.13868 | null |
2024-04-19 | A comparison between single-stage and two-stage 3D tracking algorithms for greenhouse robotics | David Rapado-Rincon et.al. | 2404.12963 | null |
2024-04-18 | Inverse Neural Rendering for Explainable Multi-Object Tracking | Julian Ost et.al. | 2404.12359 | null |
2024-04-24 | On Target Detection in the Presence of Clutter in Joint Communication and Sensing Cellular Networks | Julia Vinogradova et.al. | 2404.12133 | null |
2024-04-18 | MLS-Track: Multilevel Semantic Interaction in RMOT | Zeliang Ma et.al. | 2404.12031 | null |
2024-04-18 | KnotResolver: Tracking self-intersecting filaments in microscopy using directed graphs | Dhruv Khatri et.al. | 2404.12029 | link |
2024-04-17 | How to deal with glare for improved perception of Autonomous Vehicles | Muhammad Z. Alam et.al. | 2404.10992 | null |
2024-04-12 | Into the Fog: Evaluating Multiple Object Tracking Robustness | Nadezda Kirillova et.al. | 2404.10534 | link |
2024-04-15 | 3D Face Tracking from 2D Video through Iterative Dense UV to Image Flow | Felix Taubner et.al. | 2404.09819 | null |
2024-04-12 | IDD-X: A Multi-View Dataset for Ego-relative Important Object Localization and Explanation in Dense and Unstructured Traffic | Chirag Parikh et.al. | 2404.08561 | null |
2024-04-11 | Gaga: Group Any Gaussians via 3D-aware Memory Bank | Weijie Lyu et.al. | 2404.07977 | null |
2024-04-11 | SFSORT: Scene Features-based Simple Online Real-Time Tracker | M. M. Morsali et.al. | 2404.07553 | link |
2024-04-11 | PillarTrack: Redesigning Pillar-based Transformer Network for Single Object Tracking on Point Clouds | Weisheng Xu et.al. | 2404.07495 | link |
2024-04-11 | Trashbusters: Deep Learning Approach for Litter Detection and Tracking | Kashish Jain et.al. | 2404.07467 | null |
2024-04-09 | LRR: Language-Driven Resamplable Continuous Representation against Adversarial Tracking Attacks | Jianlang Chen et.al. | 2404.06247 | link |
2024-04-08 | DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker | Jiapeng Wu et.al. | 2404.05518 | link |
2024-04-08 | Self-Supervised Multi-Object Tracking with Path Consistency | Zijia Lu et.al. | 2404.05136 | link |
2024-04-07 | Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind | Chiara Plizzari et.al. | 2404.05072 | null |
2024-04-03 | Ego-Motion Aware Target Prediction Module for Robust Multi-Object Tracking | Navid Mahdian et.al. | 2404.03110 | link |
2024-04-03 | Representation Alignment Contrastive Regularization for Multi-Object Tracking | Shujie Chen et.al. | 2404.02562 | link |
2024-03-29 | Bayesian Nonparametrics: An Alternative to Deep Learning | Bahman Moraffah et.al. | 2404.00085 | null |
2024-03-29 | MTMMC: A Large-Scale Real-World Multi-Modal Camera Tracking Benchmark | Sanghyun Woo et.al. | 2403.20225 | null |
2024-03-29 | SceneTracker: Long-term Scene Flow Estimation Network | Bo Wang et.al. | 2403.19924 | null |
2024-03-27 | Enhancing Multiple Object Tracking Accuracy via Quantum Annealing | Yasuyuki Ihara et.al. | 2403.18908 | null |
2024-03-27 | TAFormer: A Unified Target-Aware Transformer for Video and Motion Joint Prediction in Aerial Scenes | Liangyu Xu et.al. | 2403.18238 | null |
2024-03-27 | Middle Fusion and Multi-Stage, Multi-Form Prompts for Robust RGB-T Tracking | Qiming Wang et.al. | 2403.18193 | null |
2024-03-26 | OmniVid: A Generative Framework for Universal Video Understanding | Junke Wang et.al. | 2403.17935 | link |
2024-03-26 | Exploring Dynamic Transformer for Efficient Object Tracking | Jiawen Zhu et.al. | 2403.17651 | null |
2024-03-25 | Multiple Object Tracking as ID Prediction | Ruopeng Gao et.al. | 2403.16848 | link |
2024-03-25 | From Two Stream to One Stream: Efficient RGB-T Tracking via Mutual Prompt Learning and Knowledge Distillation | Yang Luo et.al. | 2403.16834 | null |
2024-03-29 | Elysium: Exploring Object-level Perception in Videos via MLLM | Han Wang et.al. | 2403.16558 | link |
2024-03-25 | Spike-NeRF: Neural Radiance Field Based On Spike Camera | Yijia Guo et.al. | 2403.16410 | null |
2024-03-28 | SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking | Xiaojun Hou et.al. | 2403.16002 | link |
2024-03-23 | Spatio-Temporal Bi-directional Cross-frame Memory for Distractor Filtering Point Cloud Single Object Tracking | Shaoyu Sun et.al. | 2403.15831 | null |
2024-03-23 | PNAS-MOT: Multi-Modal Object Tracking with Pareto Neural Architecture Search | Chensheng Peng et.al. | 2403.15712 | link |
2024-03-22 | CR3DT: Camera-RADAR Fusion for 3D Detection and Tracking | Nicolas Baumann et.al. | 2403.15313 | null |
2024-03-22 | Reasoning-Enhanced Object-Centric Learning for Videos | Jian Li et.al. | 2403.15245 | null |
2024-03-20 | Fast-Poly: A Fast Polyhedral Framework For 3D Multi-Object Tracking | Xiaoyu Li et.al. | 2403.13443 | link |
2024-03-19 | Lifting Multi-View Detection and Tracking to the Bird's Eye View | Torben Teepe et.al. | 2403.12573 | link |
2024-03-18 | Pedestrian Tracking with Monocular Camera using Unconstrained 3D Motion Model | Jan Krejčí et.al. | 2403.11978 | null |
2024-03-17 | NetTrack: Tracking Highly Dynamic Objects with a Net | Guangze Zheng et.al. | 2403.11186 | null |
2024-03-16 | View-Centric Multi-Object Tracking with Homographic Matching in Moving UAV | Deyi Ji et.al. | 2403.10830 | null |
2024-03-16 | Exploring Learning-based Motion Models in Multi-Object Tracking | Hsiang-Wei Huang et.al. | 2403.10826 | null |
2024-03-15 | NeuFlow: Real-time, High-accuracy Optical Flow Estimation on Robots Using Edge Devices | Zhiyong Zhang et.al. | 2403.10425 | link |
2024-03-14 | OneTracker: Unifying Visual Object Tracking with Foundation Models and Efficient Tuning | Lingyi Hong et.al. | 2403.09634 | null |
2024-03-13 | Object Permanence Filter for Robust Tracking with Interactive Robots | Shaoting Peng et.al. | 2403.08231 | null |
2024-03-12 | Learning Data Association for Multi-Object Tracking using Only Coordinates | Mehdi Miah et.al. | 2403.08018 | null |
2024-03-12 | A Study on Centralised and Decentralised Swarm Robotics Architecture for Part Delivery System | Angelos Dimakos et.al. | 2403.07635 | null |
2024-03-12 | LiDAR Point Cloud-based Multiple Vehicle Tracking with Probabilistic Measurement-Region Association | Guanhua Ding et.al. | 2403.06423 | null |
2024-03-09 | SSF-Net: Spatial-Spectral Fusion Network with Spectral Angle Awareness for Hyperspectral Object Tracking | Hanzheng Wang et.al. | 2403.05852 | null |
2024-03-09 | Long-term Frame-Event Visual Tracking: Benchmark Dataset and Baseline | Xiao Wang et.al. | 2403.05839 | link |
2024-03-11 | Beyond MOT: Semantic Multi-Object Tracking | Yunhao Li et.al. | 2403.05021 | null |
2024-03-07 | Delving into the Trajectory Long-tail Distribution for Muti-object Tracking | Sijia Chen et.al. | 2403.04700 | link |
2024-03-07 | Towards learning-based planning:The nuPlan benchmark for real-world autonomous driving | Napat Karnchanachari et.al. | 2403.04133 | null |
2024-03-06 | Multi-Object Tracking with Camera-LiDAR Fusion for Autonomous Driving | Riccardo Pieroni et.al. | 2403.04112 | null |
2024-03-06 | VastTrack: Vast Category Visual Object Tracking | Liang Peng et.al. | 2403.03493 | link |
2024-03-05 | DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking | Cheng Huang et.al. | 2403.02767 | null |
2024-03-04 | DiffMOT: A Real-time Diffusion-based Multiple Object Tracker with Non-linear Prediction | Weiyi Lv et.al. | 2403.02075 | null |
2024-03-04 | Integrating Efficient Optimal Transport and Functional Maps For Unsupervised Shape Correspondence Learning | Tung Le et.al. | 2403.01781 | null |
2024-03-01 | Joint Spatial-Temporal Calibration for Camera and Global Pose Sensor | Junlin Song et.al. | 2403.00976 | null |
2024-02-28 | Estimation of railway vehicle response for track geometry evaluation using branch Fourier neural operator | Qingjing Wang et.al. | 2402.18366 | null |
2024-02-28 | EchoTrack: Auditory Referring Multi-Object Tracking for Autonomous Driving | Jiacheng Lin et.al. | 2402.18302 | link |
2024-02-28 | Enhancing Tracking Robustness with Auxiliary Adversarial Defense Networks | Zhewei Wu et.al. | 2402.17976 | null |
2024-02-27 | SWTrack: Multiple Hypothesis Sliding Window 3D Multi-Object Tracking | Sandro Papais et.al. | 2402.17892 | null |
2024-02-27 | In Defense and Revival of Bayesian Filtering for Thermal Infrared Object Tracking | Peng Gao et.al. | 2402.17098 | null |
2024-02-26 | Searching a Lightweight Network Architecture for Thermal Infrared Pedestrian Tracking | Peng Gao et.al. | 2402.16570 | null |
2024-02-26 | SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud Tracking | Yu Lin et.al. | 2402.16249 | null |
2024-02-26 | Real-Time Vehicle Detection and Urban Traffic Behavior Analysis Based on UAV Traffic Videos on Mobile Devices | Yuan Zhu et.al. | 2402.16246 | null |
2024-02-24 | Multi-Object Tracking by Hierarchical Visual Representations | Jinkun Cao et.al. | 2402.15895 | null |
2024-02-24 | Detection Is Tracking: Point Cloud Multi-Sweep Deep Learning Models Revisited | Lingji Chen et.al. | 2402.15756 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-05-30 | From Forest to Zoo: Great Ape Behavior Recognition with ChimpBehave | Michael Fuchs et.al. | 2405.20025 | null |
2024-05-30 | Multimodal Cross-Domain Few-Shot Learning for Egocentric Action Recognition | Masashi Hatano et.al. | 2405.19917 | null |
2024-05-30 | EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos | Ryo Fujii et.al. | 2405.19644 | null |
2024-05-30 | SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation | Junjie Zhang et.al. | 2405.19586 | null |
2024-05-29 | Matrix Manifold Neural Networks++ | Xuan Son Nguyen et.al. | 2405.19206 | null |
2024-05-29 | Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation | Sabrina Cynthia Triess et.al. | 2405.19173 | null |
2024-05-28 | Flow-Assisted Motion Learning Network for Weakly-Supervised Group Activity Recognition | Muhammad Adi Nugroho et.al. | 2405.18012 | null |
2024-05-30 | Benchmarking Skeleton-based Motion Encoder Models for Clinical Applications: Estimating Parkinson's Disease Severity in Walking Sequences | Vida Adeli et.al. | 2405.17817 | link |
2024-05-28 | Hierarchical Action Recognition: A Contrastive Video-Language Approach with Hierarchical Interactions | Rui Zhang et.al. | 2405.17729 | null |
2024-05-28 | EgoNCE++: Do Egocentric Video-Language Models Really Understand Hand-Object Interactions? | Boshen Xu et.al. | 2405.17719 | link |
2024-05-27 | Advancements in Tactile Hand Gesture Recognition for Enhanced Human-Machine Interaction | Chiara Fumelli et.al. | 2405.17038 | null |
2024-05-27 | A Cross-Dataset Study for Text-based 3D Human Motion Retrieval | Léore Bensabath et.al. | 2405.16909 | null |
2024-05-26 | Flow Snapshot Neurons in Action: Deep Neural Networks Generalize to Biological Motion Perception | Shuangpeng Han et.al. | 2405.16493 | null |
2024-05-25 | Application of Artificial Intelligence in Hand Gesture Recognition with Virtual Reality: Survey and Analysis of Hand Gesture Hardware Selection | Jindi Wang et.al. | 2405.16264 | null |
2024-05-22 | From CNNs to Transformers in Multimodal Human Action Recognition: A Survey | Muhammad Bilal Shaikh et.al. | 2405.15813 | null |
2024-05-24 | V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM | Abdur Rahman et.al. | 2405.15341 | null |
2024-05-23 | Enhanced Spatiotemporal Prediction Using Physical-guided And Frequency-enhanced Recurrent Neural Networks | Xuanle Zhao et.al. | 2405.14504 | null |
2024-05-23 | SpGesture: Source-Free Domain-adaptive sEMG-based Gesture Recognition with Jaccard Attentive Spiking Neural Network | Weiyu Guo et.al. | 2405.14398 | null |
2024-05-23 | MAMBA4D: Efficient Long-Sequence Point Cloud Video Understanding with Disentangled Spatial-Temporal State Space Models | Jiuming Liu et.al. | 2405.14338 | null |
2024-05-22 | Counterfactual Gradients-based Quantification of Prediction Trust in Neural Networks | Mohit Prabhushankar et.al. | 2405.13758 | null |
2024-05-21 | Identity-free Artificial Emotional Intelligence via Micro-Gesture Understanding | Rong Gao et.al. | 2405.13206 | null |
2024-05-22 | Building Temporal Kernels with Orthogonal Polynomials | Yan Ru Pei et.al. | 2405.12179 | link |
2024-05-18 | GestFormer: Multiscale Wavelet Pooling Transformer Network for Dynamic Hand Gesture Recognition | Mallika Garg et.al. | 2405.11180 | link |
2024-05-17 | Air Signing and Privacy-Preserving Signature Verification for Digital Documents | P. Sarveswarasarma et.al. | 2405.10868 | null |
2024-05-17 | MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning Chains | Zhaohuan Zhan et.al. | 2405.10620 | null |
2024-05-06 | MEET: Mixture of Experts Extra Tree-Based sEMG Hand Gesture Identification | Naveen Gehlot et.al. | 2405.09562 | null |
2024-05-14 | Wearable Sensor-Based Few-Shot Continual Learning on Hand Gestures for Motor-Impaired Individuals via Latent Embedding Exploitation | Riyad Bin Rafiq et.al. | 2405.08969 | link |
2024-05-14 | The impact of Compositionality in Zero-shot Multi-label action recognition for Object-based tasks | Carmela Calabrese et.al. | 2405.08695 | null |
2024-05-15 | POWQMIX: Weighted Value Factorization with Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning | Chang Huang et.al. | 2405.08036 | null |
2024-05-13 | Coarse or Fine? Recognising Action End States without Labels | Davide Moltisanti et.al. | 2405.07723 | link |
2024-05-11 | PRENet: A Plane-Fit Redundancy Encoding Point Cloud Sequence Network for Real-Time 3D Action Recognition | Shenglin He et.al. | 2405.06929 | null |
2024-05-10 | CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras | James Tang et.al. | 2405.06845 | link |
2024-05-09 | A Survey on Backbones for Deep Video Action Recognition | Zixuan Tang et.al. | 2405.05584 | null |
2024-05-06 | OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs | Jiahao Nick Li et.al. | 2405.03901 | null |
2024-05-05 | JOSENet: A Joint Stream Embedding Network for Violence Detection in Surveillance Videos | Pietro Nardelli et.al. | 2405.02961 | null |
2024-05-03 | On the Utility of External Agent Intention Predictor for Human-AI Coordination | Chenxu Wang et.al. | 2405.02229 | null |
2024-05-11 | MVP-Shot: Multi-Velocity Progressive-Alignment Framework for Few-Shot Action Recognition | Hongyu Qu et.al. | 2405.02077 | null |
2024-05-03 | Enhancing Micro Gesture Recognition for Emotion Understanding via Context-aware Visual-Text Contrastive Learning | Deng Li et.al. | 2405.01885 | link |
2024-05-02 | Multi-view Action Recognition via Directed Gromov-Wasserstein Discrepancy | Hoang-Quan Nguyen et.al. | 2405.01337 | null |
2024-05-07 | Towards Inclusive Face Recognition Through Synthetic Ethnicity Alteration | Praveen Kumar Chandaliya et.al. | 2405.01273 | null |
2024-04-30 | One-Stage Open-Vocabulary Temporal Action Detection Leveraging Temporal Multi-scale and Action Label Features | Trung Thanh Nguyen et.al. | 2404.19542 | link |
2024-04-30 | Cross-Block Fine-Grained Semantic Cascade for Skeleton-Based Sports Action Recognition | Zhendong Liu et.al. | 2404.19383 | null |
2024-04-28 | Enhancing Action Recognition from Low-Quality Skeleton Data via Part-Level Knowledge Distillation | Cuiwei Liu et.al. | 2404.18206 | null |
2024-04-26 | SDFD: Building a Versatile Synthetic Face Image Dataset with Diverse Attributes | Georgia Baltsou et.al. | 2404.17255 | null |
2024-04-25 | Learning Discriminative Spatio-temporal Representations for Semi-supervised Action Recognition | Yu Wang et.al. | 2404.16416 | null |
2024-04-25 | An Improved Graph Pooling Network for Skeleton-Based Action Recognition | Cong Wu et.al. | 2404.16359 | null |
2024-04-24 | Unimodal and Multimodal Sensor Fusion for Wearable Activity Recognition | Hymalai Bello et.al. | 2404.16005 | null |
2024-04-24 | 3D Face Morphing Attack Generation using Non-Rigid Registration | Jag Mohan Singh et.al. | 2404.15765 | null |
2024-04-25 | HDBN: A Novel Hybrid Dual-branch Network for Robust Skeleton-based Action Recognition | Jinfu Liu et.al. | 2404.15719 | link |
2024-04-23 | Combating Missing Modalities in Egocentric Videos at Test Time | Merey Ramazanova et.al. | 2404.15161 | null |
2024-04-23 | G3R: Generating Rich and Fine-grained mmWave Radar Data from 2D Videos for Generalized Gesture Recognition | Kaikai Deng et.al. | 2404.14934 | null |
2024-04-23 | Driver Activity Classification Using Generalizable Representations from Vision-Language Models | Ross Greer et.al. | 2404.14906 | null |
2024-04-23 | DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition | Haozhe Cheng et.al. | 2404.14890 | null |
2024-04-22 | 1st Place Solution to the 1st SkatingVerse Challenge | Tao Sun et.al. | 2404.14032 | null |
2024-04-22 | CoFInAl: Enhancing Action Quality Assessment with Coarse-to-Fine Instruction Alignment | Kanglei Zhou et.al. | 2404.13999 | link |
2024-04-21 | Attack on Scene Flow using Point Clouds | Haniyeh Ehsani Oskouie et.al. | 2404.13621 | null |
2024-04-20 | STAT: Towards Generalizable Temporal Action Localization | Yangcen Liu et.al. | 2404.13311 | null |
2024-04-19 | Ring-a-Pose: A Ring for Continuous Hand Pose Tracking | Tianhong Catherine Yu et.al. | 2404.12980 | null |
2024-04-19 | VoxAtnNet: A 3D Point Clouds Convolutional Neural Network for Generalizable Face Presentation Attack Detection | Raghavendra Ramachandra et.al. | 2404.12680 | null |
2024-04-18 | DeepLocalization: Using change point detection for Temporal Action Localization | Mohammed Shaiqur Rahman et.al. | 2404.12258 | null |
2024-04-18 | Aligning Actions and Walking to LLM-Generated Textual Descriptions | Radu Chivereanu et.al. | 2404.12192 | link |
2024-04-18 | Simultaneous Detection and Interaction Reasoning for Object-Centric Action Recognition | Xunsong Li et.al. | 2404.11903 | null |
2024-04-18 | sEMG-based Fine-grained Gesture Recognition via Improved LightGBM Model | Xiupeng Qiao et.al. | 2404.11861 | null |
2024-04-17 | VG4D: Vision-Language Model Goes 4D Video Recognition | Zhichao Deng et.al. | 2404.11605 | link |
2024-04-17 | A Data-Driven Representation for Sign Language Production | Harry Walsh et.al. | 2404.11499 | link |
2024-04-17 | Lower Limb Movements Recognition Based on Feature Recursive Elimination and Backpropagation Neural Network | Yongkai Ma et.al. | 2404.11383 | null |
2024-04-17 | Revisiting Noise Resilience Strategies in Gesture Recognition: Short-Term Enhancement in Surface Electromyographic Signal Analysis | Weiyu Guo et.al. | 2404.11213 | null |
2024-04-17 | Kathakali Hand Gesture Recognition With Minimal Data | Kavitha Raju et.al. | 2404.11205 | null |
2024-04-16 | HumMUSS: Human Motion Understanding using State Space Models | Arnab Kumar Mondal et.al. | 2404.10880 | null |
2024-04-17 | Learning to Score Sign Language with Two-stage Method | Hongli Wen et.al. | 2404.10383 | null |
2024-04-16 | MK-SGN: A Spiking Graph Convolutional Network with Multimodal Fusion and Knowledge Distillation for Skeleton-based Action Recognition | Naichuan Zheng et.al. | 2404.10210 | null |
2024-04-15 | Design and Analysis of Efficient Attention in Transformers for Social Group Activity Recognition | Masato Tamura et.al. | 2404.09964 | null |
2024-04-15 | A Diffusion-based Data Generator for Training Object Recognition Models in Ultra-Range Distance | Eran Bamani et.al. | 2404.09846 | null |
2024-04-15 | Leveraging Temporal Contextualization for Video Action Recognition | Minji Kim et.al. | 2404.09490 | null |
2024-04-14 | In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition | Wiktor Mucha et.al. | 2404.09308 | null |
2024-04-13 | Exploring Explainability in Video Action Recognition | Avinab Saha et.al. | 2404.09067 | null |
2024-04-12 | MSSTNet: A Multi-Scale Spatio-Temporal CNN-Transformer Network for Dynamic Facial Expression Recognition | Linhuang Wang et.al. | 2404.08433 | null |
2024-04-11 | Graph Integrated Language Transformers for Next Action Prediction in Complex Phone Calls | Amin Hosseiny Marani et.al. | 2404.08155 | null |
2024-04-11 | Simba: Mamba augmented U-ShiftGCN for Skeletal Action Recognition in Videos | Soumyabrata Chaudhuri et.al. | 2404.07645 | null |
2024-04-15 | Fine-Grained Side Information Guided Dual-Prompts for Zero-Shot Skeleton Action Recognition | Yang Chen et.al. | 2404.07487 | null |
2024-04-10 | O-TALC: Steps Towards Combating Oversegmentation within Online Action Segmentation | Matthew Kent Myers et.al. | 2404.06894 | null |
2024-04-10 | An Animation-based Augmentation Approach for Action Recognition from Discontinuous Video | Xingyu Song et.al. | 2404.06741 | null |
2024-04-07 | X-VARS: Introducing Explainability in Football Refereeing with Multi-Modal Large Language Model | Jan Held et.al. | 2404.06332 | null |
2024-04-10 | Algorithms for Caching and MTS with reduced number of predictions | Karim Abdel Sadek et.al. | 2404.06280 | null |
2024-04-09 | ActNetFormer: Transformer-ResNet Hybrid Method for Semi-Supervised Action Recognition in Videos | Sharana Dharshikgan Suresh Dass et.al. | 2404.06243 | link |
2024-04-08 | Localizing Moments of Actions in Untrimmed Videos of Infants with Autism Spectrum Disorder | Halil Ismail Helvaci et.al. | 2404.05849 | null |
2024-04-09 | TIM: A Time Interval Machine for Audio-Visual Action Recognition | Jacob Chalk et.al. | 2404.05559 | link |
2024-04-11 | Test-Time Zero-Shot Temporal Action Localization | Benedetta Liberatori et.al. | 2404.05426 | link |
2024-04-09 | SDFR: Synthetic Data for Face Recognition Competition | Hatef Otroshi Shahreza et.al. | 2404.04580 | null |
2024-04-05 | PhysPT: Physics-aware Pretrained Transformer for Estimating Human Dynamics from Monocular Videos | Yufei Zhang et.al. | 2404.04430 | null |
2024-04-05 | Koala: Key frame-conditioned long video-LLM | Reuben Tan et.al. | 2404.04346 | null |
2024-04-04 | UniAV: Unified Audio-Visual Perception for Multi-Task Video Localization | Tiantian Geng et.al. | 2404.03179 | null |
2024-04-03 | Optimizing the Deployment of Tiny Transformers on Low-Power MCUs | Victor J. B. Jung et.al. | 2404.02945 | link |
2024-04-03 | Multi-Scale Spatial-Temporal Self-Attention Graph Convolutional Networks for Skeleton-based Action Recognition | Ikuo Nakamura et.al. | 2404.02624 | null |
2024-04-02 | PREGO: online mistake detection in PRocedural EGOcentric videos | Alessandro Flaborea et.al. | 2404.01933 | link |
2024-04-02 | Disentangled Pre-training for Human-Object Interaction Detection | Zhuolong Li et.al. | 2404.01725 | link |
2024-04-02 | Language Model Guided Interpretable Video Action Reasoning | Ning Wang et.al. | 2404.01591 | null |
2024-04-02 | Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and Action Recognition in Drone Imagery | Christian Limberg et.al. | 2404.01571 | null |
2024-04-01 | LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization | Akshita Gupta et.al. | 2404.01282 | null |
2024-03-31 | LLMs are Good Action Recognizers | Haoxuan Qu et.al. | 2404.00532 | null |
2024-03-29 | Latent Embedding Clustering for Occlusion Robust Head Pose Estimation | José Celestino et.al. | 2403.20251 | null |
2024-03-29 | A Unified Framework for Human-centric Point Cloud Video Understanding | Yiteng Xu et.al. | 2403.20031 | null |
2024-03-28 | Zero-shot Prompt-based Video Encoder for Surgical Gesture Recognition | Mingxing Rao et.al. | 2403.19786 | link |
2024-03-28 | Hypergraph-based Multi-View Action Recognition using Event Cameras | Yue Gao et.al. | 2403.19316 | null |
2024-03-27 | PLOT-TAL -- Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization | Edward Fish et.al. | 2403.18915 | null |
2024-03-27 | iFace: Hand-Over-Face Gesture Recognition Leveraging Impedance Sensing | Mengxi Liu et.al. | 2403.18433 | null |
2024-03-27 | An Evolutionary Network Architecture Search Framework with Adaptive Multimodal Fusion for Hand Gesture Recognition | Yizhang Xia et.al. | 2403.18208 | null |
2024-03-26 | OmniVid: A Generative Framework for Universal Video Understanding | Junke Wang et.al. | 2403.17935 | link |
2024-03-25 | Understanding Long Videos in One Multimodal Language Model Pass | Kanchana Ranasinghe et.al. | 2403.16998 | link |
2024-03-25 | Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects | Zicong Fan et.al. | 2403.16428 | null |
2024-03-24 | Emotion Recognition from the perspective of Activity Recognition | Savinay Nagendra et.al. | 2403.16263 | null |
2024-03-22 | InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding | Yi Wang et.al. | 2403.15377 | link |
2024-03-22 | Gesture-Controlled Aerial Robot Formation for Human-Swarm Interaction in Safety Monitoring Applications | Vít Krátký et.al. | 2403.15333 | null |
2024-03-22 | GCN-DevLSTM: Path Development for Skeleton-Based Action Recognition | Lei Jiang et.al. | 2403.15212 | link |
2024-03-21 | Transfer Learning for Cross-dataset Isolated Sign Language Recognition in Under-Resourced Datasets | Ahmet Alp Kindiroglu et.al. | 2403.14534 | link |
2024-03-20 | Hierarchical NeuroSymbolic Approach for Action Quality Assessment | Lauren Okamoto et.al. | 2403.13798 | null |
2024-03-19 | Selective, Interpretable, and Motion Consistent Privacy Attribute Obfuscation for Action Recognition | Filip Ilic et.al. | 2403.12710 | null |
2024-03-19 | ExACT: Language-guided Conceptual Reasoning and Uncertainty Estimation for Event-based Action Recognition and More | Jiazhou Zhou et.al. | 2403.12534 | null |
2024-03-19 | VideoBadminton: A Video Dataset for Badminton Action Recognition | Qi Li et.al. | 2403.12385 | null |
2024-03-19 | Multi-View Video-Based Learning: Leveraging Weak Labels for Frame-Level Perception | Vijay John et.al. | 2403.11616 | null |
2024-03-19 | VIHE: Virtual In-Hand Eye Transformer for 3D Robotic Manipulation | Weiyao Wang et.al. | 2403.11461 | null |
2024-03-17 | A Lie Group Approach to Riemannian Batch Normalization | Ziheng Chen et.al. | 2403.11261 | link |
2024-03-17 | Boosting Semi-Supervised Temporal Action Localization by Learning from Non-Target Classes | Kun Xia et.al. | 2403.11189 | null |
2024-03-16 | CoPlay: Audio-agnostic Cognitive Scaling for Acoustic Sensing | Yin Li et.al. | 2403.10796 | null |
2024-03-15 | CrossGLG: LLM Guides One-shot Skeleton-based 3D Action Recognition in a Cross-level Manner | Tingbing Yan et.al. | 2403.10082 | null |
2024-03-15 | Skeleton-Based Human Action Recognition with Noisy Labels | Yi Xu et.al. | 2403.09975 | null |
2024-03-14 | On the Utility of 3D Hand Poses for Action Recognition | Md Salman Shamil et.al. | 2403.09805 | null |
2024-03-14 | 3D-VLA: A 3D Vision-Language-Action Generative World Model | Haoyu Zhen et.al. | 2403.09631 | null |
2024-03-14 | SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition | Jeonghyeok Do et.al. | 2403.09508 | link |
2024-03-14 | EventRPG: Event Data Augmentation with Relevance Propagation Guidance | Mingyuan Sun et.al. | 2403.09274 | link |
2024-03-14 | Leveraging Foundation Model Automatic Data Augmentation Strategies and Skeletal Points for Hands Action Recognition in Industrial Assembly Lines | Liang Wu et.al. | 2403.09056 | null |
2024-03-13 | Low-Cost and Real-Time Industrial Human Action Recognitions Based on Large-Scale Foundation Models | Wensheng Liang et.al. | 2403.08420 | null |
2024-03-13 | NaturalVLM: Leveraging Fine-grained Natural Language for Affordance-Guided Visual Manipulation | Ran Xu et.al. | 2403.08355 | null |
2024-03-13 | ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic Manipulation | Guanxing Lu et.al. | 2403.08321 | null |
2024-03-12 | NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning | Bingqian Lin et.al. | 2403.07376 | link |
2024-03-12 | BID: Boundary-Interior Decoding for Unsupervised Temporal Action Localization Pre-Trainin | Qihang Fang et.al. | 2403.07354 | null |
2024-03-11 | Attention Prompt Tuning: Parameter-efficient Adaptation of Pre-trained Models for Spatiotemporal Modeling | Wele Gedara Chaminda Bandara et.al. | 2403.06978 | link |
2024-03-11 | Deep Learning Approaches for Human Action Recognition in Video Data | Yufei Xie et.al. | 2403.06810 | null |
2024-03-11 | Real-Time Multimodal Cognitive Assistant for Emergency Medical Services | Keshara Weerasinghe et.al. | 2403.06734 | null |
2024-03-11 | Multimodal Transformers for Real-Time Surgical Activity Prediction | Keshara Weerasinghe et.al. | 2403.06705 | link |
2024-03-11 | epsilon-Mesh Attack: A Surface-based Adversarial Point Cloud Attack for Facial Expression Recognition | Batuhan Cengiz et.al. | 2403.06661 | null |
2024-03-11 | Density-Guided Label Smoothing for Temporal Localization of Driving Actions | Tunc Alkanat et.al. | 2403.06616 | null |
2024-03-11 | Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | Erkut Akdag et.al. | 2403.06577 | null |
2024-03-10 | Coherent Temporal Synthesis for Incremental Action Segmentation | Guodong Ding et.al. | 2403.06102 | null |
2024-03-09 | Dissecting Deep RL with High Update Ratios: Combatting Value Overestimation and Divergence | Marcel Hussing et.al. | 2403.05996 | null |
2024-03-08 | Benchmarking Micro-action Recognition: Dataset, Methods, and Applications | Dan Guo et.al. | 2403.05234 | link |
2024-03-06 | Video Relationship Detection Using Mixture of Experts | Ala Shaabana et.al. | 2403.03994 | null |
2024-03-05 | Behavior Generation with Latent Actions | Seungjae Lee et.al. | 2403.03181 | link |
2024-03-05 | Learning to Use Tools via Cooperative and Interactive Agents | Zhengliang Shi et.al. | 2403.03031 | null |
2024-03-04 | Gesture recognition with Brownian reservoir computing using geometrically confined skyrmion dynamics | Grischa Beneke et.al. | 2403.01877 | null |
2024-03-04 | A Simple Baseline for Efficient Hand Mesh Reconstruction | Zhishan Zhou et.al. | 2403.01813 | null |
2024-03-03 | A Unified Model Selection Technique for Spectral Clustering Based Motion Segmentation | Yuxiang Huang et.al. | 2403.01606 | null |
2024-03-03 | Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition | Kun-Yu Lin et.al. | 2403.01560 | link |
2024-03-02 | Dynamic 3D Point Cloud Sequences as 2D Videos | Yiming Zeng et.al. | 2403.01129 | null |
2024-02-29 | On the Design of Human-Robot Collaboration Gestures | Anas Shrinah et.al. | 2402.19058 | null |
2024-02-23 | Multimodal Transformer With a Low-Computational-Cost Guarantee | Sungjin Park et.al. | 2402.15096 | null |
2024-02-17 | Implementation of a Model of the Cortex Basal Ganglia Loop | Naoya Arakawa et.al. | 2402.13275 | null |
2024-02-20 | Radar-Based Recognition of Static Hand Gestures in American Sign Language | Christian Schuessler et.al. | 2402.12800 | null |
2024-02-20 | Learning Domain-Invariant Temporal Dynamics for Few-Shot Action Recognition | Yuke Li et.al. | 2402.12706 | null |
2024-02-19 | Comprehensive Cognitive LLM Agent for Smartphone GUI Automation | Xinbei Ma et.al. | 2402.11941 | null |
2024-02-15 | Hand Shape and Gesture Recognition using Multiscale Template Matching, Background Subtraction and Binary Image Analysis | Ketan Suhaas Saichandran et.al. | 2402.09663 | null |
2024-02-14 | TikTokActions: A TikTok-Derived Video Dataset for Human Action Recognition | Yang Qian et.al. | 2402.08875 | null |
2024-02-13 | BdSLW60: A Word-Level Bangla Sign Language Dataset | Husne Ara Rubaiyeat et.al. | 2402.08635 | link |
2024-02-13 | Vision-Based Hand Gesture Customization from a Single Demonstration | Soroush Shahi et.al. | 2402.08420 | null |
2024-02-12 | PBADet: A One-Stage Anchor-Free Approach for Part-Body Association | Zhongpai Gao et.al. | 2402.07814 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-05-30 | Infinite 3D Landmarks: Improving Continuous 2D Facial Landmark Detection | Prashanth Chandran et.al. | 2405.20117 | null |
2024-05-30 | Estimating Human Poses Across Datasets: A Unified Skeleton and Multi-Teacher Distillation Approach | Muhammad Saif Ullah Khan et.al. | 2405.20084 | null |
2024-05-30 | TAMBRIDGE: Bridging Frame-Centered Tracking and 3D Gaussian Splatting for Enhanced SLAM | Peifeng Jiang et.al. | 2405.19614 | null |
2024-05-29 | Real-Time Dynamic Robot-Assisted Hand-Object Interaction via Motion Primitives | Mingqi Yuan et.al. | 2405.19531 | null |
2024-05-29 | Exploring AI-based Anonymization of Industrial Image and Video Data in the Context of Feature Preservation | Sabrina Cynthia Triess et.al. | 2405.19173 | null |
2024-05-28 | World Models for General Surgical Grasping | Hongbin Lin et.al. | 2405.17940 | null |
2024-05-27 | MoSca: Dynamic Gaussian Fusion from Casual Videos via 4D Motion Scaffolds | Jiahui Lei et.al. | 2405.17421 | null |
2024-05-27 | Occlusion Handling in 3D Human Pose Estimation with Perturbed Positional Encoding | Niloofar Azizi et.al. | 2405.17397 | null |
2024-05-27 | Weiquan Wang et.al. | 2405.17016 | null | |
2024-05-27 | Clustering-based Learning for UAV Tracking and Pose Estimation | Jiaping Xiao et.al. | 2405.16867 | null |
2024-05-26 | Multi-Modal UAV Detection, Classification and Tracking Algorithm -- Technical Report for CVPR 2024 UG2 Challenge | Tianchen Deng et.al. | 2405.16464 | link |
2024-05-25 | Intensity and Texture Correction of Omnidirectional Image Using Camera Images for Indirect Augmented Reality | Hakim Ikebayashi et.al. | 2405.16008 | null |
2024-05-23 | CoPeD-Advancing Multi-Robot Collaborative Perception: A Comprehensive Dataset in Real-World Environments | Yang Zhou et.al. | 2405.14731 | link |
2024-05-23 | Segformer++: Efficient Token-Merging Strategies for High-Resolution Semantic Segmentation | Daniel Kienzle et.al. | 2405.14467 | null |
2024-05-21 | Geometric Transformation Uncertainty for Improving 3D Fetal Brain Pose Prediction from Freehand 2D Ultrasound Videos | Jayroop Ramesh et.al. | 2405.13235 | null |
2024-05-21 | Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations | Antoine Legrand et.al. | 2405.12728 | null |
2024-05-21 | PoseGravity: Pose Estimation from Points and Lines with Axis Prior | Akshay Chandrasekhar et.al. | 2405.12646 | link |
2024-05-19 | Focus on Low-Resolution Information: Multi-Granular Information-Lossless Model for Low-Resolution Human Pose Estimation | Zejun Gu et.al. | 2405.12247 | null |
2024-05-20 | AutoSoccerPose: Automated 3D posture Analysis of Soccer Shot Movements | Calvin Yeung et.al. | 2405.12070 | link |
2024-05-19 | Advancing 6-DoF Instrument Pose Estimation in Variable X-Ray Imaging Geometries | Christiaan G. A. Viviers et.al. | 2405.11677 | link |
2024-05-19 | Cross-Domain Knowledge Distillation for Low-Resolution Human Pose Estimation | Zejun Gu et.al. | 2405.11448 | null |
2024-05-18 | PS6D: Point Cloud Based Symmetry-Aware 6D Object Pose Estimation in Robot Bin-Picking | Yifan Yang et.al. | 2405.11257 | null |
2024-05-18 | MotionGS : Compact Gaussian Splatting SLAM by Motion Filter | Xinli Guo et.al. | 2405.11129 | link |
2024-05-17 | Resolving Symmetry Ambiguity in Correspondence-based Methods for Instance-level Object Pose Estimation | Yongliang Lin et.al. | 2405.10557 | null |
2024-05-16 | Diversity-Aware Sign Language Production through a Pose Encoding Variational Autoencoder | Mohamed Ilyes Lakhal et.al. | 2405.10423 | null |
2024-05-17 | Toon3D: Seeing Cartoons from a New Perspective | Ethan Weber et.al. | 2405.10320 | null |
2024-05-15 | Task-adaptive Q-Face | Haomiao Sun et.al. | 2405.09059 | null |
2024-05-14 | RDPN6D: Residual-based Dense Point-wise Network for 6Dof Object Pose Estimation Based on RGB-D Images | Zong-Wei Hong et.al. | 2405.08483 | link |
2024-05-14 | TP3M: Transformer-based Pseudo 3D Image Matching with Reference | Liming Han et.al. | 2405.08434 | null |
2024-05-13 | Deep Learning-Based Object Pose Estimation: A Comprehensive Survey | Jian Liu et.al. | 2405.07801 | link |
2024-05-13 | JointLoc: A Real-time Visual Localization Framework for Planetary UAVs Based on Joint Relative and Absolute Pose Estimation | Xubo Luo et.al. | 2405.07429 | link |
2024-05-11 | TD-NeRF: Novel Truncated Depth Prior for Joint Camera Pose and Neural Radiance Field Optimization | Zhen Tan et.al. | 2405.07027 | null |
2024-05-11 | AHPPEBot: Autonomous Robot for Tomato Harvesting based on Phenotyping and Pose Estimation | Xingxu Li et.al. | 2405.06959 | null |
2024-05-10 | CasCalib: Cascaded Calibration for Motion Capture from Sparse Unsynchronized Cameras | James Tang et.al. | 2405.06845 | link |
2024-05-10 | MGS-SLAM: Monocular Sparse Tracking and Gaussian Mapping with Depth Smooth Regularization | Pengcheng Zhu et.al. | 2405.06241 | null |
2024-05-10 | Free-Moving Object Reconstruction and Pose Estimation with Virtual Camera | Haixin Shi et.al. | 2405.05858 | null |
2024-05-09 | Semi-Autonomous Laparoscopic Robot Docking with Learned Hand-Eye Information Fusion | Huanyu Tian et.al. | 2405.05817 | null |
2024-05-09 | NeuRSS: Enhancing AUV Localization and Bathymetric Mapping with Neural Rendering for Sidescan SLAM | Yiping Xie et.al. | 2405.05807 | null |
2024-05-09 | Benchmarking Neural Radiance Fields for Autonomous Robots: An Overview | Yuhang Ming et.al. | 2405.05526 | null |
2024-05-08 | Adversary-Guided Motion Retargeting for Skeleton Anonymization | Thomas Carr et.al. | 2405.05428 | null |
2024-05-08 | FinePOSE: Fine-Grained Prompt-Driven 3D Human Pose Estimation via Diffusion Models | Jinglin Xu et.al. | 2405.05216 | link |
2024-05-08 | ProbRadarM3F: mmWave Radar based Human Skeletal Pose Estimation with Probability Map Guided Multi-Format Feature Fusion | Bing Zhu et.al. | 2405.05164 | null |
2024-05-08 | GISR: Geometric Initialization and Silhouette-based Refinement for Single-View Robot Pose and Configuration Estimation | Ivan Bilić et.al. | 2405.04890 | null |
2024-05-07 | Learning Distributional Demonstration Spaces for Task-Specific Cross-Pose Estimation | Jenny Wang et.al. | 2405.04609 | null |
2024-05-07 | Speak the Same Language: Global LiDAR Registration on BIM Using Pose Hough Transform | Zhijian Qiao et.al. | 2405.03969 | null |
2024-05-07 | Joint Estimation of Identity Verification and Relative Pose for Partial Fingerprints | Xiongjun Guan et.al. | 2405.03959 | null |
2024-05-06 | Pose Priors from Language Models | Sanjay Subramanian et.al. | 2405.03689 | null |
2024-05-06 | Optimizing Hand Region Detection in MediaPipe Holistic Full-Body Pose Estimation to Improve Accuracy and Avoid Downstream Errors | Amit Moryossef et.al. | 2405.03545 | link |
2024-05-05 | Multi-hop graph transformer network for 3D human pose estimation | Zaedul Islam et.al. | 2405.03055 | null |
2024-05-05 | Blending Distributed NeRFs with Tri-stage Robust Pose Optimization | Baijun Ye et.al. | 2405.02880 | null |
2024-05-03 | WeightedPose: Generalizable Cross-Pose Estimation via Weighted SVD | Xuxin Cheng et.al. | 2405.02241 | null |
2024-05-03 | Probablistic Restoration with Adaptive Noise Sampling for 3D Human Pose Estimation | Xianzhou Zeng et.al. | 2405.02114 | link |
2024-05-03 | An Onboard Framework for Staircases Modeling Based on Point Clouds | Chun Qing et.al. | 2405.01918 | null |
2024-05-06 | ShadowNav: Autonomous Global Localization for Lunar Navigation in Darkness | Deegan Atha et.al. | 2405.01673 | null |
2024-05-02 | IntervenGen: Interventional Data Generation for Robust and Data-Efficient Robot Imitation Learning | Ryan Hoque et.al. | 2405.01472 | null |
2024-05-02 | Behavior Imitation for Manipulator Control and Grasping with Deep Reinforcement Learning | Liu Qiyuan et.al. | 2405.01284 | null |
2024-05-02 | Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors | Wenxuan Guo et.al. | 2405.01112 | null |
2024-05-02 | CoViS-Net: A Cooperative Visual Spatial Foundation Model for Multi-Robot Applications | Jan Blumenkamp et.al. | 2405.01107 | null |
2024-05-04 | HandSSCA: 3D Hand Mesh Reconstruction with State Space Channel Attention from RGB images | Zixun Jiao et.al. | 2405.01066 | null |
2024-05-01 | Radar-Based Localization For Autonomous Ground Vehicles In Suburban Neighborhoods | Andrew J. Kramer et.al. | 2405.00600 | null |
2024-04-30 | Ultra Inertial Poser: Scalable Motion Capture and Tracking from Sparse Inertial Sensors and Ultra-Wideband Ranging | Rayan Armani et.al. | 2404.19541 | link |
2024-04-30 | UniFS: Universal Few-shot Instance Perception with Point Representations | Sheng Jin et.al. | 2404.19401 | null |
2024-04-30 | Quater-GCN: Enhancing 3D Human Pose Estimation with Orientation and Semi-supervised Training | Xingyu Song et.al. | 2404.19279 | null |
2024-04-30 | XFeat: Accelerated Features for Lightweight Image Matching | Guilherme Potje et.al. | 2404.19174 | null |
2024-04-29 | Self-Avatar Animation in Virtual Reality: Impact of Motion Signals Artifacts on the Full-Body Pose Reconstruction | Antoine Maiorca et.al. | 2404.18628 | null |
2024-04-29 | Mesh-based Photorealistic and Real-time 3D Mapping for Robust Visual Perception of Autonomous Underwater Vehicle | Jungwoo Lee et.al. | 2404.18395 | null |
2024-04-29 | Reconstructing Satellites in 3D from Amateur Telescope Images | Zhiming Chang et.al. | 2404.18394 | null |
2024-04-27 | Hybrid 3D Human Pose Estimation with Monocular Video and Sparse IMUs | Yiming Bao et.al. | 2404.17837 | null |
2024-04-26 | Localization Through Particle Filter Powered Neural Network Estimated Monocular Camera Poses | Yi Shen et.al. | 2404.17685 | null |
2024-04-26 | SLAM for Indoor Mapping of Wide Area Construction Environments | Vincent Ress et.al. | 2404.17215 | null |
2024-04-25 | WheelPose: Data Synthesis Techniques to Improve Pose Estimation Performance on Wheelchair Users | William Huang et.al. | 2404.17063 | link |
2024-04-25 | Transformer-Based Local Feature Matching for Multimodal Image Registration | Remi Delaunay et.al. | 2404.16802 | null |
2024-04-25 | DeepKalPose: An Enhanced Deep-Learning Kalman Filter for Temporally Consistent Monocular Vehicle Pose Estimation | Leandro Di Bella et.al. | 2404.16558 | null |
2024-04-25 | Efficient Solution of Point-Line Absolute Pose | Petr Hruby et.al. | 2404.16552 | link |
2024-04-25 | COBRA -- COnfidence score Based on shape Regression Analysis for method-independent quality assessment of object pose estimation from single images | Panagiotis Sapoutzoglou et.al. | 2404.16471 | link |
2024-04-25 | MegaParticles: Range-based 6-DoF Monte Carlo Localization with GPU-Accelerated Stein Particle Filter | Kenji Koide et.al. | 2404.16370 | null |
2024-04-24 | 3D Human Pose Estimation with Occlusions: Introducing BlendMimic3D Dataset and GCN Refinement | Filipa Lino et.al. | 2404.16136 | null |
2024-04-23 | SMPLer: Taming Transformers for Monocular 3D Human Shape and Pose Estimation | Xiangyu Xu et.al. | 2404.15276 | link |
2024-04-25 | Domain adaptive pose estimation via multi-level alignment | Yugan Chen et.al. | 2404.14885 | link |
2024-04-23 | Semi-supervised 2D Human Pose Estimation via Adaptive Keypoint Masking | Kexin Meng et.al. | 2404.14835 | null |
2024-04-23 | UPose3D: Uncertainty-Aware 3D Human Pose Estimation with Cross-View and Temporal Cues | Vandad Davoodnia et.al. | 2404.14634 | null |
2024-04-22 | DHRNet: A Dual-Path Hierarchical Relation Network for Multi-Person Pose Estimation | Yonghao Dang et.al. | 2404.14025 | null |
2024-04-23 | CT-NeRF: Incremental Optimizing Neural Radiance Field and Poses with Complex Trajectory | Yunlong Ran et.al. | 2404.13896 | null |
2024-04-21 | Resampling-free Particle Filters in High-dimensions | Akhilan Boopathy et.al. | 2404.13698 | null |
2024-04-20 | EC-SLAM: Real-time Dense Neural RGB-D SLAM System with Effectively Constrained Global Bundle Adjustment | Guanghao Li et.al. | 2404.13346 | link |
2024-04-18 | Spot-Compose: A Framework for Open-Vocabulary Object Retrieval and Drawer Manipulation in Point Clouds | Oliver Lemke et.al. | 2404.12440 | null |
2024-04-18 | Gait Recognition from Highly Compressed Videos | Andrei Niculae et.al. | 2404.12183 | null |
2024-04-17 | Mushroom Segmentation and 3D Pose Estimation from Point Clouds using Fully Convolutional Geometric Features and Implicit Pose Encoding | George Retsinas et.al. | 2404.12144 | link |
2024-04-17 | Kathakali Hand Gesture Recognition With Minimal Data | Kavitha Raju et.al. | 2404.11205 | null |
2024-04-17 | GeoReF: Geometric Alignment Across Shape Variation for Category-level Object Pose Refinement | Linfang Zheng et.al. | 2404.11139 | null |
2024-04-17 | CorrNet+: Sign Language Recognition and Translation via Spatial-Temporal Correlation | Lianyu Hu et.al. | 2404.11111 | link |
2024-04-16 | HumMUSS: Human Motion Understanding using State Space Models | Arnab Kumar Mondal et.al. | 2404.10880 | null |
2024-04-16 | Invariant Kalman Filtering with Noise-Free Pseudo-Measurements | Sven Goffin et.al. | 2404.10687 | null |
2024-04-16 | The Unreasonable Effectiveness of Pre-Trained Features for Camera Pose Refinement | Gabriele Trivigno et.al. | 2404.10438 | null |
2024-04-16 | GaitPoint+: A Gait Recognition Network Incorporating Point Cloud Analysis and Recycling | Huantao Ren et.al. | 2404.10213 | null |
2024-04-16 | LWIRPOSE: A novel LWIR Thermal Image Dataset and Benchmark | Avinash Upadhyay et.al. | 2404.10212 | link |
2024-04-15 | LetsGo: Large-Scale Garage Modeling and Rendering via LiDAR-Assisted Gaussian Primitives | Jiadi Cui et.al. | 2404.09748 | null |
2024-04-14 | In My Perspective, In My Hands: Accurate Egocentric 2D Hand Pose and Action Recognition | Wiktor Mucha et.al. | 2404.09308 | null |
2024-04-13 | DeDoDe v2: Analyzing and Improving the DeDoDe Keypoint Detector | Johan Edstedt et.al. | 2404.08928 | link |
2024-04-16 | 3D Human Scan With A Moving Event Camera | Kai Kohyama et.al. | 2404.08504 | null |
2024-04-11 | Separated Attention: An Improved Cycle GAN Based Under Water Image Enhancement Method | Tashmoy Ghosh et.al. | 2404.07649 | null |
2024-04-11 | GLID: Pre-training a Generalist Encoder-Decoder Vision Model | Jihao Liu et.al. | 2404.07603 | null |
2024-04-10 | Measuring proximity to standard planes during fetal brain ultrasound scanning | Chiara Di Vece et.al. | 2404.07124 | null |
2024-04-10 | MoCap-to-Visual Domain Adaptation for Efficient Human Mesh Estimation from 2D Keypoints | Bedirhan Uguz et.al. | 2404.07094 | null |
2024-04-10 | Gaussian-LIC: Photo-realistic LiDAR-Inertial-Camera SLAM with 3D Gaussian Splatting | Xiaolei Lang et.al. | 2404.06926 | null |
2024-04-09 | Matching 2D Images in 3D: Metric Relative Pose from Metric Correspondences | Axel Barroso-Laguna et.al. | 2404.06337 | link |
2024-04-09 | Incremental Joint Learning of Depth, Pose and Implicit Scene Representation on Monocular Camera in Large-scale Scenes | Tianchen Deng et.al. | 2404.06050 | null |
2024-04-09 | Improving Facial Landmark Detection Accuracy and Efficiency with Knowledge Distillation | Zong-Wei Hong et.al. | 2404.06029 | null |
2024-04-08 | Learning 3D-Aware GANs from Unposed Images with Template Feature Field | Xinya Chen et.al. | 2404.05705 | null |
2024-04-08 | Learning a Category-level Object Pose Estimator without Pose Annotations | Fengrui Tian et.al. | 2404.05626 | null |
2024-04-08 | DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker | Jiapeng Wu et.al. | 2404.05518 | link |
2024-04-08 | Two Hands Are Better Than One: Resolving Hand to Hand Intersections via Occupancy Networks | Maksym Ivashechkin et.al. | 2404.05414 | null |
2024-04-08 | STITCH: Augmented Dexterity for Suture Throws Including Thread Coordination and Handoffs | Kush Hari et.al. | 2404.05151 | null |
2024-04-05 | ToolEENet: Tool Affordance 6D Pose Estimation | Yunlong Wang et.al. | 2404.04193 | null |
2024-04-04 | SDPose: Tokenized Pose Estimation via Circulation-Guide Self-Distillation | Sichen Chen et.al. | 2404.03518 | link |
2024-04-04 | Multi Positive Contrastive Learning with Pose-Consistent Generated Images | Sho Inayoshi et.al. | 2404.03256 | null |
2024-04-04 | HandDiff: 3D Hand Pose Estimation with Diffusion on Image-Point Cloud | Wencan Cheng et.al. | 2404.03159 | link |
2024-04-03 | Fusing Multi-sensor Input with State Information on TinyML Brains for Autonomous Nano-drones | Luca Crupi et.al. | 2404.02567 | null |
2024-04-03 | Semi-Supervised Unconstrained Head Pose Estimation in the Wild | Huayi Zhou et.al. | 2404.02544 | link |
2024-04-02 | 3D Congealing: 3D-Aware Image Alignment in the Wild | Yunzhi Zhang et.al. | 2404.02125 | null |
2024-04-02 | SelfPose3d: Self-Supervised Multi-Person Multi-View 3d Pose Estimation | Vinkle Srivastav et.al. | 2404.02041 | null |
2024-04-01 | Marrying NeRF with Feature Matching for One-step Pose Estimation | Ronghan Chen et.al. | 2404.00891 | null |
2024-03-31 | Graph-Based vs. Error State Kalman Filter-Based Fusion Of 5G And Inertial Data For MAV Indoor Pose Estimation | Meisam Kabiri et.al. | 2404.00691 | null |
2024-03-31 | OmniLocalRF: Omnidirectional Local Radiance Fields from Dynamic Videos | Dongyoung Choi et.al. | 2404.00676 | null |
2024-04-02 | KTPFormer: Kinematics and Trajectory Prior Knowledge-Enhanced Transformer for 3D Human Pose Estimation | Jihua Peng et.al. | 2404.00658 | link |
2024-03-29 | FetalDiffusion: Pose-Controllable 3D Fetal MRI Synthesis with Conditional Diffusion Model | Molin Zhang et.al. | 2404.00132 | null |
2024-03-29 | Latent Embedding Clustering for Occlusion Robust Head Pose Estimation | José Celestino et.al. | 2403.20251 | null |
2024-03-29 | A Unified Framework for Human-centric Point Cloud Video Understanding | Yiteng Xu et.al. | 2403.20031 | null |
2024-04-01 | Video-Based Human Pose Regression via Decoupled Space-Time Aggregation | Jijie He et.al. | 2403.19926 | link |
2024-03-28 | Instance-Adaptive and Geometric-Aware Keypoint Learning for Category-Level 6D Object Pose Estimation | Xiao Lin et.al. | 2403.19527 | link |
2024-03-27 | Object Pose Estimation via the Aggregation of Diffusion Features | Tianfu Wang et.al. | 2403.18791 | link |
2024-03-27 | RoboKeyGen: Robot Pose and Joint Angles Estimation via Diffusion-based 3D Keypoint Generation | Yang Tian et.al. | 2403.18259 | null |
2024-03-26 | Mathematical Foundation and Corrections for Full Range Head Pose Estimation | Huei-Chung Hu et.al. | 2403.18104 | null |
2024-03-26 | EgoPoseFormer: A Simple Baseline for Egocentric 3D Human Pose Estimation | Chenhongyi Yang et.al. | 2403.18080 | null |
2024-03-26 | A Survey on 3D Egocentric Human Pose Estimation | Md Mushfiqur Azam et.al. | 2403.17893 | null |
2024-03-26 | GTA-HDR: A Large-Scale Synthetic Dataset for HDR Image Reconstruction | Hrishav Bakul Barua et.al. | 2403.17837 | link |
2024-03-26 | DiffH2O: Diffusion-Based Synthesis of Hand-Object Interactions from Textual Descriptions | Sammy Christen et.al. | 2403.17827 | null |
2024-03-26 | System Calibration of a Field Phenotyping Robot with Multiple High-Precision Profile Laser Scanners | Felix Esser et.al. | 2403.17788 | null |
2024-03-25 | Animal Avatars: Reconstructing Animatable 3D Animals from Casual Videos | Remy Sabathier et.al. | 2403.17103 | null |
2024-03-25 | Characterisation of the Intel RealSense D415 Stereo Depth Camera for Motion-Corrected CT Perfusion Imaging | Mahdieh Dashtbani Moghari et.al. | 2403.16490 | null |
2024-03-25 | Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects | Zicong Fan et.al. | 2403.16428 | null |
2024-03-25 | A Geometric Perspective on Fusing Gaussian Distributions on Lie Groups | Yixiao Ge et.al. | 2403.16411 | null |
2024-03-25 | ASDF: Assembly State Detection Utilizing Late Fusion by Integrating 6D Pose Estimation | Hannah Schieber et.al. | 2403.16400 | null |
2024-03-24 | KITchen: A Real-World Benchmark and Dataset for 6D Object Pose Estimation in Kitchen Environments | Abdelrahman Younes et.al. | 2403.16238 | null |
2024-03-24 | Diffusion Model is a Good Pose Estimator from 3D RF-Vision | Junqiao Fan et.al. | 2403.16198 | null |
2024-03-23 | UPNeRF: A Unified Framework for Monocular 3D Object Reconstruction and Pose Estimation | Yuliang Guo et.al. | 2403.15705 | null |
2024-03-22 | InterFusion: Text-Driven Generation of 3D Human-Object Interaction | Sisi Dai et.al. | 2403.15612 | null |
2024-03-22 | Augmented Reality Warnings in Roadway Work Zones: Evaluating the Effect of Modality on Worker Reaction Times | Sepehr Sabeti et.al. | 2403.15571 | null |
2024-03-22 | Gesture-Controlled Aerial Robot Formation for Human-Swarm Interaction in Safety Monitoring Applications | Vít Krátký et.al. | 2403.15333 | null |
2024-03-22 | WSCLoc: Weakly-Supervised Sparse-View Camera Relocalization | Jialu Wang et.al. | 2403.15272 | null |
2024-03-22 | DITTO: Demonstration Imitation by Trajectory Transformation | Nick Heppert et.al. | 2403.15203 | null |
2024-03-22 | Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning | Bumsoo Kim et.al. | 2403.15048 | null |
2024-03-22 | Trajectory Regularization Enhances Self-Supervised Geometric Representation | Jiayun Wang et.al. | 2403.14973 | null |
2024-03-21 | VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding | Ahmad Mahmood et.al. | 2403.14743 | null |
2024-03-21 | Visibility-Aware Keypoint Localization for 6DoF Object Pose Estimation | Ruyi Lian et.al. | 2403.14559 | null |
2024-03-21 | Exploring 3D Human Pose Estimation and Forecasting from the Robot's Perspective: The HARPER Dataset | Andrea Avogaro. Andrea Toaiari et.al. | 2403.14447 | null |
2024-03-21 | Evaluation and Deployment of LiDAR-based Place Recognition in Dense Forests | Haedam Oh et.al. | 2403.14326 | null |
2024-03-21 | Zero123-6D: Zero-shot Novel View Synthesis for RGB Category-level 6D Pose Estimation | Francesco Di Felice et.al. | 2403.14279 | null |
2024-03-20 | DVMNet: Computing Relative Pose for Unseen Objects Beyond Hypotheses | Chen Zhao et.al. | 2403.13683 | link |
2024-03-20 | Meta-Point Learning and Refining for Category-Agnostic Pose Estimation | Junjie Chen et.al. | 2403.13647 | link |
2024-03-20 | Advancing 6D Pose Estimation in Augmented Reality -- Overcoming Projection Ambiguity with Uncontrolled Imagery | Mayura Manawadu et.al. | 2403.13434 | null |
2024-03-20 | DOR3D-Net: Dense Ordinal Regression Network for 3D Hand Pose Estimation | Yamin Mao et.al. | 2403.13405 | null |
2024-03-20 | ManiPose: A Comprehensive Benchmark for Pose-aware Object Manipulation in Robotics | Qiaojun Yu et.al. | 2403.13365 | null |
2024-03-20 | MULAN-WC: Multi-Robot Localization Uncertainty-aware Active NeRF with Wireless Coordination | Weiying Wang et.al. | 2403.13348 | null |
2024-03-19 | FaceXFormer: A Unified Transformer for Facial Analysis | Kartik Narayan et.al. | 2403.12960 | null |
2024-03-19 | WHAC: World-grounded Humans and Cameras | Wanqi Yin et.al. | 2403.12959 | null |
2024-03-19 | Diffusion-Driven Self-Supervised Learning for Shape Reconstruction and Pose Estimation | Jingtao Sun et.al. | 2403.12728 | link |
2024-03-19 | IFFNeRF: Initialisation Free and Fast 6DoF pose estimation from a single image and a NeRF model | Matteo Bortolon et.al. | 2403.12682 | null |
2024-03-19 | In-Hand Following of Deformable Linear Objects Using Dexterous Fingers with Tactile Sensing | Mingrui Yu et.al. | 2403.12676 | null |
2024-03-19 | Self-learning Canonical Space for Multi-view 3D Human Pose Estimation | Xiaoben Li et.al. | 2403.12440 | null |
2024-03-19 | Human Mesh Recovery from Arbitrary Multi-view Images | Xiaoben Li et.al. | 2403.12434 | null |
2024-03-19 | XPose: eXplainable Human Pose Estimation | Luyu Qiu et.al. | 2403.12370 | null |
2024-03-18 | HOIDiffusion: Generating Realistic 3D Hand-Object Interaction Data | Mengqi Zhang et.al. | 2403.12011 | null |
2024-03-18 | Normalized Validity Scores for DNNs in Regression based Eye Feature Extraction | Wolfgang Fuhl et.al. | 2403.11665 | null |
2024-03-18 | An Accurate and Real-time Relative Pose Estimation from Triple Point-line Images by Decoupling Rotation and Translation | Zewen Xu et.al. | 2403.11639 | null |
2024-03-18 | LoRA-Composer: Leveraging Low-Rank Adaptation for Multi-Concept Customization in Training-Free Diffusion Models | Yang Yang et.al. | 2403.11627 | link |
2024-03-18 | GenFlow: Generalizable Recurrent Flow for 6D Pose Refinement of Novel Objects | Sungphill Moon et.al. | 2403.11510 | null |
2024-03-17 | A Dual-Augmentor Framework for Domain Generalization in 3D Human Pose Estimation | Qucheng Peng et.al. | 2403.11310 | null |
2024-03-17 | Compact 3D Gaussian Splatting For Dense Visual SLAM | Tianchen Deng et.al. | 2403.11247 | null |
2024-03-16 | Robotic Task Success Evaluation Under Multi-modal Non-Parametric Object Pose Uncertainty | Lakshadeep Naik et.al. | 2403.10874 | null |
2024-03-16 | DPPE: Dense Pose Estimation in a Plenoxels Environment using Gradient Approximation | Christopher Kolios et.al. | 2403.10773 | null |
2024-03-15 | GS-Pose: Cascaded Framework for Generalizable Segmentation-based 6D Object Pose Estimation | Dingding Cai et.al. | 2403.10683 | null |
2024-03-15 | CLOSURE: Fast Quantification of Pose Uncertainty Sets | Yihuai Gao et.al. | 2403.09990 | null |
2024-03-14 | Scalable Autonomous Drone Flight in the Forest with Visual-Inertial SLAM and Dense Submaps Built without LiDAR | Sebastián Barbas Laina et.al. | 2403.09596 | null |
2024-03-14 | Improving Real-Time Omnidirectional 3D Multi-Person Human Pose Estimation with People Matching and Unsupervised 2D-3D Lifting | Pawel Knap et.al. | 2403.09437 | null |
2024-03-14 | LM2D: Lyrics- and Music-Driven Dance Synthesis | Wenjie Yin et.al. | 2403.09407 | null |
2024-03-14 | SD-Net: Symmetric-Aware Keypoint Prediction and Domain Adaptation for 6D Pose Estimation In Bin-picking Scenarios | Ding-Tao Huang et.al. | 2403.09317 | link |
2024-03-14 | MOTPose: Multi-object 6D Pose Estimation for Dynamic Video Sequences using Attention-based Temporal Fusion | Arul Selvam Periyasamy et.al. | 2403.09309 | null |
2024-03-13 | Data Augmentation in Human-Centric Vision | Wentao Jiang et.al. | 2403.08650 | null |
2024-03-13 | PRAGO: Differentiable Multi-View Pose Optimization From Objectness Detections | Matteo Taiana et.al. | 2403.08586 | null |
2024-03-13 | NeRF-Supervised Feature Point Detection and Description | Ali Youssef et.al. | 2403.08156 | null |
2024-03-12 | Q-SLAM: Quadric Representations for Monocular SLAM | Chensheng Peng et.al. | 2403.08125 | null |
2024-03-12 | MRC-Net: 6-DoF Pose Estimation with MultiScale Residual Correlation | Yuelong Li et.al. | 2403.08019 | null |
2024-03-12 | Uncertainty Quantification with Deep Ensembles for 6D Object Pose Estimation | Kira Wursthorn et.al. | 2403.07741 | null |
2024-03-12 | Adaptive Fusion of Single-View and Multi-View Depth for Autonomous Driving | JunDa Cheng et.al. | 2403.07535 | null |
2024-03-12 | Category-Agnostic Pose Estimation for Point Clouds | Bowen Liu et.al. | 2403.07437 | null |
2024-03-12 | Monocular Microscope to CT Registration using Pose Estimation of the Incus for Augmented Reality Cochlear Implant Surgery | Yike Zhang et.al. | 2403.07219 | null |
2024-03-11 | Real-Time Simulated Avatar from Head-Mounted Sensors | Zhengyi Luo et.al. | 2403.06862 | null |
2024-03-11 | Transformer-based Fusion of 2D-pose and Spatio-temporal Embeddings for Distracted Driver Action Recognition | Erkut Akdag et.al. | 2403.06577 | null |
2024-03-10 | Platypose: Calibrated Zero-Shot Multi-Hypothesis 3D Human Motion Estimation | Paweł A. Pierzchlewicz et.al. | 2403.06164 | link |
2024-03-10 | Diffusion Models Trained with Large Data Are Transferable Visual Models | Guangkai Xu et.al. | 2403.06090 | null |
2024-03-08 | Prepared for the Worst: A Learning-Based Adversarial Attack for Resilience Analysis of the ICP Algorithm | Ziyu Zhang et.al. | 2403.05666 | null |
2024-03-11 | Exploiting polar symmetry in designing equivariant observers for vision-based motion estimation | Tarek Bouazza et.al. | 2403.05450 | null |
2024-03-07 | Real-Time Planning Under Uncertainty for AUVs Using Virtual Maps | Ivana Collado-Gonzalez et.al. | 2403.04936 | null |
2024-03-07 | That's My Point: Compact Object-centric LiDAR Pose Estimation for Large-scale Outdoor Localisation | Georgi Pramatarov et.al. | 2403.04755 | null |
2024-03-07 | Disentangled Diffusion-Based 3D Human Pose Estimation with Hierarchical Spatial and Temporal Denoiser | Qingyuan Cai et.al. | 2403.04444 | null |
2024-03-09 | Single-to-Dual-View Adaptation for Egocentric 3D Hand Pose Estimation | Ruicong Liu et.al. | 2403.04381 | null |
2024-03-05 | FAR: Flexible, Accurate and Robust 6DoF Relative Camera Pose Estimation | Chris Rockwell et.al. | 2403.03221 | null |
2024-03-05 | NRDF: Neural Riemannian Distance Fields for Learning Articulated Pose Priors | Yannan He et.al. | 2403.03122 | null |
2024-03-05 | Improved LiDAR Odometry and Mapping using Deep Semantic Segmentation and Novel Outliers Detection | Mohamed Afifi et.al. | 2403.03111 | null |
2024-03-05 | Splat-Nav: Safe Real-Time Robot Navigation in Gaussian Splatting Maps | Timothy Chen et.al. | 2403.02751 | null |
2024-03-04 | PowerSkel: A Device-Free Framework Using CSI Signal for Human Skeleton Estimation in Power Station | Cunyi Yin et.al. | 2403.01913 | link |
2024-03-04 | A Simple Baseline for Efficient Hand Mesh Reconstruction | Zhishan Zhou et.al. | 2403.01813 | null |
2024-03-03 | MatchU: Matching Unseen Objects for 6D Pose Estimation from RGB-D Images | Junwen Huang et.al. | 2403.01517 | null |
2024-03-02 | Single-image camera calibration with model-free distortion correction | Katia Genovese et.al. | 2403.01263 | null |
2024-03-02 | Grid-based Fast and Structural Visual Odometry | Zhang Zhihe et.al. | 2403.01110 | null |
2024-03-01 | Optimal Robot Formations: Balancing Range-Based Observability and User-Defined Configurations | Syed Shabbir Ahmed et.al. | 2403.00988 | null |
2024-03-04 | TEXterity -- Tactile Extrinsic deXterity: Simultaneous Tactile Estimation and Control for Extrinsic Dexterity | Sangwoon Kim et.al. | 2403.00049 | null |
2024-03-01 | Graph Convolutional Neural Networks for Automated Echocardiography View Recognition: A Holistic Approach | Sarina Thomas et.al. | 2402.19062 | null |
2024-02-29 | Deep Learning for 3D Human Pose Estimation and Mesh Recovery: A Survey | Yang Liu et.al. | 2402.18844 | link |
2024-02-28 | Attention-Propagation Network for Egocentric Heatmap to 3D Pose Lifting | Taeho Kang et.al. | 2402.18330 | link |
2024-02-28 | Location-guided Head Pose Estimation for Fisheye Image | Bing Li et.al. | 2402.18320 | null |
2024-02-28 | NToP: NeRF-Powered Large-scale Dataset Generation for 2D and 3D Human Pose Estimation in Top-View Fisheye Images | Jingrui Yu et.al. | 2402.18196 | null |
2024-02-28 | Six-Point Method for Multi-Camera Systems with Reduced Solution Space | Banglei Guan et.al. | 2402.18066 | null |
2024-02-27 | Real-Time Estimation of Relative Pose for UAVs Using a Dual-Channel Feature Association | Zhaoying Wang et.al. | 2402.17504 | null |
2024-02-26 | HOISDF: Constraining 3D Hand-Object Pose Estimation with Global Signed Distance Fields | Haozhe Qi et.al. | 2402.17062 | link |
2024-02-26 | DRSI-Net: Dual-Residual Spatial Interaction Network for Multi-Person Pose Estimation | Shang Wu et.al. | 2402.16640 | null |
2024-02-26 | GEA: Reconstructing Expressive 3D Gaussian Avatar from Monocular Video | Xinqi Liu et.al. | 2402.16607 | null |
2024-02-26 | DreamUp3D: Object-Centric Generative Models for Single-View 3D Scene Understanding and Real-to-Sim Transfer | Yizhe Wu et.al. | 2402.16308 | null |
2024-02-25 | XAI-based gait analysis of patients walking with Knee-Ankle-Foot orthosis using video cameras | Arnav Mishra et.al. | 2402.16175 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-05-30 | SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow | Chaoyang Wang et.al. | 2405.20282 | link |
2024-05-30 | ETHER: Efficient Finetuning of Large-Scale Models with Hyperplane Reflections | Massimo Bini et.al. | 2405.20271 | link |
2024-05-30 | Boost Your Own Human Image Generation Model via Direct Preference Optimization with AI Feedback | Sanghyeon Na et.al. | 2405.20216 | null |
2024-05-30 | RIGID: A Training-free and Model-Agnostic Framework for Robust AI-Generated Image Detection | Zhiyuan He et.al. | 2405.20112 | null |
2024-05-30 | RTGen: Generating Region-Text Pairs for Open-Vocabulary Object Detection | Fangyi Chen et.al. | 2405.19854 | null |
2024-05-30 | Puff-Net: Efficient Style Transfer with Pure Content and Style Feature Fusion Network | Sizhe Zheng et.al. | 2405.19775 | null |
2024-05-30 | MAE-GAN: A Novel Strategy for Simultaneous Super-resolution Reconstruction and Denoising of Post-stack Seismic Profile | Wenshuo Yu et.al. | 2405.19767 | null |
2024-05-30 | Mitigating annotation shift in cancer classification using single image generative models | Marta Buetas Arcas et.al. | 2405.19754 | link |
2024-05-30 | Uncertainty-guided Optimal Transport in Depth Supervised Sparse-View 3D Gaussian | Wei Sun et.al. | 2405.19657 | null |
2024-05-29 | Quo Vadis ChatGPT? From Large Language Models to Large Knowledge Models | Venkat Venkatasubramanian et.al. | 2405.19561 | null |
2024-05-29 | ConceptPrune: Concept Editing in Diffusion Models via Skilled Neuron Pruning | Ruchika Chavhan et.al. | 2405.19237 | link |
2024-05-29 | Going beyond compositional generalization, DDPMs can produce zero-shot interpolation | Justin Deschenaux et.al. | 2405.19201 | link |
2024-05-29 | The ethical situation of DALL-E 2 | Eduard Hogea et.al. | 2405.19176 | null |
2024-05-29 | Patch-enhanced Mask Encoder Prompt Image Generation | Shusong Xu et.al. | 2405.19085 | null |
2024-05-29 | EasyAnimate: A High-Performance Long Video Generation Method based on Transformer Architecture | Jiaqi Xu et.al. | 2405.18991 | link |
2024-05-29 | Topological Perspectives on Optimal Multimodal Embedding Spaces | Abdul Aziz A. B et.al. | 2405.18867 | null |
2024-05-29 | Flow Priors for Linear Inverse Problems via Iterative Corrupted Trajectory Matching | Yasi Zhang et.al. | 2405.18816 | null |
2024-05-29 | SketchTriplet: Self-Supervised Scenarized Sketch-Text-Image Triplet Generation | Zhenbei Wu et.al. | 2405.18801 | null |
2024-05-29 | Inpaint Biases: A Pathway to Accurate and Unbiased Image Generation | Jiyoon Myung et.al. | 2405.18762 | null |
2024-05-29 | SketchDeco: Decorating B&W Sketches with Colour | Chaitat Utintu et.al. | 2405.18716 | null |
2024-05-28 | Phased Consistency Model | Fu-Yun Wang et.al. | 2405.18407 | null |
2024-05-28 | Multi-modal Generation via Cross-Modal In-Context Learning | Amandeep Kumar et.al. | 2405.18304 | link |
2024-05-28 | Are Image Distributions Indistinguishable to Humans Indistinguishable to Classifiers? | Zebin You et.al. | 2405.18029 | null |
2024-05-28 | Cycle-YOLO: A Efficient and Robust Framework for Pavement Damage Detection | Zhengji Li et.al. | 2405.17905 | null |
2024-05-27 | RefDrop: Controllable Consistency in Image or Video Generation via Reference Feature Guidance | Jiaojiao Fan et.al. | 2405.17661 | null |
2024-05-27 | Enhancing Global Sensitivity and Uncertainty Quantification in Medical Image Reconstruction with Monte Carlo Arbitrary-Masked Mamba | Jiahao Huang et.al. | 2405.17659 | null |
2024-05-27 | EM-GANSim: Real-time and Accurate EM Simulation Using Conditional GANs for 3D Indoor Scenes | Ruichen Wang et.al. | 2405.17366 | null |
2024-05-27 | Prompt Optimization with Human Feedback | Xiaoqiang Lin et.al. | 2405.17346 | link |
2024-05-27 | From Text to Blueprint: Leveraging Text-to-Image Tools for Floor Plan Creation | Xiaoyu Li et.al. | 2405.17236 | null |
2024-05-27 | MCGAN: Enhancing GAN Training with Regression-Based Generator Loss | Baoren Xiao et.al. | 2405.17191 | null |
2024-05-27 | Training-free Editioning of Text-to-Image Models | Jinqi Wang et.al. | 2405.17069 | null |
2024-05-27 | The Poisson Midpoint Method for Langevin Dynamics: Provably Efficient Discretization for Diffusion Models | Saravanan Kandasamy et.al. | 2405.17068 | null |
2024-05-27 | Glauber Generative Model: Discrete Diffusion Models via Binary Classification | Harshit Varma et.al. | 2405.17035 | null |
2024-05-27 | A Correlation- and Mean-Aware Loss Function and Benchmarking Framework to Improve GAN-based Tabular Data Synthesis | Minh H. Vu et.al. | 2405.16971 | null |
2024-05-27 | Anonymization Prompt Learning for Facial Privacy-Preserving Text-to-Image Generation | Liang Shi et.al. | 2405.16895 | null |
2024-05-27 | Think Before You Act: A Two-Stage Framework for Mitigating Gender Bias Towards Vision-Language Tasks | Yunqi Zhang et.al. | 2405.16860 | link |
2024-05-24 | Learning to Discretize Denoising Diffusion ODEs | Vinh Tong et.al. | 2405.15506 | null |
2024-05-24 | A Misleading Gallery of Fluid Motion by Generative Artificial Intelligence | Ali Kashefi et.al. | 2405.15406 | null |
2024-05-24 | Stochastic SR for Gaussian microtextures | Emile Pierret et.al. | 2405.15399 | null |
2024-05-24 | Challenges and Opportunities in 3D Content Generation | Ke Zhao et.al. | 2405.15335 | null |
2024-05-24 | Towards Understanding the Working Mechanism of Text-to-Image Diffusion Model | Mingyang Yi et.al. | 2405.15330 | null |
2024-05-24 | SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance | Guibao Shen et.al. | 2405.15321 | null |
2024-05-24 | Decaf: Data Distribution Decompose Attack against Federated Learning | Zhiyang Dai et.al. | 2405.15316 | null |
2024-05-24 | Unlearning Concepts in Diffusion Model via Concept Domain Correction and Concept Preserving Gradient | Yongliang Wu et.al. | 2405.15304 | null |
2024-05-24 | StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models | Chengming Xu et.al. | 2405.15287 | null |
2024-05-24 | Defensive Unlearning with Adversarial Training for Robust Concept Erasure in Diffusion Models | Yimeng Zhang et.al. | 2405.15234 | link |
2024-05-23 | Improved Distribution Matching Distillation for Fast Image Synthesis | Tianwei Yin et.al. | 2405.14867 | null |
2024-05-23 | Semantica: An Adaptable Image-Conditioned Diffusion Model | Manoj Kumar et.al. | 2405.14857 | null |
2024-05-23 | TerDiT: Ternary Diffusion Models with Transformers | Xudong Lu et.al. | 2405.14854 | link |
2024-05-23 | Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models | Katherine Xu et.al. | 2405.14828 | null |
2024-05-24 | Fast-DDPM: Fast Denoising Diffusion Probabilistic Models for Medical Image-to-Image Generation | Hongxu Jiang et.al. | 2405.14802 | null |
2024-05-23 | Membership Inference on Text-to-Image Diffusion Models via Conditional Likelihood Discrepancy | Shengfang Zhai et.al. | 2405.14800 | null |
2024-05-23 | RetAssist: Facilitating Vocabulary Learners with Generative Images in Story Retelling Practices | Qiaoyi Chen et.al. | 2405.14794 | null |
2024-05-23 | OpFlowTalker: Realistic and Natural Talking Face Generation via Optical Flow Guidance | Shuheng Ge et.al. | 2405.14709 | null |
2024-05-23 | Learning Multi-dimensional Human Preference for Text-to-Image Generation | Sixian Zhang et.al. | 2405.14705 | null |
2024-05-23 | RectifID: Personalizing Rectified Flow with Anchored Classifier Guidance | Zhicheng Sun et.al. | 2405.14677 | link |
2024-05-21 | Personalized Residuals for Concept-Driven Text-to-Image Generation | Cusuh Ham et.al. | 2405.12978 | null |
2024-05-21 | An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation | Zhiyu Tan et.al. | 2405.12914 | null |
2024-05-21 | Spatial-aware Attention Generative Adversarial Network for Semi-supervised Anomaly Detection in Medical Image | Zerui Zhang et.al. | 2405.12872 | null |
2024-05-21 | A Dataset and Baselines for Measuring and Predicting the Music Piece Memorability | Li-Yang Tseng et.al. | 2405.12847 | null |
2024-05-21 | Leveraging Neural Radiance Fields for Pose Estimation of an Unknown Space Object during Proximity Operations | Antoine Legrand et.al. | 2405.12728 | null |
2024-05-21 | CustomText: Customized Textual Image Generation using Diffusion Models | Shubham Paliwal et.al. | 2405.12531 | null |
2024-05-20 | Diffusion for World Modeling: Visual Details Matter in Atari | Eloi Alonso et.al. | 2405.12399 | link |
2024-05-20 | Paired Conditional Generative Adversarial Network for Highly Accelerated Liver 4D MRI | Di Xu et.al. | 2405.12357 | null |
2024-05-20 | EGAN: Evolutional GAN for Ransomware Evasion | Daniel Commey et.al. | 2405.12266 | null |
2024-05-20 | Slicedit: Zero-Shot Video Editing With Text-to-Image Diffusion Models Using Spatio-Temporal Slices | Nathaniel Cohen et.al. | 2405.12211 | null |
2024-05-20 | Diffusion Models for Generating Ballistic Spacecraft Trajectories | Tyler Presser et.al. | 2405.11738 | null |
2024-05-19 | URDFormer: A Pipeline for Constructing Articulated Simulation Environments from Real-World Images | Zoey Chen et.al. | 2405.11656 | null |
2024-05-19 | Nickel and Diming Your GAN: A Dual-Method Approach to Enhancing GAN Efficiency via Knowledge Distillation | Sangyeop Yeo et.al. | 2405.11614 | null |
2024-05-19 | A GAN-Based Data Poisoning Attack Against Federated Learning Systems and Its Countermeasure | Wei Sun et.al. | 2405.11440 | null |
2024-05-18 | UPAM: Unified Prompt Attack in Text-to-Image Generation Models Against Both Textual Filters and Visual Checkers | Duo Peng et.al. | 2405.11336 | null |
2024-05-18 | On the Trajectory Regularity of ODE-based Diffusion Sampling | Defang Chen et.al. | 2405.11326 | null |
2024-05-18 | Few-Shot API Attack Detection: Overcoming Data Scarcity with GAN-Inspired Learning | Udi Aharon et.al. | 2405.11258 | null |
2024-05-18 | TriLoRA: Integrating SVD for Advanced Style Personalization in Text-to-Image Generation | Chengcheng Feng et.al. | 2405.11236 | null |
2024-05-17 | Improving face generation quality and prompt following with synthetic captions | Michail Tarasiou et.al. | 2405.10864 | null |
2024-05-17 | Multi-scale Semantic Prior Features Guided Deep Neural Network for Urban Street-view Image | Jianshun Zeng et.al. | 2405.10504 | null |
2024-05-17 | Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of Transformers | Rya Sanovar et.al. | 2405.10480 | null |
2024-05-16 | Analogist: Out-of-the-box Visual In-Context Learning with Image Diffusion Model | Zheng Gu et.al. | 2405.10316 | null |
2024-05-16 | UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models | Sahel Sharifymoghaddam et.al. | 2405.10311 | null |
2024-05-16 | VirtualModel: Generating Object-ID-retentive Human-object Interaction Image by Diffusion Model for E-commerce Marketing | Binghui Chen et.al. | 2405.09985 | null |
2024-05-16 | KPNDepth: Depth Estimation of Lane Images under Complex Rainy Environment | Zhengxu Shi et.al. | 2405.09964 | null |
2024-05-16 | Chameleon: Mixed-Modal Early-Fusion Foundation Models | Chameleon Team et.al. | 2405.09818 | null |
2024-05-16 | MediSyn: Text-Guided Diffusion Models for Broad Medical 2D and 3D Image Synthesis | Joseph Cho et.al. | 2405.09806 | null |
2024-05-16 | An Autoencoder and Generative Adversarial Networks Approach for Multi-Omics Data Imbalanced Class Handling and Classification | Ibrahim Al-Hurani et.al. | 2405.09756 | null |
2024-05-15 | Towards Evaluating the Robustness of Automatic Speech Recognition Systems via Audio Style Transfer | Weifei Jin et.al. | 2405.09470 | null |
2024-05-16 | Global-Local Image Perceptual Score (GLIPS): Evaluating Photorealistic Quality of AI-Generated Images | Memoona Aziz et.al. | 2405.09426 | null |
2024-05-15 | DeCoDEx: Confounder Detector Guidance for Improved Diffusion-based Counterfactual Explanations | Nima Fathi et.al. | 2405.09288 | link |
2024-05-15 | SOEDiff: Efficient Distillation for Small Object Editing | Qihe Pan et.al. | 2405.09114 | null |
2024-05-15 | Deep Learning in Earthquake Engineering: A Comprehensive Review | Yazhou Xie et.al. | 2405.09021 | null |
2024-05-14 | Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding | Zhimin Li et.al. | 2405.08748 | link |
2024-05-15 | Similarity Metrics for MR Image-To-Image Translation | Melanie Dohmen et.al. | 2405.08431 | null |
2024-05-14 | Compositional Text-to-Image Generation with Dense Blob Representations | Weili Nie et.al. | 2405.08246 | null |
2024-05-13 | RATLIP: Generative Adversarial CLIP Text-to-Image Synthesis Based on Recurrent Affine Transformations | Chengde Lin et.al. | 2405.08114 | link |
2024-05-13 | CTRLorALTer: Conditional LoRAdapter for Efficient 0-Shot Control & Altering of T2I Models | Nick Stracke et.al. | 2405.07913 | null |
2024-05-13 | SAR Image Synthesis with Diffusion Models | Denisa Qosja et.al. | 2405.07776 | null |
2024-05-12 | Semantic Loss Functions for Neuro-Symbolic Structured Prediction | Kareem Ahmed et.al. | 2405.07387 | null |
2024-05-12 | Understanding and Evaluating Human Preferences for AI Generated Images with Instruction Tuning | Jiarui Wang et.al. | 2405.07346 | link |
2024-05-12 | PotatoGANs: Utilizing Generative Adversarial Networks, Instance Segmentation, and Explainable AI for Enhanced Potato Disease Identification and Classification | Mohammad Shafiul Alam et.al. | 2405.07332 | link |
2024-05-12 | Stable Signature is Unstable: Removing Image Watermark from Diffusion Models | Yuepeng Hu et.al. | 2405.07145 | null |
2024-05-12 | MAxPrototyper: A Multi-Agent Generation System for Interactive User Interface Prototyping | Mingyue Yuan et.al. | 2405.07131 | null |
2024-05-11 | Unsupervised Density Neural Representation for CT Metal Artifact Reduction | Qing Wu et.al. | 2405.07047 | null |
2024-05-11 | Semantic Guided Large Scale Factor Remote Sensing Image Super-resolution with Generative Diffusion Prior | Ce Wang et.al. | 2405.07044 | link |
2024-05-11 | Training-free Subject-Enhanced Attention Guidance for Compositional Text-to-image Generation | Shengyuan Liu et.al. | 2405.06948 | null |
2024-05-10 | Controllable Image Generation With Composed Parallel Token Prediction | Jamie Stirling et.al. | 2405.06535 | null |
2024-05-10 | SketchDream: Sketch-based Text-to-3D Generation and Editing | Feng-Lin Liu et.al. | 2405.06461 | null |
2024-05-09 | Photonic quantum generative adversarial networks for classical data | Tigran Sedrakyan et.al. | 2405.06023 | null |
2024-05-09 | Frame Interpolation with Consecutive Brownian Bridge Diffusion | Zonglin Lyu et.al. | 2405.05953 | null |
2024-05-09 | Could It Be Generated? Towards Practical Analysis of Memorization in Text-To-Image Diffusion Models | Zhe Ma et.al. | 2405.05846 | null |
2024-05-10 | MasterWeaver: Taming Editability and Identity for Personalized Text-to-Image Generation | Yuxiang Wei et.al. | 2405.05806 | link |
2024-05-09 | Exploring Text-Guided Single Image Editing for Remote Sensing Images | Fangzhou Han et.al. | 2405.05769 | null |
2024-05-09 | End-to-End Generative Semantic Communication Powered by Shared Semantic Knowledge Base | Shuling Li et.al. | 2405.05738 | null |
2024-05-09 | VM-DDPM: Vision Mamba Diffusion for Medical Image Synthesis | Zhihan Ju et.al. | 2405.05667 | null |
2024-05-09 | A Survey on Personalized Content Synthesis with Diffusion Models | Xulu Zhang et.al. | 2405.05538 | null |
2024-05-09 | Characteristic Learning for Provable One Step Generation | Zhao Ding et.al. | 2405.05512 | link |
2024-05-08 | Cross-Modality Translation with Generative Adversarial Networks to Unveil Alzheimer's Disease Biomarkers | Reihaneh Hassanzadeh et.al. | 2405.05462 | null |
2024-05-08 | DrawL: Understanding the Effects of Non-Mainstream Dialects in Prompted Image Generation | Joshua N. Williams et.al. | 2405.05382 | null |
2024-05-08 | Diffusion-HMC: Parameter Inference with Diffusion Model driven Hamiltonian Monte Carlo | Nayantara Mudur et.al. | 2405.05255 | link |
2024-05-08 | StyleMamba : State Space Model for Efficient Text-driven Image Style Transfer | Zijia Wang et.al. | 2405.05027 | null |
2024-05-08 | Discrepancy-based Diffusion Models for Lesion Detection in Brain MRI | Keqiang Fan et.al. | 2405.04974 | null |
2024-05-08 | Improving Long Text Understanding with Knowledge Distilled from Summarization Model | Yan Liu et.al. | 2405.04955 | null |
2024-05-08 | HAGAN: Hybrid Augmented Generative Adversarial Network for Medical Image Synthesis | Zhihan Ju et.al. | 2405.04902 | null |
2024-05-08 | FlexEControl: Flexible and Efficient Multimodal Control for Text-to-Image Generation | Xuehai He et.al. | 2405.04834 | null |
2024-05-07 | TexControl: Sketch-Based Two-Stage Fashion Image Generation Using Diffusion Model | Yongming Zhang et.al. | 2405.04675 | null |
2024-05-07 | ResNCT: A Deep Learning Model for the Synthesis of Nephrographic Phase Images in CT Urography | Syed Jamal Safdar Gardezi et.al. | 2405.04629 | null |
2024-05-07 | SingIt! Singer Voice Transformation | Amit Eliav et.al. | 2405.04627 | null |
2024-05-07 | Towards Geographic Inclusion in the Evaluation of Text-to-Image Models | Melissa Hall et.al. | 2405.04457 | null |
2024-05-07 | Data augmentation experiments with style-based quantum generative adversarial networks on trapped-ion and superconducting-qubit technologies | Julien Baglio et.al. | 2405.04401 | null |
2024-05-07 | Diffusion-driven GAN Inversion for Multi-Modal Face Image Generation | Jihyun Kim et.al. | 2405.04356 | null |
2024-05-07 | Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer | Zhuoyi Yang et.al. | 2405.04312 | link |
2024-05-07 | Improving Offline Reinforcement Learning with Inaccurate Simulators | Yiwen Hou et.al. | 2405.04307 | null |
2024-05-07 | Bayesian Simultaneous Localization and Multi-Lane Tracking Using Onboard Sensors and a SD Map | Yuxuan Xia et.al. | 2405.04290 | null |
2024-05-07 | Bidirectional Adversarial Autoencoders for the design of Plasmonic Metasurfaces | Yuansan Liu et.al. | 2405.04056 | link |
2024-05-07 | Simple Drop-in LoRA Conditioning on Attention Layers Will Improve Your Diffusion Model | Joo Young Choi et.al. | 2405.03958 | null |
2024-05-06 | Generated Contents Enrichment | Mahdi Naseri et.al. | 2405.03650 | null |
2024-05-06 | CCDM: Continuous Conditional Diffusion Models for Image Generation | Xin Ding et.al. | 2405.03546 | link |
2024-05-06 | GLIP: Electromagnetic Field Exposure Map Completion by Deep Generative Networks | Mohammed Mallik et.al. | 2405.03384 | null |
2024-05-05 | AnoGAN for Tabular Data: A Novel Approach to Anomaly Detection | Aditya Singh et.al. | 2405.03075 | null |
2024-05-05 | Boundary-aware Decoupled Flow Networks for Realistic Extreme Rescaling | Jinmin Li et.al. | 2405.02941 | null |
2024-05-05 | Data-Efficient Molecular Generation with Hierarchical Textual Inversion | Seojin Kim et.al. | 2405.02845 | null |
2024-05-05 | SMCD: High Realism Motion Style Transfer via Mamba-based Diffusion | Ziyun Qian et.al. | 2405.02844 | null |
2024-05-05 | ImageInWords: Unlocking Hyper-Detailed Image Descriptions | Roopal Garg et.al. | 2405.02793 | link |
2024-05-04 | U-DiTs: Downsample Tokens in U-Shaped Diffusion Transformers | Yuchuan Tian et.al. | 2405.02730 | null |
2024-05-03 | Functional Imaging Constrained Diffusion for Brain PET Synthesis from Structural MRI | Minhui Yu et.al. | 2405.02504 | null |
2024-05-03 | Multi-method Integration with Confidence-based Weighting for Zero-shot Image Classification | Siqi Yin et.al. | 2405.02155 | null |
2024-05-03 | Reconstructing the mid-infrared spectra of galaxies using ultraviolet to submillimeter photometry and Deep Generative Networks | Agapi Rissaki et.al. | 2405.02153 | null |
2024-05-03 | Three-Dimensional Amyloid-Beta PET Synthesis from Structural MRI with Conditional Generative Adversarial Networks | Fernando Vega et.al. | 2405.02109 | null |
2024-05-03 | AI-generated art perceptions with GenFrame -- an image-generating picture frame | Peter Kun et.al. | 2405.01901 | null |
2024-05-03 | Defect Image Sample Generation With Diffusion Prior for Steel Surface Defect Recognition | Yichun Tai et.al. | 2405.01872 | null |
2024-05-03 | Report on the AAPM Grand Challenge on deep generative modeling for learning medical image statistics | Rucha Deshpande et.al. | 2405.01822 | null |
2024-05-02 | Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning | Rafael Elberg et.al. | 2405.01705 | link |
2024-05-02 | Investigation on optimal microstructure of dual-phase steel with high strength and ductility by machine learning | Misato Suzuki et.al. | 2405.01689 | null |
2024-05-02 | Improving Subject-Driven Image Synthesis with Subject-Agnostic Guidance | Kelvin C. K. Chan et.al. | 2405.01356 | null |
2024-05-02 | Towards Inclusive Face Recognition Through Synthetic Ethnicity Alteration | Praveen Kumar Chandaliya et.al. | 2405.01273 | null |
2024-05-02 | DiffusionPipe: Training Large Diffusion Models with Efficient Pipelines | Ye Tian et.al. | 2405.01248 | null |
2024-05-02 | On Mechanistic Knowledge Localization in Text-to-Image Generative Models | Samyadeep Basu et.al. | 2405.01008 | null |
2024-05-01 | SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models | Burak Can Biner et.al. | 2405.00878 | null |
2024-05-01 | Guided Conditional Diffusion Classifier (ConDiff) for Enhanced Prediction of Infection in Diabetic Foot Ulcers | Palawat Busaranuvong et.al. | 2405.00858 | null |
2024-05-01 | RGB |
Zheng Zeng et.al. | 2405.00666 | null |
2024-05-01 | UWAFA-GAN: Ultra-Wide-Angle Fluorescein Angiography Transformation via Multi-scale Generation and Registration Enhancement | Ruiquan Ge et.al. | 2405.00542 | link |
2024-05-01 | Compressive Sensing Imaging Using Caustic Lens Mask Generated by Periodic Perturbation in a Ripple Tank | Doğan Tunca Arık et.al. | 2405.00407 | null |
2024-05-01 | Beamforming Inferring by Conditional WGAN-GP for Holographic Antenna Arrays | Fenghao Zhu et.al. | 2405.00391 | null |
2024-05-01 | Streamlining Image Editing with Layered Diffusion Brushes | Peyman Gholami et.al. | 2405.00313 | null |
2024-04-30 | IgCONDA-PET: Implicitly-Guided Counterfactual Diffusion for Detecting Anomalies in PET Images | Shadab Ahamed et.al. | 2405.00239 | link |
2024-04-30 | DOCCI: Descriptions of Connected and Contrasting Images | Yasumasa Onoe et.al. | 2404.19753 | null |
2024-04-30 | Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | Yunhao Ge et.al. | 2404.19752 | null |
2024-04-30 | SwipeGANSpace: Swipe-to-Compare Image Generation via Efficient Latent Space Exploration | Yuto Nakashima et.al. | 2404.19693 | null |
2024-04-30 | Seeing Through the Clouds: Cloud Gap Imputation with Prithvi Foundation Model | Denys Godwin et.al. | 2404.19609 | null |
2024-04-30 | TwinDiffusion: Enhancing Coherence and Efficiency in Panoramic Image Generation with Diffusion Models | Teng Zhou et.al. | 2404.19475 | null |
2024-04-30 | InstantFamily: Masked Attention for Zero-shot Multi-ID Image Generation | Chanran Kim et.al. | 2404.19427 | null |
2024-05-01 | Mapping New Realities: Ground Truth Image Creation with Pix2Pix Image-to-Image Translation | Zhenglin Li et.al. | 2404.19265 | null |
2024-05-01 | FOTS: A Fast Optical Tactile Simulator for Sim2Real Learning of Tactile-motor Robot Manipulation Skills | Yongqiang Zhao et.al. | 2404.19217 | null |
2024-04-30 | NeRF-Insert: 3D Local Editing with Multimodal Control Signals | Benet Oriol Sabat et.al. | 2404.19204 | null |
2024-04-29 | DGE: Direct Gaussian 3D Editing by Consistent Multi-view Editing | Minghao Chen et.al. | 2404.18929 | null |
2024-04-29 | TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation | Junhao Cheng et.al. | 2404.18919 | null |
2024-04-29 | Hide and Seek: How Does Watermarking Impact Face Recognition? | Yuguang Yao et.al. | 2404.18890 | null |
2024-04-29 | Learning Mixtures of Gaussians Using Diffusion Models | Khashayar Gatmiry et.al. | 2404.18869 | null |
2024-04-29 | Socially Adaptive Path Planning Based on Generative Adversarial Network | Yao Wang et.al. | 2404.18687 | null |
2024-04-29 | FlexiFilm: Long Video Generation with Flexible Conditions | Yichen Ouyang et.al. | 2404.18620 | link |
2024-04-29 | Anywhere: A Multi-Agent Framework for Reliable and Diverse Foreground-Conditioned Image Inpainting | Tianyidan Xie et.al. | 2404.18598 | null |
2024-04-29 | SIDBench: A Python Framework for Reliably Assessing Synthetic Image Detection Methods | Manos Schinas et.al. | 2404.18552 | link |
2024-04-29 | Towards Image Synthesis with Photon Counting Stellar Intensity Interferometry | Alessia Spolon et.al. | 2404.18507 | null |
2024-04-29 | Autonomous Quality and Hallucination Assessment for Virtual Tissue Staining and Digital Pathology | Luzhe Huang et.al. | 2404.18458 | null |
2024-04-26 | Federated Transfer Component Analysis Towards Effective VNF Profiling | Xunzheng ZhangB et.al. | 2404.17553 | null |
2024-04-26 | Spatial-frequency Dual-Domain Feature Fusion Network for Low-Light Remote Sensing Image Enhancement | Zishu Yao et.al. | 2404.17400 | null |
2024-04-26 | Trinity Detector:text-assisted and attention mechanisms based spectral fusion for diffusion generation image detection | Jiawei Song et.al. | 2404.17254 | null |
2024-04-26 | ObjectAdd: Adding Objects into Image via a Training-Free Diffusion Modification Fashion | Ziyue Zhang et.al. | 2404.17230 | link |
2024-04-26 | DPGAN: A Dual-Path Generative Adversarial Network for Missing Data Imputation in Graphs | Xindi Zheng et.al. | 2404.17164 | null |
2024-04-26 | An Investigation of Time-Frequency Representation Discriminators for High-Fidelity Vocoder | Yicheng Gu et.al. | 2404.17161 | null |
2024-04-26 | Synthesizing Iris Images using Generative Adversarial Networks: Survey and Comparative Analysis | Shivangi Yadav et.al. | 2404.17105 | null |
2024-04-25 | Channel Modeling for FR3 Upper Mid-band via Generative Adversarial Networks | Yaqi Hu et.al. | 2404.17069 | null |
2024-04-25 | DE-CGAN: Boosting rTMS Treatment Prediction with Diversity Enhancing Conditional Generative Adversarial Networks | Matthew Squires et.al. | 2404.16913 | null |
2024-04-25 | REBEL: Reinforcement Learning via Regressing Relative Rewards | Zhaolin Gao et.al. | 2404.16767 | null |
2024-04-25 | Denoising: from classical methods to deep CNNs | Jean-Eric Campagne et.al. | 2404.16617 | link |
2024-04-25 | MuseumMaker: Continual Style Customization without Catastrophic Forgetting | Chenxi Liu et.al. | 2404.16612 | null |
2024-04-25 | Conditional Distribution Modelling for Few-Shot Image Synthesis with Diffusion Models | Parul Gupta et.al. | 2404.16556 | null |
2024-04-25 | OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images | Ye Mao et.al. | 2404.16538 | null |
2024-04-25 | Cross-sensor super-resolution of irregularly sampled Sentinel-2 time series | Aimi Okabayashi et.al. | 2404.16409 | link |
2024-04-24 | Guardians of the Quantum GAN | Archisman Ghosh et.al. | 2404.16156 | null |
2024-04-24 | Quantitative Characterization of Retinal Features in Translated OCTA | Rashadul Hasan Badhon et.al. | 2404.16133 | null |
2024-04-24 | Spinning solar jets explained through the interplay between plasma sheets and vortex columns | Sahel Dey et.al. | 2404.16096 | null |
2024-04-24 | PuLID: Pure and Lightning ID Customization via Contrastive Alignment | Zinan Guo et.al. | 2404.16022 | null |
2024-04-24 | Security Analysis of WiFi-based Sensing Systems: Threats from Perturbation Attacks | Hangcheng Cao et.al. | 2404.15587 | null |
2024-04-23 | Multi-scale Intervention Planning based on Generative Design | Ioannis Kavouras et.al. | 2404.15492 | null |
2024-04-23 | ID-Aligner: Enhancing Identity-Preserving Text-to-Image Generation with Reward Feedback Learning | Weifeng Chen et.al. | 2404.15449 | null |
2024-04-23 | GLoD: Composing Global Contexts and Local Details in Image Generation | Moyuru Yamada et.al. | 2404.15447 | null |
2024-04-23 | From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation | Zehuan Huang et.al. | 2404.15267 | null |
2024-04-23 | Adaptive Mixed-Scale Feature Fusion Network for Blind AI-Generated Image Quality Assessment | Tianwei Zhou et.al. | 2404.15163 | null |
2024-04-23 | Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation | Xun Wu et.al. | 2404.15100 | null |
2024-04-23 | CoARF: Controllable 3D Artistic Style Transfer for Radiance Fields | Deheng Zhang et.al. | 2404.14967 | null |
2024-04-23 | Music Style Transfer With Diffusion Model | Hong Huang et.al. | 2404.14771 | null |
2024-04-23 | SkinGEN: an Explainable Dermatology Diagnosis-to-Generation Framework with Interactive Vision-Language Models | Bo Lin et.al. | 2404.14755 | null |
2024-04-23 | Skip the Benchmark: Generating System-Level High-Level Synthesis Data using Generative Machine Learning | Yuchao Liao et.al. | 2404.14754 | null |
2024-04-23 | FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction | Hang Hua et.al. | 2404.14715 | null |
2024-04-22 | The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking | Yuying Li et.al. | 2404.14581 | null |
2024-04-22 | GeoDiffuser: Geometry-Based Image Editing with Diffusion Models | Rahul Sajnani et.al. | 2404.14403 | null |
2024-04-22 | SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation | Yuying Ge et.al. | 2404.14396 | link |
2024-04-22 | MultiBooth: Towards Generating All Your Concepts in an Image from Text | Chenyang Zhu et.al. | 2404.14239 | link |
2024-04-22 | RHanDS: Refining Malformed Hands for Generated Images with Decoupled Structure and Style Guidance | Chengrui Wang et.al. | 2404.13984 | null |
2024-04-23 | Accelerating Image Generation with Sub-path Linear Approximation Model | Chen Xu et.al. | 2404.13903 | null |
2024-04-22 | Towards Better Text-to-Image Generation Alignment via Attention Modulation | Yihang Wu et.al. | 2404.13899 | null |
2024-04-22 | Regional Style and Color Transfer | Zhicheng Ding et.al. | 2404.13880 | null |
2024-04-22 | Distributional Black-Box Model Inversion Attack with Multi-Agent Reinforcement Learning | Huan Bao et.al. | 2404.13860 | null |
2024-04-22 | A Comparative Study on Enhancing Prediction in Social Network Advertisement through Data Augmentation | Qikai Yang et.al. | 2404.13812 | null |
2024-04-21 | Enforcing Conditional Independence for Fair Representation Learning and Causal Image Generation | Jensen Hwa et.al. | 2404.13798 | null |
2024-04-19 | RadRotator: 3D Rotation of Radiographs with Diffusion Models | Pouria Rouzrokh et.al. | 2404.13000 | null |
2024-04-19 | Robust CLIP-Based Detector for Exposing Diffusion Model-Generated Images | Santosh et.al. | 2404.12908 | link |
2024-04-19 | Explainable Deepfake Video Detection using Convolutional Neural Network and CapsuleNet | Gazi Hasin Ishrak et.al. | 2404.12841 | null |
2024-04-19 | Generative Modelling with High-Order Langevin Dynamics | Ziqiang Shi et.al. | 2404.12814 | null |
2024-04-19 | PATE-TripleGAN: Privacy-Preserving Image Synthesis with Gaussian Differential Privacy | Zepeng Jiang et.al. | 2404.12730 | null |
2024-04-19 | MLSD-GAN -- Generating Strong High Quality Face Morphing Attacks using Latent Semantic Disentanglement | Aravinda Reddy PN et.al. | 2404.12679 | null |
2024-04-19 | How Real Is Real? A Human Evaluation Framework for Unrestricted Adversarial Examples | Dren Fazlija et.al. | 2404.12653 | null |
2024-04-19 | F2FLDM: Latent Diffusion Models with Histopathology Pre-Trained Embeddings for Unpaired Frozen Section to FFPE Translation | Man M. Ho et.al. | 2404.12650 | null |
2024-04-18 | Alleviating Catastrophic Forgetting in Facial Expression Recognition with Emotion-Centered Models | Israel A. Laurensi et.al. | 2404.12260 | null |
2024-04-18 | First 2D electron density measurements using Coherence Imaging Spectroscopy in the MAST-U Super-X divertor | N. Lonigro et.al. | 2404.12021 | null |
2024-04-18 | ©Plug-in Authorization for Human Content Copyright Protection in Text-to-Image Model | Chao Zhou et.al. | 2404.11962 | null |
2024-04-18 | Sketch-guided Image Inpainting with Partial Discrete Diffusion Process | Nakul Sharma et.al. | 2404.11949 | link |
2024-04-18 | LD-Pruner: Efficient Pruning of Latent Diffusion Models using Task-Agnostic Insights | Thibault Castells et.al. | 2404.11936 | null |
2024-04-18 | EdgeFusion: On-Device Text-to-Image Generation | Thibault Castells et.al. | 2404.11925 | null |
2024-04-18 | Multi-view X-ray Image Synthesis with Multiple Domain Disentanglement from CT Scans | Lixing Tan et.al. | 2404.11889 | null |
2024-04-18 | Generating synthetic electroretinogram waveforms using Artificial Intelligence to improve classification of retinal conditions in under-represented populations | Mikhail Kulyabin et.al. | 2404.11842 | null |
2024-04-18 | TextCenGen: Attention-Guided Text-Centric Background Adaptation for Text-to-Image Generation | Tianyi Liang et.al. | 2404.11824 | null |
2024-04-18 | Tailoring Generative Adversarial Networks for Smooth Airfoil Design | Joyjit Chattoraj et.al. | 2404.11816 | null |
2024-04-17 | On the Scalability of GNNs for Molecular Graphs | Maciej Sypetkowski et.al. | 2404.11568 | null |
2024-04-17 | MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation | Kuan-Chieh et.al. | 2404.11565 | null |
2024-04-17 | SSDiff: Spatial-spectral Integrated Diffusion Model for Remote Sensing Pansharpening | Yu Zhong et.al. | 2404.11537 | null |
2024-04-17 | Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt | Zhanjie Zhang et.al. | 2404.11474 | link |
2024-04-17 | What-if Analysis Framework for Digital Twins in 6G Wireless Network Management | Elif Ak et.al. | 2404.11394 | null |
2024-04-17 | Image Generative Semantic Communication with Multi-Modal Similarity Estimation for Resource-Limited Networks | Eri Hosonuma et.al. | 2404.11280 | null |
2024-04-17 | Optical Image-to-Image Translation Using Denoising Diffusion Models: Heterogeneous Change Detection as a Use Case | João Gabriel Vinholi et.al. | 2404.11243 | null |
2024-04-17 | KI-GAN: Knowledge-Informed Generative Adversarial Networks for Enhanced Multi-Vehicle Trajectory Forecasting at Signalized Intersections | Chuheng Wei et.al. | 2404.11181 | link |
2024-04-17 | TiNO-Edit: Timestep and Noise Optimization for Robust Diffusion-Based Image Editing | Sherry X. Chen et.al. | 2404.11120 | link |
2024-04-17 | Object Remover Performance Evaluation Methods using Class-wise Object Removal Images | Changsuk Oh et.al. | 2404.11104 | null |
2024-04-16 | RefFusion: Reference Adapted Diffusion Models for 3D Scene Inpainting | Ashkan Mirzaei et.al. | 2404.10765 | null |
2024-04-16 | LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? | Yuchi Wang et.al. | 2404.10763 | link |
2024-04-16 | AV-GAN: Attention-Based Varifocal Generative Adversarial Network for Uneven Medical Image Translation | Zexin Li et.al. | 2404.10714 | null |
2024-04-16 | Gaussian Splatting Decoder for 3D-aware Generative Adversarial Networks | Florian Barthel et.al. | 2404.10625 | null |
2024-04-16 | Adversarial Identity Injection for Semantic Face Image Synthesis | Giuseppe Tarollo et.al. | 2404.10408 | null |
2024-04-16 | Generating Counterfactual Trajectories with Latent Diffusion Models for Concept Discovery | Payal Varshney et.al. | 2404.10356 | null |
2024-04-16 | CanvasPic: An Interactive Tool for Freely Generating Facial Images Based on Spatial Layout | Jiafu Wei et.al. | 2404.10352 | null |
2024-04-16 | OmniSSR: Zero-shot Omnidirectional Image Super-Resolution using Stable Diffusion Model | Runyi Li et.al. | 2404.10312 | null |
2024-04-16 | Learnable Prompt for Few-Shot Semantic Segmentation in Remote Sensing Domain | Steve Andreas Immanuel et.al. | 2404.10307 | link |
2024-04-16 | OneActor: Consistent Character Generation via Cluster-Conditioned Guidance | Jiahao Wang et.al. | 2404.10267 | null |
2024-04-15 | Photo-Realistic Image Restoration in the Wild with Controlled Vision-Language Models | Ziwei Luo et.al. | 2404.09732 | link |
2024-04-15 | VFLGAN: Vertical Federated Learning-based Generative Adversarial Network for Vertically Partitioned Data Publication | Xun Yuan et.al. | 2404.09722 | null |
2024-04-15 | In-Context Translation: Towards Unifying Image Recognition, Processing, and Generation | Han Xue et.al. | 2404.09633 | null |
2024-04-15 | Text-Driven Diverse Facial Texture Generation via Progressive Latent-Space Refinement | Chi Wang et.al. | 2404.09540 | null |
2024-04-15 | Magic Clothing: Controllable Garment-Driven Image Synthesis | Weifeng Chen et.al. | 2404.09512 | link |
2024-04-15 | Improved Object-Based Style Transfer with Single Deep Network | Harshmohan Kulkarni et.al. | 2404.09461 | null |
2024-04-15 | Watermark-embedded Adversarial Examples for Copyright Protection against Diffusion Models | Peifei Zhu et.al. | 2404.09401 | null |
2024-04-14 | Counteracting Concept Drift by Learning with Future Malware Predictions | Branislav Bosansky et.al. | 2404.09352 | null |
2024-04-14 | DreamScape: 3D Scene Creation via Gaussian Splatting joint Correlation Modeling | Xuening Yuan et.al. | 2404.09227 | null |
2024-04-13 | InverseVis: Revealing the Hidden with Curved Sphere Tracing | Kai Lawonn et.al. | 2404.09092 | null |
2024-04-12 | An improved tabular data generator with VAE-GMM integration | Patricia A. Apellániz et.al. | 2404.08434 | null |
2024-04-12 | Counterfactual Explanations for Face Forgery Detection via Adversarial Removal of Artifacts | Yang Li et.al. | 2404.08341 | link |
2024-04-11 | Latent Guard: a Safety Framework for Text-to-image Generation | Runtao Liu et.al. | 2404.08031 | link |
2024-04-11 | Rethinking Artistic Copyright Infringements in the Era of Text-to-Image Generative Models | Mazda Moayeri et.al. | 2404.08030 | null |
2024-04-11 | OpenBias: Open-set Bias Detection in Text-to-Image Generative Models | Moreno D'Incà et.al. | 2404.07990 | null |
2024-04-11 | Taming Stable Diffusion for Text to 360° Panorama Image Generation | Cheng Zhang et.al. | 2404.07949 | link |
2024-04-11 | Generating Synthetic Satellite Imagery With Deep-Learning Text-to-Image Models -- Technical Challenges and Implications for Monitoring and Verification | Tuong Vy Nguyen et.al. | 2404.07754 | null |
2024-04-11 | Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models | Tuomas Kynkäänniemi et.al. | 2404.07724 | null |
2024-04-11 | Model-based Cleaning of the QUILT-1M Pathology Dataset for Text-Conditional Image Synthesis | Marc Aubreville et.al. | 2404.07676 | null |
2024-04-11 | Implicit and Explicit Language Guidance for Diffusion-based Visual Perception | Hefeng Wang et.al. | 2404.07600 | null |
2024-04-11 | GAN-based iterative motion estimation in HASTE MRI | Mathias S. Feinler et.al. | 2404.07576 | null |
2024-04-11 | ObjBlur: A Curriculum Learning Approach With Progressive Object-Level Blurring for Improved Layout-to-Image Generation | Stanislav Frolov et.al. | 2404.07564 | null |
2024-04-11 | CAT: Contrastive Adapter Training for Personalized Image Generation | Jae Wan Park et.al. | 2404.07554 | link |
2024-04-11 | Enhancing Network Intrusion Detection Performance using Generative Adversarial Networks | Xinxing Zhao et.al. | 2404.07464 | null |
2024-04-10 | RealmDreamer: Text-Driven 3D Scene Generation with Inpainting and Depth Diffusion | Jaidev Shriram et.al. | 2404.07199 | null |
2024-04-10 | A Gauss-Newton Approach for Min-Max Optimization in Generative Adversarial Networks | Neel Mishra et.al. | 2404.07172 | link |
2024-04-10 | Implicit Multi-Spectral Transformer: An Lightweight and Effective Visible to Infrared Image Translation Model | Yijia Chen et.al. | 2404.07072 | link |
2024-04-10 | Fine color guidance in diffusion models and its application to image compression at extremely low bitrates | Tom Bordin et.al. | 2404.06865 | null |
2024-04-10 | UDiFF: Generating Conditional Unsigned Distance Fields with Optimal Wavelet Diffusion | Junsheng Zhou et.al. | 2404.06851 | null |
2024-04-10 | Tuning-Free Adaptive Style Incorporation for Structure-Consistent Text-Driven Style Transfer | Yanqi Ge et.al. | 2404.06835 | null |
2024-04-10 | MedRG: Medical Report Grounding with Multi-modal Large Language Model | Ke Zou et.al. | 2404.06798 | null |
2024-04-10 | CryinGAN: Design and evaluation of point-cloud-based generative adversarial networks using disordered materials |
Adrian Xiao Bin Yong et.al. | 2404.06734 | null |
2024-04-10 | Deep Generative Data Assimilation in Multimodal Setting | Yongquan Qu et.al. | 2404.06665 | link |
2024-04-09 | GeoSynth: Contextually-Aware High-Resolution Satellite Image Synthesis | Srikumar Sastry et.al. | 2404.06637 | link |
2024-04-09 | High Noise Scheduling is a Must | Mahmut S. Gokmen et.al. | 2404.06353 | null |
2024-04-09 | Fortifying Fully Convolutional Generative Adversarial Networks for Image Super-Resolution Using Divergence Measures | Arkaprabha Basu et.al. | 2404.06294 | null |
2024-04-09 | Hyperparameter-Free Medical Image Synthesis for Sharing Data and Improving Site-Specific Segmentation | Alexander Chebykin et.al. | 2404.06240 | link |
2024-04-09 | DiffHarmony: Latent Diffusion Model Meets Image Harmonization | Pengfei Zhou et.al. | 2404.06139 | null |
2024-04-09 | Greedy-DiM: Greedy Algorithms for Unreasonably Effective Face Morphs | Zander W. Blasingame et.al. | 2404.06025 | null |
2024-04-09 | Boosting Digital Safeguards: Blending Cryptography and Steganography | Anamitra Maiti et.al. | 2404.05985 | null |
2024-04-09 | Tackling Structural Hallucination in Image Translation with Local Diffusion | Seunghoi Kim et.al. | 2404.05980 | null |
2024-04-09 | StoryImager: A Unified and Efficient Framework for Coherent Story Visualization and Completion | Ming Tao et.al. | 2404.05979 | link |
2024-04-09 | Quantum Generative Adversarial Networks in a Silicon Photonic Chip with Maximum Expressibility | Haoran Ma et.al. | 2404.05921 | null |
2024-04-08 | SwapAnything: Enabling Arbitrary Object Swapping in Personalized Visual Editing | Jing Gu et.al. | 2404.05717 | null |
2024-04-08 | Learning 3D-Aware GANs from Unposed Images with Template Feature Field | Xinya Chen et.al. | 2404.05705 | null |
2024-04-08 | SphereHead: Stable 3D Full-head Synthesis with Spherical Tri-plane Representation | Heyuan Li et.al. | 2404.05680 | null |
2024-04-08 | MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation | Kunpeng Song et.al. | 2404.05674 | null |
2024-04-08 | Automatic Controllable Colorization via Imagination | Xiaoyan Cong et.al. | 2404.05661 | null |
2024-04-08 | UniFL: Improve Stable Diffusion via Unified Feedback Learning | Jiacheng Zhang et.al. | 2404.05595 | null |
2024-04-08 | Mind-to-Image: Projecting Visual Mental Imagination of the Brain from fMRI | Hugo Caselles-Dupré et.al. | 2404.05468 | null |
2024-04-08 | CDAD-Net: Bridging Domain Gaps in Generalized Category Discovery | Sai Bhargav Rongali et.al. | 2404.05366 | null |
2024-04-08 | Mask-ControlNet: Higher-Quality Image Generation with An Additional Mask Prompt | Zhiqi Huang et.al. | 2404.05331 | null |
2024-04-08 | MC |
Jiaxiu Jiang et.al. | 2404.05268 | null |
2024-04-04 | No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance | Vishaal Udandarao et.al. | 2404.04125 | link |
2024-04-05 | 3D Facial Expressions through Analysis-by-Neural-Synthesis | George Retsinas et.al. | 2404.04104 | null |
2024-04-05 | Dynamic Prompt Optimizing for Text-to-Image Generation | Wenyi Mo et.al. | 2404.04095 | link |
2024-04-05 | Physics-Inspired Synthesized Underwater Image Dataset | Reina Kaneko et.al. | 2404.03998 | null |
2024-04-05 | Concept Weaver: Enabling Multi-Concept Fusion in Text-to-Image Models | Gihyun Kwon et.al. | 2404.03913 | null |
2024-04-04 | RaFE: Generative Radiance Fields Restoration | Zhongkai Wu et.al. | 2404.03654 | null |
2024-04-04 | CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching | Dongzhi Jiang et.al. | 2404.03653 | link |
2024-04-04 | Reference-Based 3D-Aware Image Editing with Triplane | Bahri Batuhan Bilecen et.al. | 2404.03632 | null |
2024-04-04 | Robust Concept Erasure Using Task Vectors | Minh Pham et.al. | 2404.03631 | null |
2024-04-04 | Terrain Point Cloud Inpainting via Signal Decomposition | Yizhou Xie et.al. | 2404.03572 | null |
2024-04-04 | Integrating Generative AI into Financial Market Prediction for Improved Decision Making | Chang Che et.al. | 2404.03523 | null |
2024-04-04 | Knowledge Distillation-Based Model Extraction Attack using Private Counterfactual Explanations | Fatima Ezzeddine et.al. | 2404.03348 | null |
2024-04-04 | Multi Positive Contrastive Learning with Pose-Consistent Generated Images | Sho Inayoshi et.al. | 2404.03256 | null |
2024-04-04 | Would Deep Generative Models Amplify Bias in Future Models? | Tianwei Chen et.al. | 2404.03242 | null |
2024-04-04 | Diverse and Tailored Image Generation for Zero-shot Multi-label Classification | Kaixin Zhang et.al. | 2404.03144 | null |
2024-04-03 | Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction | Keyu Tian et.al. | 2404.02905 | link |
2024-04-03 | MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment | Duygu Ceylan et.al. | 2404.02899 | null |
2024-04-03 | On the Scalability of Diffusion-based Text-to-Image Generation | Hao Li et.al. | 2404.02883 | null |
2024-04-03 | MULAN: A Multi Layer Annotated Dataset for Controllable Text-to-Image Generation | Petru-Daniel Tudosiu et.al. | 2404.02790 | null |
2024-04-03 | InstantStyle: Free Lunch towards Style-Preserving in Text-to-Image Generation | Haofan Wang et.al. | 2404.02733 | link |
2024-04-03 | Model-agnostic Origin Attribution of Generated Images with Few-shot Examples | Fengyuan Liu et.al. | 2404.02697 | null |
2024-04-03 | Deep Privacy Funnel Model: From a Discriminative to a Generative Approach with an Application to Face Recognition | Behrooz Razeghi et.al. | 2404.02696 | null |
2024-04-03 | Severity Controlled Text-to-Image Generative Model Bias Manipulation | Jordan Vice et.al. | 2404.02530 | null |
2024-04-03 | Designing a Photonic Physically Unclonable Function Having Resilience to Machine Learning Attacks | Elena R. Henderson et.al. | 2404.02440 | null |
2024-04-02 | Diffusion |
Zeyu Yang et.al. | 2404.02148 | link |
2024-04-02 | 3D Congealing: 3D-Aware Image Alignment in the Wild | Yunzhi Zhang et.al. | 2404.02125 | null |
2024-04-02 | Red-Teaming Segment Anything Model | Krzysztof Jankowski et.al. | 2404.02067 | link |
2024-04-02 | MultiParaDetox: Extending Text Detoxification with Parallel Data to New Languages | Daryna Dementieva et.al. | 2404.02037 | null |
2024-04-02 | Enhancing Portfolio Optimization with Transformer-GAN Integration: A Novel Approach in the Black-Litterman Framework | Enmin Zhu et.al. | 2404.02029 | null |
2024-04-02 | Bi-LORA: A Vision-Language Approach for Synthetic Image Detection | Mamadou Keita et.al. | 2404.01959 | null |
2024-04-02 | Real, fake and synthetic faces -- does the coin have three sides? | Shahzeb Naeem et.al. | 2404.01878 | null |
2024-04-02 | Disentangled Pre-training for Human-Object Interaction Detection | Zhuolong Li et.al. | 2404.01725 | null |
2024-04-01 | PlayFutures: Imagining Civic Futures with AI and Puppets | Supratim Pait et.al. | 2404.01527 | null |
2024-04-01 | Is Model Collapse Inevitable? Breaking the Curse of Recursion by Accumulating Real and Synthetic Data | Matthias Gerstgrasser et.al. | 2404.01413 | null |
2024-03-29 | Benchmarking Counterfactual Image Generation | Thomas Melistas et.al. | 2403.20287 | link |
2024-03-29 | FreeSeg-Diff: Training-Free Open-Vocabulary Segmentation with Diffusion Models | Barbara Toniella Corradini et.al. | 2403.20105 | null |
2024-03-29 | SCINeRF: Neural Radiance Fields from a Snapshot Compressive Image | Yunhao Li et.al. | 2403.20018 | link |
2024-03-29 | FairRAG: Fair Human Generation via Fair Retrieval Augmentation | Robik Shrestha et.al. | 2403.19964 | null |
2024-04-01 | Structure Matters: Tackling the Semantic Discrepancy in Diffusion Models for Image Inpainting | Haipeng Liu et.al. | 2403.19898 | link |
2024-03-28 | Vision-Language Synthetic Data Enhances Echocardiography Downstream Tasks | Pooria Ashrafian et.al. | 2403.19880 | link |
2024-03-28 | Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization | Yuhang Li et.al. | 2403.19866 | null |
2024-03-28 | CLoRA: A Contrastive Approach to Compose Multiple LoRA Models | Tuna Han Salih Meral et.al. | 2403.19776 | null |
2024-03-28 | Detecting Image Attribution for Text-to-Image Diffusion Models in RGB and Beyond | Katherine Xu et.al. | 2403.19653 | link |
2024-03-28 | GANTASTIC: GAN-based Transfer of Interpretable Directions for Disentangled Image Editing in Text-to-Image Diffusion Models | Yusuf Dalva et.al. | 2403.19645 | null |
2024-03-28 | Lane-Change in Dense Traffic with Model Predictive Control and Neural Networks | Sangjae Bae et.al. | 2403.19633 | link |
2024-03-28 | Collaborative Interactive Evolution of Art in the Latent Space of Deep Generative Models | Ole Hall et.al. | 2403.19620 | null |
2024-03-28 | Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model | Zhicai Wang et.al. | 2403.19600 | link |
2024-03-28 | Frame by Familiar Frame: Understanding Replication in Video Diffusion Models | Aimon Rahman et.al. | 2403.19593 | null |
2024-03-28 | Locate, Assign, Refine: Taming Customized Image Inpainting with Text-Subject Guidance | Yulin Pan et.al. | 2403.19534 | null |
2024-03-28 | Imperceptible Protection against Style Imitation from Diffusion Models | Namhyuk Ahn et.al. | 2403.19254 | null |
2024-03-28 | QNCD: Quantization Noise Correction for Diffusion Models | Huanpeng Chu et.al. | 2403.19140 | link |
2024-03-28 | Synthetic Medical Imaging Generation with Generative Adversarial Networks For Plain Radiographs | John R. McNulty et.al. | 2403.19107 | null |
2024-03-27 | Conditional Wasserstein Distances with Applications in Bayesian OT Flow Matching | Jannis Chemseddine et.al. | 2403.18705 | null |
2024-03-27 | Attention Calibration for Disentangled Text-to-Image Personalization | Yanbing Zhang et.al. | 2403.18551 | link |
2024-03-27 | DiffusionFace: Towards a Comprehensive Dataset for Diffusion-Based Face Forgery Analysis | Zhongxi Chen et.al. | 2403.18471 | link |
2024-03-27 | DiffStyler: Diffusion-based Localized Image Style Transfer | Shaoxu Li et.al. | 2403.18461 | null |
2024-03-27 | U-Sketch: An Efficient Approach for Sketch to Image Diffusion Models | Ilias Mitsouras et.al. | 2403.18425 | null |
2024-03-27 | ECNet: Effective Controllable Text-to-Image Diffusion Models | Sicheng Li et.al. | 2403.18417 | null |
2024-03-27 | Colour and Brush Stroke Pattern Recognition in Abstract Art using Modified Deep Convolutional Generative Adversarial Networks | Srinitish Srinivasan et.al. | 2403.18397 | link |
2024-03-27 | Ship in Sight: Diffusion Models for Ship-Image Super Resolution | Luigi Sigillo et.al. | 2403.18370 | link |
2024-03-27 | DSF-GAN: DownStream Feedback Generative Adversarial Network | Oriel Perets et.al. | 2403.18267 | link |
2024-03-27 | Don't Look into the Dark: Latent Codes for Pluralistic Image Inpainting | Haiwei Chen et.al. | 2403.18186 | null |
2024-03-26 | Boosting Diffusion Models with Moving Average Sampling in Frequency Domain | Yurui Qian et.al. | 2403.17870 | null |
2024-03-26 | CT Synthesis with Conditional Diffusion Models for Abdominal Lymph Node Segmentation | Yongrui Yu et.al. | 2403.17770 | null |
2024-03-26 | FaultGuard: A Generative Approach to Resilient Fault Prediction in Smart Electrical Grids | Emad Efatinasab et.al. | 2403.17494 | null |
2024-03-26 | LaRE^2: Latent Reconstruction Error Based Method for Diffusion-Generated Image Detection | Yunpeng Luo et.al. | 2403.17465 | null |
2024-03-26 | An inexact proximal MM method for a class of nonconvex composite image reconstruction models | Bujin Li et.al. | 2403.17450 | null |
2024-03-25 | DiffusionAct: Controllable Diffusion Autoencoder for One-shot Face Reenactment | Stella Bounareli et.al. | 2403.17217 | null |
2024-03-25 | FlashFace: Human Image Personalization with High-fidelity Identity Preservation | Shilong Zhang et.al. | 2403.17008 | null |
2024-03-25 | SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer | Rui Zhu et.al. | 2403.17004 | null |
2024-03-25 | Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation | Omer Dahary et.al. | 2403.16990 | null |
2024-03-25 | Isolated Diffusion: Optimizing Multi-Concept Text-to-Image Generation Training-Freely with Isolated Diffusion Guidance | Jingyuan Zhu et.al. | 2403.16954 | null |
2024-03-25 | Iso-Diffusion: Improving Diffusion Probabilistic Models Using the Isotropy of the Additive Gaussian Noise | Dilum Fernando et.al. | 2403.16790 | null |
2024-03-25 | Diff-Def: Diffusion-Generated Deformation Fields for Conditional Atlases | Sophie Starck et.al. | 2403.16776 | null |
2024-03-25 | Multi-Scale Texture Loss for CT denoising with GANs | Francesco Di Feola et.al. | 2403.16640 | link |
2024-03-25 | SDXS: Real-Time One-Step Latent Diffusion Models with Image Conditions | Yuda Song et.al. | 2403.16627 | null |
2024-03-25 | Enhancing Cross-Dataset EEG Emotion Recognition: A Novel Approach with Emotional EEG Style Transfer Network | Yijin Zhou et.al. | 2403.16540 | null |
2024-03-25 | An Intermediate Fusion ViT Enables Efficient Text-Image Alignment in Diffusion Models | Zizhao Hu et.al. | 2403.16530 | null |
2024-03-25 | Training Generative Adversarial Network-Based Vocoder with Limited Data Using Augmentation-Conditional Discriminator | Takuhiro Kaneko et.al. | 2403.16464 | null |
2024-03-25 | Refining Text-to-Image Generation: Towards Accurate Training-Free Glyph-Enhanced Image Generation | Sanyam Lakhanpal et.al. | 2403.16422 | null |
2024-03-25 | Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation | Yingshan Chang et.al. | 2403.16394 | null |
2024-03-25 | Illuminating Systematic Trends in Nuclear Data with Generative Machine Learning Models | Jordan M. R. Fox et.al. | 2403.16389 | null |
2024-03-25 | FlashEval: Towards Fast and Accurate Evaluation of Text-to-image Diffusion Generative Models | Lin Zhao et.al. | 2403.16379 | null |
2024-03-24 | Fill in the ____ (a Diffusion-based Image Inpainting Pipeline) | Eyoel Gebre et.al. | 2403.16016 | null |
2024-03-22 | DragAPart: Learning a Part-Level Motion Prior for Articulated Objects | Ruining Li et.al. | 2403.15382 | null |
2024-03-22 | Long-CLIP: Unlocking the Long-Text Capability of CLIP | Beichen Zhang et.al. | 2403.15378 | null |
2024-03-22 | A Wasserstein perspective of Vanilla GANs | Lea Kunkel et.al. | 2403.15312 | null |
2024-03-22 | Controlled Training Data Generation with Diffusion Models | Teresa Yeo et.al. | 2403.15309 | null |
2024-03-22 | Robust Utility Optimization via a GAN Approach | Florian Krach et.al. | 2403.15243 | null |
2024-03-22 | A Multimodal Approach for Cross-Domain Image Retrieval | Lucas Iijima et.al. | 2403.15152 | null |
2024-03-22 | MM-Diff: High-Fidelity Image Personalization via Multi-Modal Condition Integration | Zhichao Wei et.al. | 2403.15059 | null |
2024-03-22 | Cartoon Hallucinations Detection: Pose-aware In Context Visual Learning | Bumsoo Kim et.al. | 2403.15048 | null |
2024-03-22 | Generative Active Learning for Image Synthesis Personalization | Xulu Zhang et.al. | 2403.14987 | null |
2024-03-22 | CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model | Seungdae Han et.al. | 2403.14944 | null |
2024-03-21 | Implicit Style-Content Separation using B-LoRA | Yarden Frenkel et.al. | 2403.14572 | null |
2024-03-21 | DesignEdit: Multi-Layered Latent Decomposition and Fusion for Unified & Accurate Image Editing | Yueru Jia et.al. | 2403.14487 | null |
2024-03-21 | AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks | Max Ku et.al. | 2403.14468 | null |
2024-03-21 | Analysing Diffusion Segmentation for Medical Images | Mathias Öttl et.al. | 2403.14440 | null |
2024-03-21 | Style-Extracting Diffusion Models for Semi-Supervised Histopathology Segmentation | Mathias Öttl et.al. | 2403.14429 | null |
2024-03-21 | HySim: An Efficient Hybrid Similarity Measure for Patch Matching in Image Inpainting | Saad Noufel et.al. | 2403.14292 | null |
2024-03-21 | Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models | Pablo Marcos-Manchón et.al. | 2403.14291 | link |
2024-03-21 | Safeguarding Medical Image Segmentation Datasets against Unauthorized Training via Contour- and Texture-Aware Perturbations | Xun Lin et.al. | 2403.14250 | null |
2024-03-21 | StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN | Jongwoo Choi et.al. | 2403.14186 | null |
2024-03-21 | QSMDiff: Unsupervised 3D Diffusion Models for Quantitative Susceptibility Mapping | Zhuang Xiong et.al. | 2403.14070 | null |
2024-03-20 | Learning from Models and Data for Visual Grounding | Ruozhen He et.al. | 2403.13804 | null |
2024-03-20 | Step-Calibrated Diffusion for Biomedical Optical Image Restoration | Yiwei Lyu et.al. | 2403.13680 | null |
2024-03-20 | ReGround: Improving Textual and Spatial Grounding at No Cost | Yuseung Lee et.al. | 2403.13589 | null |
2024-03-20 | Diversity-aware Channel Pruning for StyleGAN Compression | Jiwoo Chung et.al. | 2403.13548 | link |
2024-03-20 | IDAdapter: Learning Mixed Features for Tuning-Free Personalization of Text-to-Image Models | Siying Cui et.al. | 2403.13535 | null |
2024-03-20 | Deepfake Detection without Deepfakes: Generalization via Synthetic Frequency Patterns Injection | Davide Alessandro Coccomini et.al. | 2403.13479 | null |
2024-03-20 | S2DM: Sector-Shaped Diffusion Models for Video Generation | Haoran Lang et.al. | 2403.13408 | null |
2024-03-20 | IIDM: Image-to-Image Diffusion Model for Semantic Image Synthesis | Feng Liu et.al. | 2403.13378 | null |
2024-03-20 | AGFSync: Leveraging AI-Generated Feedback for Preference Optimization in Text-to-Image Generation | Jingkun An et.al. | 2403.13352 | null |
2024-03-20 | TiBiX: Leveraging Temporal Information for Bidirectional X-ray and Report Generation | Santosh Sanjeev et.al. | 2403.13343 | null |
2024-03-19 | FouriScale: A Frequency Perspective on Training-Free High-Resolution Image Synthesis | Linjiang Huang et.al. | 2403.12963 | link |
2024-03-19 | Segment Anything for comprehensive analysis of grapevine cluster architecture and berry properties | Efrain Torres-Lomas et.al. | 2403.12935 | null |
2024-03-19 | You Only Sample Once: Taming One-Step Text-To-Image Synthesis by Self-Cooperative Diffusion GANs | Yihong Luo et.al. | 2403.12931 | link |
2024-03-19 | Ultra-High-Resolution Image Synthesis with Pyramid Diffusion Model | Jiajie Yang et.al. | 2403.12915 | link |
2024-03-19 | Generative Enhancement for 3D Medical Images | Lingting Zhu et.al. | 2403.12852 | link |
2024-03-19 | How Spammers and Scammers Leverage AI-Generated Images on Facebook for Audience Growth | Renee DiResta et.al. | 2403.12838 | null |
2024-03-19 | Total Disentanglement of Font Images into Style and Character Class Features | Daichi Haraguchi et.al. | 2403.12784 | null |
2024-03-19 | Towards Controllable Face Generation with Semantic Latent Diffusion Models | Alex Ergasti et.al. | 2403.12743 | link |
2024-03-19 | Tuning-Free Image Customization with Image and Text Guidance | Pengzhi Li et.al. | 2403.12658 | null |
2024-03-19 | NSGAN: A Non-Dominant Sorting Optimisation-Based Generative Adversarial Design Framework for Alloy Discovery | Zhipeng Li et.al. | 2403.12495 | null |
2024-03-18 | Urban Scene Diffusion through Semantic Occupancy Map | Junge Zhang et.al. | 2403.11697 | null |
2024-03-18 | Binary Noise for Binary Tasks: Masked Bernoulli Diffusion for Unsupervised Anomaly Detection | Julia Wolleb et.al. | 2403.11667 | null |
2024-03-18 | LocalStyleFool: Regional Video Style Transfer Attack Using Segment Anything Model | Yuxin Cao et.al. | 2403.11656 | null |
2024-03-18 | QEAN: Quaternion-Enhanced Attention Network for Visual Dance Generation | Zhizhen Zhou et.al. | 2403.11626 | null |
2024-03-18 | CRS-Diff: Controllable Generative Remote Sensing Foundation Model | Datao Tang et.al. | 2403.11614 | null |
2024-03-18 | VmambaIR: Visual State Space Model for Image Restoration | Yuan Shi et.al. | 2403.11423 | link |
2024-03-17 | StainDiffuser: MultiTask Dual Diffusion Model for Virtual Staining | Tushar Kataria et.al. | 2403.11340 | null |
2024-03-17 | Fast Personalized Text-to-Image Syntheses With Attention Injection | Yuxuan Zhang et.al. | 2403.11284 | null |
2024-03-17 | Forging the Forger: An Attempt to Improve Authorship Verification via Data Augmentation | Silvia Corbara et.al. | 2403.11265 | null |
2024-03-17 | Understanding Diffusion Models by Feynman's Path Integral | Yuji Hirono et.al. | 2403.11262 | null |
2024-03-14 | SCP-Diff: Photo-Realistic Semantic Image Synthesis with Spatial-Categorical Joint Prior | Huan-ang Gao et.al. | 2403.09638 | null |
2024-03-14 | Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering | Zeyu Liu et.al. | 2403.09622 | null |
2024-03-14 | PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation | Yuhan Guo et.al. | 2403.09615 | null |
2024-03-14 | Counterfactual contrastive learning: robust representations via causal image synthesis | Melanie Roschewitz et.al. | 2403.09605 | link |
2024-03-14 | Eta Inversion: Designing an Optimal Eta Function for Diffusion-based Real Image Editing | Wonjun Kang et.al. | 2403.09468 | link |
2024-03-14 | Mitigating attribute amplification in counterfactual image generation | Tian Xia et.al. | 2403.09422 | null |
2024-03-14 | Machine Learning Processes as Sources of Ambiguity: Insights from AI Art | Christian Sivertsen et.al. | 2403.09374 | null |
2024-03-14 | Mitigating Data Consistency Induced Discrepancy in Cascaded Diffusion Models for Sparse-view CT Reconstruction | Hanyu Chen et.al. | 2403.09355 | null |
2024-03-14 | StainFuser: Controlling Diffusion for Faster Neural Style Transfer in Multi-Gigapixel Histology Images | Robert Jewsbury et.al. | 2403.09302 | link |
2024-03-14 | Noise Dimension of GAN: An Image Compression Perspective | Ziran Zhu et.al. | 2403.09196 | null |
2024-03-13 | Ambient Diffusion Posterior Sampling: Solving Inverse Problems with Diffusion Models trained on Corrupted Data | Asad Aali et.al. | 2403.08728 | link |
2024-03-13 | HAIFIT: Human-Centered AI for Fashion Image Translation | Jianan Jiang et.al. | 2403.08651 | link |
2024-03-13 | Gaussian Splatting in Style | Abhishek Saroha et.al. | 2403.08498 | null |
2024-03-13 | An Analysis of Human Alignment of Latent Diffusion Models | Lorenz Linhardt et.al. | 2403.08469 | null |
2024-03-13 | Generating Synthetic Computed Tomography for Radiotherapy: SynthRAD2023 Challenge Report | Evi M. C. Huijben et.al. | 2403.08447 | null |
2024-03-13 | Iterative Online Image Synthesis via Diffusion Model for Imbalanced Classification | Shuhan Li et.al. | 2403.08407 | null |
2024-03-13 | StyleDyRF: Zero-shot 4D Style Transfer for Dynamic Neural Radiance Fields | Hongbin Xu et.al. | 2403.08310 | null |
2024-03-13 | Attack Deterministic Conditional Image Generative Models for Diverse and Controllable Generation | Tianyi Chu et.al. | 2403.08294 | null |
2024-03-13 | VIGFace: Virtual Identity Generation Model for Face Image Synthesis | Minsoo Kim et.al. | 2403.08277 | null |
2024-03-13 | CoroNetGAN: Controlled Pruning of GANs via Hypernetworks | Aman Kumar et.al. | 2403.08261 | null |
2024-03-12 | Bridging Different Language Models and Generative Vision Models for Text-to-Image Generation | Shihao Zhao et.al. | 2403.07860 | link |
2024-03-12 | Quantifying and Mitigating Privacy Risks for Tabular Generative Models | Chaoyi Zhu et.al. | 2403.07842 | null |
2024-03-12 | StyleGaussian: Instant 3D Style Transfer with Gaussian Splatting | Kunhao Liu et.al. | 2403.07807 | null |
2024-03-12 | BraSyn 2023 challenge: Missing MRI synthesis and the effect of different learning objectives | Ivo M. Baltruschat et.al. | 2403.07800 | null |
2024-03-12 | Stable-Makeup: When Real-World Makeup Transfer Meets Diffusion Model | Yuxuan Zhang et.al. | 2403.07764 | null |
2024-03-12 | Synth |
Sahand Sharifzadeh et.al. | 2403.07750 | null |
2024-03-12 | Visual Decoding and Reconstruction via EEG Embeddings with Guided Diffusion | Dongyang Li et.al. | 2403.07721 | link |
2024-03-12 | SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces | Yuta Oshima et.al. | 2403.07711 | link |
2024-03-12 | Towards Model Extraction Attacks in GAN-Based Image Translation via Domain Shift Mitigation | Di Mi et.al. | 2403.07673 | null |
2024-03-12 | Gender-ambiguous voice generation through feminine speaking style transfer in male voices | Maria Koutsogiannaki et.al. | 2403.07661 | null |
2024-03-11 | BrushNet: A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion | Xuan Ju et.al. | 2403.06976 | null |
2024-03-11 | Surface-aware Mesh Texture Synthesis with Pre-trained 2D CNNs | Áron Samuel Kovács et.al. | 2403.06855 | null |
2024-03-11 | Medical Image Synthesis via Fine-Grained Image-Text Alignment and Anatomy-Pathology Prompting | Wenting Chen et.al. | 2403.06835 | null |
2024-03-11 | Data-Independent Operator: A Training-Free Artifact Representation Extractor for Generalizable Deepfake Detection | Chuangchuang Tan et.al. | 2403.06803 | link |
2024-03-11 | FaceChain-SuDe: Building Derived Class to Inherit Category Attributes for One-shot Subject-Driven Generation | Pengchong Qiao et.al. | 2403.06775 | link |
2024-03-11 | Distribution-Aware Data Expansion with Diffusion Models | Haowei Zhu et.al. | 2403.06741 | link |
2024-03-11 | Enhancing Image Caption Generation Using Reinforcement Learning with Human Feedback | Adarsh N L et.al. | 2403.06735 | null |
2024-03-11 | Galaxy Morphologies Revealed with Subaru HSC and Super-Resolution Techniques II: Environmental Dependence of Galaxy Mergers at z~2-5 | Takatoshi Shibuya et.al. | 2403.06729 | null |
2024-03-11 | FFAD: A Novel Metric for Assessing Generated Time Series Data Utilizing Fourier Transform and Auto-encoder | Yang Chen et.al. | 2403.06576 | null |
2024-03-11 | Active Generation for Image Classification | Tao Huang et.al. | 2403.06517 | null |
2024-03-08 | Beyond Finite Data: Towards Data-free Out-of-distribution Generalization via Extrapola | Yijiang Li et.al. | 2403.05523 | null |
2024-03-08 | A Data Augmentation Pipeline to Generate Synthetic Labeled Datasets of 3D Echocardiography Images using a GAN | Cristiana Tiago et.al. | 2403.05384 | null |
2024-03-08 | Federated Learning Method for Preserving Privacy in Face Recognition System | Enoch Solomon et.al. | 2403.05344 | null |
2024-03-08 | Fine-tuning a Multiple Instance Learning Feature Extractor with Masked Context Modelling and Knowledge Distillation | Juan I. Pisula et.al. | 2403.05325 | null |
2024-03-08 | GAN-based Massive MIMO Channel Model Trained on Measured Data | Florian Euchner et.al. | 2403.05321 | null |
2024-03-08 | An Efficient Quasi-Random Sampling for Copulas | Sumin Wang et.al. | 2403.05281 | null |
2024-03-08 | Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation | Junyan Wang et.al. | 2403.05239 | null |
2024-03-08 | Synthetic Privileged Information Enhances Medical Image Representation Learning | Lucas Farndale et.al. | 2403.05220 | null |
2024-03-08 | Denoising Autoregressive Representation Learning | Yazhe Li et.al. | 2403.05196 | null |
2024-03-08 | Robust Semantic Communications for Speech-to-Text Translation | Zhenzi Weng et.al. | 2403.05187 | null |
2024-03-07 | Photonic probabilistic machine learning using quantum vacuum noise | Seou Choi et.al. | 2403.04731 | null |
2024-03-07 | PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation | Junsong Chen et.al. | 2403.04692 | null |
2024-03-07 | A Domain Translation Framework with an Adversarial Denoising Diffusion Model to Generate Synthetic Datasets of Echocardiography Images | Cristiana Tiago et.al. | 2403.04612 | null |
2024-03-07 | Discriminative Probing and Tuning for Text-to-Image Generation | Leigang Qu et.al. | 2403.04321 | null |
2024-03-06 | PromptCharm: Text-to-Image Generation through Multi-modal Prompting and Refinement | Zhijie Wang et.al. | 2403.04014 | link |
2024-03-06 | Unifying Generation and Compression: Ultra-low bitrate Image Coding Via Multi-stage Transformer | Naifu Xue et.al. | 2403.03736 | null |
2024-03-06 | Seamless Virtual Reality with Integrated Synchronizer and Synthesizer for Autonomous Driving | He Li et.al. | 2403.03541 | null |
2024-03-06 | NoiseCollage: A Layout-Aware Text-to-Image Diffusion Model Based on Noise Cropping and Merging | Takahiro Shirakawa et.al. | 2403.03485 | null |
2024-03-06 | FLAME Diffuser: Grounded Wildfire Image Synthesis using Mask Guided Diffusion | Hao Wang et.al. | 2403.03463 | null |
2024-03-07 | DLP-GAN: learning to draw modern Chinese landscape photos with generative adversarial network | Xiangquan Gui et.al. | 2403.03456 | null |
2024-03-06 | Towards Understanding Cross and Self-Attention in Stable Diffusion for Text-Guided Image Editing | Bingyan Liu et.al. | 2403.03431 | null |
2024-03-05 | Scaling Rectified Flow Transformers for High-Resolution Image Synthesis | Patrick Esser et.al. | 2403.03206 | null |
2024-03-05 | Behavior Generation with Latent Actions | Seungjae Lee et.al. | 2403.03181 | link |
2024-03-05 | Doubly Abductive Counterfactual Inference for Text-based Image Editing | Xue Song et.al. | 2403.02981 | null |
2024-03-05 | Bias in Generative AI | Mi Zhou et.al. | 2403.02726 | null |
2024-03-05 | Time Weaver: A Conditional Time Series Generation Model | Sai Shankar Narasimhan et.al. | 2403.02682 | null |
2024-03-04 | Transformer for Times Series: an Application to the S&P500 | Pierre Brugiere et.al. | 2403.02523 | null |
2024-03-04 | NiNformer: A Network in Network Transformer with Token Mixing Generated Gating Function | Abdullah Nazhat Abdullah et.al. | 2403.02411 | link |
2024-03-04 | ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models | Jiaxiang Cheng et.al. | 2403.02084 | null |
2024-03-05 | Matrix Completion with Convex Optimization and Column Subset Selection | Antonina Krajewska et.al. | 2403.01919 | link |
2024-03-04 | PLACE: Adaptive Layout-Semantic Fusion for Semantic Image Synthesis | Zhengyao Lv et.al. | 2403.01852 | link |
2024-03-02 | Bespoke Non-Stationary Solvers for Fast Sampling of Diffusion and Flow Models | Neta Shaul et.al. | 2403.01329 | null |
2024-03-02 | TCIG: Two-Stage Controlled Image Generation with Quality Enhancement through Diffusion | Salaheldin Mohamed et.al. | 2403.01212 | null |
2024-03-02 | A Hybrid Model for Traffic Incident Detection based on Generative Adversarial Networks and Transformer Model | Xinying Lu et.al. | 2403.01147 | null |
2024-03-02 | Distilling Text Style Transfer With Self-Explanation From LLMs | Chiyu Zhang et.al. | 2403.01106 | null |
2024-03-01 | BasedAI: A decentralized P2P network for Zero Knowledge Large Language Models (ZK-LLMs) | Sean Wellington et.al. | 2403.01008 | null |
2024-03-01 | Improving Android Malware Detection Through Data Augmentation Using Wasserstein Generative Adversarial Networks | Kawana Stalin et.al. | 2403.00890 | null |
2024-03-01 | Diff-Plugin: Revitalizing Details for Diffusion-based Low-level Tasks | Yuhao Liu et.al. | 2403.00644 | null |
2024-03-01 | Improving Explicit Spatial Relationships in Text-to-Image Generation through an Automatically Derived Dataset | Ander Salaberria et.al. | 2403.00587 | link |
2024-03-01 | Rethinking cluster-conditioned diffusion models | Nikolas Adaloglou et.al. | 2403.00570 | null |
2024-03-01 | VisionLLaMA: A Unified LLaMA Interface for Vision Tasks | Xiangxiang Chu et.al. | 2403.00522 | link |
2024-02-29 | SeD: Semantic-Aware Discriminator for Image Super-Resolution | Bingchen Li et.al. | 2402.19387 | null |
2024-02-29 | A Novel Approach to Industrial Defect Generation through Blended Latent Diffusion Model with Online Adaptation | Hanxi Li et.al. | 2402.19330 | null |
2024-02-29 | Memory-Augmented Generative Adversarial Transformers | Stephan Raaijmakers et.al. | 2402.19218 | null |
2024-02-29 | Generative models struggle with kirigami metamaterials | Gerrit Felsch et.al. | 2402.19196 | null |
2024-02-29 | Disentangling representations of retinal images with generative models | Sarah Müller et.al. | 2402.19186 | null |
2024-02-29 | Trajectory Consistency Distillation | Jianbin Zheng et.al. | 2402.19159 | link |
2024-02-29 | Leveraging Representations from Intermediate Encoder-blocks for Synthetic Image Detection | Christos Koutlis et.al. | 2402.19091 | null |
2024-02-29 | WDM: 3D Wavelet Diffusion Models for High-Resolution Medical Image Synthesis | Paul Friedrich et.al. | 2402.19043 | link |
2024-02-29 | Lotka-Volterra Model with Mutations and Generative Adversarial Networks | S. V. Kozyrev et.al. | 2402.19035 | null |
2024-02-29 | Generating, Reconstructing, and Representing Discrete and Continuous Data: Generalized Diffusion with Learnable Encoding-Decoding | Guangyi Liu et.al. | 2402.19009 | null |
2024-02-28 | MambaMIR: An Arbitrary-Masked Mamba for Joint Medical Image Reconstruction and Uncertainty Estimation | Jiahao Huang et.al. | 2402.18451 | null |
2024-02-28 | FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes | Ziying Pan et.al. | 2402.18331 | null |
2024-02-28 | Balancing Act: Distribution-Guided Debiasing in Diffusion Models | Rishubh Parihar et.al. | 2402.18206 | null |
2024-02-28 | Misalignment-Robust Frequency Distribution Loss for Image Transformation | Zhangkai Ni et.al. | 2402.18192 | null |
2024-02-28 | VulMCI : Code Splicing-based Pixel-row Oversampling for More Continuous Vulnerability Image Generation | Tao Peng et.al. | 2402.18189 | null |
2024-02-28 | Block and Detail: Scaffolding Sketch-to-Image Generation | Vishnu Sarukkai et.al. | 2402.18116 | null |
2024-02-28 | Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis | Yanzuo Lu et.al. | 2402.18078 | link |
2024-02-28 | SynArtifact: Classifying and Alleviating Artifacts in Synthetic Images via Vision-Language Model | Bin Cao et.al. | 2402.18068 | null |
2024-02-28 | Breaking the Black-Box: Confidence-Guided Model Inversion Attack for Distribution Shift | Xinhao Liu et.al. | 2402.18027 | null |
2024-02-27 | CustomSketching: Sketch Concept Extraction for Sketch-based Image Synthesis and Editing | Chufeng Xiao et.al. | 2402.17624 | null |
Publish Date | Title | Authors | Code | |
---|---|---|---|---|
2024-05-30 | MotionLLM: Understanding Human Behaviors from Human Motions and Videos | Ling-Hao Chen et.al. | 2405.20340 | null |
2024-05-30 | Visual Perception by Large Language Model's Weights | Feipeng Ma et.al. | 2405.20339 | null |
2024-05-30 | Xwin-LM: Strong and Scalable Alignment Practice for LLMs | Bolin Ni et.al. | 2405.20335 | link |
2024-05-30 | ParSEL: Parameterized Shape Editing with Language | Aditya Ganeshan et.al. | 2405.20319 | null |
2024-05-30 | CausalQuest: Collecting Natural Causal Questions for AI Agents | Roberto Ceraolo et.al. | 2405.20318 | link |
2024-05-30 | ANAH: Analytical Annotation of Hallucinations in Large Language Models | Ziwei Ji et.al. | 2405.20315 | link |
2024-05-30 | Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Backbone Generation | Guillaume Huguet et.al. | 2405.20313 | null |
2024-05-30 | Large Language Models Can Self-Improve At Web Agent Tasks | Ajay Patel et.al. | 2405.20309 | null |
2024-05-30 | Group Robust Preference Optimization in Reward-free RLHF | Shyam Sundhar Ramesh et.al. | 2405.20304 | null |
2024-05-30 | Who Writes the Review, Human or AI? | Panagiotis C. Theocharopoulos et.al. | 2405.20285 | null |
2024-05-29 | X-VILA: Cross-Modality Alignment for Large Language Model | Hanrong Ye et.al. | 2405.19335 | null |
2024-05-29 | LLMs Meet Multimodal Generation and Editing: A Survey | Yingqing He et.al. | 2405.19334 | link |
2024-05-29 | Multi-Modal Generative Embedding Model | Feipeng Ma et.al. | 2405.19333 | null |
2024-05-29 | Self-Exploring Language Models: Active Preference Elicitation for Online Alignment | Shenao Zhang et.al. | 2405.19332 | link |
2024-05-29 | Normative Modules: A Generative Agent Architecture for Learning Norms that Supports Multi-Agent Cooperation | Atrisha Sarkar et.al. | 2405.19328 | null |
2024-05-29 | MAP-Neo: Highly Capable and Transparent Bilingual Large Language Model Series | Ge Zhang et.al. | 2405.19327 | null |
2024-05-29 | Reasoning3D -- Grounding and Reasoning in 3D: Fine-Grained Zero-Shot Open-Vocabulary 3D Reasoning Part Segmentation via Large Vision-Language Models | Tianrun Chen et.al. | 2405.19326 | null |
2024-05-29 | Nearest Neighbor Speculative Decoding for LLM Generation and Attribution | Minghan Li et.al. | 2405.19325 | null |
2024-05-29 | Are Large Language Models Chameleons? | Mingmeng Geng et.al. | 2405.19323 | null |
2024-05-29 | Value-Incentivized Preference Optimization: A Unified Approach to Online and Offline RLHF | Shicong Cen et.al. | 2405.19320 | null |
2024-05-28 | Don't Forget to Connect! Improving RAG with Graph-based Reranking | Jialin Dong et.al. | 2405.18414 | null |
2024-05-28 | Superposed Decoding: Multiple Generations from a Single Autoregressive Inference Pass | Ethan Shen et.al. | 2405.18400 | link |
2024-05-28 | Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning | Yixiao Zhang et.al. | 2405.18386 | link |
2024-05-28 | OwLore: Outlier-weighed Layerwise Sampled Low-Rank Projection for Memory-Efficient LLM Fine-tuning | Pengxiang Li et.al. | 2405.18380 | link |
2024-05-28 | LLaMA-NAS: Efficient Neural Architecture Search for Large Language Models | Anthony Sarah et.al. | 2405.18377 | null |
2024-05-28 | Empowering Source-Free Domain Adaptation with MLLM-driven Curriculum Learning | Dongjie Chen et.al. | 2405.18376 | link |
2024-05-28 | Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning | Phakphum Artkaew et.al. | 2405.18375 | null |
2024-05-28 | PromptWizard: Task-Aware Agent-driven Prompt Optimization Framework | Eshaan Agarwal et.al. | 2405.18369 | null |
2024-05-28 | Is a 3D-Tokenized LLM the Key to Reliable Autonomous Driving? | Yifan Bai et.al. | 2405.18361 | null |
2024-05-28 | Bridging the Gap: Dynamic Learning Strategies for Improving Multilingual Performance in LLMs | Somnath Kumar et.al. | 2405.18359 | null |
2024-05-27 | Matryoshka Multimodal Models | Mu Cai et.al. | 2405.17430 | null |
2024-05-27 | NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models | Chankyu Lee et.al. | 2405.17428 | null |
2024-05-27 | Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model | Kuan-Chih Huang et.al. | 2405.17427 | link |
2024-05-27 | LARM: Large Auto-Regressive Model for Long-Horizon Embodied Intelligence | Zhuoling Li et.al. | 2405.17424 | null |
2024-05-27 | Self-Corrected Multimodal Large Language Model for End-to-End Robot Manipulation | Jiaming Liu et.al. | 2405.17418 | null |
2024-05-27 | THREAD: Thinking Deeper with Recursive Spawning | Philip Schroeder et.al. | 2405.17402 | null |
2024-05-27 | MindMerger: Efficient Boosting LLM Reasoning in non-English Languages | Zixian Huang et.al. | 2405.17386 | null |
2024-05-27 | ReMoDetect: Reward Models Recognize Aligned LLM's Generations | Hyunseok Lee et.al. | 2405.17382 | null |
2024-05-27 | RTL-Repo: A Benchmark for Evaluating LLMs on Large-Scale RTL Design Projects | Ahmed Allam et.al. | 2405.17378 | null |
2024-05-27 | Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models | ShengYun Peng et.al. | 2405.17374 | null |
2024-05-24 | Scaling Laws for Discriminative Classification in Large Language Models | Dean Wyatte et.al. | 2405.15765 | null |
2024-05-24 | Large Language Models Reflect Human Citation Patterns with a Heightened Citation Bias | Andres Algaba et.al. | 2405.15739 | null |
2024-05-24 | More Insight from Being More Focused: Analysis of Clustered Market Apps | Maleknaz Nayebi et.al. | 2405.15737 | null |
2024-05-24 | LM4LV: A Frozen Large Language Model for Low-level Vision Tasks | Boyang Zheng et.al. | 2405.15734 | null |
2024-05-24 | Optimizing Large Language Models for OpenAPI Code Completion | Bohdan Petryshyn et.al. | 2405.15729 | null |
2024-05-24 | Prompt-Aware Adapter: Towards Learning Adaptive Visual Tokens for Multimodal Large Language Models | Yue Zhang et.al. | 2405.15684 | null |
2024-05-24 | What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models | Abdelrahman Abdelhamed et.al. | 2405.15668 | null |
2024-05-24 | Class Machine Unlearning for Complex Data via Concepts Inference and Data Poisoning | Wenhan Chang et.al. | 2405.15662 | null |
2024-05-24 | Simen Gaure et.al. | 2405.15652 | null | |
2024-05-24 | LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots | Ruoyu Wang et.al. | 2405.15646 | null |
2024-05-23 | A Nurse is Blue and Elephant is Rugby: Cross Domain Alignment in Large Language Models Reveal Human-like Patterns | Asaf Yehudai et.al. | 2405.14863 | null |
2024-05-23 | Bitune: Bidirectional Instruction-Tuning | Dawid J. Kopiczko et.al. | 2405.14862 | null |
2024-05-23 | PV-Tuning: Beyond Straight-Through Estimation for Extreme LLM Compression | Vladimir Malinovskii et.al. | 2405.14852 | null |
2024-05-23 | HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models | Bernal Jiménez Gutiérrez et.al. | 2405.14831 | null |
2024-05-23 | Can LLMs Solve longer Math Word Problems Better? | Xin Xu et.al. | 2405.14804 | null |
2024-05-23 | Lessons from the Trenches on Reproducible Evaluation of Language Models | Stella Biderman et.al. | 2405.14782 | null |
2024-05-23 | WISE: Rethinking the Knowledge Memory for Lifelong Model Editing of Large Language Models | Peng Wang et.al. | 2405.14768 | link |
2024-05-23 | FinRobot: An Open-Source AI Agent Platform for Financial Applications using Large Language Models | Hongyang Yang et.al. | 2405.14767 | link |
2024-05-23 | Evaluating Large Language Models for Public Health Classification and Extraction Tasks | Joshua Harris et.al. | 2405.14766 | null |
2024-05-23 | Large language models can be zero-shot anomaly detectors for time series? | Sarah Alnegheimish et.al. | 2405.14755 | null |
2024-05-21 | Reducing Transformer Key-Value Cache Size with Cross-Layer Attention | William Brandon et.al. | 2405.12981 | null |
2024-05-21 | Energy Rank Alignment: Using Preference Optimization to Search Chemical Space at Scale | Shriram Chennakesavalu et.al. | 2405.12961 | null |
2024-05-21 | Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models | Zhangyue Yin et.al. | 2405.12939 | null |
2024-05-21 | Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs | Bilgehan Sel et.al. | 2405.12933 | null |
2024-05-21 | Code-mixed Sentiment and Hate-speech Prediction | Anjali Yadav et.al. | 2405.12929 | null |
2024-05-21 | Streamlining Software Reviews: Efficient Predictive Modeling with Minimal Examples | Tim Menzies et.al. | 2405.12920 | null |
2024-05-21 | G-DIG: Towards Gradient-based DIverse and hiGh-quality Instruction Data Selection for Machine Translation | Xingyuan Pan et.al. | 2405.12915 | null |
2024-05-21 | An Empirical Study and Analysis of Text-to-Image Generation Using Large Language Model-Powered Textual Representation | Zhiyu Tan et.al. | 2405.12914 | null |
2024-05-21 | Topic Modelling Case Law Using a Large Language Model and a New Taxonomy for UK Law: AI Insights into Summary Judgment | Holli Sargeant et.al. | 2405.12910 | link |
2024-05-21 | Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents | San Kim et.al. | 2405.12900 | null |
2024-05-20 | Adapting Large Multimodal Models to Distribution Shifts: The Role of In-Context Learning | Guanglin Zhou et.al. | 2405.12217 | link |
2024-05-20 | MathBench: Evaluating the Theory and Application Proficiency of LLMs with a Hierarchical Mathematics Benchmark | Hongwei Liu et.al. | 2405.12209 | link |
2024-05-20 | Developers' Perceptions on the Impact of ChatGPT in Software Development: A Survey | Thiago S. Vaillant et.al. | 2405.12195 | null |
2024-05-20 | CT-Eval: Benchmarking Chinese Text-to-Table Performance in Large Language Models | Haoxiang Shi et.al. | 2405.12174 | null |
2024-05-20 | Fennec: Fine-grained Language Model Evaluation and Correction Extended through Branching and Bridging | Xiaobo Liang et.al. | 2405.12163 | link |
2024-05-20 | Eliciting Problem Specifications via Large Language Models | Robert E. Wray et.al. | 2405.12147 | null |
2024-05-20 | DTLLM-VLT: Diverse Text Generation for Visual Language Tracking Based on LLM | Xuchen Li et.al. | 2405.12139 | null |
2024-05-20 | MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning | Ting Jiang et.al. | 2405.12130 | link |
2024-05-20 | Reindex-Then-Adapt: Improving Large Language Models for Conversational Recommendation | Zhankui He et.al. | 2405.12119 | null |
2024-05-20 | Imp: Highly Capable Large Multimodal Models for Mobile Devices | Zhenwei Shao et.al. | 2405.12107 | link |
2024-05-17 | A Survey on Large Language Models with Multilingualism: Recent Advances and New Frontiers | Kaiyu Huang et.al. | 2405.10936 | link |
2024-05-17 | The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural Networks | Lucius Bushnaq et.al. | 2405.10928 | null |
2024-05-17 | COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domain | Dimitrios P. Panagoulias et.al. | 2405.10893 | null |
2024-05-17 | Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature Review | Hongyi Yang et.al. | 2405.10883 | null |
2024-05-17 | The Future of Large Language Model Pre-training is Federated | Lorenzo Sani et.al. | 2405.10853 | null |
2024-05-17 | Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and Opportunities | Hao Zhou et.al. | 2405.10825 | null |
2024-05-17 | Modeling Supply Chain Interaction and Disruption: Insights from Real-world Data and Complex Adaptive System | Jiawei Feng et.al. | 2405.10818 | null |
2024-05-17 | ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot Scenarios | Markus Bayer et.al. | 2405.10808 | null |
2024-05-17 | Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched Embeddings | Albert Sawczyn et.al. | 2405.10745 | null |
2024-05-17 | Efficient Multimodal Large Language Models: A Survey | Yizhang Jin et.al. | 2405.10739 | link |
2024-05-16 | UniRAG: Universal Retrieval Augmentation for Multi-Modal Large Language Models | Sahel Sharifymoghaddam et.al. | 2405.10311 | null |
2024-05-16 | 4D Panoptic Scene Graph Generation | Jingkang Yang et.al. | 2405.10305 | link |
2024-05-16 | HW-GPT-Bench: Hardware-Aware Architecture Benchmark for Language Models | Rhea Sanjay Sukthanker et.al. | 2405.10299 | link |
2024-05-16 | Timeline-based Sentence Decomposition with In-Context Learning for Temporal Fact Extraction | Jianhao Chen et.al. | 2405.10288 | null |
2024-05-16 | FFF: Fixing Flawed Foundations in contrastive pre-training results in very strong Vision-Language models | Adrian Bulat et.al. | 2405.10286 | null |
2024-05-16 | Revisiting OPRO: The Limitations of Small-Scale LLMs as Optimizers | Tuo Zhang et.al. | 2405.10276 | null |
2024-05-16 | Keep It Private: Unsupervised Privatization of Online Text | Calvin Bao et.al. | 2405.10260 | link |
2024-05-16 | When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models | Xianzheng Ma et.al. | 2405.10255 | null |
2024-05-16 | A Systematic Evaluation of Large Language Models for Natural Language Generation Tasks | Xuanfan Ni et.al. | 2405.10251 | null |
2024-05-16 | IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers | Hao Yan et.al. | 2405.10250 | null |
2024-05-15 | Modeling Bilingual Sentence Processing: Evaluating RNN and Transformer Architectures for Cross-Language Structural Priming | Bushi Xiao et.al. | 2405.09508 | null |
2024-05-15 | ParaNames 1.0: Creating an Entity Name Corpus for 400+ Languages using Wikidata | Jonne Sälevä et.al. | 2405.09496 | null |
2024-05-15 | Beyond Flesch-Kincaid: Prompt-based Metrics Improve Difficulty Classification of Educational Texts | Donya Rooein et.al. | 2405.09482 | null |
2024-05-15 | Tell Me Why: Explainable Public Health Fact-Checking with Large Language Models | Majid Zarharan et.al. | 2405.09454 | link |
2024-05-15 | Facilitating Opinion Diversity through Hybrid NLP Approaches | Michiel van der Meer et.al. | 2405.09439 | null |
2024-05-15 | MicroPython Testbed for Federated Learning Algorithms | Miroslav Popovic et.al. | 2405.09423 | null |
2024-05-15 | Matching domain experts by training from scratch on domain knowledge | Xiaoliang Luo et.al. | 2405.09395 | null |
2024-05-15 | PolygloToxicityPrompts: Multilingual Evaluation of Neural Toxic Degeneration in Large Language Models | Devansh Jain et.al. | 2405.09373 | null |
2024-05-15 | Large Language Model Bias Mitigation from the Perspective of Knowledge Editing | Ruizhe Chen et.al. | 2405.09341 | null |
2024-05-15 | Prompting-based Synthetic Data Generation for Few-Shot Question Answering | Maximilian Schmidt et.al. | 2405.09335 | null |
2024-05-14 | Towards Enhanced RAC Accessibility: Leveraging Datasets and LLMs | Edison Jair Bejarano Sepulveda et.al. | 2405.08792 | null |
2024-05-14 | Incorporating Clinical Guidelines through Adapting Multi-modal Large Language Model for Prostate Cancer PI-RADS Scoring | Tiantian Zhang et.al. | 2405.08786 | null |
2024-05-14 | Is the Pope Catholic? Yes, the Pope is Catholic. Generative Evaluation of Intent Resolution in LLMs | Akhila Yerukola et.al. | 2405.08760 | link |
2024-05-14 | Distributed Threat Intelligence at the Edge Devices: A Large Language Model-Driven Approach | Syed Mhamudul Hasan et.al. | 2405.08755 | null |
2024-05-14 | Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding | Zhimin Li et.al. | 2405.08748 | link |
2024-05-14 | ALMol: Aligned Language-Molecule Translation LLMs through Offline Preference Contrastive Optimisation | Dimitris Gkoumas et.al. | 2405.08619 | null |
2024-05-14 | A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine | Hanguang Xiao et.al. | 2405.08603 | null |
2024-05-14 | EVDA: Evolving Deepfake Audio Detection Continual Learning Benchmark | Xiaohui Zhang et.al. | 2405.08596 | null |
2024-05-14 | Falcon 7b for Software Mention Detection in Scholarly Documents | AmeerAli Khan et.al. | 2405.08514 | null |
2024-05-14 | Archimedes-AUEB at SemEval-2024 Task 5: LLM explains Civil Procedure | Odysseas S. Chlapanis et.al. | 2405.08502 | null |
2024-05-13 | Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots | Chengyue Wu et.al. | 2405.07990 | null |
2024-05-13 | A Generalist Learner for Multifaceted Medical Image Interpretation | Hong-Yu Zhou et.al. | 2405.07988 | null |
2024-05-13 | PyZoBot: A Platform for Conversational Information Extraction and Synthesis from Curated Zotero Reference Libraries through Advanced Retrieval-Augmented Generation | Suad Alshammari et.al. | 2405.07963 | null |
2024-05-13 | AgentClinic: a multimodal agent benchmark to evaluate AI in simulated clinical environments | Samuel Schmidgall et.al. | 2405.07960 | null |
2024-05-13 | EconLogicQA: A Question-Answering Benchmark for Evaluating Large Language Models in Economic Sequential Reasoning | Yinzhu Quan et.al. | 2405.07938 | null |
2024-05-13 | PARDEN, Can You Repeat That? Defending against Jailbreaks via Repetition | Ziyang Zhang et.al. | 2405.07932 | link |
2024-05-13 | Can Better Text Semantics in Prompt Tuning Improve VLM Generalization? | Hari Chandana Kuchibhotla et.al. | 2405.07921 | null |
2024-05-13 | A Systematic Investigation of Distilling Large Language Models into Cross-Encoders for Passage Re-ranking | Ferdinand Schlatt et.al. | 2405.07920 | null |
2024-05-13 | Russian-Language Multimodal Dataset for Automatic Summarization of Scientific Papers | Alena Tsanda et.al. | 2405.07886 | null |
2024-05-13 | Reproducing the Metric-Based Evaluation of a Set of Controllable Text Generation Techniques | Michela Lorandi et.al. | 2405.07875 | null |
2024-05-10 | Linearizing Large Language Models | Jean Mercat et.al. | 2405.06640 | link |
2024-05-10 | Value Augmented Sampling for Language Model Alignment and Personalization | Seungwook Han et.al. | 2405.06639 | link |
2024-05-10 | Federated Document Visual Question Answering: A Pilot Study | Khanh Nguyen et.al. | 2405.06636 | null |
2024-05-10 | Characterizing the Accuracy - Efficiency Trade-off of Low-rank Decomposition in Language Models | Chakshu Moar et.al. | 2405.06626 | null |
2024-05-10 | What Can Natural Language Processing Do for Peer Review? | Ilia Kuznetsov et.al. | 2405.06563 | null |
2024-05-10 | Mitigating Hallucinations in Large Language Models via Self-Refinement-Enhanced Knowledge Retrieval | Mengjia Niu et.al. | 2405.06545 | null |
2024-05-10 | Prompting Large Language Models with Knowledge Graphs for Question Answering Involving Long-tail Facts | Wenyu Huang et.al. | 2405.06524 | null |
2024-05-10 | UniDM: A Unified Framework for Data Manipulation with Large Language Models | Yichen Qian et.al. | 2405.06510 | null |
2024-05-10 | Aspect-based Sentiment Evaluation of Chess Moves (ASSESS): an NLP-based Method for Evaluating Chess Strategies from Textbooks | Haifa Alrdahi et.al. | 2405.06499 | null |
2024-05-10 | Storypark: Leveraging Large Language Models to Enhance Children Story Learning Through Child-AI collaboration Storytelling | Lyumanshan Ye et.al. | 2405.06495 | null |
2024-05-09 | Natural Language Processing RELIES on Linguistics | Juri Opitz et.al. | 2405.05966 | null |
2024-05-09 | OpenBA-V2: Reaching 77.3% High Compression Ratio with Fast Multi-Stage Pruning | Dan Qiao et.al. | 2405.05957 | link |
2024-05-09 | Probing Multimodal LLMs as World Models for Driving | Shiva Sreeram et.al. | 2405.05956 | link |
2024-05-09 | Smurfs: Leveraging Multiple Proficiency Agents with Context-Efficiency for Tool Planning | Junzhi Chen et.al. | 2405.05955 | null |
2024-05-09 | CuMo: Scaling Multimodal LLM with Co-Upcycled Mixture-of-Experts | Jiachen Li et.al. | 2405.05949 | link |
2024-05-09 | Trustworthy AI-Generative Content in Intelligent 6G Network: Adversarial, Privacy, and Fairness | Siyuan Li et.al. | 2405.05930 | null |
2024-05-09 | Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations? | Zorik Gekhman et.al. | 2405.05904 | null |
2024-05-09 | Co-driver: VLM-based Autonomous Driving Assistant with Human-like Behavior and Understanding for Complex Road Scenes | Ziang Guo et.al. | 2405.05885 | null |
2024-05-09 | FlockGPT: Guiding UAV Flocking with Linguistic Orchestration | Artem Lykov et.al. | 2405.05872 | null |
2024-05-09 | Robots Can Feel: LLM-based Framework for Robot Ethical Reasoning | Artem Lykov et.al. | 2405.05824 | link |
2024-05-08 | You Only Cache Once: Decoder-Decoder Architectures for Language Models | Yutao Sun et.al. | 2405.05254 | null |
2024-05-08 | Open Source Language Models Can Provide Feedback: Evaluating LLMs' Ability to Help Students Using GPT-4-As-A-Judge | Charles Koutcheme et.al. | 2405.05253 | link |
2024-05-09 | LLMs with Personalities in Multi-issue Negotiation Games | Sean Noh et.al. | 2405.05248 | null |
2024-05-08 | SuFIA: Language-Guided Augmented Dexterity for Robotic Surgical Assistants | Masoud Moghani et.al. | 2405.05226 | null |
2024-05-08 | Conv-Basis: A New Paradigm for Efficient Attention Inference and Gradient Computation in Transformers | Jiuxiang Gu et.al. | 2405.05219 | null |
2024-05-08 | MIDGARD: Self-Consistency Using Minimum Description Length for Structured Commonsense Reasoning | Inderjeet Nair et.al. | 2405.05189 | null |
2024-05-08 | Air Gap: Protecting Privacy-Conscious Conversational Agents | Eugene Bagdasaryan et.al. | 2405.05175 | null |
2024-05-08 | XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples | Peiqin Lin et.al. | 2405.05116 | null |
2024-05-08 | QFMTS: Generating Query-Focused Summaries over Multi-Table Inputs | Weijia Zhang et.al. | 2405.05109 | null |
2024-05-08 | Concerns on Bias in Large Language Models when Creating Synthetic Personae | Helena A. Haxvig et.al. | 2405.05080 | null |
2024-05-07 | ChatHuman: Language-driven 3D Human Understanding with Retrieval-Augmented Tool Reasoning | Jing Lin et.al. | 2405.04533 | null |
2024-05-07 | QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving | Yujun Lin et.al. | 2405.04532 | link |
2024-05-07 | NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts | Shudan Zhang et.al. | 2405.04520 | null |
2024-05-07 | xLSTM: Extended Long Short-Term Memory | Maximilian Beck et.al. | 2405.04517 | null |
2024-05-07 | A Transformer with Stack Attention | Jiaoda Li et.al. | 2405.04515 | link |
2024-05-08 | Unveiling Disparities in Web Task Handling Between Human and Web Agent | Kihoon Son et.al. | 2405.04497 | null |
2024-05-07 | Toward In-Context Teaching: Adapting Examples to Students' Misconceptions | Alexis Ross et.al. | 2405.04495 | null |
2024-05-07 | The Silicone Ceiling: Auditing GPT's Race and Gender Biases in Hiring | Lena Armstrong et.al. | 2405.04412 | null |
2024-05-07 | Learning To See But Forgetting To Follow: Visual Instruction Tuning Makes LLMs More Prone To Jailbreak Attacks | Georgios Pantazopoulos et.al. | 2405.04403 | link |
2024-05-07 | Large Language Models Cannot Explain Themselves | Advait Sarkar et.al. | 2405.04382 | null |
2024-05-06 | Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs | Muhammad Uzair Khattak et.al. | 2405.03690 | null |
2024-05-06 | Large Language Models Reveal Information Operation Goals, Tactics, and Narrative Frames | Keith Burghardt et.al. | 2405.03688 | null |
2024-05-06 | Language-Image Models with 3D Understanding | Jang Hyun Cho et.al. | 2405.03685 | null |
2024-05-06 | AtomGPT: Atomistic Generative Pre-trained Transformer for Forward and Inverse Materials Design | Kamal Choudhary et.al. | 2405.03680 | null |
2024-05-06 | A New Robust Partial |
Sharath Raghvendra et.al. | 2405.03664 | null |
2024-05-06 | When LLMs Meet Cybersecurity: A Systematic Literature Review | Jie Zhang et.al. | 2405.03644 | null |
2024-05-06 | A Controlled Experiment on the Energy Efficiency of the Source Code Generated by Code Llama | Vlad-Andrei Cursaru et.al. | 2405.03616 | null |
2024-05-06 | Enabling High-Sparsity Foundational Llama Models with Efficient Pretraining and Deployment | Abhinav Agarwalla et.al. | 2405.03594 | null |
2024-05-06 | AlphaMath Almost Zero: process Supervision without process | Guoxin Chen et.al. | 2405.03553 | null |
2024-05-06 | MAmmoTH2: Scaling Instructions from the Web | Xiang Yue et.al. | 2405.03548 | null |
2024-05-03 | Leveraging Large Language Models to Enhance Domain Expert Inclusion in Data Science Workflows | Jasmine Y. Shih et.al. | 2405.02260 | null |
2024-05-03 | What matters when building vision-language models? | Hugo Laurençon et.al. | 2405.02246 | null |
2024-05-03 | REASONS: A benchmark for REtrieval and Automated citationS Of scieNtific Sentences using Public and Proprietary LLMs | Deepa Tilwani et.al. | 2405.02228 | null |
2024-05-03 | Fair Risk Control: A Generalized Framework for Calibrating Multi-group Fairness Risks | Lujing Zhang et.al. | 2405.02225 | null |
2024-05-03 | FairEvalLLM. A Comprehensive Framework for Benchmarking Fairness in Large Language Model Recommender Systems | Yashar Deldjoo et.al. | 2405.02219 | null |
2024-05-03 | Automatic Programming: Large Language Models and Beyond | Michael R. Lyu et.al. | 2405.02213 | null |
2024-05-03 | Assessing and Verifying Task Utility in LLM-Powered Applications | Negar Arabzadeh et.al. | 2405.02178 | null |
2024-05-03 | The AI Review Lottery: Widespread AI-Assisted Peer Reviews Boost Paper Scores and Acceptance Rates | Giuseppe Russo Latona et.al. | 2405.02150 | null |
2024-05-03 | MedReadMe: A Systematic Study for Fine-grained Sentence Readability in Medical Domain | Chao Jiang et.al. | 2405.02144 | null |
2024-05-03 | Optimising Calls to Large Language Models with Uncertainty-Based Two-Tier Selection | Guillem Ramírez et.al. | 2405.02134 | null |
2024-05-02 | Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks | Murtaza Dalal et.al. | 2405.01534 | null |
2024-05-02 | OmniDrive: A Holistic LLM-Agent Framework for Autonomous Driving with 3D Perception, Reasoning and Planning | Shihao Wang et.al. | 2405.01533 | null |
2024-05-02 | FLAME: Factuality-Aware Alignment for Large Language Models | Sheng-Chieh Lin et.al. | 2405.01525 | null |
2024-05-02 | Transformer-Aided Semantic Communications | Matin Mortaheb et.al. | 2405.01521 | null |
2024-05-02 | Analyzing the Role of Semantic Representations in the Era of Large Language Models | Zhijing Jin et.al. | 2405.01502 | link |
2024-05-02 | Supporting Business Document Workflows via Collection-Centric Information Foraging with Large Language Models | Raymond Fok et.al. | 2405.01501 | null |
2024-05-02 | Controllable Text Generation in the Instruction-Tuning Era | Dhananjay Ashok et.al. | 2405.01490 | null |
2024-05-02 | NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment | Gerald Shen et.al. | 2405.01481 | link |
2024-05-02 | V-FLUTE: Visual Figurative Language Understanding with Textual Explanations | Arkadiy Saakyan et.al. | 2405.01474 | null |
2024-05-02 | Advancing human-centric AI for robust X-ray analysis through holistic self-supervised learning | Théo Moutakanni et.al. | 2405.01469 | null |
2024-05-01 | Is Bigger Edit Batch Size Always Better? -- An Empirical Study on Model Editing with Llama-3 | Junsang Yoon et.al. | 2405.00664 | null |
2024-05-01 | HalluVault: A Novel Logic Programming-aided Metamorphic Testing Framework for Detecting Fact-Conflicting Hallucinations in Large Language Models | Ningke Li et.al. | 2405.00648 | null |
2024-05-01 | When Quantization Affects Confidence of Large Language Models? | Irina Proskurina et.al. | 2405.00632 | null |
2024-05-01 | "I'm Not Sure, But...": Examining the Impact of Large Language Models' Uncertainty Expression on User Reliance and Trust | Sunnie S. Y. Kim et.al. | 2405.00623 | null |
2024-05-01 | Addressing Topic Granularity and Hallucination in Large Language Models for Topic Modelling | Yida Mu et.al. | 2405.00611 | null |
2024-05-01 | Investigating Automatic Scoring and Feedback using Large Language Models | Gloria Ashiya Katuka et.al. | 2405.00602 | null |
2024-05-01 | Are Models Biased on Text without Gender-related Language? | Catarina G Belém et.al. | 2405.00588 | link |
2024-05-01 | The Real, the Better: Aligning Large Language Models with Online Human Behaviors | Guanying Jiang et.al. | 2405.00578 | null |
2024-05-01 | EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model | Deng Li et.al. | 2405.00574 | null |
2024-05-01 | Spherical Linear Interpolation and Text-Anchoring for Zero-shot Composed Image Retrieval | Young Kyun Jang et.al. | 2405.00571 | null |
2024-04-30 | DOCCI: Descriptions of Connected and Contrasting Images | Yasumasa Onoe et.al. | 2404.19753 | null |
2024-04-30 | Visual Fact Checker: Enabling High-Fidelity Detailed Caption Generation | Yunhao Ge et.al. | 2404.19752 | null |
2024-04-30 | PrivComp-KG : Leveraging Knowledge Graph and Large Language Models for Privacy Policy Compliance Verification | Leon Garza et.al. | 2404.19744 | null |
2024-04-30 | Better & Faster Large Language Models via Multi-token Prediction | Fabian Gloeckle et.al. | 2404.19737 | null |
2024-04-30 | A Framework for Leveraging Human Computation Gaming to Enhance Knowledge Graphs for Accuracy Critical Generative AI Applications | Steph Buongiorno et.al. | 2404.19729 | null |
2024-04-30 | PANGeA: Procedural Artificial Narrative using Generative AI for Turn-Based Video Games | Steph Buongiorno et.al. | 2404.19721 | null |
2024-04-30 | Assessing LLMs in Malicious Code Deobfuscation of Real-world Malware Campaigns | Constantinos Patsakis et.al. | 2404.19715 | null |
2024-04-30 | Automated Generation of High-Quality Medical Simulation Scenarios Through Integration of Semi-Structured Data and Large Language Models | Scott Sumpter et.al. | 2404.19713 | null |
2024-04-30 | When to Retrieve: Teaching LLMs to Utilize Information Retrieval Effectively | Tiziano Labruna et.al. | 2404.19705 | null |
2024-04-30 | Naturally Supervised 3D Visual Grounding with Language-Regularized Concept Learners | Chun Feng et.al. | 2404.19696 | null |
2024-04-29 | Hallucination of Multimodal Large Language Models: A Survey | Zechen Bai et.al. | 2404.18930 | link |
2024-04-29 | DPO Meets PPO: Reinforced Token Optimization for RLHF | Han Zhong et.al. | 2404.18922 | null |
2024-04-29 | TheaterGen: Character Management with LLM for Consistent Multi-turn Image Generation | Junhao Cheng et.al. | 2404.18919 | null |
2024-04-29 | Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting | Fangcheng Liu et.al. | 2404.18911 | null |
2024-04-29 | Human-in-the-Loop Synthetic Text Data Inspection with Provenance Tracking | Hong Jin Kang et.al. | 2404.18881 | link |
2024-04-29 | More RLHF, More Trust? On The Impact of Human Preference Alignment On Language Model Trustworthiness | Aaron J. Li et.al. | 2404.18870 | link |
2024-04-29 | Truth-value judgment in language models: belief directions are context sensitive | Stefan F. Schouten et.al. | 2404.18865 | null |
2024-04-29 | Performance-Aligned LLMs for Generating Fast Code | Daniel Nichols et.al. | 2404.18864 | null |
2024-04-29 | VERT: Verified Equivalent Rust Transpilation with Few-Shot Learning | Aidan Z. H. Yang et.al. | 2404.18852 | null |
2024-04-29 | It's Difficult to be Neutral -- Human and LLM-based Sentiment Annotation of Patient Comments | Petter Mæhlum et.al. | 2404.18832 | null |
2024-04-26 | Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo | Stephen Zhao et.al. | 2404.17546 | null |
2024-04-26 | Large Language Model Agent as a Mechanical Designer | Yayati Jadhav et.al. | 2404.17525 | null |
2024-04-26 | On the Use of Large Language Models to Generate Capability Ontologies | Luis Miguel Vieira da Silva et.al. | 2404.17524 | null |
2024-04-26 | Enhancing Legal Compliance and Regulation Analysis with Large Language Models | Shabnam Hassani et.al. | 2404.17522 | null |
2024-04-26 | A Comprehensive Evaluation on Event Reasoning of Large Language Models | Zhengwei Tao et.al. | 2404.17513 | link |
2024-04-26 | Learning text-to-video retrieval from image captioning | Lucas Ventura et.al. | 2404.17498 | null |
2024-04-26 | CEval: A Benchmark for Evaluating Counterfactual Text Generation | Van Bach Nguyen et.al. | 2404.17475 | null |
2024-04-26 | Ruffle&Riley: Insights from Designing and Evaluating a Large Language Model-Based Conversational Tutoring System | Robin Schmucker et.al. | 2404.17460 | null |
2024-04-26 | "ChatGPT Is Here to Help, Not to Replace Anybody" -- An Evaluation of Students' Opinions On Integrating ChatGPT In CS Courses | Bruno Pereira Cipriano et.al. | 2404.17443 | null |
2024-04-26 | InspectorRAGet: An Introspection Platform for RAG Evaluation | Kshitij Fadnis et.al. | 2404.17347 | null |
2024-04-25 | Make-it-Real: Unleashing Large Multimodal Model's Ability for Painting 3D Objects with Realistic Materials | Ye Fang et.al. | 2404.16829 | null |
2024-04-25 | How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites | Zhe Chen et.al. | 2404.16821 | link |
2024-04-25 | IndicGenBench: A Multilingual Benchmark to Evaluate Generation Capabilities of LLMs on Indic Languages | Harman Singh et.al. | 2404.16816 | null |
2024-04-25 | Make Your LLM Fully Utilize the Context | Shengnan An et.al. | 2404.16811 | link |
2024-04-25 | Improving Diversity of Commonsense Generation by Large Language Models via In-Context Learning | Tianhui Zhang et.al. | 2404.16807 | null |
2024-04-25 | Weak-to-Strong Extrapolation Expedites Alignment | Chujie Zheng et.al. | 2404.16792 | link |
2024-04-25 | SEED-Bench-2-Plus: Benchmarking Multimodal Large Language Models with Text-Rich Visual Comprehension | Bohao Li et.al. | 2404.16790 | link |
2024-04-25 | Continual Learning of Large Language Models: A Comprehensive Survey | Haizhou Shi et.al. | 2404.16789 | link |
2024-04-25 | Prefix Text as a Yarn: Eliciting Non-English Alignment in Foundation Language Model | Runzhe Zhan et.al. | 2404.16766 | null |
2024-04-25 | RadGenome-Chest CT: A Grounded Vision-Language Dataset for Chest CT Analysis | Xiaoman Zhang et.al. | 2404.16754 | null |
2024-04-24 | Hybrid LLM/Rule-based Approaches to Business Insights Generation from Structured Data | Aliaksei Vertsel et.al. | 2404.15604 | null |
2024-04-24 | ImplicitAVE: An Open-Source Dataset and Multimodal LLMs Benchmark for Implicit Attribute Value Extraction | Henry Peng Zou et.al. | 2404.15592 | link |
2024-04-24 | Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations? | Hossein Salami et.al. | 2404.15578 | null |
2024-04-23 | PRISM: Patient Records Interpretation for Semantic Clinical Trial Matching using Large Language Models | Shashi Kant Gupta et.al. | 2404.15549 | null |
2024-04-23 | Towards Systematic Evaluation of Logical Reasoning Ability of Large Language Models | Mihir Parmar et.al. | 2404.15522 | link |
2024-04-23 | Visual Delta Generator with Large Multi-modal Models for Semi-supervised Composed Image Retrieval | Young Kyun Jang et.al. | 2404.15516 | null |
2024-04-23 | ToM-LM: Delegating Theory Of Mind Reasoning to External Symbolic Executors in Large Language Models | Weizhi Tang et.al. | 2404.15515 | null |
2024-04-23 | GeoLLM-Engine: A Realistic Environment for Building Geospatial Copilots | Simranjit Singh et.al. | 2404.15500 | null |
2024-04-23 | IryoNLP at MEDIQA-CORR 2024: Tackling the Medical Error Detection & Correction Task On the Shoulders of Medical Agents | Jean-Philippe Corbeil et.al. | 2404.15488 | link |
2024-04-23 | Large Language Models Spot Phishing Emails with Surprising Accuracy: A Comparative Analysis of Performance | Het Patel et.al. | 2404.15485 | null |
2024-04-23 | Aligning LLM Agents by Learning Latent Preference from User Edits | Ge Gao et.al. | 2404.15269 | null |
2024-04-23 | XFT: Unlocking the Power of Code Instruction Tuning by Simply Merging Upcycled Mixture-of-Experts | Yifeng Ding et.al. | 2404.15247 | link |
2024-04-23 | Revisiting Unnaturalness for Automated Program Repair in the Era of Large Language Models | Aidan Z. H. Yang et.al. | 2404.15236 | null |
2024-04-23 | Re-Thinking Inverse Graphics With Large Language Models | Peter Kulits et.al. | 2404.15228 | null |
2024-04-23 | Setting up the Data Printer with Improved English to Ukrainian Machine Translation | Yurii Paniv et.al. | 2404.15196 | null |
2024-04-23 | Regressive Side Effects of Training Language Models to Mimic Student Misconceptions | Shashank Sonkar et.al. | 2404.15156 | null |
2024-04-23 | Bias patterns in the application of LLMs for clinical decision support: A comprehensive study | Raphael Poulain et.al. | 2404.15149 | null |
2024-04-23 | Rethinking LLM Memorization through the Lens of Adversarial Compression | Avi Schwarzschild et.al. | 2404.15146 | null |
2024-04-23 | MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning | Sunan He et.al. | 2404.15127 | null |
2024-04-23 | Multimodal Large Language Model is a Human-Aligned Annotator for Text-to-Image Generation | Xun Wu et.al. | 2404.15100 | null |
2024-04-22 | AutoAD III: The Prequel -- Back to the Pixels | Tengda Han et.al. | 2404.14412 | null |
2024-04-22 | SpaceByte: Towards Deleting Tokenization from Large Language Modeling | Kevin Slagle et.al. | 2404.14408 | link |
2024-04-22 | RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios? | Adrian de Wynter et.al. | 2404.14397 | null |
2024-04-22 | A Survey on Self-Evolution of Large Language Models | Zhengwei Tao et.al. | 2404.14387 | null |
2024-04-22 | Beyond Scaling: Predicting Patent Approval with Domain-specific Fine-grained Claim Dependency Graph | Xiaochen Kev Gao et.al. | 2404.14372 | link |
2024-04-22 | Preference Fine-Tuning of LLMs Should Leverage Suboptimal, On-Policy Data | Fahim Tajwar et.al. | 2404.14367 | link |
2024-04-22 | Better Synthetic Data by Retrieving and Transforming Existing Datasets | Saumya Gandhi et.al. | 2404.14361 | link |
2024-04-22 | Rethinking Legal Compliance Automation: Opportunities with Large Language Models | Shabnam Hassani et.al. | 2404.14356 | null |
2024-04-22 | Automated Long Answer Grading with RiceChem Dataset | Shashank Sonkar et.al. | 2404.14316 | null |
2024-04-22 | Explaining Arguments' Strength: Unveiling the Role of Attacks and Supports (Technical Report) | Xiang Yin et.al. | 2404.14304 | null |
2024-04-19 | MoVA: Adapting Mixture of Vision Experts to Multimodal Context | Zhuofan Zong et.al. | 2404.13046 | link |
2024-04-19 | Unified Scene Representation and Reconstruction for 3D Large Language Models | Tao Chu et.al. | 2404.13044 | null |
2024-04-19 | Data Alignment for Zero-Shot Concept Generation in Dermatology AI | Soham Gadgil et.al. | 2404.13043 | null |
2024-04-19 | LaPA: Latent Prompt Assist Model For Medical Visual Question Answering | Tiancheng Gu et.al. | 2404.13039 | link |
2024-04-19 | Sample Design Engineering: An Empirical Study of What Makes Good Downstream Fine-Tuning Samples for LLMs | Biyang Guo et.al. | 2404.13033 | link |
2024-04-19 | When Life gives you LLMs, make LLM-ADE: Large Language Models with Adaptive Data Engineering | Stephen Choi et.al. | 2404.13028 | null |
2024-04-19 | Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models | Chuofan Ma et.al. | 2404.13013 | null |
2024-04-19 | Rethinking the Evaluation of Dialogue Systems: Effects of User Feedback on Crowdworkers and LLMs | Clemencia Siro et.al. | 2404.12994 | link |
2024-04-19 | RedactBuster: Entity Type Recognition from Redacted Documents | Mirco Beltrame et.al. | 2404.12991 | null |
2024-04-19 | FineRec:Exploring Fine-grained Sequential Recommendation | Xiaokun Zhang et.al. | 2404.12975 | null |
2024-04-18 | BLINK: Multimodal Large Language Models Can See but Not Perceive | Xingyu Fu et.al. | 2404.12390 | null |
2024-04-18 | MedThink: Explaining Medical Visual Question Answering via Multimodal Decision-Making Rationale | Xiaotang Gai et.al. | 2404.12372 | null |
2024-04-18 | When LLMs are Unfit Use FastFit: Fast and Effective Text Classification with Many Classes | Asaf Yehudai et.al. | 2404.12365 | null |
2024-04-18 | Towards a Foundation Model for Partial Differential Equation: Multi-Operator Learning and Extrapolation | Jingmin Sun et.al. | 2404.12355 | link |
2024-04-18 | V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning | Hang Hua et.al. | 2404.12353 | null |
2024-04-18 | Large Language Models in Targeted Sentiment Analysis | Nicolay Rusnachenko et.al. | 2404.12342 | link |
2024-04-18 | Normative Requirements Operationalization with Large Language Models | Nick Feng et.al. | 2404.12335 | null |
2024-04-18 | Large Language Models for Synthetic Participatory Planning of Shared Automated Electric Mobility Systems | Jiangbo Yu et.al. | 2404.12317 | null |
2024-04-18 | Simultaneous Interpretation Corpus Construction by Large Language Models in Distant Language Pair | Yusuke Sakai et.al. | 2404.12299 | null |
2024-04-18 | Augmenting emotion features in irony detection with Large language modeling | Yucheng Lin et.al. | 2404.12291 | null |
2024-04-17 | A Deep Dive into Large Language Models for Automated Bug Localization and Repair | Soneya Binta Hossain et.al. | 2404.11595 | null |
2024-04-17 | Related Work and Citation Text Generation: A Survey | Xiangci Li et.al. | 2404.11588 | null |
2024-04-17 | LLMTune: Accelerate Database Knob Tuning with Large Language Models | Xinmei Huang et.al. | 2404.11581 | null |
2024-04-17 | MoA: Mixture-of-Attention for Subject-Context Disentanglement in Personalized Image Generation | Kuan-Chieh et.al. | 2404.11565 | null |
2024-04-17 | Quantifying Multilingual Performance of Large Language Models Across Languages | Zihao Li et.al. | 2404.11553 | null |
2024-04-17 | Evaluating Span Extraction in Generative Paradigm: A Reflection on Aspect-Based Sentiment Analysis | Soyoung Yang et.al. | 2404.11539 | null |
2024-04-17 | Pack of LLMs: Model Fusion at Test-Time via Perplexity Optimization | Costas Mavromatis et.al. | 2404.11531 | null |
2024-04-17 | Embedding Privacy in Computational Social Science and Artificial Intelligence Research | Keenan Jones et.al. | 2404.11515 | null |
2024-04-17 | Towards Coarse-to-Fine Evaluation of Inference Efficiency for Large Language Models | Yushuo Chen et.al. | 2404.11502 | link |
2024-04-17 | Paraphrase and Solve: Exploring and Exploiting the Impact of Surface Form on Mathematical Reasoning in Large Language Models | Yue Zhou et.al. | 2404.11500 | link |
2024-04-16 | Nearly Optimal Algorithms for Contextual Dueling Bandits from Adversarial Feedback | Qiwei Di et.al. | 2404.10776 | null |
2024-04-16 | LaDiC: Are Diffusion Models Really Inferior to Autoregressive Counterparts for Image-to-Text Generation? | Yuchi Wang et.al. | 2404.10763 | link |
2024-04-16 | Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification | Yu-Yang Li et.al. | 2404.10757 | null |
2024-04-16 | Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study | Shusheng Xu et.al. | 2404.10719 | null |
2024-04-16 | An empirical study on code review activity prediction in practice | Doriane Olewicki et.al. | 2404.10703 | null |
2024-04-16 | Automating REST API Postman Test Cases Using LLM | S Deepika Sri et.al. | 2404.10678 | null |
2024-04-16 | ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images | Quan Van Nguyen et.al. | 2404.10652 | link |
2024-04-16 | Self-playing Adversarial Language Game Enhances LLM Reasoning | Pengyu Cheng et.al. | 2404.10642 | link |
2024-04-16 | HLAT: High-quality Large Language Model Pre-trained on AWS Trainium | Haozheng Fan et.al. | 2404.10630 | null |
2024-04-16 | Private Attribute Inference from Images with Vision-Language Models | Batuhan Tömekçe et.al. | 2404.10618 | null |
2024-04-15 | Personalized Collaborative Fine-Tuning for On-Device Large Language Models | Nicolas Wagner et.al. | 2404.09753 | null |
2024-04-15 | Quantization of Large Language Models with an Overdetermined Basis | Daniil Merkulov et.al. | 2404.09737 | null |
2024-04-15 | Unveiling Imitation Learning: Exploring the Impact of Data Falsity to Large Language Model | Hyunsoo Cho et.al. | 2404.09717 | null |
2024-04-15 | Enhancing Robot Explanation Capabilities through Vision-Language Models: a Preliminary Study by Interpreting Visual Inputs for Improved Human-Robot Interaction | David Sobrín-Hidalgo et.al. | 2404.09705 | null |
2024-04-15 | Generative AI for Game Theory-based Mobile Networking | Long He et.al. | 2404.09699 | null |
2024-04-15 | Are Large Language Models Reliable Argument Quality Annotators? | Nailia Mirzakhmedova et.al. | 2404.09696 | null |
2024-04-15 | LoRAP: Transformer Sub-Layers Deserve Differentiated Structured Compression for Large Language Models | Guangyan Li et.al. | 2404.09695 | null |
2024-04-15 | Multi-News+: Cost-efficient Dataset Cleansing via LLM-based Data Annotation | Juhwan Choi et.al. | 2404.09682 | null |
2024-04-15 | Do LLMs Understand Visual Anomalies? Uncovering LLM Capabilities in Zero-shot Anomaly Detection | Jiaqi Zhu et.al. | 2404.09654 | null |
2024-04-15 | Bridging Vision and Language Spaces with Assignment Prediction | Jungin Park et.al. | 2404.09632 | link |
2024-04-12 | Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts | Övgü Özdemir et.al. | 2404.08589 | link |
2024-04-12 | Enhancing Autonomous Vehicle Training with Language Model Integration and Critical Scenario Generation | Hanlin Tian et.al. | 2404.08570 | null |
2024-04-12 | RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs | Shreyas Chaudhari et.al. | 2404.08555 | null |
2024-04-12 | Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward | Xuan Xie et.al. | 2404.08517 | null |
2024-04-12 | Efficient Interactive LLM Serving with Proxy Model-based Sequence Length Prediction | Haoran Qiu et.al. | 2404.08509 | link |
2024-04-12 | LaSagnA: Language-based Segmentation Assistant for Complex Queries | Cong Wei et.al. | 2404.08506 | link |
2024-04-12 | Strategic Interactions between Large Language Models-based Agents in Beauty Contests | Siting Lu et.al. | 2404.08492 | null |
2024-04-12 | Thematic Analysis with Large Language Models: does it work with languages other than English? A targeted test in Italian | Stefano De Paoli et.al. | 2404.08488 | null |
2024-04-12 | Comparing Apples to Oranges: LLM-powered Multimodal Intention Prediction in an Object Categorization Task | Hassan Ali et.al. | 2404.08424 | null |
2024-04-12 | AdapterSwap: Continuous Training of LLMs with Data Removal and Access-Control Guarantees | William Fleshman et.al. | 2404.08417 | null |
2024-04-11 | OpenBias: Open-set Bias Detection in Text-to-Image Generative Models | Moreno D'Incà et.al. | 2404.07990 | null |
2024-04-11 | View Selection for 3D Captioning via Diffusion Ranking | Tiange Luo et.al. | 2404.07984 | null |
2024-04-11 | Manipulating Large Language Models to Increase Product Visibility | Aounon Kumar et.al. | 2404.07981 | link |
2024-04-11 | LLoCO: Learning Long Contexts Offline | Sijun Tan et.al. | 2404.07979 | link |
2024-04-11 | Ferret-v2: An Improved Baseline for Referring and Grounding with Large Language Models | Haotian Zhang et.al. | 2404.07973 | null |
2024-04-11 | Leveraging Large Language Models (LLMs) to Support Collaborative Human-AI Online Risk Data Annotation | Jinkyung Park et.al. | 2404.07926 | null |
2024-04-11 | LaVy: Vietnamese Multimodal Large Language Model | Chi Tran et.al. | 2404.07922 | null |
2024-04-11 | AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs | Zeyi Liao et.al. | 2404.07921 | link |
2024-04-11 | DesignQA: A Multimodal Benchmark for Evaluating Large Language Models' Understanding of Engineering Documentation | Anna C. Doris et.al. | 2404.07917 | link |
2024-04-11 | High-Dimension Human Value Representation in Large Language Models | Samuel Cahyawijaya et.al. | 2404.07900 | null |
2024-04-10 | UMBRAE: Unified Multimodal Decoding of Brain Signals | Weihao Xia et.al. | 2404.07202 | null |
2024-04-10 | Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention | Tsendsuren Munkhdalai et.al. | 2404.07143 | null |
2024-04-11 | Semantically-correlated memories in a dense associative model | Thomas F Burns et.al. | 2404.07123 | null |
2024-04-10 | Continuous Language Model Interpolation for Dynamic and Controllable Text Generation | Sara Kangaslahti et.al. | 2404.07117 | null |
2024-04-11 | From Model-centered to Human-Centered: Revision Distance as a Metric for Text Evaluation in LLMs-based Applications | Yongqiang Ma et.al. | 2404.07108 | null |
2024-04-10 | Graph Chain-of-Thought: Augmenting Large Language Models by Reasoning on Graphs | Bowen Jin et.al. | 2404.07103 | null |
2024-04-10 | Dynamic Generation of Personalities with Large Language Models | Jianzhi Liu et.al. | 2404.07084 | null |
2024-04-10 | VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning | Alexandros Xenos et.al. | 2404.07078 | link |
2024-04-10 | Exploring Concept Depth: How Large Language Models Acquire Knowledge at Different Layers? | Mingyu Jin et.al. | 2404.07066 | link |
2024-04-10 | Groundedness in Retrieval-augmented Long-form Generation: An Empirical Study | Alessandro Stolfo et.al. | 2404.07060 | null |
2024-04-09 | Pitfalls of Conversational LLMs on News Debiasing | Ipek Baris Schlicht et.al. | 2404.06488 | null |
2024-04-09 | Ada-LEval: Evaluating long-context LLMs with length-adaptable benchmarks | Chonghua Wang et.al. | 2404.06480 | link |
2024-04-09 | Automated Federated Pipeline for Parameter-Efficient Fine-Tuning of Large Language Models | Zihan Fang et.al. | 2404.06448 | null |
2024-04-09 | Large Language Models to the Rescue: Deadlock Resolution in Multi-Robot Systems | Kunal Garg et.al. | 2404.06413 | null |
2024-04-09 | AgentQuest: A Modular Benchmark Framework to Measure Progress and Improve LLM Agents | Luca Gioacchini et.al. | 2404.06411 | link |
2024-04-09 | Take a Look at it! Rethinking How to Evaluate Language Model Jailbreak | Hongyu Cai et.al. | 2404.06407 | link |
2024-04-09 | Apprentices to Research Assistants: Advancing Research with Large Language Models | M. Namvarpour et.al. | 2404.06404 | null |
2024-04-09 | MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies | Shengding Hu et.al. | 2404.06395 | link |
2024-04-09 | MuPT: A Generative Symbolic Music Pretrained Transformer | Xingwei Qu et.al. | 2404.06393 | null |
2024-04-09 | Latent Distance Guided Alignment Training for Large Language Models | Haotian Luo et.al. | 2404.06390 | null |
2024-04-08 | MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding | Bo He et.al. | 2404.05726 | null |
2024-04-08 | Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs | Keen You et.al. | 2404.05719 | null |
2024-04-08 | Comprehensive Study on German Language Models for Clinical and Biomedical Text Understanding | Ahmad Idrissi-Yaghir et.al. | 2404.05694 | null |
2024-04-08 | Evaluating Mathematical Reasoning Beyond Accuracy | Shijie Xia et.al. | 2404.05692 | link |
2024-04-08 | Retrieval-Augmented Open-Vocabulary Object Detection | Jooyeon Kim et.al. | 2404.05687 | link |
2024-04-08 | MoMA: Multimodal LLM Adapter for Fast Personalized Image Generation | Kunpeng Song et.al. | 2404.05674 | null |
2024-04-08 | CoReS: Orchestrating the Dance of Reasoning and Segmentation | Xiaoyi Bao et.al. | 2404.05673 | null |
2024-04-08 | Fighting crime with Transformers: Empirical analysis of address parsing methods in payment data | Haitham Hammami et.al. | 2404.05632 | link |
2024-04-08 | LTNER: Large Language Model Tagging for Named Entity Recognition with Contextualized Entity Marking | Faren Yan et.al. | 2404.05624 | null |
2024-04-08 | MedExpQA: Multilingual Benchmarking of Large Language Models for Medical Question Answering | Iñigo Alonso et.al. | 2404.05590 | null |
2024-04-05 | Physical Property Understanding from Language-Embedded Feature Fields | Albert J. Zhai et.al. | 2404.04242 | null |
2024-04-05 | Cleared for Takeoff? Compositional & Conditional Reasoning may be the Achilles Heel to (Flight-Booking) Language Agents | Harsh Kohli et.al. | 2404.04237 | null |
2024-04-05 | Benchmarking and Improving Compositional Generalization of Multi-aspect Controllable Text Generation | Tianqi Zhong et.al. | 2404.04232 | link |
2024-04-05 | Social Skill Training with Large Language Models | Diyi Yang et.al. | 2404.04204 | null |
2024-04-05 | Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model | Xinrun Du et.al. | 2404.04167 | null |
2024-04-05 | Large language models as oracles for instantiating ontologies with domain-specific knowledge | Giovanni Ciatto et.al. | 2404.04108 | link |
2024-04-05 | Improving Factual Accuracy of Neural Table-to-Text Output by Addressing Input Problems in ToTTo | Barkavi Sundararajan et.al. | 2404.04103 | link |
2024-04-05 | Robust Preference Optimization with Provable Noise Tolerance for LLMs | Xize Liang et.al. | 2404.04102 | null |
2024-04-05 | Assessing the quality of information extraction | Filip Seitl et.al. | 2404.04068 | null |
2024-04-05 | CLUE: A Clinical Language Understanding Evaluation for LLMs | Amin Dada et.al. | 2404.04067 | null |
2024-04-04 | CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching | Dongzhi Jiang et.al. | 2404.03653 | link |
2024-04-04 | AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent | Hanyu Lai et.al. | 2404.03648 | link |
2024-04-04 | Capabilities of Large Language Models in Control Engineering: A Benchmark Study on GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra | Darioush Kevian et.al. | 2404.03647 | null |
2024-04-04 | Training LLMs over Neurally Compressed Text | Brian Lester et.al. | 2404.03626 | null |
2024-04-04 | Unveiling LLMs: The Evolution of Latent Representations in a Temporal Knowledge Graph | Marco Bronzini et.al. | 2404.03623 | null |
2024-04-04 | Visualization-of-Thought Elicits Spatial Reasoning in Large Language Models | Wenshan Wu et.al. | 2404.03622 | null |
2024-04-04 | DeViDe: Faceted medical knowledge for improved medical vision-language pre-training | Haozhe Luo et.al. | 2404.03618 | null |
2024-04-04 | Sailor: Open Language Models for South-East Asia | Longxu Dou et.al. | 2404.03608 | link |
2024-04-04 | Evaluating LLMs at Detecting Errors in LLM Responses | Ryo Kamoi et.al. | 2404.03602 | link |
2024-04-04 | Intent Detection and Entity Extraction from BioMedical Literature | Ankan Mullick et.al. | 2404.03598 | link |
2024-04-03 | ALOHa: A New Measure for Hallucination in Captioning Models | Suzanne Petryk et.al. | 2404.02904 | null |
2024-04-03 | MatAtlas: Text-driven Consistent Geometry Texturing and Material Assignment | Duygu Ceylan et.al. | 2404.02899 | null |
2024-04-03 | ChatGLM-Math: Improving Math Problem-Solving in Large Language Models with a Self-Critique Pipeline | Yifan Xu et.al. | 2404.02893 | null |
2024-04-03 | Integrating Explanations in Learning LTL Specifications from Demonstrations | Ashutosh Gupta et.al. | 2404.02872 | null |
2024-04-03 | Toward Inference-optimal Mixture-of-Expert Large Language Models | Longfei Yun et.al. | 2404.02852 | null |
2024-04-03 | I-Design: Personalized LLM Interior Designer | Ata Çelen et.al. | 2404.02838 | null |
2024-04-03 | Cherry on Top: Parameter Heterogeneity and Quantization in Large Language Models | Wanyun Cui et.al. | 2404.02837 | null |
2024-04-03 | Retrieving Examples from Memory for Retrieval Augmented Neural Machine Translation: A Systematic Comparison | Maxime Bouthors et.al. | 2404.02835 | null |
2024-04-03 | Empowering Biomedical Discovery with AI Agents | Shanghua Gao et.al. | 2404.02831 | null |
2024-04-03 | BAdam: A Memory Efficient Full Parameter Training Method for Large Language Models | Qijun Luo et.al. | 2404.02827 | link |
2024-04-02 | Topic-based Watermarks for LLM-Generated Text | Alexander Nemecek et.al. | 2404.02138 | null |
2024-04-02 | Exploring Automated Distractor Generation for Math Multiple-choice Questions via Large Language Models | Wanyong Feng et.al. | 2404.02124 | null |
2024-04-02 | GINopic: Topic Modeling with Graph Isomorphism Network | Suman Adhya et.al. | 2404.02115 | link |
2024-04-02 | CLAPNQ: Cohesive Long-form Answers from Passages in Natural Questions for RAG systems | Sara Rosenthal et.al. | 2404.02103 | link |
2024-04-02 | Advancing LLM Reasoning Generalists with Preference Trees | Lifan Yuan et.al. | 2404.02078 | link |
2024-04-02 | Digital Forgetting in Large Language Models: A Survey of Unlearning Methods | Alberto Blanco-Justicia et.al. | 2404.02062 | null |
2024-04-02 | Long-context LLMs Struggle with Long In-context Learning | Tianle Li et.al. | 2404.02060 | link |
2024-04-02 | Deconstructing In-Context Learning: Understanding Prompts via Corruption | Namrata Shivagunde et.al. | 2404.02054 | link |
2024-04-02 | BERTopic-Driven Stock Market Predictions: Unraveling Sentiment Insights | Enmin Zhu et.al. | [2404.02053](http://arxiv |