It appears that the List of All Adversarial Example Papers has been experiencing crashes over the past few days. In the absence of this valuable resource, staying up-to-date with the latest research papers in this field has become challenging. Consequently, I created a repository aimed at aggregating and maintaining the most current papers in this domain. While this repository may not encompass every paper, I did try. If you find any papers we have missed, just drop me an email. We have included the data from List of All Adversarial Example Papers till 2023-09-01. We also provide a list of papers about transfer-based attacks here.
-
Decoupling Augmentation Bias in Prompt Learning for Vision-Language Models
Gahyeon Kim, Sohee Kim, Seokju Lee
-
Whisper Leak: a side-channel attack on Large Language Models
Geoff McDonald, Jonathan Bar Or
-
From Insight to Exploit: Leveraging LLM Collaboration for Adaptive Adversarial Text Generation
Najrin Sultana, Md Rafi Ur Rashid, Kang Gu, Shagufta Mehnaz
-
Yize Liu, Yunyun Hou, Aina Sui
-
A Lightweight 3D-CNN for Event-Based Human Action Recognition with Privacy-Preserving Potential
Mehdi Sefidgar Dilmaghani, Francis Fowley, Peter Corcoran
-
Byzantine-Robust Federated Learning with Learnable Aggregation Weights
Javad Parsa, Amir Hossein Daghestani, André M. H. Teixeira, Mikael Johansson
-
Death by a Thousand Prompts: Open Model Vulnerability Analysis
Amy Chang, Nicholas Conley, Harish Santhanalakshmi Ganesan, Adam Swanda
-
Bayesian Advantage of Re-Identification Attack in the Shuffle Model
Pengcheng Su, Haibo Cheng, Ping Wang
-
Auditing M-LLMs for Privacy Risks: A Synthetic Benchmark and Evaluation Framework
Junhao Li, Jiahao Chen, Zhou Feng, Chunyi Zhou
-
When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning
Chenyu Zhang, Minsol Kim, Shohreh Ghorbani, Jingyao Wu, Rosalind Picard, Patricia Maes, Paul Pu Liang
-
Optimizing AI Agent Attacks With Synthetic Data
Chloe Loughridge, Paul Colognese, Avery Griffin, Tyler Tracy, Jon Kutasov, Joe Benton
-
AutoAdv: Automated Adversarial Prompting for Multi-Turn Jailbreaking of Large Language Models
Aashray Reddy, Andrew Zagula, Nicholas Saban
-
On The Dangers of Poisoned LLMs In Security Automation
Patrick Karlsen, Even Eilertsen
-
Ferhat Ozgur Catak, Jungwon Seo, Umit Cali
-
LiveSecBench: A Dynamic and Culturally-Relevant AI Safety Benchmark for LLMs in Chinese Context
Yudong Li, Zhongliang Yang, Kejiang Chen, Wenxuan Wang, Tianxin Zhang, Sifang Wan, Kecheng Wang, Haitian Li, Xu Wang, Lefan Cheng, Youdan Yang, Baocheng Chen, Ziyu Liu, Yufei Sun, Liyan Wu, Wenya Wen, Xingchi Gu, Peiru Yang
-
Hao Li, Daiwei Lu, Jesse d'Almeida, Dilara Isik, Ehsan Khodapanah Aghdam, Nick DiSanto, Ayberk Acar, Susheela Sharma, Jie Ying Wu, Robert J. Webster III, Ipek Oguz
-
Robust Face Liveness Detection for Biometric Authentication using Single Image
Poulami Raha, Yeongnam Chae
-
A Non-Adversarial Approach to Idempotent Generative Modelling
Mohammed Al-Jaff, Giovanni Luca Marchetti, Michael C Welle, Jens Lundell, Mats G. Gustafsson, Gustav Eje Henter, Hossein Azizpour, Danica Kragic
-
Nesterov-Accelerated Robust Federated Learning Over Byzantine Adversaries
Lihan Xu, Yanjie Dong, Gang Wang, Runhao Zeng, Xiaoyi Fan, Xiping Hu
-
Enhancing Federated Learning Privacy with QUBO
Andras Ferenczi, Sutapa Samanta, Dagen Wang, Todd Hodges
-
Nicolas Riccieri Gardin Assumpcao, Leandro Villas
-
PrivGNN: High-Performance Secure Inference for Cryptographic Graph Neural Networks
Fuyi Wang, Zekai Chen, Mingyuan Fan, Jianying Zhou, Lei Pan, Leo Yu Zhang
-
An Automated Framework for Strategy Discovery, Retrieval, and Evolution in LLM Jailbreak Attacks
Xu Liu, Yan Chen, Kan Ling, Yichi Zhu, Hengrun Zhang, Guisheng Fan, Huiqun Yu
-
Verifying LLM Inference to Prevent Model Weight Exfiltration
Roy Rinberg, Adam Karvonen, Alex Hoover, Daniel Reuter, Keri Warr
-
Evaluating Control Protocols for Untrusted AI Agents
Jon Kutasov, Chloe Loughridge, Yuqi Sun, Henry Sleight, Buck Shlegeris, Tyler Tracy, Joe Benton
-
W.K.M Mithsara, Ning Yang, Ahmed Imteaj, Hussein Zangoti, Abdur R. Shahid
-
Online Learning to Rank under Corruption: A Robust Cascading Bandits Approach
Fatemeh Ghaffari, Siddarth Sitaraman, Xutong Liu, Xuchuang Wang, Mohammad Hajiesmaili
-
PrivyWave: Privacy-Aware Wireless Sensing of Heartbeat
Yixuan Gao, Tanvir Ahmed, Zekun Chang, Thijs Roumen, Rajalakshmi Nandakumar
-
Black-Box Membership Inference Attack for LVLMs via Prior Knowledge-Calibrated Memory Probing
Jinhua Yin, Peiru Yang, Chen Yang, Huili Wang, Zhiyang Hu, Shangguang Wang, Yongfeng Huang, Tao Qi
-
RobustFSM: Submodular Maximization in Federated Setting with Malicious Clients
Duc A. Tran, Dung Truong, Duy Le
-
CryptoMoE: Privacy-Preserving and Scalable Mixture of Experts Inference via Balanced Expert Routing
Yifan Zhou, Tianshi Xu, Jue Hong, Ye Wu, Meng Li
-
EraseFlow: Learning Concept Erasure Policies via GFlowNet-Driven Alignment
Abhiram Kusumba, Maitreya Patel, Kyle Min, Changhoon Kim, Chitta Baral, Yezhou Yang
-
Do Methods to Jailbreak and Defend LLMs Generalize Across Languages?
Berk Atil, Rebecca J. Passonneau, Fred Morstatter
-
Visual Backdoor Attacks on MLLM Embodied Decision Making via Contrastive Trigger Learning
Qiusi Zhan, Hyeonjeong Ha, Rui Yang, Sirui Xu, Hanyang Chen, Liang-Yan Gui, Yu-Xiong Wang, Huan Zhang, Heng Ji, Daniel Kang
-
Consistency Training Helps Stop Sycophancy and Jailbreaks
Alex Irpan, Alexander Matt Turner, Mark Kurzeja, David K. Elson, Rohin Shah
-
ZEBRA: Towards Zero-Shot Cross-Subject Generalization for Universal Brain Visual Decoding
Haonan Wang, Jingyu Lu, Hongrui Li, Xiaomeng Li
-
Adaptive Defense against Harmful Fine-Tuning for Large Language Models via Bayesian Data Scheduler
Zixuan Hu, Li Shen, Zhenyi Wang, Yongxian Wei, Dacheng Tao
-
Sparse Model Inversion: Efficient Inversion of Vision Transformers for Data-Free Applications
Zixuan Hu, Yongxian Wei, Li Shen, Zhenyi Wang, Lei Li, Chun Yuan, Dacheng Tao
-
Contrastive Knowledge Transfer and Robust Optimization for Secure Alignment of Large Language Models
Jiasen Zheng, Huajun Zhang, Xu Yan, Ran Hao, Chong Peng
-
SilhouetteTell: Practical Video Identification Leveraging Blurred Recordings of Video Subtitles
Guanchong Huang, Song Fang
-
Alik Pramanick, Mayank Bansal, Utkarsh Srivastava, Suklav Ghosh, Arijit Sur
-
C-LEAD: Contrastive Learning for Enhanced Adversarial Defense
Suklav Ghosh, Sonal Kumar, Arijit Sur
-
Rethinking Robust Adversarial Concept Erasure in Diffusion Models
Qinghong Yin, Yu Tian, Yue Zhang
-
A Hybrid Deep Learning and Forensic Approach for Robust Deepfake Detection
Sales Aribe Jr
-
Samarup Bhattacharya, Anubhab Bhattacharya, Abir Chakraborty
-
Chenghao Du, Quanfeng Huang, Tingxuan Tang, Zihao Wang, Yue Xiao
-
Prevalence of Security and Privacy Risk-Inducing Usage of AI-based Conversational Agents
Kathrin Grosse, Nico Ebert
-
Jianli Zhao, Tingchen Fu, Rylan Schaeffer, Mrinank Sharma, Fazl Barez
-
The Oversight Game: Learning to Cooperatively Balance an AI Agent's Safety and Autonomy
William Overman, Mohsen Bayati
-
SIRAJ: Diverse and Efficient Red-Teaming for LLM Agents via Distilled Structured Reasoning
Kaiwen Zhou, Ahmed Elgohary, A S M Iftekhar, Amin Saied
-
Do Students Debias Like Teachers? On the Distillability of Bias Mitigation Methods
Jiali Cheng, Chirag Agarwal, Hadi Amiri
-
Security Risk of Misalignment between Text and Image in Multi-modal Model
Xiaosen Wang, Zhijin Ge, Shaokang Wang
-
SSCL-BW: Sample-Specific Clean-Label Backdoor Watermarking for Dataset Ownership Verification
Yingjia Wang, Ting Qiao, Xing Liu, Chongzuo Li, Sixing Wu, Jianbin Li
-
Robust Graph Condensation via Classification Complexity Mitigation
Jiayi Luo, Qingyun Sun, Beining Yang, Haonan Yuan, Xingcheng Fu, Yanbiao Ma, Jianxin Li, Philip S. Yu
-
Model Inversion with Layer-Specific Modeling and Alignment for Data-Free Continual Learning
Ruilin Tong, Haodong Lu, Yuhang Liu, Dong Gong
-
Agent Skills Enable a New Class of Realistic and Trivially Simple Prompt Injections
David Schmotz, Sahar Abdelnabi, Maksym Andriushchenko
-
On Measuring Localization of Shortcuts in Deep Networks
Nikita Tsoy, Nikola Konstantinov
-
ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models
Weifei Jin, Yuxin Cao, Junjie Su, Minhui Xue, Jie Hao, Ke Xu, Jin Song Dong, Derui Wang
-
PEEL: A Poisoning-Exposing Encoding Theoretical Framework for Local Differential Privacy
Lisha Shuai, Jiuling Dong, Nan Zhang, Shaofeng Tan, Haokun Zhang, Zilong Song, Gaoya Dong, Xiaolong Yang
-
A DRL-Empowered Multi-Level Jamming Approach for Secure Semantic Communication
Weixuan Chen, Qianqian Yang
-
Accurate Target Privacy Preserving Federated Learning Balancing Fairness and Utility
Kangkang Sun, Jun Wu, Minyi Guo, Jianhua Li, Jianwei Huang
-
Broken-Token: Filtering Obfuscated Prompts by Counting Characters-Per-Token
Shaked Zychlinski, Yuval Kainan
-
Fine-Grained Iterative Adversarial Attacks with Limited Computation Budget
Zhichao Hou, Weizhi Gao, Xiaorui Liu
-
Unvalidated Trust: Cross-Stage Vulnerabilities in Large Language Model Architectures
Dominik Schwarz
-
Semantically-Aware LLM Agent to Enhance Privacy in Conversational AI Services
Jayden Serenari, Stephen Lee
-
PF-DAformer: Proximal Femur Segmentation via Domain Adaptive Transformer for Dual-Center QCT
Rochak Dhakal, Chen Zhao, Zixin Shi, Joyce H. Keyak, Tadashi S. Kaneko, Kuan-Jui Su, Hui Shen, Hong-Wen Deng, Weihua Zhou
-
Agentic Moderation: Multi-Agent Design for Safer Vision-Language Models
Juan Ren, Mark Dras, Usman Naseem
-
Lipschitz-aware Linearity Grafting for Certified Robustness
Yongjin Han, Suhyun Kim
-
Hasan Akgul, Mari Eplik, Javier Rojas, Aina Binti Abdullah, Pieter van der Merwe
-
DeepShield: Fortifying Deepfake Video Detection with Local and Global Forgery Analysis
Yinqi Cai, Jichang Li, Zhaolun Li, Weikai Chen, Rushi Lan, Xi Xie, Xiaonan Luo, Guanbin Li
-
A Unified Bilevel Model for Adversarial Learning and A Case Study
Yutong Zheng, Qingna Li
-
On the Stability of Neural Networks in Deep Learning
Blaise Delattre
-
Spectral Perturbation Bounds for Low-Rank Approximation with Applications to Privacy
Phuc Tran, Nisheeth K. Vishnoi, Van H. Vu
-
Model Inversion Attacks Meet Cryptographic Fuzzy Extractors
Mallika Prabhakar, Louise Xu, Prateek Saxena
-
NetEcho: From Real-World Streaming Side-Channels to Full LLM Conversation Recovery
Zheng Zhang, Guanlong Wu, Sen Deng, Shuai Wang, Yinqian Zhang
-
Emily Herron, Junqi Yin, Feiyi Wang
-
FakeZero: Real-Time, Privacy-Preserving Misinformation Detection for Facebook and X
Soufiane Essahli, Oussama Sarsar, Imane Fouad, Anas Motii, Ahmed Bentajer
-
Robust GNN Watermarking via Implicit Perception of Topological Invariants
Jipeng Li, Yannning Shen
-
Adversarial Pre-Padding: Generating Evasive Network Traffic Against Transformer-Based Classifiers
Quanliang Jing, Xinxin Fan, Yanyan Liu, Jingping Bi
-
Simon Yu, Peilin Yu, Hongbo Zheng, Huajie Shao, Han Zhao, Lui Sha
-
Layer of Truth: Probing Belief Shifts under Continual Pre-Training Poisoning
Svetlana Churina, Niranjan Chebrolu, Kokil Jaidka
-
Guangzhi Su, Shuchang Huang, Yutong Ke, Zhuohang Liu, Long Qian, Kaizhu Huang
-
OS-Sentinel: Towards Safety-Enhanced Mobile GUI Agents via Hybrid Validation in Realistic Workflows
Qiushi Sun, Mukai Li, Zhoumianze Liu, Zhihui Xie, Fangzhi Xu, Zhangyue Yin, Kanzhi Cheng, Zehao Li, Zichen Ding, Qi Liu, Zhiyong Wu, Zhuosheng Zhang, Ben Kao, Lingpeng Kong
-
Causal-Aware Generative Adversarial Networks with Reinforcement Learning
Tu Anh Hoang Nguyen, Dang Nguyen, Tri-Nhan Vo, Thuc Duy Le, Sunil Gupta
-
The Cost of Robustness: Tighter Bounds on Parameter Complexity for Robust Memorization in ReLU Nets
Yujun Kim, Chaewon Moon, Chulhee Yun
-
Viktoriia Zinkovich, Anton Antonov, Andrei Spiridonov, Denis Shepelev, Andrey Moskalenko, Daria Pugacheva, Elena Tutubalina, Andrey Kuznetsov, Vlad Shakhuro
-
Relative Scaling Laws for LLMs
William Held, David Hall, Percy Liang, Diyi Yang
-
SPICE: Self-Play In Corpus Environments Improves Reasoning
Bo Liu, Chuanyang Jin, Seungone Kim, Weizhe Yuan, Wenting Zhao, Ilia Kulikov, Xian Li, Sainbayar Sukhbaatar, Jack Lanchantin, Jason Weston
-
Heethanjan Kanagalingam, Thenukan Pathmanathan, Mokeeshan Vathanakumar, Tharmakulasingam Mukunthan
-
AutoPrompt: Automated Red-Teaming of Text-to-Image Models via LLM-Driven Adversarial Prompts
Yufan Liu, Wanqian Zhang, Huashan Chen, Lin Wang, Xiaojun Jia, Zheng Lin, Weiping Wang
-
Enhancing CLIP Robustness via Cross-Modality Alignment
Xingyu Zhu, Beier Zhu, Shuo Wang, Kesen Zhao, Hanwang Zhang
-
Vanish into Thin Air: Cross-prompt Universal Adversarial Attacks for SAM2
Ziqi Zhou, Yifan Hu, Yufei Song, Zijing Li, Shengshan Hu, Leo Yu Zhang, Dezhong Yao, Long Zheng, Hai Jin
-
A Dual-Branch CNN for Robust Detection of AI-Generated Facial Forgeries
Xin Zhang, Yuqi Song, Fei Zuo
-
A Pragmatic Way to Measure Chain-of-Thought Monitorability
Scott Emmons, Roland S. Zimmermann, David K. Elson, Rohin Shah
-
Mitigating Negative Transfer via Reducing Environmental Disagreement
Hui Sun, Zheng Xie, Hao-Yuan He, Ming Li
-
SPEAR++: Scaling Gradient Inversion via Sparsely-Used Dictionary Learning
Alexander Bakarsky, Dimitar I. Dimitrov, Maximilian Baader, Martin Vechev
-
PRIVET: Privacy Metric Based on Extreme Value Theory
Antoine Szatkownik, Aurélien Decelle, Beatriz Seoane, Nicolas Bereux, Léo Planche, Guillaume Charpiat, Burak Yelmen, Flora Jay, Cyril Furtlehner
-
A Comprehensive Evaluation Framework for Synthetic Trip Data Generation in Public Transport
Yuanyuan Wu, Zhenlin Qin, Zhenliang Ma
-
A Novel XAI-Enhanced Quantum Adversarial Networks for Velocity Dispersion Modeling in MaNGA Galaxies
Sathwik Narkedimilli, N V Saran Kumar, Aswath Babu H, Manjunath K Vanahalli, Manish M, Vinija Jain, Aman Chadha
-
Self-Concordant Perturbations for Linear Bandits
Lucas Lévy, Jean-Lou Valeau, Arya Akhavan, Patrick Rebeschini
-
Vishal Halder, Alexandre Reiffers-Masson, Abdeldjalil Aïssa-El-Bey, Gugan Thoppe
-
Attack on a PUF-based Secure Binary Neural Network
Bijeet Basak, Nupur Patil, Kurian Polachan, Srinivas Vivek
-
Cybersecurity AI Benchmark (CAIBench): A Meta-Benchmark for Evaluating Cybersecurity AI Agents
María Sanz-Gómez, Víctor Mayoral-Vilches, Francesco Balassone, Luis Javier Navarrete-Lozano, Cristóbal R. J. Veas Chavez, Maite del Mundo de Torres
-
Yan Meng, Jiachun Li, Matthew Pillari, Arjun Deopujari, Liam Brennan, Hafsah Shamsie, Haojin Zhu, Yuan Tian
-
Hammering the Diagnosis: Rowhammer-Induced Stealthy Trojan Attacks on ViT-Based Medical Imaging
Banafsheh Saber Latibari, Najmeh Nazari, Hossein Sayadi, Houman Homayoun, Abhijit Mahalanobis
-
Najmeh Nazari, Banafsheh Saber Latibari, Elahe Hosseini, Fatemeh Movafagh, Chongzhou Fang, Hosein Mohammadi Makrani, Kevin Immanuel Gubbi, Abhijit Mahalanobis, Setareh Rafatirad, Hossein Sayadi, Houman Homayoun
-
Learning to Attack: Uncovering Privacy Risks in Sequential Data Releases
Ziyao Cui, Minxing Zhang, Jian Pei
-
scMRDR: A scalable and flexible framework for unpaired single-cell multi-omics data integration
Jianle Sun, Chaoqi Liang, Ran Wei, Peng Zheng, Lei Bai, Wanli Ouyang, Hongliang Yan, Peng Ye
-
Secure Retrieval-Augmented Generation against Poisoning Attacks
Zirui Cheng, Jikai Sun, Anjun Gao, Yueyang Quan, Zhuqing Liu, Xiaohua Hu, Minghong Fang
-
Multi-Environment POMDPs: Discrete Model Uncertainty Under Partial Observability
Eline M. Bovy, Caleb Probine, Marnix Suilen, Ufuk Topcu, Nils Jansen
-
Agentic AI Security: Threats, Defenses, Evaluation, and Open Challenges
Shrestha Datta, Shahriar Kabir Nahin, Anshuman Chhabra, Prasant Mohapatra
-
MCPGuard : Automatically Detecting Vulnerabilities in MCP Servers
Bin Wang, Zexin Liu, Hao Yu, Ao Yang, Yenan Huang, Jing Guo, Huangsheng Cheng, Hui Li, Huiyu Wu
-
QueryIPI: Query-agnostic Indirect Prompt Injection on Coding Agents
Yuchong Xie, Zesen Liu, Mingyu Luo, Zhixiang Zhang, Kaikai Zhang, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
-
Aryan Mathur, Asaduddin Ahmed, Pushti Amit Vasoya, Simeon Kandan Sonar, Yasir Z, Madesh Kuppusamy
-
Differential Privacy: Gradient Leakage Attacks in Federated Learning Environments
Miguel Fernandez-de-Retana, Unai Zulaika, Rubén Sánchez-Corcuera, Aitor Almeida
-
Cross-Lingual Summarization as a Black-Box Watermark Removal Attack
Gokul Ganesan
-
Hao Liang, Haifeng Wen, Kaishun Wu, Hong Xing
-
Hanyu Zhu, Lance Fiondella, Jiawei Yuan, Kai Zeng, Long Jiao
-
Yingzhi Mao, Chunkang Zhang, Junxiang Wang, Xinyan Guan, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun
-
Reducing the Probability of Undesirable Outputs in Language Models Using Probabilistic Inference
Stephen Zhao, Aidan Li, Rob Brekelmans, Roger Grosse
-
SafetyPairs: Isolating Safety Critical Image Features with Counterfactual Image Generation
Alec Helbling, Shruti Palaskar, Kundan Krishna, Polo Chau, Leon Gatys, Joseph Yitan Cheng
-
DictPFL: Efficient and Private Federated Learning on Encrypted Gradients
Jiaqi Xue, Mayank Kumar, Yuzhang Shang, Shangqian Gao, Rui Ning, Mengxin Zheng, Xiaoqian Jiang, Qian Lou
-
How Hard is it to Confuse a World Model?
Waris Radji, Odalric-Ambrym Maillard
-
PINN Balls: Scaling Second-Order Methods for PINNs with Domain Decomposition and Adaptive Sampling
Andrea Bonfanti, Ismael Medina, Roman List, Björn Staeves, Roberto Santana, Marco Ellero
-
Probe-based Fine-tuning for Reducing Toxicity
Jan Wehner, Mario Fritz
-
FrameShield: Adversarially Robust Video Anomaly Detection
Mojtaba Nafez, Mobina Poulaei, Nikan Vasei, Bardia Soltani Moakhar, Mohammad Sabokrou, MohammadHossein Rohban
-
Soft Instruction De-escalation Defense
Nils Philipp Walter, Chawin Sitawarin, Jamie Hayes, David Stutz, Ilia Shumailov
-
Doubly-Regressing Approach for Subgroup Fairness
Kyungseon Lee, Kunwoong Kim, Jihu Lee, Dongyoon Yang, Yongdai Kim
-
Jie Zhang, Xiaohong Li, Mengke Zhang, Ruitao Feng, Shanshan Xu, Zhe Hou, Guangdong Bai
-
Adjacent Words, Divergent Intents: Jailbreaking Large Language Models via Task Concurrency
Yukun Jiang, Mingjie Li, Michael Backes, Yang Zhang
-
The Trojan Example: Jailbreaking LLMs through Template Filling and Unsafety Reasoning
Mingrui Liu, Sixiao Zhang, Cheng Long, Kwok Yan Lam
-
Enhanced MLLM Black-Box Jailbreaking Attacks and Defenses
Xingwei Zhong, Kar Wai Fok, Vrizlynn L.L. Thing
-
Spatio-Temporal Attention Network for Epileptic Seizure Prediction
Zan Li, Kyongmin Yeo, Wesley Gifford, Lara Marcuse, Madeline Fields, Bülent Yener
-
SAID: Empowering Large Language Models with Self-Activating Internal Defense
Yulong Chen, Yadong Liu, Jiawen Zhang, Mu Li, Chao Huang, Jie Wen
-
Wu Yichao, Wang Yirui, Ding Panpan, Wang Hailong, Zhu Bingqian, Liu Chun
-
Chiyu Chen, Xinhao Song, Yunkai Chai, Yang Yao, Haodong Zhao, Lijun Li, Jie Li, Yan Teng, Gongshen Liu, Yingchun Wang
-
Transferable Black-Box One-Shot Forging of Watermarks via Image Preference Models
Tomáš Souček, Sylvestre-Alvise Rebuffi, Pierre Fernandez, Nikola Jovanović, Hady Elsahar, Valeriu Lacatusu, Tuan Tran, Alexandre Mourachko
-
Steering Evaluation-Aware Language Models To Act Like They Are Deployed
Tim Tian Hua, Andrew Qin, Samuel Marks, Neel Nanda
-
AdaDoS: Adaptive DoS Attack via Deep Adversarial Reinforcement Learning in SDN
Wei Shao, Yuhao Wang, Rongguang He, Muhammad Ejaz Ahmed, Seyit Camtepe
-
RAGRank: Using PageRank to Counter Poisoning in CTI LLM Pipelines
Austin Jia, Avaneesh Ramesh, Zain Shamsi, Daniel Zhang, Alex Liu
-
Francesca Padovani, Bastian Bunzeck, Manar Ali, Omar Momen, Arianna Bisazza, Hendrik Buschmeier, Sina Zarrieß
-
BadGraph: A Backdoor Attack Against Latent Diffusion Model for Text-Guided Graph Generation
Liang Ye, Shengqin Chen, Jiazhu Dai
-
Causal Debiasing for Visual Commonsense Reasoning
Jiayi Zou, Gengyun Jia, Bing-Kun Bao
-
Dino-Diffusion Modular Designs Bridge the Cross-Domain Gap in Autonomous Parking
Zixuan Wu, Hengyuan Zhang, Ting-Hsuan Chen, Yuliang Guo, David Paz, Xinyu Huang, Liu Ren
-
MEIcoder: Decoding Visual Stimuli from Neural Activity by Leveraging Most Exciting Inputs
Jan Sobotka, Luca Baroni, Ján Antolík
-
H-SPLID: HSIC-based Saliency Preserving Latent Information Decomposition
Lukas Miklautz, Chengzhi Shi, Andrii Shkabrii, Theodoros Thirimachos Davarakis, Prudence Lam, Claudia Plant, Jennifer Dy, Stratis Ioannidis
-
Adversary-Aware Private Inference over Wireless Channels
Mohamed Seif, Malcolm Egan, Andrea J. Goldsmith, H. Vincent Poor
-
Divyanshu Kumar, Shreyas Jena, Nitin Aravind Birur, Tanay Baswa, Sahil Agarwal, Prashanth Harshangi
-
HHEML: Hybrid Homomorphic Encryption for Privacy-Preserving Machine Learning on Edge
Yu Hin Chan, Hao Yang, Shiyu Shen, Xingyu Fan, Shengzhe Lyu, Patrick S. Y. Hung, Ray C. C. Cheung
-
NeuPerm: Disrupting Malware Hidden in Neural Network Parameters by Leveraging Permutation Symmetry
Daniel Gilkarov, Ran Dubin
-
An Experimental Study of Trojan Vulnerabilities in UAV Autonomous Landing
Reza Ahmari, Ahmad Mohammadi, Vahid Hemmati, Mohammed Mynuddin, Mahmoud Nabil Mahmoud, Parham Kebria, Abdollah Homaifar, Mehrdad Saif
-
Fair Representation Learning with Controllable High Confidence Guarantees via Adversarial Inference
Yuhong Luo, Austin Hoag, Xintong Wang, Philip S. Thomas, Przemyslaw A. Grabowicz
-
Kernel Learning with Adversarial Features: Numerical Efficiency and Adaptive Regularization
Antônio H. Ribeiro, David Vävinggren, Dave Zachariah, Thomas B. Schön, Francis Bach
-
Can Current Detectors Catch Face-to-Voice Deepfake Attacks?
Nguyen Linh Bao Nguyen, Alsharif Abuadbba, Kristen Moore, Tingming Wu
-
A new measure for dynamic leakage based on quantitative information flow
Luigi D. C. Soares, Mário S. Alvim, Natasha Fernandes
-
A Reinforcement Learning Framework for Robust and Secure LLM Watermarking
Li An, Yujian Liu, Yepeng Liu, Yuheng Bu, Yang Zhang, Shiyu Chang
-
Adversarially-Aware Architecture Design for Robust Medical AI Systems
Alyssa Gerhart, Balaji Iyangar
-
LAPRAD: LLM-Assisted PRotocol Attack Discovery
R.Can Aygun, Yehuda Afek, Anat Bremler-Barr, Leonard Kleinrock
-
Collaborative penetration testing suite for emerging generative AI algorithms
Petar Radanliev
-
A New Type of Adversarial Examples
Xingyang Nie, Guojie Xiao, Su Pan, Biao Wang, Huilin Ge, Tao Fang
-
Monitoring LLM-based Multi-Agent Systems Against Corruptions via Node Evaluation
Chengcan Wu, Zhixin Zhang, Mingqian Xu, Zeming Wei, Meng Sun
-
Style Attack Disguise: When Fonts Become a Camouflage for Adversarial Intent
Yangshijie Zhang, Xinda Wang, Jialin Liu, Wenqiang Wang, Zhicong Ma, Xingxing Jia
-
Machine Text Detectors are Membership Inference Attacks
Ryuto Koike, Liam Dugan, Masahiro Kaneko, Chris Callison-Burch, Naoaki Okazaki
-
Hubble: a Model Suite to Advance the Study of LLM Memorization
Johnny Tian-Zheng Wei, Ameya Godbole, Mohammad Aflah Khan, Ryan Wang, Xiaoyuan Zhu, James Flemings, Nitya Kashyap, Krishna P. Gummadi, Willie Neiswanger, Robin Jia
-
OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform
Thomas Wang, Haowen Li
-
LLM Unlearning with LLM Beliefs
Kemou Li, Qizhou Wang, Yue Wang, Fengpeng Li, Jun Liu, Bo Han, Jiantao Zhou
-
Blackbox Model Provenance via Palimpsestic Membership Inference
Rohith Kuditipudi, Jing Huang, Sally Zhu, Diyi Yang, Christopher Potts, Percy Liang
-
Woo Jae Kim, Kyu Beom Han, Yoonki Cho, Youngju Na, Junsik Jung, Sooel Son, Sung-eui Yoon
-
Can You Trust What You See? Alpha Channel No-Box Attacks on Video Object Detection
Ariana Yi, Ce Zhou, Liyang Xiao, Qiben Yan
-
Subliminal Corruption: Mechanisms, Thresholds, and Interpretability
Reya Vir, Sarvesh Bhatnagar
-
ConvXformer: Differentially Private Hybrid ConvNeXt-Transformer for Inertial Navigation
Omer Tariq, Muhammad Bilal, Muneeb Ul Hassan, Dongsoo Han, Jon Crowcroft
-
Revisiting the Relation Between Robustness and Universality
M. Klabunde, L. Caspari, F. Lemmerich
-
Euodia Dodd, Nataša Krčo, Igor Shilov, Yves-Alexandre de Montjoye
-
HAMLOCK: HArdware-Model LOgically Combined attacK
Sanskar Amgain, Daniel Lobo, Atri Chatterjee, Swarup Bhunia, Fnu Suya
-
Exploring the Effect of DNN Depth on Adversarial Attacks in Network Intrusion Detection Systems
Mohamed ElShehaby, Ashraf Matrawy
-
Defending Against Prompt Injection with DataFilter
Yizhu Wang, Sizhe Chen, Raghad Alkhudair, Basel Alomair, David Wagner
-
AegisMCP: Online Graph Intrusion Detection for Tool-Augmented LLMs on Edge Devices
Zhonghao Zhan, Amir Al Sadi, Krinos Li, Hamed Haddadi
-
Privacy-Preserving Spiking Neural Networks: A Deep Dive into Encryption Parameter Optimisation
Mahitha Pulivathi
-
CircuitGuard: Mitigating LLM Memorization in RTL Code Generation Against IP Leakage
Nowfel Mashnoor, Mohammad Akyash, Hadi Kamali, Kimia Azar
-
LLMs can hide text in other text of the same length.ipynb
Antonio Norelli, Michael Bronstein
-
Ask What Your Country Can Do For You: Towards a Public Red Teaming Model
Wm. Matthew Kennedy, Cigdem Patlak, Jayraj Dave, Blake Chambers, Aayush Dhanotiya, Darshini Ramiah, Reva Schwartz, Jack Hagen, Akash Kundu, Mouni Pendharkar, Liam Baisley, Theodora Skeadas, Rumman Chowdhury
-
Xiang Li, Buxin Su, Chendi Wang, Qi Long, Weijie J. Su
-
Towards Strong Certified Defense with Universal Asymmetric Randomization
Hanbin Hong, Ashish Kundu, Ali Payani, Binghui Wang, Yuan Hong
-
Tushar Nayan, Ziqi Zhang, Ruimin Sun
-
Jia Deng, Jin Li, Zhenhua Zhao, Shaowei Wang
-
Rectifying Shortcut Behaviors in Preference-based Reward Learning
Wenqian Ye, Guangtao Zheng, Aidong Zhang
-
DuoLens: A Framework for Robust Detection of Machine-Generated Multilingual Text and Code
Shriyansh Agrawal, Aidan Lau, Sanyam Shah, Ahan M R, Kevin Zhu, Sunishchal Dev, Vasu Sharma
-
FeatureFool: Zero-Query Fooling of Video Models via Feature Map
Duoxun Tang, Xi Xiao, Guangwu Hu, Kangkang Sun, Xiao Yang, Dongyang Chen, Qing Li, Yongjie Yin, Jiyao Wang
-
Yifei Sun
-
Kuai Yu, Xiaoyu Wu, Peishen Yan, Qingqian Yang, Linshan Jiang, Hao Wang, Yang Hua, Tao Song, Haibing Guan
-
Thomas Hofweber, Jefrey Bergl, Ian Reyes, Amir Sadovnik
-
Can Reasoning Models Obfuscate Reasoning? Stress-Testing Chain-of-Thought Monitorability
Artur Zolkowski, Wen Xing, David Lindner, Florian Tramèr, Erik Jenner
-
Extracting alignment data in open models
Federico Barbero, Xiangming Gu, Christopher A. Choquette-Choo, Chawin Sitawarin, Matthew Jagielski, Itay Yona, Petar Veličković, Ilia Shumailov, Jamie Hayes
-
PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits
Neeladri Bhuiya, Madhav Aggarwal, Diptanshu Purwar
-
CourtGuard: A Local, Multiagent Prompt Injection Classifier
Isaac Wu, Michael Maslowski
-
GUIDE: Enhancing Gradient Inversion Attacks in Federated Learning with Denoising Models
Vincenzo Carletti, Pasquale Foggia, Carlo Mazzocca, Giuseppe Parrella, Mario Vento
-
SafeSearch: Do Not Trade Safety for Utility in LLM Search Agents
Qiusi Zhan, Angeline Budiman-Chan, Abdelrahman Zayed, Xingzhi Guo, Daniel Kang, Joo-Kyung Kim
-
Fit for Purpose? Deepfake Detection in the Real World
Guangyu Lin, Li Lin, Christina P. Walker, Daniel S. Schiff, Shu Hu
-
DRO-InstructZero: Distributionally Robust Prompt Optimization for Large Language Models
Yangyang Li
-
Ting Qiao, Xing Liu, Wenke Huang, Jianbin Li, Zhaoxin Fan, Yiming Li
-
Learning to Detect Unknown Jailbreak Attacks in Large Vision-Language Models
Shuang Liang, Zhihao Xu, Jialing Tao, Hui Xue, Xiting Wang
-
SoK: Taxonomy and Evaluation of Prompt Security in Large Language Models
Hanbin Hong, Shuya Feng, Nima Naderloui, Shenao Yan, Jingyu Zhang, Biying Liu, Ali Arastehfard, Heqing Huang, Yuan Hong
-
DeceptionBench: A Comprehensive Benchmark for AI Deception Behaviors in Real-world Scenarios
Yao Huang, Yitong Sun, Yichi Zhang, Ruochen Zhang, Yinpeng Dong, Xingxing Wei
-
Language Models are Injective and Hence Invertible
Giorgos Nikolaou, Tommaso Mencattini, Donato Crisostomi, Andrea Santilli, Yannis Panagakis, Emanuele Rodola'
-
Unmasking Facial DeepFakes: A Robust Multiview Detection Framework for Natural Images
Sami Belguesmia, Mohand Saïd Allili, Assia Hamadene
-
Stress-Aware Learning under KL Drift via Trust-Decayed Mirror Descent
Gabriel Nixon Raj
-
Yuyuan Feng, Bin Ma, Enyan Dai
-
Adversary-Free Counterfactual Prediction via Information-Regularized Representations
Shiqin Tang, Rong Feng, Shuxin Zhuang, Hongzong Li, Youzhi Zhang
-
Constrained Adversarial Perturbation
Virendra Nishad, Bhaskar Mukhoty, Hilal AlQuabeh, Sandeep K. Shukla, Sayak Ray Chowdhury
-
Blackwell's Approachability for Sequential Conformal Inference
Guillaume Principato, Gilles Stoltz
-
HarmRLVR: Weaponizing Verifiable Rewards for Harmful LLM Alignment
Yuexiao Liu, Lijun Li, Xingjun Wang, Jing Shao
-
Towards Proactive Defense Against Cyber Cognitive Attacks
Bonnie Rushing, Mac-Rufus Umeokolo, Shouhuai Xu
-
Bridging Symmetry and Robustness: On the Role of Equivariance in Enhancing Adversarial Robustness
Longwei Wang, Ifrat Ikhtear Uddin, KC Santosh, Chaowei Zhang, Xiao Qin, Yang Zhou
-
Echoes of Human Malice in Agents: Benchmarking LLMs for Multi-Turn Online Harassment Attacks
Trilok Padhi, Pinxian Lu, Abdulkadir Erol, Tanmay Sutar, Gauri Sharma, Mina Sonmez, Munmun De Choudhury, Ugur Kursuncu
-
Bingjie Zhang, Yibo Yang, Renzhe, Dandan Guo, Jindong Gu, Philip Torr, Bernard Ghanem
-
Terrarium: Revisiting the Blackboard for Multi-Agent Safety, Privacy, and Security Studies
Mason Nakamura, Abhinav Kumar, Saaduddin Mahmud, Sahar Abdelnabi, Shlomo Zilberstein, Eugene Bagdasarian
-
Jingwen Gu, Yiting He, Zhishuai Liu, Pan Xu
-
TED++: Submanifold-Aware Backdoor Detection via Layerwise Tubular-Neighbourhood Screening
Nam Le, Leo Yu Zhang, Kewen Liao, Shirui Pan, Wei Luo
-
BinCtx: Multi-Modal Representation Learning for Robust Android App Behavior Detection
Zichen Liu, Shao Yang, Xusheng Xiao
-
Are My Optimized Prompts Compromised? Exploring Vulnerabilities of LLM-based Optimizers
Andrew Zhao, Reshmi Ghosh, Vitor Carvalho, Emily Lawton, Keegan Hines, Gao Huang, Jack W. Stokes
-
Stealthy Dual-Trigger Backdoors: Attacking Prompt Tuning in LM-Empowered Graph Foundation Models
Xiaoyu Xue, Yuni Lai, Chenxi Huang, Yulin Zhu, Gaolei Li, Xiaoge Zhang, Kai Zhou
-
Galaxy Morphology Classification with Counterfactual Explanation
Zhuo Cao, Lena Krieger, Hanno Scharr, Ira Assent
-
On the Ability of LLMs to Handle Character-Level Perturbations: How Well and How?
Anyun Zhuo, Xuefei Ning, Ningyuan Li, Yu Wang, Pinyan Lu
-
A Multi-domain Image Translative Diffusion StyleGAN for Iris Presentation Attack Detection
Shivangi Yadav, Arun Ross
-
Structured Universal Adversarial Attacks on Object Detection for Video Sequences
Sven Jacob, Weijia Shao, Gjergji Kasneci
-
Keima Abe, Hayato Muraki, Shuhei Tomoshige, Kenichi Oishi, Hitoshi Iyatomi
-
SteeringTTA: Guiding Diffusion Trajectories for Robust Test-Time-Adaptation
Jihyun Yu, Yoojin Oh, Wonho Bae, Mingyu Kim, Junhyug Noh
-
Backdoor Unlearning by Linear Task Decomposition
Amel Abdelraheem, Alessandro Favero, Gerome Bovet, Pascal Frossard
-
When Flatness Does (Not) Guarantee Adversarial Robustness
Nils Philipp Walter, Linara Adilova, Jilles Vreeken, Michael Kamp
-
Guillaume Rongier, Luk Peeters
-
Redundancy-Aware Test-Time Graph Out-of-Distribution Detection
Yue Hou, He Zhu, Ruomei Liu, Yingke Su, Junran Wu, Ke Xu
-
An Information Asymmetry Game for Trigger-based DNN Model Watermarking
Chaoyue Huang, Gejian Zhao, Hanzhou Wu, Zhihua Xia, Asad Malik
-
Fanchao Meng, Jiaping Gui, Yunbo Li, Yue Wu
-
Certifying optimal MEV strategies with Lean
Massimo Bartoletti, Riccardo Marchesin, Roberto Zunino
-
Lexo: Eliminating Stealthy Supply-Chain Attacks via LLM-Assisted Program Regeneration
Evangelos Lamprou, Julian Dai, Grigoris Ntousakis, Martin C. Rinard, Nikos Vasilakis
-
A Hard-Label Black-Box Evasion Attack against ML-based Malicious Traffic Detection Systems
Zixuan Liu, Yi Zhao, Zhuotao Liu, Qi Li, Chuanpu Fu, Guangmeng Zhou, Ke Xu
-
DeLeaker: Dynamic Inference-Time Reweighting For Semantic Leakage Mitigation in Text-to-Image Models
Mor Ventura, Michael Toker, Or Patashnik, Yonatan Belinkov, Roi Reichart
-
Active Honeypot Guardrail System: Probing and Confirming Multi-Turn LLM Jailbreaks
ChenYu Wu, Yi Wang, Yang Liao
-
Deyue Zhang, Dongdong Yang, Junjie Mu, Quancheng Zou, Zonghao Ying, Wenzhuo Xu, Zhao Liu, Xuan Wang, Xiangzheng Zhang
-
Targeted Attacks and Defenses for Distributed Federated Learning in Vehicular Networks
Utku Demir, Tugba Erpek, Yalin E. Sagduyu, Sastry Kompella, Mengran Xue
-
MAGPIE: A benchmark for Multi-AGent contextual PrIvacy Evaluation
Gurusha Juneja, Jayanth Naga Sai Pasupulati, Alon Albalak, Wenyue Hua, William Yang Wang
-
PoTS: Proof-of-Training-Steps for Backdoor Detection in Large Language Models
Issam Seddik, Sami Souihi, Mohamed Tamaazousti, Sara Tucci Piergiovanni
-
SMOTE and Mirrors: Exposing Privacy Leakage from Synthetic Minority Oversampling
Georgi Ganev, Reza Nazari, Rees Davison, Amir Dizche, Xinmin Wu, Ralph Abbey, Jorge Silva, Emiliano De Cristofaro
-
Aofan Liu, Shiyuan Song, Haoxuan Li, Cehao Yang, Yiyan Qi
-
SAJA: A State-Action Joint Attack Framework on Multi-Agent Deep Reinforcement Learning
Weiqi Guo, Guanjun Liu, Ziyuan Zhou
-
TRUSTVIS: A Multi-Dimensional Trustworthiness Evaluation Framework for Large Language Models
Ruoyu Sun, Da Song, Jiayang Song, Yuheng Huang, Lei Ma
-
Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning
Baogang Song, Dongdong Zhao, Jianwen Xiang, Qiben Xu, Zizhuo Yu
-
Personal Attribute Leakage in Federated Speech Models
Hamdan Al-Ali, Ali Reza Ghavamipour, Tommaso Caselli, Fatih Turkmen, Zeerak Talat, Hanan Aldarmaki
-
Adversarial Fine-tuning in Offline-to-Online Reinforcement Learning for Robust Robot Control
Shingo Ayabe, Hiroshi Kera, Kazuhiko Kawamoto
-
Generalist++: A Meta-learning Framework for Mitigating Trade-off in Adversarial Training
Yisen Wang, Yichuan Mo, Hongjun Wang, Junyi Li, Zhouchen Lin
-
In-Browser LLM-Guided Fuzzing for Real-Time Prompt Injection Testing in Agentic AI Browsers
Avihay Cohen
-
Ziqing Lu, Lifeng Lai, Weiyu Xu
-
SHIELD: Classifier-Guided Prompting for Robust and Safer LVLMs
Juan Ren, Mark Dras, Usman Naseem
-
Taming the Fragility of KV Cache Eviction in LLM Inference
Yuan Feng, Haoyu Guo, JunLin Lv, S. Kevin Zhou, Xike Xie
-
GAPS: A Clinically Grounded, Automated Benchmark for Evaluating AI Clinicians
Xiuyuan Chen, Tao Sun, Dexin Su, Ailing Yu, Junwei Liu, Zhe Chen, Gangzeng Jin, Xin Wang, Jingnan Liu, Hansong Xiao, Hualei Zhou, Dongjie Tao, Chunxiao Guo, Minghui Yang, Yuan Xia, Jing Zhao, Qianrui Fan, Yanyun Wang, Shuai Zhen, Kezhong Chen, Jun Wang, Zewen Sun, Heng Zhao, Tian Guan, Shaodong Wang, Geyun Chang, Jiaming Deng, Hongchengcheng Chen, Kexin Feng, Ruzhen Li, Jiayi Geng, Changtai Zhao, Jun Wang, Guihu Lin, Peihao Li, Liqi Liu, Peng Wei, Jian Wang, Jinjie Gu, Ping Wang, Fan Yang
-
LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models
Senyu Fei, Siyin Wang, Junhao Shi, Zihao Dai, Jikun Cai, Pengfang Qian, Li Ji, Xinzhe He, Shiduo Zhang, Zhaoye Fei, Jinlan Fu, Jingjing Gong, Xipeng Qiu
-
Model-agnostic Adversarial Attack and Defense for Vision-Language-Action Models
Haochuan Xu, Yun Sing Koh, Shuhuai Huang, Zirun Zhou, Di Wang, Jun Sakuma, Jingfeng Zhang
-
Akib Mohammed Khan, Bartosz Krawczyk
-
Risk-adaptive Activation Steering for Safe Multimodal Large Language Models
Jonghyun Park, Minhyuk Seo, Jonghyun Choi
-
Selective Adversarial Attacks on LLM Benchmarks
Ivan Dubrovsky, Anastasia Orlova, Illarion Iov, Nina Gubina, Irena Gureeva, Alexey Zaytsev
-
Robust Minimax Boosting with Performance Guarantees
Santiago Mazuelas, Veronica Alvarez
-
From base cases to backdoors: An Empirical Study of Unnatural Crypto-API Misuse
Victor Olaiya, Adwait Nadkarni
-
Tan Le, Van Le, Sachin Shetty
-
Toward Efficient Inference Attacks: Shadow Model Sharing via Mixture-of-Experts
Li Bai, Qingqing Ye, Xinwei Zhang, Sen Zhang, Zi Liang, Jianliang Xu, Haibo Hu
-
Who Speaks for the Trigger? Dynamic Expert Routing in Backdoored Mixture-of-Experts Transformers
Xin Zhao, Xiaojun Chen, Bingshan Liu, Haoyu Gao, Zhendong Zhao, Yilong Chen
-
Cyber-Resilient System Identification for Power Grid through Bayesian Integration
Shimiao Li, Guannan Qu, Bryan Hooi, Vyas Sekar, Soummya Kar, Larry Pileggi
-
Every Language Model Has a Forgery-Resistant Signature
Matthew Finlayson, Xiang Ren, Swabha Swayamdipta
-
Robust or Suggestible? Exploring Non-Clinical Induction in LLM Drug-Safety Decisions
Siying Liu, Shisheng Zhang, Indu Bala
-
NAPPure: Adversarial Purification for Robust Image Classification under Non-Additive Perturbations
Junjie Nan, Jianing Li, Wei Chen, Mingkun Zhang, Xueqi Cheng
-
Signature in Code Backdoor Detection, how far are we?
Quoc Hung Le, Thanh Le-Cong, Bach Le, Bowen Xu
-
PIShield: Detecting Prompt Injection Attacks via Intrinsic LLM Features
Wei Zou, Yupei Liu, Yanting Wang, Ying Chen, Neil Gong, Jinyuan Jia
-
Wissam Salhab, Darine Ameyed, Hamid Mcheick, Fehmi Jaafar
-
SafeMT: Multi-turn Safety for Multimodal Language Models
Han Zhu, Juntao Dai, Jiaming Ji, Haoran Li, Chengkun Cai, Pengcheng Wen, Chi-Min Chan, Boyuan Chen, Yaodong Yang, Sirui Han, Yike Guo
-
PromptLocate: Localizing Prompt Injection Attacks
Yuqi Jia, Yupei Liu, Zedian Shao, Jinyuan Jia, Neil Gong
-
Shallow Robustness, Deep Vulnerabilities: Multi-Turn Evaluation of Medical LLMs
Blazej Manczak, Eric Lin, Francisco Eiras, James O' Neill, Vaikkunth Mugunthan
-
LLM-REVal: Can We Trust LLM Reviewers Yet?
Rui Li, Jia-Chen Gu, Po-Nien Kung, Heming Xia, Junfeng liu, Xiangwen Kong, Zhifang Sui, Nanyun Peng
-
Lang Gao, Xuhui Li, Chenxi Wang, Mingzhe Li, Wei Liu, Zirui Song, Jinghui Zhang, Rui Yan, Preslav Nakov, Xiuying Chen
-
StyleDecipher: Robust and Explainable Detection of LLM-Generated Texts with Stylistic Analysis
Siyuan Li, Aodu Wulianghai, Xi Lin, Guangyan Li, Xiang Chen, Jun Wu, Jianhua Li
-
Vision Language Models Map Logos to Text via Semantic Entanglement in the Visual Projector
Sifan Li, Hongkai Chen, Yujun Cai, Qingwen Ye, Liyang Chen, Junsong Yuan, Yiwei Wang
-
Content Anonymization for Privacy in Long-form Audio
Cristina Aggazzotti, Ashi Garg, Zexin Cai, Nicholas Andrews
-
ImageSentinel: Protecting Visual Datasets from Unauthorized Retrieval-Augmented Image Generation
Ziyuan Luo, Yangyi Zhao, Ka Chun Cheung, Simon See, Renjie Wan
-
MS-GAGA: Metric-Selective Guided Adversarial Generation Attack
Dion J. X. Ho, Gabriel Lee Jun Rong, Niharika Shrivastava, Harshavardhan Abichandani, Pai Chet Ng, Xiaoxiao Miao
-
Fairness-Constrained Optimization Attack in Federated Learning
Harsh Kasyap, Minghong Fang, Zhuqing Liu, Carsten Maple, Somanath Tripathy
-
Bowen Fan, Zhilin Guo, Xunkai Li, Yihan Zhou, Bing Zhou, Zhenjun Li, Rong-Hua Li, Guoren Wang
-
Keep Calm and Avoid Harmful Content: Concept Alignment and Latent Manipulation Towards Safer Answers
Ruben Belo, Claudia Soares, Marta Guimaraes
-
KoALA: KL-L0 Adversarial Detector via Label Agreement
Siqi Li, Yasser Shoukry
-
Sample-Efficient Omniprediction for Proper Losses
Isaac Gibbs, Ryan J. Tibshirani
-
Daniel Pulido-Cortázar, Daniel Gibert, Felip Manyà
-
Leaking Queries On Secure Stream Processing Systems
Hung Pham, Viet Vo, Tien Tuan Anh Dinh, Duc Tran, Shuhao Zhang
-
Ye Tian, Yanqiu Yu, Liangliang Song, Zhiquan Liu, Yanbin Wang, Jianguo Sun
-
Targeted Pooled Latent-Space Steganalysis Applied to Generative Steganography, with a Fix
Etienne Levecque, Aurélien Noirault, Tomáš Pevný, Jan Butora, Patrick Bas, Rémi Cogranne
-
Who's Asking? Evaluating LLM Robustness to Inquiry Personas in Factual Question Answering
Nil-Jana Akpinar, Chia-Jung Lee, Vanessa Murdock, Pietro Perona
-
João A. Leite, Arnav Arora, Silvia Gargova, João Luz, Gustavo Sampaio, Ian Roberts, Carolina Scarton, Kalina Bontcheva
-
Pruning Cannot Hurt Robustness: Certified Trade-offs in Reinforcement Learning
James Pedley, Benjamin Etheridge, Stephen J. Roberts, Francesco Quinzan
-
An Investigation of Memorization Risk in Healthcare Foundation Models
Sana Tonekaboni, Lena Stempfle, Adibvafa Fallahpour, Walter Gerych, Marzyeh Ghassemi
-
Reference-Specific Unlearning Metrics Can Hide the Truth: A Reality Check
Sungjun Cho, Dasol Hwang, Frederic Sala, Sangheum Hwang, Kyunghyun Cho, Sungmin Cha
-
Rithwik Gupta, Daniel Muthukrishna, Jeroen Audenaert
-
Local Differential Privacy for Federated Learning with Fixed Memory Usage and Per-Client Privacy
Rouzbeh Behnia, Jeremiah Birrell, Arman Riasi, Reza Ebrahimi, Kaushik Dutta, Thang Hoang
-
Guarding the Guardrails: A Taxonomy-Driven Approach to Jailbreak Detection
Olga E. Sorokoletova, Francesco Giarrusso, Vincenzo Suriani, Daniele Nardi
-
RAID: Refusal-Aware and Integrated Decoding for Jailbreaking LLMs
Tuan T. Nguyen, John Le, Thai T. Vu, Willy Susilo, Heath Cooper
-
GAR: Generative Adversarial Reinforcement Learning for Formal Theorem Proving
Ruida Wang, Jiarui Yao, Rui Pan, Shizhe Diao, Tong Zhang
-
PHANTOM RECALL: When Familiar Puzzles Fool Smart Models
Souradeep Mukhopadhyay, Rishabh Baral, Nimeesh Mahajan, Samhitha Harish, Aswin RRV, Mihir Parmar, Mutsumi Nakamura, Chitta Baral
-
BlackIce: A Containerized Red Teaming Toolkit for AI Security Testing
Caelin Kaplan, Alexander Warnecke, Neil Archibald
-
Countermind: A Multi-Layered Security Architecture for Large Language Models
Dominik Schwarz
-
LLM Knowledge is Brittle: Truthfulness Representations Rely on Superficial Resemblance
Patrick Haller, Mark Ibrahim, Polina Kirichenko, Levent Sagun, Samuel J. Bell
-
Don't Walk the Line: Boundary Guidance for Filtered Generation
Sarah Ball, Andreas Haupt
-
Deep Research Brings Deeper Harm
Shuo Chen, Zonggen Li, Zhen Han, Bailan He, Tong Liu, Haokun Chen, Georg Groh, Philip Torr, Volker Tresp, Jindong Gu
-
Robust Adversarial Reinforcement Learning in Stochastic Games via Sequence Modeling
Xiaohang Tang, Zhuowen Cheng, Satyabrat Kumar
-
High-Probability Bounds For Heterogeneous Local Differential Privacy
Maryam Aliakbarpour, Alireza Fallah, Swaha Roy, Ria Stevens
-
Yuwen Cui, Guangjing Wang, Khanh Vu, Kai Wei, Kehan Shen, Zhengyuan Jiang, Xiao Han, Ning Wang, Zhuo Lu, Yao Liu
-
Deeksha Hareesha Kulal, Chidozie Princewill Arannonu, Afsah Anwar, Nidhi Rastogi, Quamar Niyaz
-
LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings
Ting Li, Yang Yang, Yipeng Yu, Liang Yao, Guoqing Chao, Ruifeng Xu
-
TabVLA: Targeted Backdoor Attacks on Vision-Language-Action Models
Zonghuan Xu, Xiang Zheng, Xingjun Ma, Yu-Gang Jiang
-
Catch-Only-One: Non-Transferable Examples for Model-Specific Authorization
Zihan Wang, Zhiyong Ma, Zhongkui Ma, Shuofeng Liu, Akide Liu, Derui Wang, Minhui Xue, Guangdong Bai
-
DITTO: A Spoofing Attack Framework on Watermarked LLMs via Knowledge Distillation
Hyeseon Ahn, Shinwoo Park, Yo-Sub Han
-
PhysHSI: Towards a Real-World Generalizable and Natural Humanoid-Scene Interaction System
Huayi Wang, Wentao Zhang, Runyi Yu, Tao Huang, Junli Ren, Feiyu Jia, Zirui Wang, Xiaojie Niu, Xiao Chen, Jiahe Chen, Qifeng Chen, Jingbo Wang, Jiangmiao Pang
-
RAG-Pull: Imperceptible Attacks on RAG Systems for Code Generation
Vasilije Stambolic, Aritra Dhar, Lukas Cavigelli
-
Attacks by Content: Automated Fact-checking is an AI Security Issue
Michael Schlichtkrull
-
Large Language Models Are Effective Code Watermarkers
Rui Xu, Jiawei Chen, Zhaoxia Yin, Cong Kong, Xinpeng Zhang
-
Multi-View Graph Feature Propagation for Privacy Preservation and Feature Sparsity
Etzion Harari, Moshe Unger
-
Early Detection and Reduction of Memorisation for Domain Adaptation and Instruction Tuning
Dean L. Slack, Noura Al Moubayed
-
Living Off the LLM: How LLMs Will Change Adversary Tactics
Sean Oesch, Jack Hutchins, Luke Koch, Kevin Kurian
-
PACEbench: A Framework for Evaluating Practical AI Cyber-Exploitation Capabilities
Zicheng Liu, Lige Huang, Jie Zhang, Dongrui Liu, Yuan Tian, Jing Shao
-
Adversarial Attacks Leverage Interference Between Features in Superposition
Edward Stevinson, Lucas Prieto, Melih Barsbey, Tolga Birdal
-
Information-Preserving Reformulation of Reasoning Traces for Antidistillation
Jiayu Ding, Lei Cui, Li Dong, Nanning Zheng, Furu Wei
-
LLMAtKGE: Large Language Models as Explainable Attackers against Knowledge Graph Embeddings
Ting Li, Yang Yang, Yipeng Yu, Liang Yao, Guoqing Chao, Ruifeng Xu
-
Bag of Tricks for Subverting Reasoning-based Safety Guardrails
Shuo Chen, Zhen Han, Haokun Chen, Bailan He, Shengyun Si, Jingpei Wu, Philip Torr, Volker Tresp, Jindong Gu
-
ROFI: A Deep Learning-Based Ophthalmic Sign-Preserving and Reversible Patient Face Anonymizer
Yuan Tian, Min Zhou, Yitong Chen, Fang Li, Lingzi Qi, Shuo Wang, Xieyang Xu, Yu Yu, Shiqiong Xu, Chaoyu Lei, Yankai Jiang, Rongzhao Zhang, Jia Tan, Li Wu, Hong Chen, Xiaowei Liu, Wei Lu, Lin Li, Huifang Zhou, Xuefei Song, Guangtao Zhai, Xianqun Fan
-
CoDefend: Cross-Modal Collaborative Defense via Diffusion Purification and Prompt Optimization
Fengling Zhu, Boshi Liu, Jingyu Hua, Sheng Zhong
-
Exploring and Leveraging Class Vectors for Classifier Editing
Jaeik Kim, Jaeyoung Do
-
The Easy Path to Robustness: Coreset Selection using Sample Hardness
Pranav Ramesh, Arjun Roy, Deepak Ravikumar, Kaushik Roy, Gopalakrishnan Srinivasan
-
Quantifying Information Disclosure During Gradient Descent Using Gradient Uniqueness
Mahmoud Abdelghafar, Maryam Aliakbarpour, Chris Jermaine
-
Qizhou Peng, Yang Zheng, Yu Wen, Yanna Wu, Yingying Du
-
Adversarial Robustness in One-Stage Learning-to-Defer
Yannis Montreuil, Letian Yu, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi
-
CoSPED: Consistent Soft Prompt Targeted Data Extraction and Defense
Yang Zhuochen, Fok Kar Wai, Thing Vrizlynn
-
TraceAegis: Securing LLM-Based Agents via Hierarchical and Behavioral Anomaly Detection
Jiahao Liu, Bonan Ruan, Xianglin Yang, Zhiwei Lin, Yan Liu, Yang Wang, Tao Wei, Zhenkai Liang
-
Collaborative Shadows: Distributed Backdoor Attacks in LLM-Based Multi-Agent Systems
Pengyu Zhu, Lijun Li, Yaxing Lyu, Li Sun, Sen Su, Jing Shao
-
Joint Discriminative-Generative Modeling via Dual Adversarial Training
Xuwang Yin, Claire Zhang, Julie Steele, Nir Shavit, Tony T. Wang
-
Exploring and Leveraging Class Vectors for Classifier Editing
Jaeik Kim, Jaeyoung Do
-
Bag of Tricks for Subverting Reasoning-based Safety Guardrails
Shuo Chen, Zhen Han, Haokun Chen, Bailan He, Shengyun Si, Jingpei Wu, Philip Torr, Volker Tresp, Jindong Gu
-
Generative AI for Biosciences: Emerging Threats and Roadmap to Biosecurity
Zaixi Zhang, Souradip Chakraborty, Amrit Singh Bedi, Emilin Mathew, Varsha Saravanan, Le Cong, Alvaro Velasquez, Sheng Lin-Gibson, Megan Blewett, Dan Hendrycs, Alex John London, Ellen Zhong, Ben Raphael, Adji Bousso Dieng, Jian Ma, Eric Xing, Russ Altman, George Church, Mengdi Wang
-
The Irrational Machine: Neurosis and the Limits of Algorithmic Safety
Daniel Howard
-
SASER: Stego attacks on open-source LLMs
Ming Tan, Wei Li, Hu Tao, Hailong Ma, Aodi Liu, Qian Chen, Zilong Wang
-
f-INE: A Hypothesis Testing Framework for Estimating Influence under Training Randomness
Subhodip Panda, Dhruv Tarsadiya, Shashwat Sourav, Prathosh A.P, Sai Praneeth Karimireddy
-
From Detection to Mitigation: Addressing Bias in Deep Learning Models for Chest X-Ray Diagnosis
Clemence Mottez, Louisa Fay, Maya Varma, Sophie Ostmeier, Curtis Langlotz
-
Merlin's Whisper: Enabling Efficient Reasoning in LLMs via Black-box Adversarial Prompting
Heming Xia, Cunxiao Du, Rui Li, Chak Tou Leong, Yongqi Li, Wenjie Li
-
DUAL-Bench: Measuring Over-Refusal and Robustness in Vision-Language Models
Kaixuan Ren, Preslav Nakov, Usman Naseem
-
ImpMIA: Leveraging Implicit Bias for Membership Inference Attack under Realistic Scenarios
Yuval Golbari, Navve Wasserman, Gal Vardi, Michal Irani
-
Weiming Zhao, Xulong Wang, Jun Qi, Yun Yang, Po Yang
-
Meng Xi, Sihan Lv, Yechen Jin, Guanjie Cheng, Naibo Wang, Ying Li, Jianwei Yin
-
The Achilles' Heel of LLMs: How Altering a Handful of Neurons Can Cripple Language Abilities
Zixuan Qin, Kunlin Lyu, Qingchen Yu, Yifan Sun, Zhaoxin Fan
-
Pharmacist: Safety Alignment Data Curation for Large Language Models against Harmful Fine-tuning
Guozhi Liu, Qi Mu, Tiansheng Huang, Xinhua Wang, Li Shen, Weiwei Lin, Zhang Li
-
MetaBreak: Jailbreaking Online LLM Services via Special Token Manipulation
Wentian Zhu, Zhen Xiang, Wei Niu, Le Guan
-
ArtPerception: ASCII Art-based Jailbreak on LLMs with Recognition Pre-test
Guan-Yan Yang, Tzu-Yu Cheng, Ya-Wen Teng, Farn Wanga, Kuo-Hui Yeh
-
Path Drift in Large Reasoning Models:How First-Person Commitments Override Safety
Yuyi Huang, Runzhe Zhan, Lidia S.Chao, Ailin Tao, Derek F.Wong
-
A-IPO: Adaptive Intent-driven Preference Optimization
Wenqing Wang, Muhammad Asif Ali, Ali Shoker, Ruohan Yang, Junyang Chen, Ying Sha, Huan Wang
-
Backdoor Collapse: Eliminating Unknown Threats via Known Backdoor Aggregation in Language Models
Liang Lin, Miao Yu, Moayad Aloqaily, Zhenhong Zhou, Kun Wang, Linsey Pang, Prakhar Mehrotra, Qingsong Wen
-
SecureWebArena: A Holistic Security Evaluation Benchmark for LVLM-based Web Agents
Zonghao Ying, Yangguang Shao, Jianle Gan, Gan Xu, Junjie Shen, Wenxin Zhang, Quanchen Zou, Junzheng Shi, Zhenfei Yin, Mingchuan Zhang, Aishan Liu, Xianglong Liu
-
Tight Robustness Certificates and Wasserstein Distributional Attacks for Deep Neural Networks
Bach C. Le, Tung V. Dao, Binh T. Nguyen, Hong T.M. Chu
-
Yue Deng, Francisco Santos, Pang-Ning Tan, Lifeng Luo
-
An information theorist's tour of differential privacy
Anand D. Sarwate, Flavio P. Calmon, Oliver Kosut, Lalitha Sankar
-
Scheming Ability in LLM-to-LLM Strategic Interactions
Thao Pham
-
ADMIT: Few-shot Knowledge Poisoning Attacks on RAG-based Fact Checking
Yutao Wu, Xiao Liu, Yinghui Li, Yifeng Gao, Yifan Ding, Jiale Ding, Xiang Zheng, Xingjun Ma
-
Near-Optimal Second-Order Guarantees for Model-Based Adversarial Imitation Learning
Shangzhe Li, Dongruo Zhou, Weitong Zhang
-
RO-Bench: Large-scale robustness evaluation of MLLMs with text-driven counterfactual videos
Zixi Yang, Jiapeng Li, Muxi Diao, Yinuo Jing, Kongming Liang
-
On Epistemic Uncertainty of Visual Tokens for Object Hallucinations in Large Vision-Language Models
Hoigi Seo, Dong Un Kang, Hyunjin Cho, Joohoon Lee, Se Young Chun
-
Junchao Fan, Xiaolin Chang
-
MemLoss: Enhancing Adversarial Training with Recycling Adversarial Examples
Soroush Mahdi, Maryam Amirmazlaghani, Saeed Saravani, Zahra Dehghanian
-
Zhi Yang, Changwu Huang, Ke Tang, Xin Yao
-
On the Implicit Adversariality of Catastrophic Forgetting in Deep Continual Learning
Ze Peng, Jian Zhang, Jintao Guo, Lei Qi, Yang Gao, Yinghuan Shi
-
Adaptive Attacks on Trusted Monitors Subvert AI Control Protocols
Mikhail Terekhov, Alexander Panfilov, Daniil Dzenhaliou, Caglar Gulcehre, Maksym Andriushchenko, Ameya Prabhu, Jonas Geiping
-
SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG
Xiaonan Si, Meilin Zhu, Simeng Qin, Lijia Yu, Lijun Zhang, Shuaitong Liu, Xinfeng Li, Ranjie Duan, Yang Liu, Xiaojun Jia
-
Group-Adaptive Adversarial Learning for Robust Fake News Detection Against Malicious Comments
Zhao Tong, Chunlin Gong, Yimeng Gu, Haichao Shi, Qiang Liu, Shu Wu, Xiao-Yu Zhang
-
All Code, No Thought: Current Language Models Struggle to Reason in Ciphered Language
Shiyuan Guo, Henry Sleight, Fabien Roger
-
Text Prompt Injection of Vision Language Models
Ruizhe Zhu
-
CoBia: Constructed Conversations Can Trigger Otherwise Concealed Societal Biases in LLMs
Nafiseh Nikeghbal, Amir Hossein Kargaran, Jana Diesner
-
Exploration of Incremental Synthetic Non-Morphed Images for Single Morphing Attack Detection
David Benavente-Rios, Juan Ruiz Rodriguez, Gustavo Gatica
-
Robustness and Regularization in Hierarchical Re-Basin
Benedikt Franke, Florian Heinrich, Markus Lange, Arne Raulf
-
SeCon-RAG: A Two-Stage Semantic Filtering and Conflict-Free Framework for Trustworthy RAG
Xiaonan Si, Meilin Zhu, Simeng Qin, Lijia Yu, Lijun Zhang, Shuaitong Liu, Xinfeng Li, Ranjie Duan, Yang Liu, Xiaojun Jia
-
Yuki Nii, Futa Waseda, Ching-Chun Chang, Isao Echizen
-
All Code, No Thought: Current Language Models Struggle to Reason in Ciphered Language
Shiyuan Guo, Henry Sleight, Fabien Roger
-
Zhi Yang, Changwu Huang, Ke Tang, Xin Yao
-
Profit Mirage: Revisiting Information Leakage in LLM-based Financial Agents
Xiangyu Li, Yawen Zeng, Xiaofen Xing, Jin Xu, Xiangmin Xu
-
VoiceAgentBench: Are Voice Assistants ready for agentic tasks?
Dhruv Jain, Harshit Shukla, Gautam Rajeev, Ashish Kulkarni, Chandra Khatri, Shubham Agarwal
-
Chain-of-Trigger: An Agentic Backdoor that Paradoxically Enhances Agentic Robustness
Jiyang Qiu, Xinbei Ma, Yunqing Xu, Zhuosheng Zhang, Hai Zhao
-
Rethinking Reasoning: A Survey on Reasoning-based Backdoors in LLMs
Man Hu, Xinyi Wu, Zuofeng Suo, Jinbo Feng, Linghui Meng, Yanhao Jia, Anh Tuan Luu, Shuai Zhao
-
Effective and Stealthy One-Shot Jailbreaks on Deployed Mobile Vision-Language Agents
Renhua Ding, Xiao Yang, Zhengwei Fang, Jun Luo, Kun He, Jun Zhu
-
MetaDefense: Defending Finetuning-based Jailbreak Attack Before and During Generation
Weisen Jiang, Sinno Jialin Pan
-
Fewer Weights, More Problems: A Practical Attack on LLM Pruning
Kazuki Egashira, Robin Staab, Thibaud Gloaguen, Mark Vero, Martin Vechev
-
Backdoor Vectors: a Task Arithmetic View on Backdoor Attacks and Defenses
Stanisław Pawlak, Jan Dubiński, Daniel Marczak, Bartłomiej Twardowski
-
Interpreting LLM-as-a-Judge Policies via Verifiable Global Explanations
Jasmina Gajcin, Erik Miehling, Rahul Nair, Elizabeth Daly, Radu Marinescu, Seshu Tirupathi
-
XuHao Hu, Peng Wang, Xiaoya Lu, Dongrui Liu, Xuanjing Huang, Jing Shao
-
Provably Robust Adaptation for Language-Empowered Foundation Models
Yuni Lai, Xiaoyu Xue, Linghui Shen, Yulun Wu, Gaolei Li, Song Guo, Kai Zhou, Bin Xiao
-
SAFER-AiD: Saccade-Assisted Foveal-peripheral vision Enhanced Reconstruction for Adversarial Defense
Jiayang Liu, Daniel Tso, Yiming Bu, Qinru Qiu
-
Deceptive Exploration in Multi-armed Bandits
I. Arda Vurankaya, Mustafa O. Karabag, Wesley A. Suttle, Jesse Milzman, David Fridovich-Keil, Ufuk Topcu
-
CommandSans: Securing AI Agents with Surgical Precision Prompt Sanitization
Debeshee Das, Luca Beurer-Kellner, Marc Fischer, Maximilian Baader
-
Ragib Amin Nihal, Rui Wen, Kazuhiro Nakadai, Jun Sakuma
-
Haoran Ou, Kangjie Chen, Xingshuo Han, Gelei Deng, Jie Zhang, Han Qiu, Tianwei Zhang
-
VisualDAN: Exposing Vulnerabilities in VLMs with Visual-Driven DAN Commands
Aofan Liu, Lulu Tang
-
The Framework That Survives Bad Models: Human-AI Collaboration For Clinical Trials
Yao Chen, David Ohlssen, Aimee Readie, Gregory Ligozio, Ruvie Martin, Thibaud Coroller
-
Are LLMs Reliable Rankers? Rank Manipulation via Two-Stage Token Optimization
Tiancheng Xing, Jerry Li, Yixuan Du, Xiyang Hu
-
Ke Guo, Haochen Liu, Xiaojun Wu, Chen Lv
-
A Multi-Agent Framework for Stateful Inference-Time Search
Arshika Lalan, Rajat Ghosh, Aditya Kolsur, Debojyoti Dutta
-
Cocoon: A System Architecture for Differentially Private Training with Correlated Noises
Donghwan Kim, Xin Gu, Jinho Baek, Timothy Lo, Younghoon Min, Kwangsik Shin, Jongryool Kim, Jongse Park, Kiwan Maeng
-
Do Internal Layers of LLMs Reveal Patterns for Jailbreak Detection?
Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis
-
AWM: Accurate Weight-Matrix Fingerprint for Large Language Models
Boyi Zeng, Lin Chen, Ziwei He, Xinbing Wang, Zhouhan Lin
-
Quantifying Data Contamination in Psychometric Evaluations of LLMs
Jongwook Han, Woojung Song, Jonggeun Lee, Yohan Jo
-
Red-Bandit: Test-Time Adaptation for LLM Red-Teaming via Bandit-Guided LoRA Experts
Christos Ziakas, Nicholas Loo, Nishita Jain, Alessandra Russo
-
XLSR-Kanformer: A KAN-Intergrated model for Synthetic Speech Detection
Phuong Tuan Dat, Tran Huy Dat
-
Differentially Private Synthetic Text Generation for Retrieval-Augmented Generation (RAG)
Junki Mori, Kazuya Kakizaki, Taiki Miyagawa, Jun Sakuma
-
Exposing Citation Vulnerabilities in Generative Engines
Riku Mochizuki, Shusuke Komatsu, Souta Noguchi, Kazuto Ataka
-
RedTWIZ: Diverse LLM Red Teaming via Adaptive Attack Planning
Artur Horal, Daniel Pina, Henrique Paz, Iago Paulo, João Soares, Rafael Ferreira, Diogo Tavares, Diogo Glória-Silva, João Magalhães, David Semedo
-
StyleKeeper: Prevent Content Leakage using Negative Visual Query Guidance
Jaeseok Jeong, Junho Kim, Gayoung Lee, Yunjey Choi, Youngjung Uh
-
Label-frugal satellite image change detection with generative virtual exemplar learning
Hichem Sahbi
-
OBJVanish: Physically Realizable Text-to-3D Adv. Generation of LiDAR-Invisible Objects
Bing Li, Wuqi Wang, Yanan Zhang, Jingzheng Li, Haigen Min, Wei Feng, Xingyu Zhao, Jie Zhang, Qing Guo
-
SpecGuard: Spectral Projection-based Advanced Invisible Watermarking
Inzamamul Alam, Md Tanvir Islam, Khan Muhammad, Simon S. Woo
-
Unsupervised Backdoor Detection and Mitigation for Spiking Neural Networks
Jiachen Li, Bang Wu, Xiaoyu Xia, Xiaoning Liu, Xun Yi, Xiuzhen Zhang
-
SaFeR-VLM: Toward Safety-aware Fine-grained Reasoning in Multimodal Models
Huahui Yi, Kun Wang, Qiankun Li, Miao Yu, Liang Lin, Gongli Xi, Hao Wu, Xuming Hu, Kang Li, Yang Liu
-
Revisiting Mixout: An Overlooked Path to Robust Finetuning
Masih Aminbeidokhti, Heitor Rapela Medeiros, Eric Granger, Marco Pedersoli
-
Is the Hard-Label Cryptanalytic Model Extraction Really Polynomial?
Akira Ito, Takayuki Miura, Yosuke Todo
-
Get RICH or Die Scaling: Profitably Trading Inference Compute for Robustness
Tavish McDonald, Bo Lei, Stanislav Fort, Bhavya Kailkhura, Brian Bartoldson
-
Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples
Alexandra Souly, Javier Rando, Ed Chapman, Xander Davies, Burak Hasircioglu, Ezzeldin Shereen, Carlos Mougan, Vasilios Mavroudis, Erik Jones, Chris Hicks, Nicholas Carlini, Yarin Gal, Robert Kirk
-
Falsification-Driven Reinforcement Learning for Maritime Motion Planning
Marlon Müller, Florian Finkeldei, Hanna Krasowski, Murat Arcak, Matthias Althoff
-
Code Agent can be an End-to-end System Hacker: Benchmarking Real-world Threats of Computer-use Agent
Weidi Luo, Qiming Zhang, Tianyu Lu, Xiaogeng Liu, Bin Hu, Hung-Chun Chiu, Siyuan Ma, Yizhe Zhang, Xusheng Xiao, Yinzhi Cao, Zhen Xiang, Chaowei Xiao
-
Yixiang Zhang, Xinhao Deng, Zhongyi Gu, Yihao Chen, Ke Xu, Qi Li, Jianping Wu
-
Yuhua Xu, Wei Sun, Chengpei Tang, Jiaxing Lu, Jingying Zhou, Chen Gu
-
Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race
Xutao Mao, Ke Li, Cameron Baird, Ezra Xuanru Tao, Dan Lin
-
Muhammad Usman, Yugyung Lee
-
PEAR: Planner-Executor Agent Robustness Benchmark
Shen Dong, Mingxuan Zhang, Pengfei He, Li Ma, Bhavani Thuraisingham, Hui Liu, Yue Xing
-
A2AS: Agentic AI Runtime Security and Self-Defense
Eugene Neelou, Ivan Novikov, Max Moroz, Om Narayan, Tiffany Saade, Mika Ayenson, Ilya Kabanov, Jen Ozmen, Edward Lee, Vineeth Sai Narajala, Emmanuel Guilherme Junior, Ken Huang, Huseyin Gulsin, Jason Ross, Marat Vyshegorodtsev, Adelin Travers, Idan Habler, Rahul Jadav
-
Benchmarking Fake Voice Detection in the Fake Voice Generation Arms Race
Xutao Mao, Ke Li, Cameron Baird, Ezra Xuanru Tao, Dan Lin
-
RGBD Gaze Tracking Using Transformer for Feature Fusion
Tobias J. Bauer
-
Protecting De-identified Documents from Search-based Linkage Attacks
Pierre Lison, Mark Anderson
-
Geometry-Aware Backdoor Attacks: Leveraging Curvature in Hyperbolic Embeddings
Ali Baheri
-
A Survey on Agentic Security: Applications, Threats and Defenses
Asif Shahriar, Md Nafiu Rahman, Sadif Ahmed, Farig Sadeque, Md Rizwan Parvez
-
Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security
Ali Naseh, Anshuman Suri, Yuefeng Peng, Harsh Chaudhari, Alina Oprea, Amir Houmansadr
-
Empirical Comparison of Membership Inference Attacks in Deep Transfer Learning
Yuxuan Bai, Gauri Pradhan, Marlon Tobaben, Antti Honkela
-
Automated Repeatable Adversary Threat Emulation with Effects Language (EL)
Suresh K. Damodaran, Paul D. Rowe
-
Vipul Goyal, Justin Raizes
-
LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback
Raffaele Mura, Giorgio Piras, Kamilė Lukošiūtė, Maura Pintor, Amin Karbasi, Battista Biggio
-
Toward a Safer Web: Multilingual Multi-Agent LLMs for Mitigating Adversarial Misinformation Attacks
Nouar Aldahoul, Yasir Zaki
-
Adversarial-Resilient RF Fingerprinting: A CNN-GAN Framework for Rogue Transmitter Detection
Raju Dhakal, Prashant Shekhar, Laxima Niure Kandel
-
RareAgent: Self-Evolving Reasoning for Drug Repurposing in Rare Diseases
Lang Qin, Zijian Gan, Xu Cao, Pengcheng Jiang, Yankai Jiang, Jiawei Han, Kaishun Wu, Jintai Chen
-
The Role of Federated Learning in Improving Financial Security: A Survey
Cade Houston Kennedy, Amr Hilal, Morteza Momeni
-
LatentBreak: Jailbreaking Large Language Models through Latent Space Feedback
Raffaele Mura, Giorgio Piras, Kamilė Lukošiūtė, Maura Pintor, Amin Karbasi, Battista Biggio
-
Khartik Uppalapati, Shakeel Abdulkareem, Bora Yimenicioglu
-
A Calibration-Free Fixed Point of Curved Boolean Logic Matching the Fine-Structure Constant
Maximilian R. P. von Liechtenstein
-
AutoDAN-Reasoning: Enhancing Strategies Exploration based Jailbreak Attacks with Test-Time Scaling
Xiaogeng Liu, Chaowei Xiao
-
Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment
Nevan Wichers, Aram Ebtekar, Ariana Azarbal, Victor Gillioz, Christine Ye, Emil Ryd, Neil Rathi, Henry Sleight, Alex Mallen, Fabien Roger, Samuel Marks
-
Inoculation Prompting: Instructing LLMs to misbehave at train-time improves test-time alignment
Nevan Wichers, Aram Ebtekar, Ariana Azarbal, Victor Gillioz, Christine Ye, Emil Ryd, Neil Rathi, Henry Sleight, Alex Mallen, Fabien Roger, Samuel Marks
-
SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
Peigui Qi, Kunsheng Tang, Wenbo Zhou, Weiming Zhang, Nenghai Yu, Tianwei Zhang, Qing Guo, Jie Zhang
-
Rounding-Guided Backdoor Injection in Deep Learning Model Quantization
Xiangxiang Chen, Peixin Zhang, Jun Sun, Wenhai Wang, Jingyi Wang
-
SafeGuider: Robust and Practical Content Safety Control for Text-to-Image Models
Peigui Qi, Kunsheng Tang, Wenbo Zhou, Weiming Zhang, Nenghai Yu, Tianwei Zhang, Qing Guo, Jie Zhang
-
Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time
Daniel Tan, Anders Woodruff, Niels Warncke, Arun Jose, Maxime Riché, David Demitri Africa, Mia Taylor
-
Agentic Misalignment: How LLMs Could Be Insider Threats
Aengus Lynch, Benjamin Wright, Caleb Larson, Stuart J. Ritchie, Soren Mindermann, Evan Hubinger, Ethan Perez, Kevin Troy
-
Shahriar Kabir Nahin, Hadi Askari, Muhao Chen, Anshuman Chhabra
-
Detecting Malicious Pilot Contamination in Multiuser Massive MIMO Using Decision Trees
Pedro Ivo da Cruz, Dimitri Silva, Tito Spadini, Ricardo Suyama, Murilo Bellezoni Loiola
-
Uncertainty Quantification In Surface Landmines and UXO Classification Using MC Dropout
Sagar Lekhak, Emmett J. Ientilucci, Dimah Dera, Susmita Ghosh
-
Malice in Agentland: Down the Rabbit Hole of Backdoors in the AI Supply Chain
Léo Boisvert, Abhay Puri, Chandra Kiran Reddy Evuru, Nicolas Chapados, Quentin Cappart, Alexandre Lacoste, Krishnamurthy Dj Dvijotham, Alexandre Drouin
-
Attack via Overfitting: 10-shot Benign Fine-tuning to Jailbreak LLMs
Zhixin Xie, Xurui Song, Jun Luo
-
Machine Unlearning Meets Adversarial Robustness via Constrained Interventions on LLMs
Fatmazohra Rezkellah, Ramzi Dakhmouche
-
Xinzhe Huang, Wenjing Hu, Tianhang Zheng, Kedong Xiu, Xiaojun Jia, Di Wang, Zhan Qin, Kui Ren
-
Kedong Xiu, Churui Zeng, Tianhang Zheng, Xinzhe Huang, Xiaojun Jia, Di Wang, Puning Zhao, Zhan Qin, Kui Ren
-
Vicinity-Guided Discriminative Latent Diffusion for Privacy-Preserving Domain Adaptation
Jing Wang, Wonho Bae, Jiahong Chen, Wenxu Wang, Junhyug Noh
-
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers
Xin-Qiang Cai, Wei Wang, Feng Liu, Tongliang Liu, Gang Niu, Masashi Sugiyama
-
Eliciting Secret Knowledge from Language Models
Bartosz Cywiński, Emil Ryd, Rowan Wang, Senthooran Rajamanoharan, Neel Nanda, Arthur Conmy, Samuel Marks
-
Baseline Systems For The 2025 Low-Resource Audio Codec Challenge
Yusuf Ziya Isik, Rafał Łaganowski
-
A Generalized Information Bottleneck Theory of Deep Learning
Charles Westphal, Stephen Hailes, Mirco Musolesi
-
AdvChain: Adversarial Chain-of-Thought Tuning for Robust Safety Alignment of Large Reasoning Models
Zihao Zhu, Xinyu Wu, Gehan Hu, Siwei Lyu, Ke Xu, Baoyuan Wu
-
Towards Safe Reasoning in Large Reasoning Models via Corrective Intervention
Yichi Zhang, Yue Ding, Jingwen Yang, Tianwei Luo, Dongbai Li, Ranjie Duan, Qiang Liu, Hang Su, Yinpeng Dong, Jun Zhu
-
UniAPL: A Unified Adversarial Preference Learning Framework for Instruct-Following
FaQiang Qian, WeiKun Zhang, Ziliang Wang, Kang An, Xuhui Zheng, Liangjian Wen, Mengya Gao, Yong Dai, Yichao Wu
-
Stable Forgetting: Bounded Parameter-Efficient Unlearning in LLMs
Arpit Garg, Hemanth Saratchandran, Ravi Garg, Simon Lucey
-
Metamorphic Testing for Audio Content Moderation Software
Wenxuan Wang, Yongjiang Wu, Junyuan Zhang, Shuqing Li, Yun Peng, Wenting Chen, Shuai Wang, Michael R. Lyu
-
Adversarial Reinforcement Learning Framework for ESP Cheater Simulation
Inkyu Park, Jeong-Gwan Lee, Taehwan Kwon, Juheon Choi, Seungku Kim, Junsu Kim, Kimin Lee
-
DiffuGuard: How Intrinsic Safety is Lost and Found in Diffusion Large Language Models
Zherui Li, Zheng Nie, Zhenhong Zhou, Yufei Guo, Yue Liu, Yitong Zhang, Yu Cheng, Qingsong Wen, Kun Wang, Jiaheng Zhang
-
HarmMetric Eval: Benchmarking Metrics and Judges for LLM Harmfulness Assessment
Langqi Yang, Tianhang Zheng, Kedong Xiu, Yixuan Chen, Di Wang, Puning Zhao, Zhan Qin, Kui Ren
-
Community detection robustness of graph neural networks
Jaidev Goel, Pablo Moriano, Ramakrishnan Kannan, Yulia R. Gel
-
Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption
Longxiang He, Deheng Ye, Junbo Tan, Xueqian Wang, Li Shen
-
Scalable GANs with Transformers
Sangeek Hyun, MinKyu Lee, Jae-Pil Heo
-
SecInfer: Preventing Prompt Injection via Inference-time Scaling
Yupei Liu, Yanting Wang, Yuqi Jia, Jinyuan Jia, Neil Zhenqiang Gong
-
GHOST: Hallucination-Inducing Image Generation for Multimodal LLMs
Aryan Yazdan Parast, Parsa Hosseini, Hesam Asadollahzadeh, Arshia Soltani Moakhar, Basim Azam, Soheil Feizi, Naveed Akhtar
-
Sanitize Your Responses: Mitigating Privacy Leakage in Large Language Models
Wenjie Fu, Huandong Wang, Junyao Gao, Guoan Wan, Tao Jiang
-
SemanticShield: LLM-Powered Audits Expose Shilling Attacks in Recommender Systems
Kaihong Li, Huichi Zhou, Bin Ma, Fangjun Huang
-
DRIFT: Divergent Response in Filtered Transformations for Robust Adversarial Defense
Amira Guesmi, Muhammad Shafique
-
TokenSwap: Backdoor Attack on the Compositional Understanding of Large Vision-Language Models
Zhifang Zhang, Qiqi Tao, Jiaqi Lv, Na Zhao, Lei Feng, Joey Tianyi Zhou
-
VAGUEGAN: Stealthy Poisoning and Backdoor Attacks on Image Generative Pipelines
Mostafa Mohaimen Akand Faisal, Rabeya Amin Jhuma
-
MANI-Pure: Magnitude-Adaptive Noise Injection for Adversarial Purification
Xiaoyi Huang, Junwei Wu, Kejia Zhang, Carl Yang, Zhiming Luo
-
Score-based Membership Inference on Diffusion Models
Mingxing Rao, Bowen Qu, Daniel Moyer
-
H+: An Efficient Similarity-Aware Aggregation for Byzantine Resilient Federated Learning
Shiyuan Zuo, Rongfei Fan, Cheng Zhan, Jie Xu, Puning Zhao, Han Hu
-
Distributionally Robust Federated Learning with Outlier Resilience
Zifan Wang, Xinlei Yi, Xenia Konti, Michael M. Zavlanos, Karl H. Johansson
-
Guided Uncertainty Learning Using a Post-Hoc Evidential Meta-Model
Charmaine Barker, Daniel Bethell, Simos Gerasimou
-
Learning in an Echo Chamber: Online Learning with Replay Adversary
Daniil Dmitriev, Harald Eskelund Franck, Carolin Heinzler, Amartya Sanyal
-
FuncPoison: Poisoning Function Library to Hijack Multi-agent Autonomous Driving Systems
Yuzhen Long, Songze Li
-
Takedown: How It's Done in Modern Coding Agent Exploits
Eunkyu Lee, Donghyeon Kim, Wonyoung Kim, Insu Yun
-
When MCP Servers Attack: Taxonomy, Feasibility, and Mitigation
Weibo Zhao, Jiahao Liu, Bonan Ruan, Shaofei Li, Zhenkai Liang
-
GSPR: Aligning LLM Safeguards as Generalizable Safety Policy Reasoners
Haoran Li, Yulin Chen, Jingru Zeng, Hao Peng, Huihao Jing, Wenbin Hu, Xi Yang, Ziqian Zeng, Sirui Han, Yangqiu Song
-
PRIVMARK: Private Large Language Models Watermarking with MPC
Thomas Fargues, Ye Dong, Tianwei Zhang, Jin-Song Dong
-
Tereza Burianová, Martin Perešíni, Ivan Homoliak
-
Formalization Driven LLM Prompt Jailbreaking via Reinforcement Learning
Zhaoqi Wang, Daqing He, Zijian Zhang, Xin Li, Liehuang Zhu, Meng Li, Jiamou Liu
-
SafeSearch: Automated Red-Teaming for the Safety of LLM-Based Search Agents
Jianshuo Dong, Sheng Guo, Hao Wang, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu
-
Quant Fever, Reasoning Blackholes, Schrodinger's Compliance, and More: Probing GPT-OSS-20B
Shuyi Lin, Tian Lu, Zikai Wang, Bo Wen, Yibo Zhao, Cheng Tan
-
Uncovering Vulnerabilities of LLM-Assisted Cyber Threat Intelligence
Yuqiao Meng, Luoxi Tang, Feiyang Yu, Jinyuan Jia, Guanhua Yan, Ping Yang, Zhaohan Xi
-
BioVessel-Net and RetinaMix: Unsupervised Retinal Vessel Segmentation from OCTA Images
Cheng Huang, Weizheng Xie, Fan Gao, Yutong Liu, Ruoling Wu, Zeyu Han, Jingxi Qiu, Xiangxiang Wang, Zhenglin Yang, Hao Wang, Yongbin Yu
-
Generalizable Speech Deepfake Detection via Information Bottleneck Enhanced Adversarial Alignment
Pu Huang, Shouguang Wang, Siya Yao, Mengchu Zhou
-
Accuracy-Robustness Trade Off via Spiking Neural Network Gradient Sparsity Trail
Nhan T. Luu
-
HFuzzer: Testing Large Language Models for Package Hallucinations via Phrase-based Fuzzing
Yukai Zhao, Menghan Wu, Xing Hu, Xin Xia
-
Adversarial Diffusion for Robust Reinforcement Learning
Daniele Foffano, Alessio Russo, Alexandre Proutiere
-
Taught Well Learned Ill: Towards Distillation-conditional Backdoor Attack
Yukun Chen, Boheng Li, Yu Yuan, Leyi Qi, Yiming Li, Tianwei Zhang, Zhan Qin, Kui Ren
-
Towards Understanding Subliminal Learning: When and How Hidden Biases Transfer
Simon Schrodi, Elias Kempf, Fazl Barez, Thomas Brox
-
Preserving Cross-Modal Stability for Visual Unlearning in Multimodal Scenarios
Jinghan Xu Yuyang Zhang Qixuan Cai Jiancheng Chen Keqiu Li
-
Beyond Magic Words: Sharpness-Aware Prompt Evolving for Robust Large Language Models with TARE
Guancheng Wan, Lucheng Fu, Haoxin Liu, Yiqiao Jin, Hui Yi Leong, Eric Hanchen Jiang, Hejia Geng, Jinhe Bi, Yunpu Ma, Xiangru Tang, B. Aditya Prakash, Yizhou Sun, Wei Wang
-
Efficient Domain-Adaptive Multi-Task Dense Prediction with Vision Foundation Models
Beomseok Kang, Niluthpol Chowdhury Mithun, Mikhail Sizintsev, Han-Pang Chiu, Supun Samarasekera
-
Efthymios Tsaprazlis, Tiantian Feng, Anil Ramakrishna, Rahul Gupta, Shrikanth Narayanan
-
Djamel Eddine Boukhari
-
You Zhou, Lijiang Chen, Shuchang Lyu, Guangxia Cui, Wenpei Bai, Zheng Zhou, Meng Li, Guangliang Cheng, Huiyu Zhou, Qi Zhao
-
Bridging the Task Gap: Multi-Task Adversarial Transferability in CLIP and Its Derivatives
Kuanrong Liu, Siyuan Liang, Cheng Qian, Ming Zhang, Xiaochun Cao
-
StolenLoRA: Exploring LoRA Extraction Attacks via Synthetic Data
Yixu Wang, Yan Teng, Yingchun Wang, Xingjun Ma
-
FedDAPL: Toward Client-Private Generalization in Federated Learning
Soroosh Safari Loaliyan, Jose-Luis Ambite, Paul M. Thompson, Neda Jahanshad, Greg Ver Steeg
-
Merge Now, Regret Later: The Hidden Cost of Model Merging is Adversarial Transferability
Ankit Gangwal, Aaryan Ajay Sharma
-
Visual CoT Makes VLMs Smarter but More Fragile
Chunxue Xu, Yiwei Wang, Yujun Cai, Bryan Hooi, Songze Li
-
Influence-Guided Concolic Testing of Transformer Robustness
Chih-Duo Hong, Yu Wang, Yao-Chen Chang, Fang Yu
-
Sheikh Md Mushfiqur Rahman, Nasir Eisty
-
AutoML in Cybersecurity: An Empirical Study
Sherif Saad, Kevin Shi, Mohammed Mamun, Hythem Elmiligi
-
A First Look at Privacy Risks of Android Task-executable Voice Assistant Applications
Shidong Pan, Yikai Ge, Xiaoyu Sun
-
GPM: The Gaussian Pancake Mechanism for Planting Undetectable Backdoors in Differential Privacy
Haochen Sun, Xi He
-
Binary Diff Summarization using Large Language Models
Meet Udeshi, Venkata Sai Charan Putrevu, Prashanth Krishnamurthy, Prashant Anantharaman, Sean Carrick, Ramesh Karri, Farshad Khorrami
-
Analyzing and Evaluating Unbiased Language Model Watermark
Yihan Wu, Xuehao Cui, Ruibo Chen, Heng Huang
-
Performance of Machine Learning Methods for Gravity Inversion: Successes and Challenges
Vahid Negahdari, Shirin Samadi Bahrami, Seyed Reza Moghadasi, Mohammad Reza Razvan
-
Deceive, Detect, and Disclose: Large Language Models Play Mini-Mafia
Davi Bastos Costa, Renato Vicente
-
LLM Watermark Evasion via Bias Inversion
Jeongyeon Hwang, Sangdon Park, Jungseul Ok
-
DPFNAS: Differential Privacy-Enhanced Federated Neural Architecture Search for 6G Edge Intelligence
Yang Lv, Jin Cao, Ben Niu, Zhe Sun, Fengwei Wang, Fenghua Li, Hui Li
-
Virus Infection Attack on LLMs: Your Poisoning Can Spread "VIA" Synthetic Data
Zi Liang, Qingqing Ye, Xuan Liu, Yanyun Wang, Jianliang Xu, Haibo Hu
-
Patch Rebirth: Toward Fast and Transferable Model Inversion of Vision Transformers
Seongsoo Heo, Dong-Wan Choi
-
Adaptive Token-Weighted Differential Privacy for LLMs: Not All Tokens Require Equal Protection
Manjiang Yu, Priyanka Singh, Xue Li, Yang Cao
-
Vid-Freeze: Protecting Images from Malicious Image-to-Video Generation via Temporal Freezing
Rohit Chowdhury, Aniruddha Bala, Rohan Jaiswal, Siddharth Roheda
-
A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
Wonje Jeung, Sangyeon Yoon, Yoonjun Cho, Dongjae Jeon, Sangwoo Shin, Hyesoo Hong, Albert No
-
Jonas Ngnawé, Maxime Heuillet, Sabyasachi Sahoo, Yann Pequignot, Ola Ahmad, Audrey Durand, Frédéric Precioso, Christian Gagné
-
Dual-Space Smoothness for Robust and Balanced LLM Unlearning
Han Yan, Zheyuan Liu, Meng Jiang
-
Factor Decorrelation Enhanced Data Removal from Deep Predictive Models
Wenhao Yang, Lin Li, Xiaohui Tao, Kaize Shi
-
ReliabilityRAG: Effective and Provably Robust Defense for RAG-based Web-Search
Zeyu Shen, Basileal Imana, Tong Wu, Chong Xiang, Prateek Mittal, Aleksandra Korolova
-
Wonhyuk Lee, Youngchol Kim, Yunjin Park, Junhyung Moon, Dongyoung Jeong, Wanjin Park
-
MaskSQL: Safeguarding Privacy for LLM-Based Text-to-SQL via Abstraction
Sepideh Abedini, Shubhankar Mohapatra, D. B. Emerson, Masoumeh Shafieinejad, Jesse C. Cresswell, Xi He
-
Zhiqiang Tian, Weigang Li, Chunhua Deng, Junwei Hu, Yongqiang Wang, Wenping Liu
-
Real-World Transferable Adversarial Attack on Face-Recognition Systems
Andrey Kaznacheev, Matvey Mikhalchuk, Andrey Kuznetsov, Aleksandr Petiushko, Anton Razzhigaev
-
Ming-Tsung Hsu, Fang-Yu Hsu, Yi-Ting Lin, Kai-Heng Chien, Jun-Ren Chen, Cheng-Hsiang Su, Yi-Chen Ou, Chiou-Ting Hsu, Pei-Kai Huang
-
Nikolas McNeal, N. Apurva Ratan Murty
-
GuardNet: Graph-Attention Filtering for Jailbreak Defense in Large Language Models
Javad Forough, Mohammad Maheri, Hamed Haddadi
-
CoSIFL: Collaborative Secure and Incentivized Federated Learning with Differential Privacy
Zhanhong Xie, Meifan Zhang, Lihua Yin
-
NanoFlux: Adversarial Dual-LLM Evaluation and Distillation For Multi-Domain Reasoning
Raviteja Anantha, Soheil Hor, Teodor Nicola Antoniu, Layne C. Price
-
Xiangchen Meng, Yangdi Lyu
-
Bartosz Burgiel
-
Benchmarking MLLM-based Web Understanding: Reasoning, Robustness and Safety
Junliang Liu, Jingyu Xiao, Wenxin Tang, Wenxuan Wang, Zhixian Wang, Minrui Zhang, Shuanghe Yu
-
Backdoor Attribution: Elucidating and Controlling Backdoor in Language Models
Miao Yu, Zhenhong Zhou, Moayad Aloqaily, Kun Wang, Biwei Huang, Stephen Wang, Yueming Jin, Qingsong Wen
-
You Can't Steal Nothing: Mitigating Prompt Leakages in LLMs via System Vectors
Bochuan Cao, Changjiang Li, Yuanpu Cao, Yameng Ge, Ting Wang, Jinghui Chen
-
Active Attacks: Red-teaming LLMs via Adaptive Environments
Taeyoung Yun, Pierre-Luc St-Charles, Jinkyoo Park, Yoshua Bengio, Minsu Kim
-
Benchmarking and Mitigate Psychological Sycophancy in Medical Vision-Language Models
Zikun Guo, Xinyue Xu, Pei Xiang, Shu Yang, Xin Han, Di Wang, Lijie Hu
-
Aravindhan G, Yuvaraj Govindarajulu, Parin Shah
-
The Rogue Scalpel: Activation Steering Compromises LLM Safety
Anton Korznikov, Andrey Galichin, Alexey Dontsov, Oleg Y. Rogov, Ivan Oseledets, Elena Tutubalina
-
Jailbreaking on Text-to-Video Models via Scene Splitting Strategy
Wonjun Lee, Haon Park, Doehyeon Lee, Bumsub Ham, Suhyun Kim
-
Evaluating the Limits of Large Language Models in Multilingual Legal Reasoning
Antreas Ioannou, Andreas Shiamishis, Nora Hollenstein, Nezihe Merve Gürel
-
Mixture of Detectors: A Compact View of Machine-Generated Text Detection
Sai Teja Lekkala, Yadagiri Annepaka, Arun Kumar Challa, Samatha Reddy Machireddy, Partha Pakray, Chukhu Chunka
-
Context Parametrization with Compositional Adapters
Josip Jukić, Martin Tutek, Jan Šnajder
-
SBFA: Single Sneaky Bit Flip Attack to Break Large Language Models
Jingkai Guo, Chaitali Chakrabarti, Deliang Fan
-
Deepfakes: we need to re-think the concept of "real" images
Janis Keuper, Margret Keuper
-
FailureAtlas:Mapping the Failure Landscape of T2I Models via Active Exploration
Muxi Chen, Zhaohua Zhang, Chenchen Zhao, Mingyang Chen, Wenyu Jiang, Tianwen Jiang, Jianhuan Zhuo, Yu Tang, Qiuyong Xiao, Jihong Zhang, Qiang Xu
-
RAPID^3: Tri-Level Reinforced Acceleration Policies for Diffusion Transformer
Wangbo Zhao, Yizeng Han, Zhiwei Tang, Jiasheng Tang, Pengfei Zhou, Kai Wang, Bohan Zhuang, Zhangyang Wang, Fan Wang, Yang You
-
Text Adversarial Attacks with Dynamic Outputs
Wenqiang Wang, Siyuan Liang, Xiao Yan, Xiaochun Cao
-
Closing the Safety Gap: Surgical Concept Erasure in Visual Autoregressive Models
Xinhao Zhong, Yimin Zhou, Zhiqi Zhang, Junhao Li, Yi Sun, Bin Chen, Shu-Tao Xia, Ke Xu
-
Zubov-Net: Adaptive Stability for Neural ODEs Reconciling Accuracy with Robustness
Chaoyang Luo, Yan Zou, Nanjing Huang
-
Concept-SAE: Active Causal Probing of Visual Model Behavior
Jianrong Ding, Muxi Chen, Chenchen Zhao, Qiang Xu
-
Non-Linear Trajectory Modeling for Multi-Step Gradient Inversion Attacks in Federated Learning
Li Xia, Zheng Liu, Sili Huang, Wei Tang, Xuan Liu
-
Countering adversarial evasion in regression analysis
David Benfield, Phan Tu Vuong, Alain Zemkoho
-
A Law of Data Reconstruction for Random Features (and Beyond)
Leonardo Iurada, Simone Bombari, Tatiana Tommasi, Marco Mondelli
-
Erase or Hide? Suppressing Spurious Unlearning Neurons for Robust Unlearning
Nakyeong Yang, Dong-Kyum Kim, Jea Kwon, Minsung Kim, Kyomin Jung, Meeyoung Cha
-
Nonlinear Optimization with GPU-Accelerated Neural Network Constraints
Robert Parker, Oscar Dowson, Nicole LoGiudice, Manuel Garcia, Russell Bent
-
"Your AI, My Shell": Demystifying Prompt Injection Attacks on Agentic AI Coding Editors
Yue Liu, Yanjie Zhao, Yunbo Lyu, Ting Zhang, Haoyu Wang, David Lo
-
Collusion-Driven Impersonation Attack on Channel-Resistant RF Fingerprinting
Zhou Xu, Guyue Li, Zhe Peng, Aiqun Hu
-
Privacy Mechanism Design based on Empirical Distributions
Leonhard Grosse, Sara Saeidian, Mikael Skoglund, Tobias J. Oechtering
-
Gaurav Bagwe, Saket S. Chaturvedi, Xiaolong Ma, Xiaoyong Yuan, Kuang-Ching Wang, Lan Zhang
-
Hassen Dhrif
-
Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment
Jaehan Kim, Minkyoo Song, Seungwon Shin, Sooel Son
-
Seeing Isn't Believing: Context-Aware Adversarial Patch Synthesis via Conditional GAN
Roie Kazoom, Alon Goldberg, Hodaya Cohen, Ofer Hadar
-
Boundary on the Table: Efficient Black-Box Decision-Based Attacks for Structured Data
Roie Kazoom, Yuval Ratzabi, Etamar Rothstein, Ofer Hadar
-
Observation-Free Attacks on Online Learning to Rank
Sameep Chattopadhyay, Nikhil Karamchandani, Sharayu Mohair
-
Soft-Di[M]O: Improving One-Step Discrete Image Generation with Soft Embeddings
Yuanzhi Zhu, Xi Wang, Stéphane Lathuilière, Vicky Kalogeiton
-
Unsupervised Speech Enhancement using Data-defined Priors
Dominik Klement, Matthew Maciejewski, Sanjeev Khudanpur, Jan Černocký, Lukáš Burget
-
ChatInject: Abusing Chat Templates for Prompt Injection in LLM Agents
Hwan Chang, Yonghyun Jun, Hwanhee Lee
-
Concept activation vectors: a unifying view and adversarial attacks
Ekkehard Schnoor, Malik Tiomoko, Jawher Said, Alex Jung, Wojciech Samek
-
Model Context Protocol for Vision Systems: Audit, Security, and Protocol Extensions
Aditi Tiwari, Akshit Bhalla, Darshan Prasad
-
Eduardo Chielle, Manaar Alam, Jinting Liu, Jovan Kascelan, Michail Maniatakos
-
AntiFLipper: A Secure and Efficient Defense Against Label-Flipping Attacks in Federated Learning
Aashnan Rahman, Abid Hasan, Sherajul Arifin, Faisal Haque Bappy, Tahrim Hossain, Tariqul Islam, Abu Raihan Mostofa Kamal, Md. Azam Hossain
-
Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment
Jaehan Kim, Minkyoo Song, Seungwon Shin, Sooel Son
-
On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations
Jianing Guo, Zhenhong Wu, Chang Tu, Yiyao Ma, Xiangqi Kong, Zhiqian Liu, Jiaming Ji, Shuning Zhang, Yuanpei Chen, Kai Chen, Xianglong Liu, Qi Dou, Yaodong Yang, Huijie Zhao, Weifeng Lv, Simin Li
-
SAGE: A Realistic Benchmark for Semantic Understanding
Samarth Goel, Reagan J. Lee, Kannan Ramchandran
-
A Framework for Rapidly Developing and Deploying Protection Against Large Language Model Attacks
Adam Swanda, Amy Chang, Alexander Chen, Fraser Burch, Paul Kassianik, Konstantin Berlin
-
Addressing Gradient Misalignment in Data-Augmented Training for Robust Speech Deepfake Detection
Duc-Tuan Truong, Tianchi Liu, Junjie Li, Ruijie Tao, Kong Aik Lee, Eng Siong Chng
-
DAC-LoRA: Dynamic Adversarial Curriculum for Efficient and Robust Few-Shot Adaptation
Ved Umrajkar
-
Trustworthy Semantic Communication for Vehicular Networks: Challenges and Solutions
Yanghe Pan, Yuntao Wang, Shaolong Guo, Chengyu Yin, Ruidong Li, Zhou Su, Yuan Wu
-
Security-aware Semantic-driven ISAC via Paired Adversarial Residual Networks
Yu Liu, Boxiang He, Fanggang Wang
-
Automatic Red Teaming LLM-based Agents with Model Context Protocol Tools
Ping He, Changjiang Li, Binbin Zhao, Tianyu Du, Shouling Ji
-
The Use of the Simplex Architecture to Enhance Safety in Deep-Learning-Powered Autonomous Systems
Federico Nesti, Niko Salamini, Mauro Marinoni, Giorgio Maria Cicero, Gabriele Serra, Alessandro Biondi, Giorgio Buttazzo
-
Vision Transformers: the threat of realistic adversarial patches
Kasper Cools, Clara Maathuis, Alexander M. van Oers, Claudia S. Hübner, Nikos Deligiannis, Marijke Vandewal, Geert De Cubber
-
Evading Overlapping Community Detection via Proxy Node Injection
Dario Loi, Matteo Silvestri, Fabrizio Silvestri, Gabriele Tolomei
-
No Prior, No Leakage: Revisiting Reconstruction Attacks in Trained Neural Networks
Yehonatan Refael, Guy Smorodinsky, Ofir Lindenbaum, Itay Safran
-
RedHerring Attack: Testing the Reliability of Attack Detection
Jonathan Rusert
-
Overcoming Black-box Attack Inefficiency with Hybrid and Dynamic Select Algorithms
Abhinay Shankar Belde, Rohit Ramkumar, Jonathan Rusert
-
Zero-Shot Privacy-Aware Text Rewriting via Iterative Tree Search
Shuo Huang, Xingliang Yuan, Gholamreza Haffari, Lizhen Qu
-
Weixuan Wang, Minghao Wu, Barry Haddow, Alexandra Birch
-
Learning the Wrong Lessons: Syntactic-Domain Spurious Correlations in Language Models
Chantal Shaib, Vinith M. Suriyakumar, Levent Sagun, Byron C. Wallace, Marzyeh Ghassemi
-
Jieli Zhu, Vi Ngoc-Nha Tran
-
Bounds of Chain-of-Thought Robustness: Reasoning Steps, Embed Norms, and Beyond
Dingzirui Wang, Xuanliang Zhang, Keyan Xu, Qingfu Zhu, Wanxiang Che, Yang Deng
-
Wenkai Guo, Xuefeng Liu, Haolin Wang, Jianwei Niu, Shaojie Tang, Jing Yuan
-
CLUE: Conflict-guided Localization for LLM Unlearning Framework
Hang Chen, Jiaying Zhu, Xinyu Yang, Wenya Wang
-
Poisoning Prompt-Guided Sampling in Video Large Language Models
Yuxin Cao, Wei Song, Jingling Xue, Jin Song Dong
-
The Unanticipated Asymmetry Between Perceptual Optimization and Assessment
Jiabei Zhang, Qi Wang, Siyu Wu, Du Chen, Tianhe Wu
-
A Single Neuron Works: Precise Concept Erasure in Text-to-Image Diffusion Models
Qinqin He, Jiaqi Weng, Jialing Tao, Hui Xue
-
The Unwinnable Arms Race of AI Image Detection
Till Aczel, Lorenzo Vettor, Andreas Plesner, Roger Wattenhofer
-
FERD: Fairness-Enhanced Data-Free Robustness Distillation
Zhengxiao Li, Liming Lu, Xu Zheng, Siyuan Liang, Zhenghan Chen, Yongbin Zhou, Shuchao Pang
-
Sparse Representations Improve Adversarial Robustness of Neural Network Classifiers
Killian Steunou, Sigurd Saue, Théo Druilhe
-
The Impact of Audio Watermarking on Audio Anti-Spoofing Countermeasures
Zhenshan Zhang, Xueping Zhang, Yechen Wang, Liwei Jin, Ming Li
-
FORCE: Transferable Visual Jailbreaking Attacks via Feature Over-Reliance CorrEction
Runqi Lin, Alasdair Paren, Suqin Yuan, Muyang Li, Philip Torr, Adel Bibi, Tongliang Liu
-
EvoMail: Self-Evolving Cognitive Agents for Adaptive Spam and Phishing Email Defense
Wei Huang, De-Tian Chu, Lin-Yuan Bai, Wei Kang, Hai-Tao Zhang, Bo Li, Zhi-Mo Han, Jing Ge, Hai-Feng Lin
-
Optimal Robust Recourse with $L^p$-Bounded Model Change
Phone Kyaw, Kshitij Kayastha, Shahin Jabbari
-
Cryptographic Backdoor for Neural Networks: Boon and Bane
Anh Tu Ngo, Anupam Chattopadhyay, Subhamoy Maitra
-
Are Modern Speech Enhancement Systems Vulnerable to Adversarial Attacks?
Rostislav Makarov, Lea Schönherr, Timo Gerkmann
-
RLCracker: Exposing the Vulnerability of LLM Watermarks with Adaptive RL Attacks
Hanbo Huang, Yiran Zhang, Hao Zheng, Xuan Gong, Yihan Li, Lin Liu, Shiyu Liang
-
TRiCo: Triadic Game-Theoretic Co-Training for Robust Semi-Supervised Learning
Hongyang He, Xinyuan Song, Yangfan He, Zeyu Zhang, Yanshu Li, Haochen You, Lifan Sun, Wenqiao Zhang
-
Saurabh Kataria, Davood Fattahi, Minxiao Wang, Ran Xiao, Matthew Clark, Timothy Ruchti, Mark Mai, Xiao Hu
-
Functional Encryption in Secure Neural Network Training: Data Leakage and Practical Mitigations
Alexandru Ioniţă, Andreea Ioniţă
-
Aurosweta Mahapatra, Ismail Rasim Ulgen, Berrak Sisman
-
Bidirectional Intention Inference Enhances LLMs' Defense Against Multi-Turn Jailbreak Attacks
Haibo Tong, Dongcheng Zhao, Guobin Shen, Xiang He, Dachuan Lin, Feifei Zhao, Yi Zeng
-
Responsible Diffusion: A Comprehensive Survey on Safety, Ethics, and Trust in Diffusion Models
Kang Wei, Xin Yuan, Fushuo Huo, Chuan Ma, Long Yuan, Songze Li, Ming Ding, Dacheng Tao
-
Huizhen Shu, Xuying Li, Zhuo Li
-
CON-QA: Privacy-Preserving QA using cloud LLMs in Contract Domain
Ajeet Kumar Singh, Rajsabi Surya, Anurag Tripathi, Santanu Choudhury, Sudhir Bisane
-
Steerable Adversarial Scenario Generation through Test-Time Preference Alignment
Tong Nie, Yuewen Mei, Yihong Tang, Junlin He, Jie Sun, Haotian Shi, Wei Ma, Jian Sun
-
bi-GRPO: Bidirectional Optimization for Jailbreak Backdoor Injection on LLMs
Wence Ji, Jiancan Wu, Aiying Li, Shuyi Zhang, Junkang Wu, An Zhang, Xiang Wang, Xiangnan He
-
Zhixiao Wu, Yao Lu, Jie Wen, Hao Sun, Qi Zhou, Guangming Lu
-
Lubos Mjachky, Ivan Homoliak
-
Beyond Sharp Minima: Robust LLM Unlearning via Feedback-Guided Multi-Point Optimization
Wenhan Wu, Zheyuan Liu, Chongyang Gao, Ren Wang, Kaize Ding
-
RAG Security and Privacy: Formalizing the Threat Model and Attack Surface
Atousa Arzanipour, Rouzbeh Behnia, Reza Ebrahimi, Kaushik Dutta
-
Benchmarking Gaslighting Attacks Against Speech Large Language Models
Jinyang Wu, Bin Zhu, Xiandong Zou, Qiquan Zhang, Xu Fang, Pan Zhou
-
Yixun Zhang, Feng Zhou, Jianqin Yin
-
FreezeVLA: Action-Freezing Attacks against Vision-Language-Action Models
Xin Wang, Jie Li, Zejia Weng, Yixu Wang, Yifeng Gao, Tianyu Pang, Chao Du, Yan Teng, Yingchun Wang, Zuxuan Wu, Xingjun Ma, Yu-Gang Jiang
-
Zhifang Zhang, Jiahan Zhang, Shengjie Zhou, Qi Wei, Shuo He, Feng Liu, Lei Feng
-
Xuekang Zhu, Ji-Zhe Zhou, Kaiwen Feng, Chenfan Qu, Yunfei Wang, Liting Zhou, Jian liu
-
Smaller is Better: Enhancing Transparency in Vehicle AI Systems via Pruning
Sanish Suwal, Shaurya Garg, Dipkamal Bhusal, Michael Clifford, Nidhi Rastogi
-
Universal Camouflage Attack on Vision-Language Models for Autonomous Driving
Dehong Kong, Sifan Yu, Siyuan Liang, Jiawei Liang, Jianhou Gan, Aishan Liu, Wenqi Ren
-
Puning Zhao, Zhikun Zhang, Bo Sun, Li Shen, Liang Zhang, Shaowei Wang, Zhe Liu
-
On the Fragility of Contribution Score Computation in Federated Learning
Balazs Pejo, Marcell Frank, Krisztian Varga, Peter Veliczky
-
Generative Model Inversion Through the Lens of the Manifold Hypothesis
Xiong Peng, Bo Han, Fengfei Yu, Tongliang Liu, Feng Liu, Mingyuan Zhou
-
Staying on the Manifold: Geometry-Aware Noise Injection
Albert Kjøller Jacobsen, Johanna Marie Gegenfurtner, Georgios Arvanitidis
-
Monitoring Violations of Differential Privacy over Time
Önder Askin, Tim Kutta, Holger Dette
-
FlyTrap: Physical Distance-Pulling Attack Towards Camera-based Autonomous Target Tracking Systems
Shaoyuan Xie, Mohamad Habib Fakih, Junchi Lu, Fayzah Alshammari, Ningfei Wang, Takami Sato, Halima Bouzidi, Mohammad Abdullah Al Faruque, Qi Alfred Chen
-
Are Neural Networks Collision Resistant?
Marco Benedetti, Andrej Bogdanov, Enrico M. Malatesta, Marc Mézard, Gianmarco Perrupato, Alon Rosen, Nikolaj I. Schwartzbach, Riccardo Zecchina
-
Tharcisse Ndayipfukamiye, Jianguo Ding, Doreen Sebastian Sarwatt, Adamu Gaston Philipo, Huansheng Ning
-
Understanding and Improving Adversarial Robustness of Neural Probabilistic Circuits
Weixin Chen, Han Zhao
-
Perspectra: Choosing Your Experts Enhances Critical Thinking in Multi-Agent Research Ideation
Yiren Liu, Viraj Shah, Sangho Suh, Pao Siangliulue, Tal August, Yun Huang
-
Every Character Counts: From Vulnerability to Defense in Phishing Detection
Maria Chiper, Radu Tudor Ionescu
-
Bridging Privacy and Utility: Synthesizing anonymized EEG with constraining utility functions
Kay Fuhrmeister, Arne Pelzer, Fabian Radke, Julia Lechinger, Mahzad Gharleghi, Thomas Köllmer, Insa Wolf
-
Efficiently Attacking Memorization Scores
Tue Do, Varun Chandrasekaran, Daniel Alabi
-
Differential Privacy of Network Parameters from a System Identification Perspective
Andrew Campbell, Anna Scaglione, Hang Liu, Victor Elvira, Sean Peisert, Daniel Arnold
-
Ren-Yi Huang, Dumindu Samaraweera, Prashant Shekhar, J. Morris Chang
-
JaiLIP: Jailbreaking Vision-Language Models via Loss Guided Image Perturbation
Md Jueal Mia, M. Hadi Amini
-
Dynamic Dual-level Defense Routing for Continual Adversarial Training
Wenxuan Wang, Chenglei Wang, Xuelin Qian
-
SafeSteer: Adaptive Subspace Steering for Efficient Jailbreak Defense in Vision-Language Models
Xiyu Zeng, Siyuan Liang, Liming Lu, Haotian Zhu, Enguang Liu, Jisheng Dang, Yongbin Zhou, Shuchao Pang
-
Large Language Models for Real-World IoT Device Identification
Rameen Mahmood, Tousif Ahmed, Sai Teja Peddinti, Danny Yuxing Huang
-
TIMED: Adversarial and Autoregressive Refinement of Diffusion-Based Time Series Generation
MohammadReza EskandariNasab, Shah Muhammad Hamdi, Soukaina Filali Boubrahimi
-
The Pareto Frontier of Resilient Jet Tagging
Rikab Gambhir, Matt LeBlanc, Yuanchen Zhou
-
Stochastic Path Planning in Correlated Obstacle Fields
Li Zhou, Elvan Ceyhan
-
Improving Credit Card Fraud Detection through Transformer-Enhanced GAN Oversampling
Kashaf Ul Emaan
-
The Secret Agenda: LLMs Strategically Lie and Our Current Safety Tools Are Blind
Caleb DeLeeuw, Gaurav Chawla, Aniket Sharma, Vanessa Dietze
-
Defending against Stegomalware in Deep Neural Networks with Permutation Symmetry
Birk Torpmann-Hagen, Michael A. Riegler, Pål Halvorsen, Dag Johansen
-
Why Speech Deepfake Detectors Won't Generalize: The Limits of Detection in an Open World
Visar Berisha, Prad Kadambi, Isabella Lenz
-
SAEmnesia: Erasing Concepts in Diffusion Models with Sparse Autoencoders
Enrico Cassano, Riccardo Renzulli, Marco Nurisso, Mirko Zaffaroni, Alan Perotti, Marco Grangetto
-
Localizing Adversarial Attacks To Produces More Imperceptible Noise
Pavan Reddy, Aditya Sanjay Gujral
-
Diversity Boosts AI-Generated Text Detection
Advik Raj Basani, Pin-Yu Chen
-
Uncovering Privacy Vulnerabilities through Analytical Gradient Inversion Attacks
Tamer Ahmed Eltaras, Qutaibah Malluhi, Alessandro Savino, Stefano Di Carlo, Adnan Qayyum
-
Rule Encoding and Compliance in Large Language Models: An Information-Theoretic Analysis
Joachim Diederich
-
DevFD: Developmental Face Forgery Detection by Learning Shared and Orthogonal LoRA Subspaces
Tianshuo Zhang, Li Gao, Siran Peng, Xiangyu Zhu, Zhen Lei
-
Is It Certainly a Deepfake? Reliability Analysis in Detection & Generation Ecosystem
Neslihan Kose, Anthony Rhodes, Umur Aybars Ciftci, Ilke Demir
-
Distributionally Robust Safety Verification of Neural Networks via Worst-Case CVaR
Masako Kishida
-
Privacy in Action: Towards Realistic Privacy Mitigation and Evaluation for LLM-Powered Agents
Shouju Wang, Fenglin Yu, Xirui Liu, Xiaoting Qin, Jue Zhang, Qingwei Lin, Dongmei Zhang, Saravan Rajmohan
-
Saeid Sheikhi, Panos Kostakos, Lauri Loven
-
Strategic Dishonesty Can Undermine AI Safety Evaluations of Frontier LLM
Alexander Panfilov, Evgenii Kortukov, Kristina Nikolić, Matthias Bethge, Sebastian Lapuschkin, Wojciech Samek, Ameya Prabhu, Maksym Andriushchenko, Jonas Geiping
-
D-REX: A Benchmark for Detecting Deceptive Reasoning in Large Language Models
Satyapriya Krishna, Andy Zou, Rahul Gupta, Eliot Krzysztof Jones, Nick Winter, Dan Hendrycks, J. Zico Kolter, Matt Fredrikson, Spyros Matsoukas
-
An Unlearning Framework for Continual Learning
Sayanta Adhikari, Vishnuprasadh Kumaravelu, P. K. Srijith
-
Budgeted Adversarial Attack against Graph-Based Anomaly Detection in Sensor Networks
Sanju Xaviar, Omid Ardakanian
-
SilentStriker:Toward Stealthy Bit-Flip Attacks on Large Language Models
Haotian Xu, Qingsong Peng, Jie Shi, Huadi Zheng, Yu Li, Cheng Zhuo
-
Lipschitz-Based Robustness Certification for Recurrent Neural Networks via Convex Relaxation
Paul Hamelbeck, Johannes Schiffer
-
Shilling Recommender Systems by Generating Side-feature-aware Fake User Profiles
Yuanrong Wang, Yingpeng Du
-
TextCrafter: Optimization-Calibrated Noise for Defending Against Text Embedding Inversion
Duoxun Tang, Xinhang Jiang, Jiajun Niu
-
B-Privacy: Defining and Enforcing Privacy in Weighted Voting
Samuel Breckenridge, Dani Vilardell, Andrés Fábrega, Amy Zhao, Patrick McCorry, Rafael Solari, Ari Juels
-
Synth-MIA: A Testbed for Auditing Privacy Leakage in Tabular Data Synthesis
Joshua Ward, Xiaofeng Lin, Chi-Hua Wang, Guang Cheng
-
Quickest Change Detection in Continuous-Time in Presence of a Covert Adversary
Amir Reza Ramtin, Philippe Nain, Don Towsley
-
Yu-Kai Shih, You-Kai Kang
-
Shilling Recommender Systems by Generating Side-feature-aware Fake User Profiles
Yuanrong Wang, Yingpeng Du
-
TextCrafter: Optimization-Calibrated Noise for Defending Against Text Embedding Inversion
Duoxun Tang, Xinhang Jiang, Jiajun Niu
-
Ilham Wicaksono, Zekun Wu, Rahul Patel, Theo King, Adriano Koshiyama, Philip Treleaven
-
AdaptiveGuard: Towards Adaptive Runtime Safety for LLM-Powered Software
Rui Yang, Michael Fu, Chakkrit Tantithamthavorn, Chetan Arora, Gunel Gulmammadova, Joey Chua
-
Jiahe Qian, Yaoyu Fang, Ziqiao Weng, Xinkun Wang, Lee A. Cooper, Bo Zhou
-
Localizing Malicious Outputs from CodeLLM
Mayukh Borana, Junyi Liang, Sai Sathiesh Rajan, Sudipta Chattopadhyay
-
SVeritas: Benchmark for Robust Speaker Verification under Diverse Conditions
Massa Baali, Sarthak Bisht, Francisco Teixeira, Kateryna Shapovalenko, Rita Singh, Bhiksha Raj
-
TraceHiding: Scalable Machine Unlearning for Mobility Data
Ali Faraji, Manos Papagelis
-
Xuan Chen, Shiwei Feng, Zikang Xiong, Shengwei An, Yunshu Mao, Lu Yan, Guanhong Tao, Wenbo Guo, Xiangyu Zhang
-
Seeing is Deceiving: Mirror-Based LiDAR Spoofing for Autonomous Vehicle Deception
Selma Yahia, Ildi Alla, Girija Bangalore Mohan, Daniel Rau, Mridula Singh, Valeria Loscri
-
Rui Yang, Michael Fu, Chakkrit Tantithamthavorn, Chetan Arora, Gunel Gulmammadova, Joey Chua
-
Lightweight MobileNetV1+GRU for ECG Biometric Authentication: Federated and Adversarial Evaluation
Dilli Hang Rai, Sabin Kafley
-
MARS: A Malignity-Aware Backdoor Defense in Federated Learning
Wei Wan, Yuxuan Ning, Zhicong Huang, Cheng Hong, Shengshan Hu, Ziqi Zhou, Yechao Zhang, Tianqing Zhu, Wanlei Zhou, Leo Yu Zhang
-
Multimodal Prompt Decoupling Attack on the Safety Filters in Text-to-Image Models
Xingkai Peng, Jun Jiang, Meng Tong, Shuai Li, Weiming Zhang, Nenghai Yu, Kejiang Chen
-
Xuan Chen, Shiwei Feng, Zikang Xiong, Shengwei An, Yunshu Mao, Lu Yan, Guanhong Tao, Wenbo Guo, Xiangyu Zhang
-
Can an Individual Manipulate the Collective Decisions of Multi-Agents?
Fengyuan Liu, Rui Zhao, Shuo Chen, Guohao Li, Philip Torr, Lei Han, Jindong Gu
-
Train to Defend: First Defense Against Cryptanalytic Neural Network Parameter Extraction Attacks
Ashley Kurian, Aydin Aysu
-
V-CECE: Visual Counterfactual Explanations via Conceptual Edits
Nikolaos Spanos, Maria Lymperaiou, Giorgos Filandrianos, Konstantinos Thomas, Athanasios Voulodimos, Giorgos Stamou
-
FakeChain: Exposing Shallow Cues in Multi-Step Deepfake Detection
Minji Heo, Simon S. Woo
-
MoRoVoc: A Large Dataset for Geographical Variation Identification of the Spoken Romanian Language
Andrei-Marius Avram, Ema-Ioana Bănescu, Anda-Teodora Robea, Dumitru-Clementin Cercel, Mihaela-Claudia Cercel
-
Hanting Li, Huaao Tang, Jianhong Han, Tianxiong Zhou, Jiulong Cui, Haizhen Xie, Yan Chen, Jie Hu
-
A Novel Metric for Detecting Memorization in Generative Models for Brain MRI Synthesis
Antonio Scardace, Lemuel Puglisi, Francesco Guarnera, Sebastiano Battiato, Daniele Ravì
-
ADVEDM:Fine-grained Adversarial Attack against VLM-based Embodied Agents
Yichen Wang, Hangtao Zhang, Hewen Pan, Ziqi Zhou, Xianlong Wang, Peijin Guo, Lulu Xue, Shengshan Hu, Minghui Li, Leo Yu Zhang
-
SOLAR: Switchable Output Layer for Accuracy and Robustness in Once-for-All Training
Shaharyar Ahmed Khan Tareen, Lei Fan, Xiaojing Yuan, Qin Lin, Bin Hu
-
FairTune: A Bias-Aware Fine-Tuning Framework Towards Fair Heart Rate Prediction from PPG
Lovely Yeswanth Panchumarthi, Saurabh Kataria, Yi Wu, Xiao Hu, Alex Fedorov, Hyunjung Gloria Kwak
-
Delving into Cryptanalytic Extraction of PReLU Neural Networks
Yi Chen, Xiaoyang Dong, Ruijie Ma, Yantian Shen, Anyu Wang, Hongbo Yu, Xiaoyun Wang
-
"Digital Camouflage": The LLVM Challenge in LLM-Based Malware Detection
Ekin Böke, Simon Torka
-
Stress Testing Deliberative Alignment for Anti-Scheming Training
Bronson Schoen, Evgenia Nitishinskaya, Mikita Balesni, Axel Højmark, Felix Hofstätter, Jérémy Scheurer, Alexander Meinke, Jason Wolfe, Teun van der Weij, Alex Lloyd, Nicholas Goldowsky-Dill, Angela Fan, Andrei Matveiakin, Rusheb Shah, Marcus Williams, Amelia Glaese, Boaz Barak, Wojciech Zaremba, Marius Hobbhahn
-
Krati Saxena, Federico Jurado Ruiz, Guido Manzi, Dianbo Liu, Alex Lamb
-
Reward Hacking Mitigation using Verifiable Composite Rewards
Mirza Farhan Bin Tarek, Rahmatollah Beheshti
-
Robust Vision-Language Models via Tensor Decomposition: A Defense Against Adversarial Attacks
Het Patel, Muzammil Allie, Qian Zhang, Jia Chen, Evangelos E. Papalexakis
-
DNA-DetectLLM: Unveiling AI-Generated Text via a DNA-Inspired Mutation-Repair Paradigm
Xiaowei Zhu, Yubing Ren, Fang Fang, Qingfeng Tan, Shi Wang, Yanan Cao
-
Sparse-Autoencoder-Guided Internal Representation Unlearning for Large Language Models
Tomoya Yamashita, Akira Ito, Yuuki Yamanaka, Masanori Yamada, Takayuki Miura, Toshiki Shibahara
-
SABER: Uncovering Vulnerabilities in Safety Alignment via Cross-Layer Residual Connection
Maithili Joshi, Palash Nandi, Tanmoy Chakraborty
-
Backdoor Mitigation via Invertible Pruning Masks
Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu, Raja Jurdak
-
PolyJuice Makes It Real: Black-Box, Universal Red Teaming for Synthetic Image Detectors
Sepehr Dehdashtian, Mashrur M. Morshed, Jacob H. Seidman, Gaurav Bharaj, Vishnu Naresh Boddeti
-
Zhangqi Jiang, Tingjin Luo, Xu Yang, Xinyan Liang
-
Randomized Smoothing Meets Vision-Language Models
Emmanouil Seferis, Changshun Wu, Stefanos Kollias, Saddek Bensalem, Chih-Hong Cheng
-
Zhengxing Li, Guangmingmei Yang, Jayaram Raghuram, David J. Miller, George Kesidis
-
Adversarially Robust Assembly Language Model for Packed Executables Detection
Shijia Li, Jiang Ming, Lanqing Liu, Longwei Yang, Ni Zhang, Chunfu Jia
-
Cuckoo Attack: Stealthy and Persistent Attacks Against AI-IDE
Xinpeng Liu, Junming Liu, Peiyu Liu, Han Zheng, Qinying Wang, Mathias Payer, Shouling Ji, Wenhai Wang
-
Inference Attacks on Encrypted Online Voting via Traffic Analysis
Anastasiia Belousova, Francesco Marchiori, Mauro Conti
-
Dongyang Zhan, Kai Tan, Lin Ye, Xiangzhan Yu, Hongli Zhang, Zheng He
-
Secure Confidential Business Information When Sharing Machine Learning Models
Yunfan Yang, Jiarong Xu, Hongzhe Zhang, Xiao Fang
-
Evaluating CxG Generalisation in LLMs via Construction-Based NLI Fine Tuning
Tom Mackintosh, Harish Tayyar Madabushi, Claire Bonial
-
Overfitting in Adaptive Robust Optimization
Karl Zhu, Dimitris Bertsimas
-
Davide Ettori, Nastaran Darabi, Sina Tayebati, Ranganath Krishnan, Mahesh Subedar, Omesh Tickoo, Amit Ranjan Trivedi
-
SynBench: A Benchmark for Differentially Private Text Generation
Yidan Sun, Viktor Schlegel, Srinivasan Nandakumar, Iqra Zahid, Yuping Wu, Yulong Wu, Hao Li, Jie Zhang, Warren Del-Pinto, Goran Nenadic, Siew Kei Lam, Anil Anthony Bharath
-
Enhancing Retrieval Augmentation via Adversarial Collaboration
Letian Zhang, Guanghao Meng, Xudong Ren, Yiming Wang, Shu-Tao Xia
-
Sentinel Agents for Secure and Trustworthy Agentic AI in Multi-Agent Systems
Diego Gosmar, Deborah A. Dahl
-
LLM Jailbreak Detection for (Almost) Free!
Guorui Chen, Yifan Xia, Xiaojun Jia, Zhijiang Li, Philip Torr, Jindong Gu
-
Enterprise AI Must Enforce Participant-Aware Access Control
Shashank Shreedhar Bhatt, Tanmay Rajore, Khushboo Aggarwal, Ganesh Ananthanarayanan, Ranveer Chandra, Nishanth Chandran, Suyash Choudhury, Divya Gupta, Emre Kiciman, Sumit Kumar Pandey, Srinath Setty, Rahul Sharma, Teijia Zhao
-
Adversarial Distilled Retrieval-Augmented Guarding Model for Online Malicious Intent Detection
Yihao Guo, Haocheng Bian, Liutong Zhou, Ze Wang, Zhaoyi Zhang, Francois Kawala, Milan Dean, Ian Fischer, Yuantao Peng, Noyan Tokgozoglu, Ivan Barrientos, Riyaaz Shaik, Rachel Li, Chandru Venkataraman, Reza Shifteh Far, Moses Pawar, Venkat Sundaranatha, Michael Xu, Frank Chu
-
Reveal and Release: Iterative LLM Unlearning with Self-generated Data
Linxi Xie, Xin Teng, Shichang Ke, Hongyi Wen, Shengjie Wang
-
Siyu Yan, Long Zeng, Xuecheng Wu, Chengcheng Han, Kongcheng Zhang, Chong Peng, Xuezhi Cao, Xunliang Cai, Chenjuan Guo
-
[Re] Improving Interpretation Faithfulness for Vision Transformers
Izabela Kurek, Wojciech Trejter, Stipe Frkovic, Andro Erdelez
-
Discrete optimal transport is a strong audio adversarial attack
Anton Selitskiy, Akib Shahriyar, Jishnuraj Prakasan
-
Vulnerable Agent Identification in Large-Scale Multi-Agent Reinforcement Learning
Simin Li, Zheng Yuwei, Zihao Mao, Linhao Wang, Ruixiao Xu, Chengdong Ma, Xin Yu, Yuqing Ma, Qi Dou, Xin Wang, Jie Luo, Bo An, Yaodong Yang, Weifeng Lv, Xianglong Liu
-
Watermarking and Anomaly Detection in Machine Learning Models for LORA RF Fingerprinting
Aarushi Mahajan, Wayne Burleson
-
Ruijie Hou, Yueyang Jiao, Hanxu Hu, Yingming Li, Wai Lam, Huajian Zhang, Hongyuan Lu
-
AIP: Subverting Retrieval-Augmented Generation via Adversarial Instructional Prompt
Saket S. Chaturvedi, Gaurav Bagwe, Lan Zhang, Xiaoyong Yuan
-
Edge-Aware Normalized Attention for Efficient and Detail-Preserving Single Image Super-Resolution
Penghao Rao, Tieyong Zeng
-
Geometric Image Synchronization with Deep Watermarking
Pierre Fernandez, Tomáš Souček, Nikola Jovanović, Hady Elsahar, Sylvestre-Alvise Rebuffi, Valeriu Lacatusu, Tuan Tran, Alexandre Mourachko
-
Xingchen Wang, Feijie Wu, Chenglin Miao, Tianchun Li, Haoyu Hu, Qiming Cao, Jing Gao, Lu Su
-
CUFG: Curriculum Unlearning Guided by the Forgetting Gradient
Jiaxing Miao, Liang Hu, Qi Zhang, Lai Zhong Yuan, Usman Naseem
-
STEP: Structured Training and Evaluation Platform for benchmarking trajectory prediction models
Julian F. Schumann, Anna Mészáros, Jens Kober, Arkady Zgonnikov
-
Yigit E. Yildirim, Samet Demir, Zafer Dogan
-
Yuanbo Xie, Yingjie Zhang, Tianyun Liu, Duohe Ma, Tingwen Liu
-
Evil Vizier: Vulnerabilities of LLM-Integrated XR Systems
Yicheng Zhang, Zijian Huang, Sophie Chen, Erfan Shayegani, Jiasi Chen, Nael Abu-Ghazaleh
-
Acoustic Simulation Framework for Multi-channel Replay Speech Detection
Michael Neri, Tuomas Virtanen
-
ORCA: Agentic Reasoning For Hallucination and Adversarial Robustness in Vision-Language Models
Chung-En Johnny Yu, Hsuan-Chih (Neil)Chen, Brian Jalaian, Nathaniel D. Bastian
-
Impact of Phonetics on Speaker Identity in Adversarial Voice Attack
Daniyal Kabir Dar, Qiben Yan, Li Xiao, Arun Ross
-
Toxicity Red-Teaming: Benchmarking LLM Safety in Singapore's Low-Resource Languages
Yujia Hu, Ming Shan Hee, Preslav Nakov, Roy Ka-Wei Lee
-
Red Teaming Multimodal Language Models: Evaluating Harm Across Prompt Modalities and Models
Madison Van Doren, Casey Ford, Emily Dix
-
Stochastic Sample Approximations of (Local) Moduli of Continuity
Rodion Nazarov, Allen Gehret, Robert Shorten, Jakub Marecek
-
Adversarial generalization of unfolding (model-based) networks
Vicky Kouni
-
Assessing metadata privacy in neuroimaging
Emilie Kibsgaard, Anita Sue Jwa, Christopher J Markiewicz, David Rodriguez Gonzalez, Judith Sainz Pardo, Russell A. Poldrack, Cyril R. Pernet
-
Benchmarking and Improving LLM Robustness for Personalized Generation
Chimaobi Okite, Naihao Deng, Kiran Bodipati, Huaidian Hou, Joyce Chai, Rada Mihalcea
-
Semantic Representation Attack against Aligned Large Language Models
Jiawei Lian, Jianhong Pan, Lefan Wang, Yi Wang, Shaohui Mei, Lap-Pui Chau
-
Adversarial generalization of unfolding (model-based) networks
Vicky Kouni
-
DSCC-HS: A Dynamic Self-Reinforcing Framework for Hallucination Suppression in Large Language Models
Xiao Zheng
-
Scrub It Out! Erasing Sensitive Memorization in Code Language Models via Machine Unlearning
Zhaoyang Chu, Yao Wan, Zhikun Zhang, Di Wang, Zhou Yang, Hongyu Zhang, Pan Zhou, Xuanhua Shi, Hai Jin, David Lo
-
Differential Privacy in Federated Learning: Mitigating Inference Attacks with Randomized Response
Ozer Ozturk, Busra Buyuktanir, Gozde Karatas Baydogmus, Kazim Yildiz
-
Privacy-Aware In-Context Learning for Large Language Models
Bishnu Bhusal, Manoj Acharya, Ramneet Kaur, Colin Samplawski, Anirban Roy, Adam D. Cobb, Rohit Chadha, Susmit Jha
-
StyleProtect: Safeguarding Artistic Identity in Fine-tuned Diffusion Models
Qiuyu Tang, Joshua Krinsky, Aparna Bharati
-
Wenkui Yang, Jie Cao, Junxian Duan, Ran He
-
Niruthiha Selvanayagam, Ted Kurti
-
Secure UAV-assisted Federated Learning: A Digital Twin-Driven Approach with Zero-Knowledge Proofs
Md Bokhtiar Al Zami, Md Raihan Uddin, Dinh C. Nguyen
-
ParaAegis: Parallel Protection for Flexible Privacy-preserved Federated Learning
Zihou Wu, Yuecheng Li, Tianchi Liao, Jian Lou, Chuan Chen
-
Differentially private federated learning for localized control of infectious disease dynamics
Raouf Kerkouche, Henrik Zunker, Mario Fritz, Martin J. Kühn
-
Defending Diffusion Models Against Membership Inference Attacks via Higher-Order Langevin Dynamics
Benjamin Sterling, Yousef El-Laham, Mónica F. Bugallo
-
Mert Gürbüzbalaban, Yasa Syed, Necdet Serhat Aybat
-
Baolei Zhang, Haoran Xin, Yuxi Chen, Zhuqing Liu, Biao Yi, Tong Li, Lihai Nie, Zheli Liu, Minghong Fang
-
Cybersecurity AI: Humanoid Robots as Attack Vectors
Víctor Mayoral-Vilches
-
VCBench: Benchmarking LLMs in Venture Capital
Rick Chen, Joseph Ternasky, Afriyie Samuel Kwesi, Ben Griffin, Aaron Ontoyin Yin, Zakari Salifu, Kelvin Amoaba, Xianling Mu, Fuat Alican, Yigit Ihlamur
-
A Simple and Efficient Jailbreak Method Exploiting LLMs' Helpfulness
Xuan Luo, Yue Wang, Zefeng He, Geng Tu, Jing Li, Ruifeng Xu
-
RLBind: Adversarial-Invariant Cross-Modal Alignment for Unified Robust Embeddings
Yuhong Lu
-
Bihao Zhan, Jie Zhou, Junsong Li, Yutao Yang, Shilian Chen, Qianjun Pan, Xin Li, Wen Wu, Xingjiao Wu, Qin Chen, Hang Yan, Liang He
-
RepIt: Representing Isolated Targets to Steer Language Models
Vincent Siu, Nathan W. Henry, Nicholas Crispino, Yang Liu, Dawn Song, Chenguang Wang
-
DisorientLiDAR: Physical Attacks on LiDAR-based Localization
Yizhen Lao, Yu Zhang, Ziting Wang, Chengbo Wang, Yifei Xue, Wanpeng Shao
-
CIARD: Cyclic Iterative Adversarial Robustness Distillation
Liming Lu, Shuchao Pang, Xu Zheng, Xiang Gu, Anan Du, Yunhuai Liu, Yongbin Zhou
-
A Systematic Evaluation of Parameter-Efficient Fine-Tuning Methods for the Security of Code LLMs
Kiho Lee, Jungkon Kim, Doowon Kim, Hyoungshick Kim
-
Jinjie Shen, Yaxiong Wang, Lechao Cheng, Nan Pu, Zhun Zhong
-
Defense-to-Attack: Bypassing Weak Defenses Enables Stronger Jailbreaks in Vision-Language Models
Yunhan Zhao, Xiang Zheng, Xingjun Ma
-
Jailbreaking Large Language Models Through Content Concretization
Johan Wahréus, Ahmed Hussain, Panos Papadimitratos
-
Sy-FAR: Symmetry-based Fair Adversarial Robustness
Haneen Najjar, Eyal Ronen, Mahmood Sharif
-
MIA-EPT: Membership Inference Attack via Error Prediction for Tabular Data
Eyal German, Daniel Samira, Yuval Elovici, Asaf Shabtai
-
JANUS: A Dual-Constraint Generative Framework for Stealthy Node Injection Attacks
Jiahao Zhang, Xiaobing Pei, Zhaokun Zhong, Wenqiang Hao, Zhenghao Tang
-
Shaz Furniturewala, Arkaitz Zubiaga
-
Empowering LLMs with Parameterized Skills for Adversarial Long-Horizon Planning
Sijia Cui, Shuai Xu, Aiyao He, Yanna Wang, Bo Xu
-
Do Natural Language Descriptions of Model Activations Convey Privileged Information?
Millicent Li, Alberto Mario Ceballos Arroyo, Giordano Rogers, Naomi Saphra, Byron C. Wallace
-
When Inverse Data Outperforms: Exploring the Pitfalls of Mixed Data in Multi-Stage Fine-Tuning
Mengyi Deng, Xin Li, Tingyu Zhu, Zhicheng Yang, Zhijiang Guo, Wei Wang
-
Agent4FaceForgery: Multi-Agent LLM Framework for Realistic Face Forgery Detection
Yingxin Lai, Zitong Yu, Jun Wang, Linlin Shen, Yong Xu, Xiaochun Cao
-
End4: End-to-end Denoising Diffusion for Diffusion-Based Inpainting Detection
Fei Wang, Xuecheng Wu, Zheng Zhang, Danlei Huang, Yuheng Huang, BoWang
-
Sanjeda Akter, Ibne Farabi Shihab, Anuj Sharma
-
BAPFL: Exploring Backdoor Attacks Against Prototype-based Federated Learning
Honghong Zeng, Jiong Lou, Zhe Wang, Hefeng Zhou, Chentao Wu, Wei Zhao, Jie Li
-
On the Out-of-Distribution Backdoor Attack for Federated Learning
Jiahao Xu, Zikai Zhang, Rui Hu
-
Zhen Li, Zijian Zhang, Wenjin Yang, Pengbo Wang, Zhaoqi Wang, Meng Li, Yan Wu, Xuyang Liu, Jing Sun, Liehuang Zhu
-
Bridging Threat Models and Detections: Formal Verification via CADP
Dumitru-Bogdan Prelipcean, Cătălin Dima
-
Artem Savkin, Thomas Lapotre, Kevin Strauss, Uzair Akbar, Federico Tombari
-
Valuation of Exotic Options and Counterparty Games Based on Conditional Diffusion
Helin Zhao, Junchi Shen
-
Onat Gungor, Roshan Sood, Harold Wang, Tajana Rosing
-
FedMentor: Domain-Aware Differential Privacy for Heterogeneous Federated LLMs in Mental Health
Nobin Sarwar, Shubhashis Roy Dipta
-
Beyond Data Privacy: New Privacy Risks for Large Language Models
Yuntao Du, Zitao Li, Ninghui Li, Bolin Ding
-
Vaidehi Patil, Elias Stengel-Eskin, Mohit Bansal
-
A Multi-Agent LLM Defense Pipeline Against Prompt Injection Attacks
S M Asif Hossain, Ruksat Khan Shayoni, Mohd Ruhul Ameen, Akif Islam, M. F. Mridha, Jungpil Shin
-
Towards mitigating information leakage when evaluating safety monitors
Gerard Boxo, Aman Neelappa, Shivam Raval
-
SteeringSafety: A Systematic Safety Evaluation Framework of Representation Steering in LLMs
Vincent Siu, Nicholas Crispino, David Park, Nathan W. Henry, Zhun Wang, Yang Liu, Dawn Song, Chenguang Wang
-
Wei Cai, Shujuan Liu, Jian Zhao, Ziyan Shi, Yusheng Zhao, Yuchen Yuan, Tianle Zhang, Chi Zhang, Xuelong Li
-
Inducing Uncertainty for Test-Time Privacy
Muhammad H. Ashiq, Peter Triantafillou, Hung Yun Tseng, Grigoris G. Chrysos
-
Reasoned Safety Alignment: Ensuring Jailbreak Defense via Answer-Then-Check
Chentao Cao, Xiaojun Xu, Bo Han, Hang Li
-
Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning
Filip Sondej, Yushi Yang
-
Navid Hashemi, Samuel Sasaki, Diego Manzanas Lopez, Ipek Oguz, Meiyi Ma, Taylor T. Johnson
-
James C. Ward, Alex Bott, Connor York, Edmund R. Hunt
-
Poison to Detect: Detection of Targeted Overfitting in Federated Learning
Soumia Zohra El Mestari, Maciej Krzysztof Zuziak, Gabriele Lenzini
-
Beyond PII: How Users Attempt to Estimate and Mitigate Implicit LLM Inference
Synthia Wang, Sai Teja Peddinti, Nina Taft, Nick Feamster
-
A Controllable 3D Deepfake Generation Framework with Gaussian Splatting
Wending Liu, Siyun Liang, Huy H. Nguyen, Isao Echizen
-
Robust Concept Erasure in Diffusion Models: A Theoretical Perspective on Security and Robustness
Zixuan Fu, Yan Ren, Finn Carter, Chenyue Wen, Le Ku, Daheng Yu, Emily Davis, Bo Zhang
-
DRAG: Data Reconstruction Attack using Guided Diffusion
Wa-Kin Lei, Jun-Cheng Chen, Shang-Tse Chen
-
DARD: Dice Adversarial Robustness Distillation against Adversarial Attacks
Jing Zou, Shungeng Zhang, Meikang Qiu, Chong Li
-
From Autoencoders to CycleGAN: Robust Unpaired Face Manipulation via Adversarial Learning
Collin Guo
-
Removal Attack and Defense on AI-generated Content Latent-based Watermarking
De Zhang Lee, Han Fang, Hanyi Wang, Ee-Chien Chang
-
A Practical Adversarial Attack against Sequence-based Deep Learning Malware Classifiers
Kai Tan, Dongyang Zhan, Lin Ye, Hongli Zhang, Binxing Fang
-
NeuroStrike: Neuron-Level Attacks on Aligned LLMs
Lichao Wu, Sasha Behrouzi, Mohamadreza Rostami, Maximilian Thang, Stjepan Picek, Ahmad-Reza Sadeghi
-
Efficient Byzantine-Robust Privacy-Preserving Federated Learning via Dimension Compression
Xian Qin, Xue Yang, Xiaohu Tang
-
MORABLES: A Benchmark for Assessing Abstract Moral Reasoning in LLMs with Fables
Matteo Marcuzzo, Alessandro Zangari, Andrea Albarelli, Jose Camacho-Collados, Mohammad Taher Pilehvar
-
Geometric Red-Teaming for Robotic Manipulation
Divyam Goel, Yufei Wang, Tiancheng Wu, Guixiu Qiao, Pavel Piliptchak, David Held, Zackory Erickson
-
Amulet: a Python Library for Assessing Interactions Among ML Defenses and Risks
Asim Waheed, Vasisht Duddu, Rui Zhang, Sebastian Szyller, N. Asokan
-
Phi: Preference Hijacking in Multi-modal Large Language Models at Inference Time
Yifan Lan, Yuanpu Cao, Weitong Zhang, Lu Lin, Jinghui Chen
-
Secure Human Oversight of AI: Exploring the Attack Surface of Human Oversight
Jonas C. Ditz, Veronika Lazar, Elmar Lichtmeß, Carola Plesch, Matthias Heck, Kevin Baum, Markus Langer
-
Redefining Website Fingerprinting Attacks With Multiagent LLMs
Chuxu Song, Dheekshith Dev Manohar Mekala, Hao Wang, Richard Martin
-
Gustavo Sandoval, Denys Fenchenko, Junyao Chen
-
Free-MAD: Consensus-Free Multi-Agent Debate
Yu Cui, Hang Fu, Haibin Zhang, Licheng Wang, Cong Zuo
-
Membership Inference Attacks on Recommender System: A Survey
Jiajie He, Yuechun Gu, Keke Chen, Xintong Chen
-
ENJ: Optimizing Noise with Genetic Algorithms to Jailbreak LSMs
Yibo Zhang, Liang Lin
-
Feature Space Topology Control via Hopkins Loss
Einari Vaaras, Manu Airaksinen
-
Simin Chen, Jinjun Peng, Yixin He, Junfeng Yang, Baishakhi Ray
-
From Firewalls to Frontiers: AI Red-Teaming is a Domain-Specific Evolution of Cyber Red-Teaming
Anusha Sinha, Keltin Grimes, James Lucassen, Michael Feffer, Nathan VanHoudnos, Zhiwei Steven Wu, Hoda Heidari
-
When Smiley Turns Hostile: Interpreting How Emojis Trigger LLMs' Toxicity
Shiyao Cui, Xijia Feng, Yingkang Wang, Junxiao Yang, Zhexin Zhang, Biplab Sikdar, Hongning Wang, Han Qiu, Minlie Huang
-
RanAT4BIE: Random Adversarial Training for Biomedical Information Extraction
Jian Chen, Shengyi Lv, Leilei Su
-
Beyond Sliders: Mastering the Art of Diffusion-based Image Manipulation
Yufei Tang, Daiheng Gao, Pingyu Wu, Wenbo Zhou, Bang Zhang, Weiming Zhang
-
Gao Yu Lee, Tanmoy Dam, Md Meftahul Ferdaus, Daniel Puiu Poenar, Vu N.Duong
-
Realistic Environmental Injection Attacks on GUI Agents
Yitong Zhang, Ximo Li, Liyi Cai, Jia Li
-
Stabilizing Data-Free Model Extraction
Dat-Thinh Nguyen, Kim-Hung Le, Nhien-An Le-Khac
-
On the Escaping Efficiency of Distributed Adversarial Training Algorithms
Ying Cao, Kun Yuan, Ali H. Sayed
-
Qingzhao Zhang, Shaocheng Luo, Z. Morley Mao, Miroslav Pajic, Michael K. Reiter
-
Doan Minh Trung, Tien Duc Anh Hao, Luong Hoang Minh, Nghi Hoang Khoa, Nguyen Tan Cam, Van-Hau Pham, Phan The Duy
-
Tao Wang, Yushu Zhang, Xiangli Xiao, Kun Xu, Lin Yuan, Wenying Wen, Yuming Fang
-
MAUI: Reconstructing Private Client Data in Federated Transfer Learning
Ahaan Dabholkar, Atul Sharma, Z. Berkay Celik, Saurabh Bagchi
-
Syed Emad Uddin Shubha, Tasnuva Farheen
-
Hybrid Quantum-Classical Model for Image Classification
Muhammad Adnan Shahzad
-
Self-Evolving LLMs via Continual Instruction Tuning
Jiazheng Kang, Le Huang, Cheng Hou, Zhe Zhao, Zhenxiang Yan, Chuan Shi, Ting Bai
-
Pathological Truth Bias in Vision-Language Models
Yash Thube
-
Harmful Prompt Laundering: Jailbreaking LLMs with Abductive Styles and Symbolic Encoding
Seongho Joo, Hyukhun Koh, Kyomin Jung
-
Public Data Assisted Differentially Private In-Context Learning
Seongho Joo, Hyukhun Koh, Kyomin Jung
-
Farhan Sadik, Christopher L. Newman, Stuart J. Warden, Rachel K. Surowiec
-
Robustifying Diffusion-Denoised Smoothing Against Covariate Shift
Ali Hedayatnia, Mostafa Tavassolipour, Babak Nadjar Araabi, Abdol-Hossein Vahabie
-
A Modern Look at Simplicity Bias in Image Classification Tasks
Xiaoguang Chang, Teng Wang, Changyin Sun
-
A Biosecurity Agent for Lifecycle LLM Biosecurity Alignment
Meiyin Meng, Zaixi Zhang
-
Hailong Yang, Renhuo Zhao, Guanjin Wang, Zhaohong Deng
-
Adaptive Token Merging for Efficient Transformer Semantic Communication at the Edge
Omar Erak, Omar Alhussein, Hatem Abou-Zeid, Mehdi Bennis, Sami Muhaidat
-
Adversarial robustness through Lipschitz-Guided Stochastic Depth in Neural Networks
Laith Nayal, Mahmoud Mousatat, Bader Rasheed
-
Immunizing Images from Text to Image Editing via Adversarial Cross-Attention
Matteo Trippodo, Federico Becattini, Lorenzo Seidenari
-
Mohammad Hasan Narimani, Mostafa Tavassolipour
-
Prompt Injection Attacks on LLM Generated Reviews of Scientific Publications
Janis Keuper
-
When Your Reviewer is an LLM: Biases, Divergence, and Prompt Injection Risks in Peer Review
Changjia Zhu, Junjie Xiong, Renkai Ma, Zhicong Lu, Yao Liu, Lingyao Li
-
Machine Unlearning for Responsible and Adaptive AI in Education
Betty Mayeku, Sandra Hummel, Parisa Memarmoshrefi
-
LLM in the Middle: A Systematic Review of Threats and Mitigations to Real-World LLM-based Systems
Vitor Hugo Galhardo Moia, Igor Jochem Sanz, Gabriel Antonio Fontes Rebello, Rodrigo Duarte de Meneses, Briland Hitaj, Ulf Lindqvist
-
Privacy-Preserving Decentralized Federated Learning via Explainable Adaptive Differential Privacy
Fardin Jalil Piran, Zhiling Chen, Yang Zhang, Qianyu Zhou, Jiong Tang, Farhad Imani
-
Jingyu Tang, Chaoran Chen, Jiawen Li, Zhiping Zhang, Bingcan Guo, Ibrahim Khalilov, Simret Araya Gebreegziabher, Bingsheng Yao, Dakuo Wang, Yanfang Ye, Tianshi Li, Ziang Xiao, Yaxing Yao, Toby Jia-Jun Li
-
Safety and Security Analysis of Large Language Models: Risk Profile and Harm Potential
Charankumar Akiri, Harrison Simpson, Kshitiz Aryal, Aarav Khanna, Maanak Gupta
-
Side-channel Inference of User Activities in AR/VR Using GPU Profiling
Seonghun Son, Chandrika Mukherjee, Reham Mohamed Aburas, Berk Gulmezoglu, Z. Berkay Celik
-
JU-NLP at Touché: Covert Advertisement in Conversational AI-Generation and Detection Strategies
Arka Dutta, Agrik Majumdar, Sombrata Biswas, Dipankar Das, Sivaji Bandyopadhyay
-
Enabling Regulatory Multi-Agent Collaboration: Architecture, Challenges, and Solutions
Qinnan Hu, Yuntao Wang, Yuan Gao, Zhou Su, Linkang Du
-
Towards Confidential and Efficient LLM Inference with Dual Privacy Protection
Honglan Yu, Yibin Wang, Feifei Dai, Dong Liu, Haihui Fan, Xiaoyan Gu
-
Character-Level Perturbations Disrupt LLM Watermarks
Zhaoxi Zhang, Xiaomei Zhang, Yanjun Zhang, He Zhang, Shirui Pan, Bo Liu, Asif Qumer Gill, Leo Yu Zhang
-
Prompt Pirates Need a Map: Stealing Seeds helps Stealing Prompts
Felix Mächtle, Ashwath Shetty, Jonas Sander, Nils Loose, Sören Pirk, Thomas Eisenbarth
-
OpenFake: An Open Dataset and Platform Toward Large-Scale Deepfake Detection
Victor Livernoche, Akshatha Arodi, Andreea Musulan, Zachary Yang, Adam Salvail, Gaétan Marceau Caron, Jean-François Godbout, Reihaneh Rabbany
-
Steering MoE LLMs via Expert (De)Activation
Mohsen Fayyaz, Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Ryan Rossi, Trung Bui, Hinrich Schütze, Nanyun Peng
-
Balancing Utility and Privacy: Dynamically Private SGD with Random Projection
Zhanhong Jiang, Md Zahid Hasan, Nastaran Saadati, Aditya Balu, Chao Liu, Soumik Sarkar
-
ProDiGy: Proximity- and Dissimilarity-Based Byzantine-Robust Federated Learning
Sena Ergisi, Luis Maßny, Rawad Bitar
-
Representation-Aware Distributionally Robust Optimization: A Knowledge Transfer Framework
Zitao Wang, Nian Si, Molei Liu
-
ZORRO: Zero-Knowledge Robustness and Privacy for Split Learning (Full Version)
Nojan Sheybani, Alessandro Pegoraro, Jonathan Knauer, Phillip Rieger, Elissa Mollakuqe, Farinaz Koushanfar, Ahmad-Reza Sadeghi
-
Images in Motion?: A First Look into Video Leakage in Collaborative Deep Learning
Md Fazle Rasul, Alanood Alqobaisi, Bruhadeshwar Bezawada, Indrakshi Ray
-
Chengyu Yang, Rishik Reddy Yesgari, Chengjun Liu
-
Jiaqi Weng, Han Zheng, Hanyu Zhang, Qinqin He, Jialing Tao, Hui Xue, Zhixuan Chu, Xiting Wang
-
Symmetry-Guided Multi-Agent Inverse Reinforcement Learnin
Yongkai Tian, Yirong Qi, Xin Yu, Wenjun Wu, Jie Luo
-
Adversarial Attacks Against Automated Fact-Checking: A Survey
Fanzhen Liu, Alsharif Abuadbba, Kristen Moore, Surya Nepal, Cecile Paris, Jia Wu, Jian Yang, Quan Z. Sheng
-
Architecting Resilient LLM Agents: A Guide to Secure Plan-then-Execute Implementations
Ron F. Del Rosario, Klaudia Krawiecka, Christian Schroeder de Witt
-
X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates
Hyunjun Kim, Junwoo Ha, Sangyoon Yu, Haon Park
-
Vivek Oommen, Siavash Khodakarami, Aniruddha Bora, Zhicheng Wang, George Em Karniadakis
-
Seongho Kim, Sejong Ryu, Hyoukjun You, Je Hyeong Hong
-
Ensemble Distribution Distillation for Self-Supervised Human Activity Recognition
Matthew Nolan, Lina Yao, Robert Davidson
-
Perfectly-Private Analog Secure Aggregation in Federated Learning
Delio Jaramillo-Velez, Charul Rajput, Ragnar Freij-Hollanti, Camilla Hollanti, Alexandre Graell i Amat
-
Shun Takagi, Satoshi Hasegawa
-
Tight Privacy Audit in One Run
Zihang Xiang, Tianhao Wang, Hanshen Xiao, Yuan Tian, Di Wang
-
Approximate Algorithms for Verifying Differential Privacy with Gaussian Distributions
Bishnu Bhusal, Rohit Chadha, A. Prasad Sistla, Mahesh Viswanathan
-
Corruption-Tolerant Asynchronous Q-Learning with Near-Optimal Rates
Sreejeet Maity, Aritra Mitra
-
Group Distributionally Robust Machine Learning under Group Level Distributional Uncertainty
Xenia Konti, Yi Shen, Zifan Wang, Karl Henrik Johansson, Michael J. Pencina, Nicoleta J. Economou-Zavlanos, Michael M. Zavlanos
-
Quantum Error Correction in Adversarial Regimes
Rahul Arvind, Nikhil Bansal, Dax Enshan Koh, Tobias Haug, Kishor Bharti
-
AVEC: Bootstrapping Privacy for Local LLMs
Madhava Gaikwad
-
X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates
Hyunjun Kim, Junwoo Ha, Sangyoon Yu, Haon Park
-
How Far Are We from True Unlearnability?
Kai Ye, Liangcai Su, Chenxiong Qian
-
Nearest Neighbor Projection Removal Adversarial Training
Himanshu Singh, A. V. Subramanyam, Shivank Rajput, Mohan Kankanhalli
-
Hammer and Anvil: A Principled Defense Against Backdoors in Federated Learning
Lucas Fenaux, Zheng Wang, Jacob Yan, Nathan Chung, Florian Kerschbaum
-
Sketched Gaussian Mechanism for Private Federated Learning
Qiaobo Li, Zhijie Chen, Arindam Banerjee
-
SAGE: Sample-Aware Guarding Engine for Robust Intrusion Detection Against Adversarial Attacks
Jing Chen, Onat Gungor, Zhengli Shang, Tajana Rosing
-
Asynchronous Gossip Algorithms for Rank-Based Statistical Methods
Anna Van Elst, Igor Colin, Stephan Clémençon
-
Meryem Malak Dif, Mouhamed Amine Bouchiha, Abdelaziz Amara Korba, Yacine Ghamri-Doudane
-
When Secure Isn't: Assessing the Security of Machine Learning Model Sharing
Gabriele Digregorio, Marco Di Gennaro, Stefano Zanero, Stefano Longari, Michele Carminati
-
RetinaGuard: Obfuscating Retinal Age in Fundus Images for Biometric Privacy Preserving
Zhengquan Luo, Chi Liu, Dongfu Xiao, Zhen Yu, Yueye Wang, Tianqing Zhu
-
Beyond I'm Sorry, I Can't: Dissecting Large Language Model Refusal
Nirmalendu Prakash, Yeo Wei Jie, Amir Abdullah, Ranjan Satapathy, Erik Cambria, Roy Ka Wei Lee
-
Robust DDoS-Attack Classification with 3D CNNs Against Adversarial Methods
Landon Bragg, Nathan Dorsey, Josh Prior, John Ajit, Ben Kim, Nate Willis, Pablo Rivas
-
Uncovering the Vulnerability of Large Language Models in the Financial Domain via Risk Concealment
Gang Cheng, Haibo Jin, Wenbin Zhang, Haohan Wang, Jun Zhuang
-
AntiDote: Bi-level Adversarial Training for Tamper-Resistant LLMs
Debdeep Sanyal, Manodeep Ray, Murari Mandal
-
EchoLeak: The First Real-World Zero-Click Prompt Injection Exploit in a Production LLM System
Pavan Reddy, Aditya Sanjay Gujral
-
Exploit Tool Invocation Prompt for Tool Behavior Hijacking in LLM-Based Agentic System
Yu Liu, Yuchong Xie, Mingyu Luo, Zesen Liu, Zhixiang Zhang, Kaikai Zhang, Zongjie Li, Ping Chen, Shuai Wang, Dongdong She
-
Decoding Latent Attack Surfaces in LLMs: Prompt Injection via HTML in Web Summarization
Ishaan Verma
-
Zhenhua Xu, Xixiang Zhao, Xubin Yue, Shengwei Tian, Changting Lin, Meng Han
-
Taniya Gidatkar, Oluwaseun Ajao, Matthew Shardlow
-
Contextuality, Holonomy and Discrete Fiber Bundles in Group-Valued Boltzmann Machines
Jean-Pierre Magnot
-
Pre-Forgettable Models: Prompt Learning as a Native Mechanism for Unlearning
Rutger Hendrix, Giovanni Patanè, Leonardo G. Russo, Simone Carnemolla, Giovanni Bellitto, Federica Proietto Salanitri, Concetto Spampinato, Matteo Pennisi
-
A Comprehensive Survey on Trustworthiness in Reasoning with Large Language Models
Yanbo Wang, Yongcan Yu, Jian Liang, Ran He
-
NeuroBreak: Unveil Internal Jailbreak Mechanisms in Large Language Models
Chuhan Zhang, Ye Zhang, Bowen Shi, Yuyou Gan, Tianyu Du, Shouling Ji, Dazhan Deng, Yingcai Wu
-
Detecting Regional Spurious Correlations in Vision Transformers via Token Discarding
Solha Kang, Esla Timothy Anzaku, Wesley De Neve, Arnout Van Messem, Joris Vankerschaver, Francois Rameau, Utku Ozbulak
-
False Sense of Security: Why Probing-based Malicious Input Detection Fails to Generalize
Cheng Wang, Zeming Wei, Qin Liu, Muhao Chen
-
Inverse IFEval: Can LLMs Unlearn Stubborn Training Conventions to Follow Real Instructions?
Qinyan Zhang, Xinping Lei, Ruijie Miao, Yu Fu, Haojie Fan, Le Chang, Jiafan Hou, Dingling Zhang, Zhongfei Hou, Ziqiang Yang, Changxin Pu, Fei Hu, Jingkai Liu, Mengyun Liu, Yang Liu, Xiang Gao, Jiaheng Liu, Tong Yang, Zaiyuan Wang, Ge Zhang, Wenhao Huang
-
Shakiba Amirshahi, Amin Bigdeli, Charles L. A. Clarke, Amira Ghenai
-
Self-adaptive Dataset Construction for Real-World Multimodal Safety Scenarios
Jingen Qu, Lijun Li, Bo Zhang, Yichen Yan, Jing Shao
-
Formal Verification of Local Robustness of a Classification Algorithm for a Spatial Use Case
Delphine Longuet, Amira Elouazzani, Alejandro Penacho Riveiros, Nicola Bastianello
-
Privacy Risks in Time Series Forecasting: User- and Record-Level Membership Inference
Nicolas Johansson, Tobias Olsson, Daniel Nilsson, Johan Östman, Fazeleh Hoseini
-
Qifeng Tan, Shusen Yang, Xuebin Ren, Yikai Zhang
-
Peekaboo, I See Your Queries: Passive Attacks Against DSSE Via Intermittent Observations
Hao Nie, Wei Wang, Peng Xu, Wei Chen, Laurence T. Yang, Mauro Conti, Kaitai Liang
-
An Automated, Scalable Machine Learning Model Inversion Assessment Pipeline
Tyler Shumaker, Jessica Carpenter, David Saranchak, Nathaniel D. Bastian
-
Between a Rock and a Hard Place: Exploiting Ethical Reasoning to Jailbreak LLMs
Shei Pern Chua, Zhen Leng Thai, Teh Kai Jun, Xiao Li, Xiaolin Hu
-
Variational Gaussian Mixture Manifold Models for Client-Specific Federated Personalization
Sai Puppala, Ismail Hossain, Md Jahangir Alam, Sajedul Talukder
-
Xin Tong, Zhi Lin, Jingya Wang, Meng Han, Bo Jin
-
Visible Yet Unreadable: A Systematic Blind Spot of Vision Language Models Across Writing Systems
Jie Zhang, Ting Xu, Gelei Deng, Runyi Hu, Han Qiu, Tianwei Zhang, Qing Guo, Ivor Tsang
-
Yunbo Long, Liming Xu, Lukas Beckenbauer, Yuhan Liu, Alexandra Brintrup
-
ANNIE: Be Careful of Your Robots
Yiyang Huang, Zixuan Wang, Zishen Wan, Yapeng Tian, Haobo Xu, Yinhe Han, Yiming Gan
-
Alma M. Liezenga, Stefan Wijnja, Puck de Haan, Niels W. T. Brink, Jip J. van Stijn, Yori Kamphuis, Klamer Schutte
-
On the MIA Vulnerability Gap Between Private GANs and Diffusion Models
Ilana Sebag, Jean-Yves Franceschi, Alain Rakotomamonjy, Alexandre Allauzen, Jamal Atif
-
DPQuant: Efficient and Differentially-Private Model Training via Dynamic Quantization Scheduling
Yubo Gao, Renbo Tu, Gennady Pekhimenko, Nandita Vijaykumar
-
SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models
Jigang Fan, Zhenghong Zhou, Ruofan Jin, Le Cong, Mengdi Wang, Zaixi Zhang
-
Tzuhsuan Huang, Cheng Yu Yeo, Tsai-Ling Huang, Hong-Han Shuai, Wen-Huang Cheng, Jun-Cheng Chen
-
Background Matters Too: A Language-Enhanced Adversarial Framework for Person Re-Identification
Kaicong Huang, Talha Azfar, Jack M. Reilly, Thomas Guggisberg, Ruimin Ke
-
High Cursive Complex Character Recognition using GAN External Classifier
S M Rafiuddin
-
Backdoor Poisoning Attack Against Face Spoofing Attack Detection Methods
Shota Iwamatsu, Koichi Ito, Takafumi Aoki
-
Hania Ghouse, Muzammil Behzad
-
Kaoru Otsuka, Yuki Takezawa, Makoto Yamada
-
LSAM: Asynchronous Distributed Training with Landscape-Smoothed Sharpness-Aware Minimization
Yunfei Teng, Sixin Zhang
-
Can LLMs Lie? Investigation beyond Hallucination
Haoran Huan, Mihir Prabhudesai, Mengning Wu, Shantanu Jaiswal, Deepak Pathak
-
EverTracer: Hunting Stolen Large Language Models via Stealthy and Robust Probabilistic Fingerprint
Zhenhua Xu, Meng Han, Wenpeng Xing
-
Somiya Chhillar, Mary K. Righi, Rebecca E. Sutter, Evgenios M. Kornaropoulos
-
Federated Learning: An approach with Hybrid Homomorphic Encryption
Pedro Correia, Ivan Silva, Ivone Amorim, Eva Maia, Isabel Praça
-
PersonaTeaming: Exploring How Introducing Personas Can Improve Automated AI Red-Teaming
Wesley Hanwen Deng, Sunnie S. Y. Kim, Akshita Jha, Ken Holstein, Motahhare Eslami, Lauren Wilcox, Leon A Gatys
-
Learning an Adversarial World Model for Automated Curriculum Generation in MARL
Brennen Hill
-
Stealth by Conformity: Evading Robust Aggregation through Adaptive Poisoning
Ryan McGaughey, Jesus Martinez del Rincon, Ihsen Alouani
-
Silent Until Sparse: Backdoor Attacks on Semi-Structured Sparsity
Wei Guo, Maura Pintor, Ambra Demontis, Battista Biggio
-
Prototype-Guided Robust Learning against Backdoor Attacks
Wei Guo, Maura Pintor, Ambra Demontis, Battista Biggio
-
From Injection to Defense: Constructing Edit-Based Fingerprints for Large Language Models
Yue Li, Xin Yi, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Linlin Wang
-
SafeProtein: Red-Teaming Framework and Benchmark for Protein Foundation Models
Jigang Fan, Zhenghong Zhou, Ruofan Jin, Le Cong, Mengdi Wang, Zaixi Zhang
-
Kesen Wang, Daulet Toibazar, Pedro J. Moreno
-
Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models
Sandipana Dowerah, Atharva Kulkarni, Ajinkya Kulkarni, Hoan My Tran, Joonas Kalda, Artem Fedorchenko, Benoit Fauve, Damien Lolive, Tanel Alumäe, Matthew Magimai Doss
-
Halima Bouzidi, Haoyu Liu, Mohammad Abdullah Al Faruque
-
Sai Teja Reddy Adapala
-
Jian Chen, Jiabao Dou, Jinbao Tian, Yunqi Yang, Zhou Li
-
PIR-RAG: A System for Private Information Retrieval in Retrieval-Augmented Generation
Baiqiang Wang, Qian Lou, Mengxin Zheng, Dongfang Zhao
-
Distributed Gossip-GAN for Low-overhead CSI Feedback Training in FDD mMIMO-OFDM Systems
Yuwen Cao, Guijun Liu, Tomoaki Ohtsuki, Howard H. Yang, Tony Q. S. Quek
-
Deep opacity and AI: A threat to XAI and to privacy protection mechanisms
Vincent C. Müller
-
Partially Functional Dynamic Backdoor Diffusion-based Causal Model
Xinwen Liu, Lei Qian, Song Xi Chen, Niansheng Tang
-
When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment
Hanqi Yan, Hainiu Xu, Siya Qi, Shu Yang, Yulan He
-
When Thinking Backfires: Mechanistic Insights Into Reasoning-Induced Misalignment
Hanqi Yan, Hainiu Xu, Siya Qi, Shu Yang, Yulan He
-
GCAV: A Global Concept Activation Vector Framework for Cross-Layer Consistency in Interpretability
Zhenghao He, Sanchit Sinha, Guangzhi Xiong, Aidong Zhang
-
PromptSleuth: Detecting Prompt Injection via Semantic Intent Invariance
Mengxiao Wang, Yuxuan Zhang, Guofei Gu
-
Enhancing Resilience for IoE: A Perspective of Networking-Level Safeguard
Guan-Yan Yang, Jui-Ning Chen, Farn Wang, Kuo-Hui Yeh
-
Safe-Control: A Safety Patch for Mitigating Unsafe Content in Text-to-Image Generation Models
Xiangtao Meng, Yingkai Dong, Ning Yu, Li Wang, Zheng Li, Shanqing Guo
-
Network-Level Prompt and Trait Leakage in Local Research Agents
Hyejun Jeong, Mohammadreza Teymoorianfard, Abhinav Kumar, Amir Houmansadr, Eugene Bagdasarian
-
Pruning Weights but Not Truth: Safeguarding Truthfulness While Pruning LLMs
Yao Fu, Runchao Li, Xianxuan Long, Haotian Yu, Xiaotian Han, Yu Yin, Pan Li
-
Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents
Zhixin Lin, Jungang Li, Shidong Pan, Yibo Shi, Yue Yao, Dongliang Xu
-
AR$^2$: Adversarial Reinforcement Learning for Abstract Reasoning in Large Language Models
Cheng-Kai Yeh, Hsing-Wang Lee, Chung-Hung Kuo, Hen-Hsen Huang
-
Sheng Liu, Qiang Sheng, Danding Wang, Yang Li, Guang Yang, Juan Cao
-
Language Models Identify Ambiguities and Exploit Loopholes
Jio Choi, Mohit Bansal, Elias Stengel-Eskin
-
AEGIS : Automated Co-Evolutionary Framework for Guarding Prompt Injections Schema
Ting-Chun Liu, Ching-Yu Hsu, Kuan-Yi Lee, Chi-An Fu, Hung-yi Lee
-
Robustness Assessment and Enhancement of Text Watermarking for Google's SynthID
Xia Han, Qi Li, Jianbing Ni, Mohammad Zulkernine
-
Robustness is Important: Limitations of LLMs for Data Fitting
Hejia Liu, Mochen Yang, Gediminas Adomavicius
-
PRISM: Robust VLM Alignment with Principled Reasoning for Integrated Safety in Multimodality
Nanxi Li, Zhengyue Zhao, Chaowei Xiao
-
Membership Inference Attacks on LLM-based Recommender Systems
Jiajie He, Yuechun Gu, Min-Chun Chen, Keke Chen
-
Auditing Approximate Machine Unlearning for Differentially Private Models
Yuechun Gu, Jiajie He, Keke Chen
-
FLAegis: A Two-Layer Defense Framework for Federated Learning Against Poisoning Attacks
Enrique Mármol Campos, Aurora González Vidal, José Luis Hernández Ramos, Antonio Skarmeta
-
SegReConcat: A Data Augmentation Method for Voice Anonymization Attack
Ridwan Arefeen, Xiaoxiao Miao, Rong Tong, Aik Beng Ng, Simon See
-
Enhancing Model Privacy in Federated Learning with Random Masking and Quantization
Zhibo Xu, Jianhao Zhu, Jingwen Xu, Changze Lv, Zisu Huang, Xiaohua Wang, Muling Wu, Qi Qian, Xiaoqing Zheng, Xuanjing Huang
-
Tackling Federated Unlearning as a Parameter Estimation Problem
Antonio Balordi, Lorenzo Manini, Fabio Stella, Alessio Merlo
-
Attackers Strike Back? Not Anymore - An Ensemble of RL Defenders Awakens for APT Detection
Sidahmed Benabderrahmane, Talal Rahwan
-
SecureV2X: An Efficient and Privacy-Preserving System for Vehicle-to-Everything (V2X) Applications
Joshua Lee, Ali Arastehfard, Weiran Liu, Xuegang Ban, Yuan Hong
-
UniC-RAG: Universal Knowledge Corruption Attacks to Retrieval-Augmented Generation
Runpeng Geng, Yanting Wang, Ying Chen, Jinyuan Jia
-
Stephen Meisenbacher, Alexandra Klymenko, Andreea-Elena Bodea, Florian Matthes
-
Flatness-aware Curriculum Learning via Adversarial Difficulty
Hiroaki Aizawa, Yoshikazu Hayashi
-
A Closer Look at Edema Area Segmentation in SD-OCT Images Using Adversarial Framework
Yuhui Tao, Yizhe Zhang, Qiang Chen
-
Can we make NeRF-based visual localization privacy-preserving?
Maxime Pietrantoni, Martin Humenberger, Torsten Sattler, Gabriela Csurka
-
Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models
Rui Zhang, Zihan Wang, Tianli Yang, Hongwei Li, Wenbo Jiang, Qingchuan Zhao, Yang Liu, Guowen Xu
-
Saddle Hierarchy in Dense Associative Memory
Robin Thériault, Daniele Tantari
-
Get Global Guarantees: On the Probabilistic Nature of Perturbation Robustness
Wenchuan Mu, Kwan Hui Lim
-
A Tight Context-aware Privacy Bound for Histogram Publication
Sara Saeidian, Ata Yavuzyılmaz, Leonhard Grosse, Georg Schuppe, Tobias J. Oechtering
-
Memorization in Graph Neural Networks
Adarsh Jamadandi, Jing Xu, Adam Dziedzic, Franziska Boenisch
-
Membership Inference Attacks on LLM-based Recommender Systems
Jiajie He, Yuechun Gu, Min-Chun Chen, Keke Chen
-
On Surjectivity of Neural Networks: Can you elicit any behavior from your model?
Haozhe Jiang, Nika Haghtalab
-
Attacking LLMs and AI Agents: Advertisement Embedding Attacks Against Large Language Models
Qiming Guo, Jinwen Tang, Xingran Huang
-
Robustness Feature Adapter for Efficient Adversarial Training
Quanwei Wu, Jun Guo, Wei Wang, Yi Wang
-
Speculative Safety-Aware Decoding
Xuekang Wang, Shengyu Zhu, Xueqi Cheng
-
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo
-
Vocoder-Projected Feature Discriminator
Takuhiro Kaneko, Hirokazu Kameoka, Kou Tanaka, Yuto Kondo
-
Learning from Few Samples: A Novel Approach for High-Quality Malcode Generation
Haijian Ma, Daizong Liu, Xiaowen Cai, Pan Zhou, Yulai Xie
-
ISACL: Internal State Analyzer for Copyrighted Training Data Leakage
Guangwei Zhang, Qisheng Su, Jiateng Liu, Cheng Qian, Yanzhou Pan, Yanjie Fu, Denghui Zhang
-
CATformer: Contrastive Adversarial Transformer for Image Super-Resolution
Qinyi Tian, Spence Cox, Laura E. Dalton
-
SCOUT: Semi-supervised Camouflaged Object Detection by Utilizing Text and Adaptive Data Selection
Weiqi Yan, Lvhai Chen, Shengchuan Zhang, Yan Zhang, Liujuan Cao
-
Does simple trump complex? Comparing strategies for adversarial robustness in DNNs
William Brooks, Marelie H. Davel, Coenraad Mouton
-
FedGreed: A Byzantine-Robust Loss-Based Aggregation Method for Federated Learning
Emmanouil Kritharakis, Antonios Makris, Dusan Jakovetic, Konstantinos Tserpes
-
Quantum-Classical Hybrid Framework for Zero-Day Time-Push GNSS Spoofing Detection
Abyad Enan, Mashrur Chowdhury, Sagar Dasgupta, Mizanur Rahman
-
PhantomLint: Principled Detection of Hidden LLM Prompts in Structured Documents
Toby Murray
-
ClearMask: Noise-Free and Naturalness-Preserving Protection Against Voice Deepfake Attacks
Yuanda Wang, Bocheng Chen, Hanqing Guo, Guangjing Wang, Weikang Ding, Qiben Yan
-
Backprompting: Leveraging Synthetic Production Data for Health Advice Guardrails
Kellen Tan Cheng, Anna Lisa Gentile, Chad DeLuca, Guang-Jie Ren
-
Analise de Desaprendizado de Maquina em Modelos de Classificacao de Imagens Medicas
Andreza M. C. Falcao, Filipe R. Cordeiro
-
Training Language Model Agents to Find Vulnerabilities with CTF-Dojo
Terry Yue Zhuo, Dingmin Wang, Hantian Ding, Varun Kumar, Zijian Wang
-
Privacy-Preserving Federated Learning Framework for Risk-Based Adaptive Authentication
Yaser Baseri, Abdelhakim Senhaji Hafid, Dimitrios Makrakis, Hamidreza Fereidouni
-
A.S.E: A Repository-Level Benchmark for Evaluating Security in AI-Generated Code
Keke Lian, Bin Wang, Lei Zhang, Libo Chen, Junjie Wang, Ziming Zhao, Yujiu Yang, Miaoqian Lin, Haotong Duan, Haoran Zhao, Shuang Liao, Mingda Guo, Jiazheng Quan, Yilu Zhong, Chenhao He, Zichuan Chen, Jie Wu, Haoling Li, Zhaoxuan Li, Jiongchi Yu, Hui Li, Dong Zhang
-
School of Reward Hacks: Hacking harmless tasks generalizes to misaligned behavior in LLMs
Mia Taylor, James Chua, Jan Betley, Johannes Treutlein, Owain Evans
-
Kaiwen Zuo, Zelin Liu, Raman Dutt, Ziyang Wang, Zhongtian Sun, Yeming Wang, Fan Mo, Pietro Liò
-
Exposing Privacy Risks in Graph Retrieval-Augmented Generation
Jiale Liu, Jiahao Zhang, Suhang Wang
-
Agent-Testing Agent: A Meta-Agent for Automated Testing and Evaluation of Conversational AI Agents
Sameer Komoravolu, Khalil Mrini
-
Activation Transport Operators
Andrzej Szablewski, Marek Masiak
-
Bryan Chen Zhengyu Tan, Daniel Wai Kit Chin, Zhengyuan Liu, Nancy F. Chen, Roy Ka-Wei Lee
-
Advancing Weakly-Supervised Change Detection in Satellite Images via Adversarial Class Prompting
Zhenghui Zhao, Chen Wu, Di Wang, Hongruixuan Chen, Cuiqun Chen, Zhuo Zheng, Bo Du, Liangpei Zhang
-
Uncovering and Mitigating Destructive Multi-Embedding Attacks in Deepfake Proactive Forensics
Lixin Jia, Haiyang Sun, Zhiqing Guo, Yunfeng Diao, Dan Ma, Gaobo Yang
-
AdaGAT: Adaptive Guidance Adversarial Training for the Robustness of Deep Neural Networks
Zhenyu Liu, Huizhi Liang, Xinrun Li, Vaclav Snasel, Varun Ojha
-
Defending Deepfake via Texture Feature Perturbation
Xiao Zhang, Changfang Chen, Tianyi Wang
-
Sharpness-Aware Geometric Defense for Robust Out-Of-Distribution Detection
Jeng-Lin Li, Ming-Ching Chang, Wei-Chao Chen
-
MetaFed: Advancing Privacy, Performance, and Sustainability in Federated Metaverse Systems
Muhammet Anil Yagiz, Zeynep Sude Cengiz, Polat Goktas
-
Trust Me, I Know This Function: Hijacking LLM Static Analysis using Bias
Shir Bernstein, David Beste, Daniel Ayzenshteyn, Lea Schonherr, Yisroel Mirsky
-
FRAME : Comprehensive Risk Assessment Framework for Adversarial Machine Learning Threats
Avishag Shapira, Simon Shigol, Asaf Shabtai
-
Adversarial Examples Are Not Bugs, They Are Superposition
Liv Gorton, Owen Lewis
-
Risk Assessment and Security Analysis of Large Language Models
Xiaoyan Zhang, Dongyang Lyu, Xiaoqi Li
-
SoK: Cybersecurity Assessment of Humanoid Ecosystem
Priyanka Prakash Surve, Asaf Shabtai, Yuval Elovici
-
LLMs Can't Handle Peer Pressure: Crumbling under Multi-Agent Social Interactions
Maojia Song, Tej Deep Pala, Weisheng Jin, Amir Zadeh, Chuan Li, Dorien Herremans, Soujanya Poria
-
WildSpoof Challenge Evaluation Plan
Yihan Wu, Jee-weon Jung, Hye-jin Shim, Xin Cheng, Xin Wang
-
Hyunjun Kim, Junwoo Ha, Sangyoon Yu, Haon Park
-
NAT: Learning to Attack Neurons for Enhanced Adversarial Transferability
Krishna Kanth Nakka, Alexandre Alahi
-
Unveiling the Latent Directions of Reflection in Large Language Models
Fu-Chieh Chang, Yu-Ting Lee, Pei-Yuan Wu
-
Towards Safeguarding LLM Fine-tuning APIs against Cipher Attacks
Jack Youstra, Mohammed Mahfoud, Yang Yan, Henry Sleight, Ethan Perez, Mrinank Sharma
-
Carlos Soto
-
SALMAN: Stability Analysis of Language Models Through the Maps Between Graph-based Manifolds
Wuxinlin Cheng, Yupeng Cao, Jinwen Wu, Koduvayur Subbalakshmi, Tian Han, Zhuo Feng
-
STA-GANN: A Valid and Generalizable Spatio-Temporal Kriging Approach
Yujie Li, Zezhi Shao, Chengqing Yu, Tangwen Qian, Zhao Zhang, Yifan Du, Shaoming He, Fei Wang, Yongjun Xu
-
An Investigation of Visual Foundation Models Robustness
Sandeep Gupta, Roberto Passerone
-
From Confidence to Collapse in LLM Factual Robustness
Alina Fastowski, Bardh Prenkaj, Gjergji Kasneci
-
LLMSymGuard: A Symbolic Safety Guardrail Framework Leveraging Interpretable Jailbreak Concepts
Darpan Aswal, Céline Hudelot
-
Yu Yan, Sheng Sun, Zhe Wang, Yijun Lin, Zenghao Duan, zhifei zheng, Min Liu, Zhiyi yin, Jianping Zhang
-
HAMSA: Hijacking Aligned Compact Models via Stealthy Automation
Alexey Krylov, Iskander Vagizov, Dmitrii Korzh, Maryam Douiba, Azidine Guezzaz, Vladimir Kokh, Sergey D. Erokhin, Elena V. Tutubalina, Oleg Y. Rogov
-
Guangyu Yang, Jinghong Chen, Jingbiao Mei, Weizhe Lin, Bill Byrne
-
Domain Adaptation via Feature Refinement
Savvas Karatsiolis, Andreas Kamilaris
-
PromptFlare: Prompt-Generalized Defense via Cross-Attention Decoy in Diffusion-Based Inpainting
Hohyun Na, Seunghoo Hong, Simon S. Woo
-
Benchmarking the Robustness of Agentic Systems to Adversarially-Induced Harms
Jonathan Nöther, Adish Singla, Goran Radanovic
-
Quality control in sublinear time: a case study via random graphs
Cassandra Marcussen, Ronitt Rubinfeld, Madhu Sudan
-
Evaluating the Defense Potential of Machine Unlearning against Membership Inference Attacks
Aristeidis Sidiropoulos, Christos Chrysanthos Nikolaidis, Theodoros Tsiolakis, Nikolaos Pavlidis, Vasilis Perifanis, Pavlos S. Efraimidis
-
How to Beat Nakamoto in the Race
Shu-Jie Cao, Dongning Guo
-
Guarding Your Conversations: Privacy Gatekeepers for Secure Interactions with Cloud-Based AI Models
GodsGift Uzor, Hasan Al-Qudah, Ynes Ineza, Abdul Serwadda
-
A Survey of Threats Against Voice Authentication and Anti-Spoofing Systems
Kamel Kamel, Keshav Sood, Hridoy Sankar Dutta, Sunil Aryal
-
Aligning Distributionally Robust Optimization with Practical Deep Learning Needs
Dmitrii Feoktistov, Igor Ignashin, Andrey Veprikov, Nikita Borovko, Alexander Bogdanov, Savelii Chezhegov, Aleksandr Beznosikov
-
Nesrine Benchoubane, Olfa Ben Yahia, William Ferguson, Gurkan Gur, Sumit Chakravarty, Gregory Falco, Gunes Karabulut Kurt
-
Conflict-Aware Soft Prompting for Retrieval-Augmented Generation
Eunseong Choi, June Park, Hyeri Lee, Jongwuk Lee
-
Hengyu An, Jinghuai Zhang, Tianyu Du, Chunyi Zhou, Qingming Li, Tao Lin, Shouling Ji
-
VideoEraser: Concept Erasure in Text-to-Video Diffusion Models
Naen Xu, Jinghuai Zhang, Changjiang Li, Zhi Chen, Chunyi Zhou, Qingming Li, Tianyu Du, Shouling Ji
-
Unveiling Trust in Multimodal Large Language Models: Evaluation, Analysis, and Mitigation
Yichi Zhang, Yao Huang, Yifan Wang, Yitong Sun, Chang Liu, Zhe Zhao, Zhengwei Fang, Huanran Chen, Xiao Yang, Xingxing Wei, Hang Su, Yinpeng Dong, Jun Zhu
-
Reliable Unlearning Harmful Information in LLMs with Metamorphosis Representation Projection
Chengcan Wu, Zeming Wei, Huanran Chen, Yinpeng Dong, Meng Sun
-
Towards a 3D Transfer-based Black-box Attack via Critical Feature Guidance
Shuchao Pang, Zhenghan Chen, Shen Zhang, Liming Lu, Siyuan Liang, Anan Du, Yongbin Zhou
-
A Study of Privacy-preserving Language Modeling Approaches
Pritilata Saha, Abhirup Sinha
-
SafetyFlow: An Agent-Flow System for Automated LLM Safety Benchmarking
Xiangyang Zhu, Yuan Tian, Chunyi Li, Kaiwei Zhang, Wei Sun, Guangtao Zhai
-
SDGO: Self-Discrimination-Guided Optimization for Consistent Safety in Large Language Models
Peng Ding, Wen Sun, Dailin Li, Wei Zou, Jiaming Wang, Jiajun Chen, Shujian Huang
-
Retrieval-Augmented Review Generation for Poisoning Recommender Systems
Shiyi Yang, Xinshu Li, Guanglin Zhou, Chen Wang, Xiwei Xu, Liming Zhu, Lina Yao
-
Adversarial Attacks against Neural Ranking Models via In-Context Learning
Amin Bigdeli, Negar Arabzadeh, Ebrahim Bagheri, Charles L. A. Clarke
-
Adversarial Agent Behavior Learning in Autonomous Driving Using Deep Reinforcement Learning
Arjun Srinivasan, Anubhav Paras, Aniket Bera
-
Fast globally optimal Truncated Least Squares point cloud registration with fixed rotation axis
Ivo Ivanov, Carsten Markgraf
-
DoSReMC: Domain Shift Resilient Mammography Classification using Batch Normalization Adaptation
Uğurcan Akyüz, Deniz Katircioglu-Öztürk, Emre K. Süslü, Burhan Keleş, Mete C. Kaya, Gamze Durhan, Meltem G. Akpınar, Figen B. Demirkazık, Gözde B. Akar
-
SafeLLM: Unlearning Harmful Outputs from Large Language Models against Jailbreak Attacks
Xiangman Li, Xiaodong Wu, Qi Li, Jianbing Ni, Rongxing Lu
-
Mini-Batch Robustness Verification of Deep Neural Networks
Saar Tzour-Shaday, Dana Drachsler Cohen
-
Kiarash Kazari, Ezzeldin Shereen, György Dán
-
BadFU: Backdoor Federated Learning through Adversarial Machine Unlearning
Bingguang Lu, Hongsheng Hu, Yuantian Miao, Shaleeza Sohail, Chaoxiang He, Shuo Wang, Xiao Chen
-
Strategic Sample Selection for Improved Clean-Label Backdoor Attacks in Text Classification
Onur Alp Kirci, M. Emre Gursoy
-
Foe for Fraud: Transferable Adversarial Attacks in Credit Card Fraud Detection
Jan Lum Fok, Qingwen Zeng, Shiping Chen, Oscar Fawkes, Huaming Chen
-
Jiangfan Liu, Yongkang Guo, Fangzhi Zhong, Tianyuan Zhang, Zonglei Jing, Siyuan Liang, Jiakai Wang, Mingchuan Zhang, Aishan Liu, Xianglong Liu
-
Adversarial Hospital-Invariant Feature Learning for WSI Patch Classification
Mengliang Zhang, Jacob M. Luber
-
Improving Fairness in Graph Neural Networks via Counterfactual Debiasing
Zengyi Wo, Chang Liu, Yumeng Wang, Minglai Shao, Wenjun Wang
-
Sajib Biswas, Mao Nishino, Samuel Jacob Chacko, Xiuwen Liu
-
Distributional Adversarial Attacks and Training in Deep Hedging
Guangyi He, Tobias Sutter, Lukas Gonon
-
Xuezheng Qin, Ruwei Huang, Xiaolong Tang, Feng Li
-
A Lightweight Incentive-Based Privacy-Preserving Smart Metering Protocol for Value-Added Services
Farid Zaredar, Morteza Amini
-
Farid Zaredar, Morteza Amini
-
TAIGen: Training-Free Adversarial Image Generation via Diffusion Models
Susim Roy, Anubhooti Jain, Mayank Vatsa, Richa Singh
-
A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives
Kaixiang Zhao, Lincan Li, Kaize Ding, Neil Zhenqiang Gong, Yue Zhao, Yushun Dong
-
MoEcho: Exploiting Side-Channel Attacks to Compromise User Privacy in Mixture-of-Experts LLMs
Ruyi Ding, Tianhong Xu, Xinyi Shen, Aidong Adam Ding, Yunsi Fei
-
Paired-Sampling Contrastive Framework for Joint Physical-Digital Face Attack Detection
Andrei Balykin, Anvar Ganiev, Denis Kondranin, Kirill Polevoda, Nikolai Liudkevich, Artem Petrov
-
Side Effects of Erasing Concepts from Diffusion Models
Shaswati Saha, Sourajit Saha, Manas Gaur, Tejas Gokhale
-
Aura-CAPTCHA: A Reinforcement Learning and GAN-Enhanced Multi-Modal CAPTCHA System
Joydeep Chandra, Prabal Manhas, Ramanjot Kaur, Rashi Sahay
-
Robust Estimation Under Heterogeneous Corruption Rates
Syomantak Chaudhuri, Jerry Li, Thomas A. Courtade
-
Potential and challenges of generative adversarial networks for super-resolution in 4D Flow MRI
Oliver Welin Odeback, Arivazhagan Geetha Balasubramanian, Jonas Schollenberger, Edward Ferdiand, Alistair A. Young, C. Alberto Figueroa, Susanne Schnell, Outi Tammisola, Ricardo Vinuesa, Tobias Granberg, Alexander Fyrdahl, David Marlevi
-
Self-Disguise Attack: Induce the LLM to disguise itself for AIGT detection evasion
Yinghan Zhou, Juan Wen, Wanli Peng, Zhengxian Wu, Ziwei Zhang, Yiming Xue
-
Linkage Attacks Expose Identity Risks in Public ECG Data Sharing
Ziyu Wang, Elahe Khatibi, Farshad Firouzi, Sanaz Rahimi Mousavi, Krishnendu Chakrabarty, Amir M. Rahmani
-
Ashwath Vaithinathan Aravindan, Abha Jha, Matthew Salaway, Atharva Sandeep Bhide, Duygu Nur Yaldiz
-
The AI Risk Spectrum: From Dangerous Capabilities to Existential Threats
Markov Grey, Charbel-Raphaël Segerie
-
Daniel M. Jimenez-Gutierrez, Yelizaveta Falkouskaya, Jose L. Hernandez-Ramos, Aris Anagnostopoulos, Ioannis Chatzigiannakis, Andrea Vitaletti
-
Evaluating Identity Leakage in Speaker De-Identification Systems
Seungmin Seo, Oleg Aulov, Afzal Godil, Kevin Mangold
-
Unintended Misalignment from Agentic Fine-Tuning: Risks and Mitigation
Dongyoon Hahm, Taywon Min, Woogyeol Jin, Kimin Lee
-
CRISP: Persistent Concept Unlearning via Sparse Autoencoders
Tomer Ashuach, Dana Arad, Aaron Mueller, Martin Tutek, Yonatan Belinkov
-
Kaiwei Zhang, Qi Jia, Zijian Chen, Wei Sun, Xiangyang Zhu, Chunyi Li, Dandan Zhu, Guangtao Zhai
-
Enhancing Robustness of Implicit Neural Representations Against Weight Perturbations
Wenyong Zhou, Yuxin Cheng, Zhengwu Liu, Taiqiang Wu, Chen Zhang, Ngai Wong
-
Yiming Cao, Yanjie Li, Kaisheng Liang, Yuni Lai, Bin Xiao
-
Timestep-Compressed Attack on Spiking Neural Networks through Timestep-Level Backpropagation
Donghwa Kang, Doohyun Kim, Sang-Ki Ko, Jinkyu Lee, Hyeongboo Baek, Brent ByungHoon Kang
-
Backdooring Self-Supervised Contrastive Learning by Noisy Alignment
Tuo Chen, Jie Gui, Minjing Dong, Ju Jia, Lanting Fang, Jian Liu
-
Xiaopeng Peng, Heath Gemar, Erin Fleet, Kyle Novak, Abbie Watnik, Grover Swartzlander
-
Text2Weight: Bridging Natural Language and Neural Network Weight Spaces
Bowen Tian, Wenshuo Chen, Zexi Li, Songning Lai, Jiemin Wu, Yutao Yue
-
Heavy-tailed Linear Bandits: Adversarial Robustness, Best-of-both-worlds, and Beyond
Canzhe Zhao, Shinji Ito, Shuai Li
-
FedUP: Efficient Pruning-based Federated Unlearning for Model Poisoning Attacks
Nicolò Romandini, Cristian Borcea, Rebecca Montanari, Luca Foschini
-
Mohamed Elmahallawy, Tie Luo
-
Beneath the Mask: Can Contribution Data Unveil Malicious Personas in Open-Source Projects?
Ruby Nealon
-
Red Teaming Methodology for Design Obfuscation
Yuntao Liu, Abir Akib, Zelin Lu, Qian Xu, Ankur Srivastava, Gang Qu, David Kehlet, Nij Dorairaj
-
CCFC: Core & Core-Full-Core Dual-Track Defense for LLM Jailbreak Protection
Jiaming Hu, Haoyu Wang, Debarghya Mukherjee, Ioannis Ch. Paschalidis
-
Xin Wu, Fei Teng, Ji Zhang, Xingwang Li, Yuxuan Liang
-
Your Reward Function for RL is Your Best PRM for Search: Unifying RL and Search-Based TTS
Can Jin, Yang Zhou, Qixin Zhang, Hongwu Peng, Di Zhang, Marco Pavone, Ligong Han, Zhang-Wei Hong, Tong Che, Dimitris N. Metaxas
-
MMReview: A Multidisciplinary and Multimodal Benchmark for LLM-Based Peer Review Automation
Xian Gao, Jiacheng Ruan, Zongyun Zhang, Jingsheng Gao, Ting Liu, Yuzhuo Fu
-
Two Birds with One Stone: Multi-Task Detection and Attribution of LLM-Generated Text
Zixin Rao, Youssef Mohamed, Shang Liu, Zeyan Liu
-
Noise Robust One-Class Intrusion Detection on Dynamic Graphs
Aleksei Liuliakov, Alexander Schulz, Luca Hermes, Barbara Hammer
-
MCPTox: A Benchmark for Tool Poisoning Attack on Real-World MCP Servers
Zhiqiang Wang, Yichao Gao, Yanting Wang, Suyuan Liu, Haifeng Sun, Haoran Cheng, Guanquan Shi, Haohua Du, Xiangyang Li
-
CIA+TA Risk Assessment for AI Reasoning Vulnerabilities
Yuksel Aydin
-
Mechanistic Exploration of Backdoored Large Language Model Attention Patterns
Mohammed Abu Baker, Lakshmi Babu-Saheer
-
Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution
Robert Dilworth
-
MMReview: A Multidisciplinary and Multimodal Benchmark for LLM-Based Peer Review Automation
Xian Gao, Jiacheng Ruan, Zongyun Zhang, Jingsheng Gao, Ting Liu, Yuzhuo Fu
-
Unveiling Unicode's Unseen Underpinnings in Undermining Authorship Attribution
Robert Dilworth
-
Systematic Analysis of MCP Security
Yongjian Guo, Puzhuo Liu, Wanlun Ma, Zehang Deng, Xiaogang Zhu, Peng Di, Xi Xiao, Sheng Wen
-
Robust Federated Learning under Adversarial Attacks via Loss-Based Client Clustering
Emmanouil Kritharakis, Dusan Jakovetic, Antonios Makris, Konstantinos Tserpes
-
RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation Patterns
Xin Chen, Junchao Wu, Shu Yang, Runzhe Zhan, Zeyu Wu, Ziyang Luo, Di Wang, Min Yang, Lidia S. Chao, Derek F. Wong
-
Drifting Away from Truth: GenAI-Driven News Diversity Challenges LVLM-Based Misinformation Detection
Fanxiao Li, Jiaying Wu, Tingchao Fu, Yunyun Dong, Bingbing Song, Wei Zhou
-
Jaeung Lee, Suhyeon Yu, Yurim Jang, Simon S. Woo, Jaemin Jo
-
Jinyu Lu, Xinrong Sun, Yunting Tao, Tong Ji, Fanyu Kong, Guoqiang Yang
-
The Hidden Cost of Correlation: Rethinking Privacy Leakage in Local Differential Privacy
Sandaru Jayawardana, Sennur Ulukus, Ming Ding, Kanchana Thilakarathna
-
MAJIC: Markovian Adaptive Jailbreaking via Iterative Composition of Diverse Innovative Strategies
Weiwei Qi, Shuo Shao, Wei Gu, Tianhang Zheng, Puning Zhao, Zhan Qin, Kui Ren
-
Yangyang Guo, Yangyan Li, Mohan Kankanhalli
-
DAASH: A Meta-Attack Framework for Synthesizing Effective and Stealthy Adversarial Examples
Abdullah Al Nomaan Nafi, Habibur Rahaman, Zafaryab Haider, Tanzim Mahfuz, Fnu Suya, Swarup Bhunia, Prabuddha Chakraborty
-
Efficient Constraint-Aware Flow Matching via Randomized Exploration
Zhengyan Huan, Jacob Boerma, Li-Ping Liu, Shuchin Aeron
-
DAIQ: Auditing Demographic Attribute Inference from Question in LLMs
Srikant Panda, Hitesh Laxmichand Patel, Shahad Al-Khalifa, Amit Agarwal, Hend Al-Khalifa, Sharefah Al-Ghamdi
-
Distribution Matching via Generalized Consistency Models
Sagar Shrestha, Rajesh Shrestha, Tri Nguyen, Subash Timilsina
-
CRoC: Context Refactoring Contrast for Graph Anomaly Detection with Limited Supervision
Siyue Xie, Da Sun Handason Tam, Wing Cheong Lau
-
Where to Start Alignment? Diffusion Large Language Model May Demand a Distinct Position
Zhixin Xie, Xurui Song, Jun Luo
-
Yahsin Yeh, Yilun Wu, Bokai Ruan, Honghan Shuai
-
EXOTIC: An Exact, Optimistic, Tree-Based Algorithm for Min-Max Optimization
Chinmay Maheshwari, Chinmay Pimpalkhare, Debasish Chatterjee
-
Rethinking Safety in LLM Fine-tuning: An Optimization Perspective
Minseon Kim, Jin Myung Kwak, Lama Alssum, Bernard Ghanem, Philip Torr, David Krueger, Fazl Barez, Adel Bibi
-
Hanwen Cao, Haobo Lu, Xiaosen Wang, Kun He
-
CryptPEFT: Efficient and Private Neural Network Inference via Parameter-Efficient Fine-Tuning
Saisai Xia, Wenhao Wang, Zihao Wang, Yuhui Zhang, Yier Jin, Dan Meng, Rui Hou
-
Adjustable AprilTags For Identity Secured Tasks
Hao Li
-
MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
Yixuan Yang, Daoyuan Wu, Yufan Chen
-
Passive Hack-Back Strategies for Cyber Attribution: Covert Vectors in Denied Environment
Abraham Itzhak Weinberg
-
MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
Yixuan Yang, Daoyuan Wu, Yufan Chen
-
Rigorous Feature Importance Scores based on Shapley Value and Banzhaf Index
Xuanxiang Huang, Olivier Létoffé, Joao Marques-Silva
-
Xiaojin Zhang, Mingcong Xu, Yiming Li, Wei Chen, Qiang Yang
-
CAMF: Collaborative Adversarial Multi-agent Framework for Machine Generated Text Detection
Yue Wang, Liesheng Wei, Yuxiang Wang
-
Mitigating Jailbreaks with Intent-Aware LLMs
Wei Jie Yeo, Ranjan Satapathy, Erik Cambria
-
Matthew Hull, Haoyang Yang, Pratham Mehta, Mansi Phute, Aeree Cho, Haorang Wang, Matthew Lau, Wenke Lee, Wilian Lunardi, Martin Andreoni, Polo Chau
-
Amira Guesmi, Bassem Ouni, Muhammad Shafique
-
An Improved Algorithm for Adversarial Linear Contextual Bandits via Reduction
Tim van Erven, Jack Mayo, Julia Olkhovskaya, Chen-Yu Wei
-
Adversarial Robustness in Distributed Quantum Machine Learning
Pouya Kananian, Hans-Arno Jacobsen
-
Ben Nassi, Stav Cohen, Or Yair
-
Too Easily Fooled? Prompt Injection Breaks LLMs on Frustratingly Simple Multiple-Choice Questions
Xuyang Guo, Zekai Huang, Zhao Song, Jiahao Zhang
-
When Punctuation Matters: A Large-Scale Comparison of Prompt Robustness Methods for LLMs
Mikhail Seleznyov, Mikhail Chaichuk, Gleb Ershov, Alexander Panchenko, Elena Tutubalina, Oleg Somov
-
Noise Matters: Optimizing Matching Noise for Diffusion Classifiers
Yanghao Wang, Long Chen
-
Semantically Guided Adversarial Testing of Vision Models Using Language Models
Katarzyna Filus, Jorge M. Cruz-Duarte
-
Remove360: Benchmarking Residuals After Object Removal in 3D Gaussian Splatting
Simona Kocour, Assia Benbihi, Torsten Sattler
-
Boosting the Robustness-Accuracy Trade-off of SNNs by Robust Temporal Self-Ensemble
Jihang Wang, Dongcheng Zhao, Ruolin Chen, Qian Zhang, Yi Zeng
-
Robust Convolution Neural ODEs via Contractivity-promoting regularization
Muhammad Zakwan, Liang Xu, Giancarlo Ferrari-Trecate
-
Ruijia Zhang, Xinyan Zhao, Ruixiang Wang, Sigen Chen, Guibin Zhang, An Zhang, Kun Wang, Qingsong Wen
-
Limitation Learning: Catching Adverse Dialog with GAIL
Noah Kasmanoff, Rahul Zalkikar
-
Assessing User Privacy Leakage in Synthetic Packet Traces: An Attack-Grounded Approach
Minhao Jin, Hongyu He, Maria Apostolaki
-
Keke Gai, Dongjue Wang, Jing Yu, Liehuang Zhu, Qi Wu
-
Layer-Wise Perturbations via Sparse Autoencoders for Adversarial Text Generation
Huizhen Shu, Xuying Li, Qirui Wang, Yuji Kosuga, Mengqiu Tian, Zhuo Li
-
Contrastive ECOC: Learning Output Codes for Adversarial Defense
Che-Yu Chou, Hung-Hsuan Chen
-
Hybrid Generative Fusion for Efficient and Privacy-Preserving Face Recognition Dataset Generation
Feiran Li, Qianqian Xu, Shilong Bao, Boyu Han, Zhiyong Yang, Qingming Huang
-
Enhancing Fairness in Autoencoders for Node-Level Graph Anomaly Detection
Shouju Wang, Yuchen Song, Sheng'en Li, Dongmian Zou
-
Searching for Privacy Risks in LLM Agents via Simulation
Yanzhe Zhang, Diyi Yang
-
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
Chiyu Zhang, Lu Zhou, Xiaogang Xu, Jiafei Wu, Liming Fang, Zhe Liu
-
Towards Powerful and Practical Patch Attacks for 2D Object Detection in Autonomous Driving
Yuxin Cao, Yedi Zhang, Wentao He, Yifan Liao, Yan Xiao, Chang Li, Zhiyong Huang, Jin Song Dong
-
Pruning and Malicious Injection: A Retraining-Free Backdoor Attack on Transformer Models
Taibiao Zhao, Mingxuan Sun, Hao Wang, Xiaobing Chen, Xiangwei Zhou
-
Oops!... They Stole it Again: Attacks on Split Learning
Tanveer Khan, Antonis Michalas
-
BERTector: Intrusion Detection Based on Joint-Dataset Learning
Haoyang Hu, Xun Huang, Chenyu Wu, Shiwen Liu, Zhichao Lian, Shuangquan Zhang
-
Anyuan Sang, Lu Zhou, Li Yang, Junbo Jia, Huipeng Yang, Pengbin Feng, Jianfeng Ma
-
Bistochastically private release of longitudinal data
Nicolas Ruiz
-
Wenpeng Xing, Zhonghao Qi, Yupeng Qin, Yilin Li, Caini Chang, Jiahui Yu, Changting Lin, Zhenzhen Xie, Meng Han
-
SproutBench: A Benchmark for Safe and Ethical Large Language Models for Youth
Wenpeng Xing, Lanyi Wei, Haixiao Hu, Rongchang Li, Mohan Li, Changting Lin, Meng Han
-
Failures to Surface Harmful Contents in Video Large Language Models
Yuxin Cao, Wei Song, Derui Wang, Jingling Xue, Jin Song Dong
-
SHLIME: Foiling adversarial attacks fooling SHAP and LIME
Sam Chauhan, Estelle Duguet, Karthik Ramakrishnan, Hugh Van Deventer, Jack Kruger, Ranjan Subbaraman
-
Javier Muñoz-Haro, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez
-
Contrast Sensitivity in Multimodal Large Language Models: A Psychophysics-Inspired Evaluation
Pablo Hernández-Cámara, Alexandra Gomez-Villa, Jose Manuel Jaén-Lorites, Jorge Vila-Tomás, Valero Laparra, Jesus Malo
-
Keke Gai, Dongjue Wang, Jing Yu, Liehuang Zhu, Qi Wu
-
Jailbreaking Commercial Black-Box LLMs with Explicitly Harmful Prompts
Chiyu Zhang, Lu Zhou, Xiaogang Xu, Jiafei Wu, Liming Fang, Zhe Liu
-
Shadow in the Cache: Unveiling and Mitigating Privacy Risks of KV-cache in LLM Inference
Zhifan Luo, Shuo Shao, Su Zhang, Lijing Zhou, Yuke Hu, Chenxu Zhao, Zhihao Liu, Zhan Qin
-
NeuronTune: Fine-Grained Neuron Modulation for Balanced Safety-Utility Alignment in LLMs
Birong Pan, Mayi Xu, Qiankun Pi, Jianhao Chen, Yuanyuan Zhu, Ming Zhong, Tieyun Qian
-
Generation of Indian Sign Language Letters, Numbers, and Words
Ajeet Kumar Yadav, Nishant Kumar, Rathna G N
-
Demystifying the Role of Rule-based Detection in AI Systems for Windows Malware Detection
Andrea Ponte, Luca Demetrio, Luca Oneto, Ivan Tesfai Ogbu, Battista Biggio, Fabio Roli
-
The Surprising Effectiveness of Membership Inference with Simple N-Gram Coverage
Skyler Hallinan, Jaehun Jung, Melanie Sclar, Ximing Lu, Abhilasha Ravichander, Sahana Ramnath, Yejin Choi, Sai Praneeth Karimireddy, Niloofar Mireshghallah, Xiang Ren
-
Slow Tuning and Low-Entropy Masking for Safe Chain-of-Thought Distillation
Ziyang Ma, Qingyue Yuan, Linhai Zhang, Deyu Zhou
-
The Perils of Chart Deception: How Misleading Visualizations Affect Vision-Language Models
Ridwan Mahbub, Mohammed Saidul Islam, Md Tahmid Rahman Laskar, Mizanur Rahman, Mir Tafseer Nayeem, Enamul Hoque
-
IAG: Input-aware Backdoor Attack on VLMs for Visual Grounding
Junxian Li, Beining Xu, Di Zhang
-
CLIP-Flow: A Universal Discriminator for AI-Generated Images Inspired by Anomaly Detection
Zhipeng Yuan, Kai Wang, Weize Quan, Dong-Ming Yan, Tieru Wu
-
Carlos Franzreb, Arnab Das, Tim Polzehl, Sebastian Möller
-
Security Analysis of ChatGPT: Threats and Privacy Risks
Yushan Xiang, Zhongwen Li, Xiaoqi Li
-
Klaudia Krawiecka, Christian Schroeder de Witt
-
Amazon Nova AI Challenge -- Trusted AI: Advancing secure, AI-assisted software development
Sattvik Sahai, Prasoon Goyal, Michael Johnston, Anna Gottardi, Yao Lu, Lucy Hu, Luke Dai, Shaohua Liu, Samyuth Sagi, Hangjie Shi, Desheng Zhang, Lavina Vaz, Leslie Ball, Maureen Murray, Rahul Gupta, Shankar Ananthakrishna
-
Empowering Morphing Attack Detection using Interpretable Image-Text Foundation Model
Sushrut Patwardhan, Raghavendra Ramachandra, Sushma Venkatesh
-
Md Sazedur Rahman, Mohamed Elmahallawy, Sanjay Madria, Samuel Frimpong
-
IPG: Incremental Patch Generation for Generalized Adversarial Patch Training
Wonho Lee, Hyunsik Na, Jisu Lee, Daeseon Choi
-
Do Language Models Agree with Human Perceptions of Suspense in Stories?
Glenn Matlin, Devin Zhang, Rodrigo Barroso Loza, Diana M. Popescu, Joni Isbell, Chandreyi Chakraborty, Mark Riedl
-
Wei Cai, Jian Zhao, Yuchu Jiang, Tianle Zhang, Xuelong Li
-
SMA: Who Said That? Auditing Membership Leakage in Semi-Black-box RAG Controlling
Shixuan Sun, Siyuan Liang, Ruoyu Chen, Jianjie Huang, Jingzhi Li, Xiaochun Cao
-
AI Security Map: Holistic Organization of AI Security Technologies and Impacts on Stakeholders
Hiroya Kato, Kentaro Kita, Kento Hasegawa, Seira Hidano
-
Aydin Zaboli, Junho Hong
-
Securing Educational LLMs: A Generalised Taxonomy of Attacks on LLMs and DREAD Risk Assessment
Farzana Zahid, Anjalika Sewwandi, Lee Brandon, Vimal Kumar, Roopak Sinha
-
SafeFix: Targeted Model Repair via Controlled Image Generation
Ouyang Xu, Baoming Zhang, Ruiyu Mao, Yunhui Guo
-
EditMF: Drawing an Invisible Fingerprint for Your Large Language Models
Jiaxuan Wu, Yinghan Zhou, Wanli Peng, Yiming Xue, Juan Wen, Ping Zhong
-
Oblivionis: A Lightweight Learning and Unlearning Framework for Federated Large Language Models
Fuyao Zhang, Xinyu Yan, Tiantong Wu, Wenjie Li, Tianxiang Chen, Yang Cao, Ran Yan, Longtao Huang, Wei Yang Bryan Lim, Qiang Yang
-
Attacks and Defenses Against LLM Fingerprinting
Kevin Kurian, Ethan Holland, Sean Oesch
-
Zhiqiang Yang, Renshuai Tao, Xiaolong Zheng, Guodong Yang, Chunjie Zhang
-
Privacy-protected Retrieval-Augmented Generation for Knowledge Graph Question Answering
Yunfeng Ning, Mayi Xu, Jintao Wen, Qiankun Pi, Yuanyuan Zhu, Ming Zhong, Jiawei Jiang, Tieyun Qian
-
MADPromptS: Unlocking Zero-Shot Morphing Attack Detection with Multiple Prompt Aggregation
Eduarda Caldeira, Fadi Boutros, Naser Damer
-
Deep Learning Models for Robust Facial Liveness Detection
Oleksandr Kuznetsov, Emanuele Frontoni, Luca Romeo, Riccardo Rosati, Andrea Maranesi, Alessandro Muscatello
-
Exploring Cross-Stage Adversarial Transferability in Class-Incremental Continual Learning
Jungwoo Kim, Jong-Seok Lee
-
Fre-CW: Targeted Attack on Time Series Forecasting using Frequency Domain Loss
Naifu Feng, Lixing Chen, Junhua Tang, Hua Ding, Jianhua Li, Yang Bai
-
Multi-Target Backdoor Attacks Against Speaker Recognition
Alexandrine Fortier, Sonal Joshi, Thomas Thebaud, Jesus Villalba Lopez, Najim Dehak, Patrick Cardinal
-
Image selective encryption analysis using mutual information in CNN based embedding space
Ikram Messadi, Giulia Cervia, Vincent Itier
-
Evasive Ransomware Attacks Using Low-level Behavioral Adversarial Examples
Manabu Hirano, Ryotaro Kobayashi
-
Never Compromise to Vulnerabilities: A Comprehensive Survey on AI Governance
Yuchu Jiang, Jian Zhao, Yuchen Yuan, Tianle Zhang, Yao Huang, Yanghao Zhang, Yan Wang, Yanshu Li, Xizhong Guo, Yusheng Zhao, Jun Zhang, Zhi Zhang, Xiaojian Lin, Yixiu Zou, Haoxuan Ma, Yuhu Shang, Yuzhi Hu, Keshu Cai, Ruochen Zhang, Boyuan Chen, Yilan Gao, Ziheng Jiao, Yi Qin, Shuangjun Du, Xiao Tong, Zhekun Liu, Yu Chen, Xuankun Rong, Rui Wang, Yejie Zheng, Zhaoxin Fan, Hongyuan Zhang, Pan Zhou, Lei Jin, Hao Zhao, Xu Yang, Jiaojiao Zhao, Jianshu Li, Joey Tianyi Zhou, Zhi-Qi Cheng, Longtao Huang, Zhiyi Liu, Zheng Zhu, Jianan Li, Gang Wang, Qi Li, Xu-Yao Zhang, Yaodong Yang, Mang Ye, Wenqi Ren, Zhaofeng He, Hang Su, Rongrong Ni, Liping Jing, Xingxing Wei, Junliang Xing, Massimo Alioto, Shengmei Shen, Petia Radeva, Dacheng Tao, Ya-Qin Zhang, Shuicheng Yan, Chi Zhang, Zhongjiang He, Xuelong Li
-
Cowpox: Towards the Immunity of VLM-based Multi-Agent Systems
Yutong Wu, Jie Zhang, Yiming Li, Chao Zhang, Qing Guo, Nils Lukas, Tianwei Zhang
-
Can AI Keep a Secret? Contextual Integrity Verification: A Provable Security Architecture for LLMs
Aayush Gupta
-
Exact Verification of Graph Neural Networks with Incremental Constraint Solving
Minghao Liu, Chia-Hsuan Lu, Marta Kwiatkowska
-
Collective dynamics of strategic classification
Marta C. Couto, Flavia Barsotti, Fernando P. Santos
-
Jeffri Murrugarra-LLerena, Haoran Niu, K. Suzanne Barber, Hal Daumé III, Yang Trista Cao, Paola Cascante-Bonilla
-
Constrained Black-Box Attacks Against Multi-Agent Reinforcement Learning
Amine Andam, Jamal Bentahar, Mustapha Hedabou
-
Securing Agentic AI: Threat Modeling and Risk Analysis for Network Monitoring Agentic AI System
Pallavi Zambare, Venkata Nikhil Thanikella, Ying Liu
-
Search-Time Data Contamination
Ziwen Han, Meher Mankikar, Julian Michael, Zifan Wang
-
Special-Character Adversarial Attacks on Open-Source Language Model
Ephraiem Sarabamoun
-
Maxime Heuillet, Rishika Bhagwatkar, Jonas Ngnawé, Yann Pequignot, Alexandre Larouche, Christian Gagné, Irina Rish, Ola Ahmad, Audrey Durand
-
Privacy Preserving Inference of Personalized Content for Out of Matrix Users
Michael Sun, Tai Vu, Andrew Wang
-
Wenjing Zhang, Ye Hu, Tao Luo, Zhilong Zhang, Mingzhe Chen
-
1-2-3 Check: Enhancing Contextual Privacy in LLM via Multi-Agent Reasoning
Wenkai Li, Liwen Sun, Zhenxiang Guan, Xuhui Zhou, Maarten Sap
-
Best-Effort Policies for Robust Markov Decision Processes
Alessandro Abate, Thom Badings, Giuseppe De Giacomo, Francesco Fabiano
-
BlindGuard: Safeguarding LLM-based Multi-Agent Systems under Unknown Attacks
Rui Miao, Yixin Liu, Yili Wang, Xu Shen, Yue Tan, Yiwei Dai, Shirui Pan, Xin Wang
-
Stephan Rabanser
-
BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models
Maozhen Zhang, Mengnan Zhao, Bo Wang
-
Can You Trick the Grader? Adversarial Persuasion of LLM Judges
Yerin Hwang, Dongryeol Lee, Taegwan Kang, Yongil Kim, Kyomin Jung
-
Jinx: Unlimited LLMs for Probing Alignment Failures
Jiahao Zhao, Liwei Dong
-
Runze Wang, Zeli Chen, Zhiyun Song, Wei Fang, Jiajin Zhang, Danyang Tu, Yuxing Tang, Minfeng Xu, Xianghua Ye, Le Lu, Dakai Jin
-
Hongrui Zheng, Yuezun Li, Liejun Wang, Yunfeng Diao, Zhiqing Guo
-
MIMIC: Multimodal Inversion for Model Interpretation and Conceptualization
Animesh Jain, Alexandros Stergiou
-
VOIDFace: A Privacy-Preserving Multi-Network Face Recognition With Enhanced Security
Ajnas Muhammed, Iurri Medvedev, Nuno Gonçalves
-
Mitigating Biases in Surgical Operating Rooms with Geometry
Tony Danjun Wang, Tobias Czempiel, Nassir Navab, Lennart Bastian
-
Nicholas Klein, Hemlata Tak, James Fullwood, Krishna Regmi, Leonidas Spinoulas, Ganesh Sivaraman, Tianxiang Chen, Elie Khoury
-
Yan Wang, Da-Wei Zhou, Han-Jia Ye
-
IPBA: Imperceptible Perturbation Backdoor Attack in Federated Self-Supervised Learning
Jiayao Wang, Yang Song, Zhendong Zhao, Jiale Zhang, Qilin Wu, Junwu Zhu, Dongfang Zhao
-
FairDRL-ST: Disentangled Representation Learning for Fair Spatio-Temporal Mobility Prediction
Sichen Zhao, Wei Shao, Jeffrey Chan, Ziqi Xu, Flora Salim
-
Multi-Turn Jailbreaks Are Simpler Than They Seem
Xiaoxue Yang, Jaeha Lee, Anna-Katharina Dick, Jasper Timm, Fei Xie, Diogo Cruz
-
Multi-Hop Privacy Propagation for Differentially Private Federated Learning in Social Networks
Chenchen Lin, Xuehe Wang
-
EFU: Enforcing Federated Unlearning via Functional Encryption
Samaneh Mohammadi, Vasileios Tsouvalas, Iraklis Symeonidis, Ali Balador, Tanir Ozcelebi, Francesco Flammini, Nirvana Meratnia
-
Robust Anomaly Detection in O-RAN: Leveraging LLMs against Data Manipulation Attacks
Thusitha Dayaratne, Ngoc Duy Pham, Viet Vo, Shangqi Lai, Sharif Abuadbba, Hajime Suzuki, Xingliang Yuan, Carsten Rudolph
-
False Reality: Uncovering Sensor-induced Human-VR Interaction Vulnerability
Yancheng Jiang, Yan Jiang, Ruochen Zhou, Yi-Chao Chen, Xiaoyu Ji, Wenyuan Xu
-
Fully-Fluctuating Participation in Sleepy Consensus
Yuval Efron, Joachim Neu, Toniann Pitassi
-
Vibeke Binz Vallevik, Anne Kjersti C. Befring, Severin Elvatun, Jan Franz Nygaard
-
VISOR: Visual Input-based Steering for Output Redirection in Vision-Language Models
Mansi Phute, Ravikumar Balakrishnan
-
Selective KV-Cache Sharing to Mitigate Timing Side-Channels in LLM Inference
Kexin Chu, Zecheng Lin, Dawei Xiang, Zixu Shen, Jianchang Su, Cheng Chu, Yiwei Yang, Wenhui Zhang, Wenfei Wu, Wei Zhang
-
Designing with Deception: ML- and Covert Gate-Enhanced Camouflaging to Thwart IC Reverse Engineering
Junling Fan, David Koblah, Domenic Forte
-
Towards Effective MLLM Jailbreaking Through Balanced On-Topicness and OOD-Intensity
Zuoou Li, Weitong Zhang, Jingyuan Wang, Shuyuan Zhang, Wenjia Bai, Bernhard Kainz, Mengyun Qiao
-
FIDELIS: Blockchain-Enabled Protection Against Poisoning Attacks in Federated Learning
Jane Carney, Kushal Upreti, Gaby G. Dagher, Tim Andersen
-
Hallucination as a Computational Boundary: A Hierarchy of Inevitability and the Oracle Escape
Quan Shi, Wang Xi, Zenghui Ding, Jianqing Gao, Xianjun Yang
-
A Real-Time, Self-Tuning Moderator Framework for Adversarial Prompt Detection
Ivan Zhang
-
Representation Understanding via Activation Maximization
Hongbo Zhu, Angelo Cangelosi
-
ObfusQAte: A Proposed Framework to Evaluate LLM Robustness on Obfuscated Factual Question Answering
Shubhra Ghosh, Abhilekh Borah, Aditya Kumar Guru, Kripabandhu Ghosh
-
A Spin Glass Characterization of Neural Networks
Jun Li
-
Gradient Surgery for Safe LLM Fine-Tuning
Biao Yi, Jiahao Li, Baolei Zhang, Lihai Nie, Tong Li, Tiansheng Huang, Zheli Liu
-
HaDM-ST: Histology-Assisted Differential Modeling for Spatial Transcriptomics Generation
Xuepeng Liu, Zheng Jiang, Pinan Zhu, Hanyu Liu, Chao Li
-
Rongxuan Peng, Shunquan Tan, Chenqi Kong, Anwei Luo, Alex C. Kot, Jiwu Huang
-
Fading the Digital Ink: A Universal Black-Box Attack Framework for 3DGS Watermarking Systems
Qingyuan Zeng, Shu Jiang, Jiajing Lin, Zhenzhong Wang, Kay Chen Tan, Min Jiang
-
Towards Unveiling Predictive Uncertainty Vulnerabilities in the Context of the Right to Be Forgotten
Wei Qian, Chenxu Zhao, Yangyi Li, Wenqian Ye, Mengdi Huai
-
Enhancing Privacy in Decentralized Min-Max Optimization: A Differentially Private Approach
Yueyang Quan, Chang Wang, Shengjie Zhai, Minghong Fang, Zhuqing Liu
-
Certifiably robust malware detectors by design
Pierre-Francois Gimenez, Sarath Sivaprasad, Mario Fritz
-
Multi-task Adversarial Attacks against Black-box Model with Few-shot Queries
Wenqiang Wang, Yan Xiao, Hao Lin, Yangshijie Zhang, Xiaochun Cao
-
Assessing and Mitigating Data Memorization Risks in Fine-Tuned Large Language Models
Badrinath Ramakrishnan, Akshaya Balaji
-
Xianjun Yang, Liqiang Xiao, Shiyang Li, Faisal Ladhak, Hyokun Yun, Linda Ruth Petzold, Yi Xu, William Yang Wang
-
PROPS: Progressively Private Self-alignment of Large Language Models
Noel Teku, Fengwei Tian, Payel Bhattacharjee, Souradip Chakraborty, Amrit Singh Bedi, Ravi Tandon
-
Who's the Evil Twin? Differential Auditing for Undesired Behavior
Ishwar Balappanawar, Venkata Hasith Vattikuti, Greta Kintzley, Ronan Azimi-Mancel, Satvik Golechha
-
Balancing Privacy and Efficiency: Music Information Retrieval via Additive Homomorphic Encryption
William Zerong Wang, Dongfang Zhao
-
Membership and Memorization in LLM Knowledge Distillation
Ziqi Zhang, Ali Shahin Shamsabadi, Hanxiao Lu, Yifeng Cai, Hamed Haddadi
-
Model-Agnostic Sentiment Distribution Stability Analysis for Robust LLM-Generated Texts Detection
Siyuan Li, Xi Lin, Guangyan Li, Zehao Liu, Aodu Wulianghai, Li Ding, Jun Wu, Jianhua Li
-
Adversarial Video Promotion Against Text-to-Video Retrieval
Qiwei Tian, Chenhao Lin, Zhengyu Zhao, Qian Li, Shuai Liu, Chao Shen
-
Membership Inference Attacks with False Discovery Rate Control
Chenxu Zhao, Wei Qian, Aobo Chen, Mengdi Huai
-
Sensory robustness through top-down feedback and neural stochasticity in recurrent vision models
Antonino Greco, Marco D'Alessandro, Karl J. Friston, Giovanni Pezzulo, Markus Siegel
-
SEF-MK: Speaker-Embedding-Free Voice Anonymization through Multi-k-means Quantization
Beilong Tang, Xiaoxiao Miao, Xin Wang, Ming Li
-
Label Inference Attacks against Federated Unlearning
Wei Wang, Xiangyun Tang, Yajie Wang, Yijing Lin, Tao Zhang, Meng Shen, Dusit Niyato, Liehuang Zhu
-
Towards Effective Prompt Stealing Attack against Text-to-Image Diffusion Models
Shiqian Zhao, Chong Wang, Yiming Li, Yihao Huang, Wenjie Qu, Siew-Kei Lam, Yi Xie, Kangjie Chen, Jie Zhang, Tianwei Zhang
-
Context Misleads LLMs: The Role of Context Filtering in Maintaining Safe Alignment of LLMs
Jinhwa Kim, Ian G. Harris
-
The Cost of Thinking: Increased Jailbreak Risk in Large Language Models
Fan Yang
-
LLM Robustness Leaderboard v1 --Technical report
Pierre Peigné - Lefebvre, Quentin Feuillade-Montixi, Tom David, Nicolas Miailhe
-
ETA: Energy-based Test-time Adaptation for Depth Completion
Younjoon Chung, Hyoungseob Park, Patrick Rim, Xiaoran Zhang, Jihe He, Ziyao Zeng, Safa Cicek, Byung-Woo Hong, James S. Duncan, Alex Wong
-
Differentially Private Federated Clustering with Random Rebalancing
Xiyuan Yang, Shengyuan Hu, Soyeon Kim, Tian Li
-
Membership Inference Attack with Partial Features
Xurun Wang, Guangrui Liu, Xinjie Li, Haoyu He, Lin Yao, Weizhe Zhang
-
In-Training Defenses against Emergent Misalignment in Language Models
David Kaczér, Magnus Jørgenvåg, Clemens Vetter, Lucie Flek, Florian Mai
-
FedMeNF: Privacy-Preserving Federated Meta-Learning for Neural Fields
Junhyeog Yun, Minui Hong, Gunhee Kim
-
ScamAgents: How AI Agents Can Simulate Human-Level Scam Calls
Sanket Badhe
-
Sofiane Bouaziz, Adel Hafiane, Raphael Canals, Rachid Nedjai
-
Adversarial Topic-aware Prompt-tuning for Cross-topic Automated Essay Scoring
Chunyun Zhang, Hongyan Zhao, Chaoran Cui, Qilong Song, Zhiqing Lu, Shuai Gong, Kailin Liu
-
Beyond Uniform Criteria: Scenario-Adaptive Multi-Dimensional Jailbreak Evaluation
Lai Jiang, Yuekang Li, Xiaohan Zhang, Youtao Ding, Li Pan
-
Quantifying Conversation Drift in MCP via Latent Polytope
Haoran Shi, Hongwei Yao, Shuo Shao, Shaopeng Jiao, Ziqi Peng, Zhan Qin, Cong Wang
-
Fact2Fiction: Targeted Poisoning Attack to Agentic Fact-checking System
Haorui He, Yupeng Li, Bin Benjamin Zhu, Dacheng Wen, Reynold Cheng, Francis C. M. Lau
-
SAM Encoder Breach by Adversarial Simplicial Complex Triggers Downstream Model Failures
Yi Qin, Rui Wang, Tao Huang, Tong Xiao, Liping Jing
-
SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models
Hanqing Wang, Yuan Tian, Mingyu Liu, Zhenhao Zhang, Xiangyang Zhu
-
FVGen: Accelerating Novel-View Synthesis with Adversarial Video Diffusion Distillation
Wenbin Teng, Gonglin Chen, Haiwei Chen, Yajie Zhao
-
Adaptive Backtracking for Privacy Protection in Large Language Models
Zhihao Yao, Yuxuan Gu, Xiachong Feng, Weitao Ma, Bo Li, Xiaocheng Feng
-
ProvX: Generating Counterfactual-Driven Attack Explanations for Provenance-Based Detection
Weiheng Wu, Wei Qiao, Teng Li, Yebo Feng, Zhuo Ma, Jianfeng Ma, Yang Liu
-
Zhengxian Wu, Juan Wen, Wanli Peng, Haowei Chang, Yinghan Zhou, Yiming Xue
-
When AIOps Become "AI Oops": Subverting LLM-driven IT Operations via Telemetry Manipulation
Dario Pasquini, Evgenios M. Kornaropoulos, Giuseppe Ateniese, Omer Akgul, Athanasios Theocharis, Petros Efstathopoulos
-
Deep Ignorance: Filtering Pretraining Data Builds Tamper-Resistant Safeguards into Open-Weight LLMs
Kyle O'Brien, Stephen Casper, Quentin Anthony, Tomek Korbak, Robert Kirk, Xander Davies, Ishan Mishra, Geoffrey Irving, Yarin Gal, Stella Biderman
-
Train It and Forget It: Merge Lists are Unnecessary for BPE Inference in Language Models
Tomohiro Sawada, Kartik Goyal
-
Sihan Ma, Qiming Wu, Ruotong Jiang, Frank Burns
-
Learning to Forget with Information Divergence Reweighted Objectives for Noisy Labels
Jeremiah Birrell, Reza Ebrahimi
-
Privacy-Preserving Tabular Synthetic Data Generation Using TabularARGN
Andrey Sidorenko, Paul Tiwald
-
Fine-Grained Safety Neurons with Training-Free Continual Projection to Reduce LLM Fine Tuning Risks
Bing Han, Feifei Zhao, Dongcheng Zhao, Guobin Shen, Ping Wu, Yu Shi, Yi Zeng
-
Latent Fusion Jailbreak: Blending Harmful and Harmless Representations to Elicit Unsafe LLM Outputs
Wenpeng Xing, Mohan Li, Chunqiang Hu, Haitao XuNingyu Zhang, Bo Lin, Meng Han
-
Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign Prompts
Zhaomin Wu, Mingzhe Du, See-Kiong Ng, Bingsheng He
-
MedMKEB: A Comprehensive Knowledge Editing Benchmark for Medical Multimodal Large Language Models
Dexuan Xu, Jieyi Wang, Zhongyan Chai, Yongzhi Cao, Hanpin Wang, Huamin Zhang, Yu Huang
-
Automatic Image Colorization with Convolutional Neural Networks and Generative Adversarial Networks
Ruiyu Li, Changyuan Qiu, Hangrui Cao, Qihan Ren, Yuqing Qiu
-
Renmiao Chen, Shiyao Cui, Xuancheng Huang, Chengwei Pan, Victor Shea-Jay Huang, QingLin Zhang, Xuan Ouyang, Zhexin Zhang, Hongning Wang, Minlie Huang
-
Zane Xu, Jason Sun
-
Building Effective Safety Guardrails in AI Education Tools
Hannah-Beth Clark, Laura Benton, Emma Searle, Margaux Dowland, Matthew Gregory, Will Gayne, John Roberts
-
Qi Guo, Xiaojun Jia, Shanmin Pang, Simeng Qin, Lin Wang, Ju Jia, Yang Liu, Qing Guo
-
Farah Wahida, M.A.P. Chamikara, Yashothara Shanmugarasa, Mohan Baruwal Chhetri, Thilina Ranbaduge, Ibrahim Khalil
-
Physical Adversarial Camouflage through Gradient Calibration and Regularization
Jiawei Liang, Siyuan Liang, Jianjie Huang, Chenxi Si, Ming Zhang, Xiaochun Cao
-
Keep It Real: Challenges in Attacking Compression-Based Adversarial Purification
Samuel Räber, Till Aczel, Andreas Plesner, Roger Wattenhofer
-
FS-IQA: Certified Feature Smoothing for Robust Image Quality Assessment
Ekaterina Shumitskaya, Dmitriy Vatolin, Anastasia Antsiferova
-
Don't Reach for the Stars: Rethinking Topology for Resilient Federated Learning
Mirko Konstantin, Anirban Mukhopadhyay
-
NT-ML: Backdoor Defense via Non-target Label Training and Mutual Learning
Wenjie Huo, Katinka Wolter
-
Let's Measure Information Step-by-Step: LLM-Based Evaluation Beyond Vibes
Zachary Robertson, Sanmi Koyejo
-
Thorsten Peinemann, Paula Arnold, Sebastian Berndt, Thomas Eisenbarth, Esfandiar Mohammadi
-
Anti-Jamming Sensing with Distributed Reconfigurable Intelligent Metasurface Antennas
Zhaowei Wang, Yunsong Huang, Weicheng Liu, Hui-Ming Wang
-
Necessity of Block Designs for Optimal Locally Private Distribution Estimation
Abigail Gentle
-
Safety of Embodied Navigation: A Survey
Zixia Wang, Jia Hu, Ronghui Mu
-
Guardians and Offenders: A Survey on Harmful Content Generation and Safety Mitigation
Chi Zhang, Changjia Zhu, Junjie Xiong, Xiaoran Xu, Lingyao Li, Yao Liu, Zhuo Lu
-
Sasa Maric, Rasil Baidar, Robert Abbas, Sam Reisenfeld
-
A Neurosymbolic Framework for Interpretable Cognitive Attack Detection in Augmented Reality
Rongqian Chen, Allison Andreyev, Yanming Xiu, Mahdi Imani, Bin Li, Maria Gorlatova, Gang Tan, Tian Lan
-
RL-MoE: An Image-Based Privacy Preserving Approach In Intelligent Transportation System
Abdolazim Rezaei, Mehdi Sookhak, Mahboobeh Haghparast
-
Robust Market Making: To Quote, or not To Quote
Ziyi Wang, Carmine Ventre, Maria Polukarov
-
Adversarial Attacks and Defenses on Graph-aware Large Language Models (LLMs)
Iyiola E. Olatunji, Franziska Boenisch, Jing Xu, Adam Dziedzic
-
IFDECORATOR: Wrapping Instruction Following Reinforcement Learning with Verifiable Rewards
Xu Guo, Tianyi Liang, Tong Jian, Xiaogui Yang, Ling-I Wu, Chenhui Li, Zhihui Lu, Qipeng Guo, Kai Chen
-
ANPrompt: Anti-noise Prompt Tuning for Vision-Language Models
Yansheng Gao, Yufei Zheng, Jinghan Qu, Zixi Zhu, Yukuan Zhang, Shengsheng Wang
-
Boosting Adversarial Transferability via Residual Perturbation Attack
Jinjia Peng, Zeze Tao, Huibing Wang, Meng Wang, Yang Wang
-
AuthPrint: Fingerprinting Generative Models Against Malicious Model Providers
Kai Yao, Marc Juarez
-
Communication-Learning Co-Design for Differentially Private Over-the-Air Federated Distillation
Zihao Hu, Jia Yan, Ying-Jun Angela Zhang
-
SEAgent: Self-Evolving Computer Use Agent with Autonomous Learning from Experience
Zeyi Sun, Ziyu Liu, Yuhang Zang, Yuhang Cao, Xiaoyi Dong, Tong Wu, Dahua Lin, Jiaqi Wang
-
Jiayi Wen, Tianxin Chen, Zhirun Zheng, Cheng Huang
-
An Audit and Analysis of LLM-Assisted Health Misinformation Jailbreaks Against LLMs
Ayana Hussain, Patrick Zhao, Nicholas Vincent
-
Assessing Representation Stability for Transformer Models
Bryan E. Tuck, Rakesh M. Verma
-
Guangli Li, Canbiao Wu, Zhen Liang
-
Per-element Secure Aggregation against Data Reconstruction Attacks in Federated Learning
Takumi Suimon, Yuki Koizumi, Junji Takemasa, Toru Hasegawa
-
Rui Pu, Chaozhuo Li, Rui Ha, Litian Zhang, Lirong Qiu, Xi Zhang
-
T2UE: Generating Unlearnable Examples from Text Descriptions
Xingjun Ma, Hanxun Huang, Tianwei Song, Ye Sun, Yifeng Gao, Yu-Gang Jiang
-
Rui Zou, Mengqi Wei, Yutao Zhu, Jirong Wen, Xin Zhao, Jing Chen
-
VCNet: Recreating High-Level Visual Cortex Principles for Robust Artificial Vision
Brennen A. Hill, Zhang Xinyu, Timothy Putra Prasetio
-
Untraceable DeepFakes via Traceable Fingerprint Elimination
Jiewei Lai, Lan Zhang, Chen Tang, Pengcheng Sun, Xinming Wang, Yunhao Wang
-
VFLAIR-LLM: A Comprehensive Framework and Benchmark for Split Learning of LLMs
Zixuan Gu, Qiufeng Fan, Long Sun, Yang Liu, Xiaojun Ye
-
Attack the Messages, Not the Agents: A Multi-round Adaptive Stealthy Tampering Framework for LLM-MAS
Bingyu Yan, Ziyi Zhou, Xiaoming Zhang, Chaozhuo Li, Ruilin Zeng, Yirui Qi, Tianbo Wang, Litian Zhang
-
Xinwei Liu, Xiaojun Jia, Yuan Xun, Simeng Qin, Xiaochun Cao
-
Wang Yu-Hang, Shiwei Li, Jianxiang Liao, Li Bohan, Jian Liu, Wenfei Yin
-
When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs
Bodam Kim, Hiskias Dingeto, Taeyoun Kwon, Dasol Choi, DongGeon Lee, Haon Park, JaeHoon Lee, Jongho Shin
-
VideoGuard: Protecting Video Content from Unauthorized Editing
Junjie Cao, Kaizhou Li, Xinchun Yu, Hongxiang Li, Xiaoping Zhang
-
Sri Durga Sai Sowmya Kadali, Evangelos E. Papalexakis
-
Haoran Wang, Xiongxiao Xu, Baixiang Huang, Kai Shu
-
Token-Level Precise Attack on RAG: Searching for the Best Alternatives to Mislead Generation
Zizhong Li, Haopeng Zhang, Jiawei Zhang
-
Adversarial Attention Perturbations for Large Object Detection Transformers
Zachary Yahn, Selim Furkan Tekin, Fatih Ilhan, Sihao Hu, Tiansheng Huang, Yichang Xu, Margaret Loper, Ling Liu
-
Seeing It Before It Happens: In-Generation NSFW Detection for Diffusion-Based Text-to-Image Models
Fan Yang, Yihao Huang, Jiayi Zhu, Ling Shi, Geguang Pu, Jin Song Dong, Kailong Wang
-
evTransFER: A Transfer Learning Framework for Event-based Facial Expression Recognition
Rodrigo Verschae, Ignacio Bugueno-Cordova
-
BadBlocks: Low-Cost and Stealthy Backdoor Attacks Tailored for Text-to-Image Diffusion Models
Yu Pan, Jiahao Chen, Lin Wang, Bingrong Dai, Yi Du
-
Heterogeneity-Oblivious Robust Federated Learning
Weiyao Zhang, Jinyang Li, Qi Song, Miao Wang, Chungang Lin, Haitong Luo, Xuying Meng, Yujun Zhang
-
What If, But Privately: Private Counterfactual Retrieval
Shreya Meel, Mohamed Nomeir, Pasan Dissanayake, Sanghamitra Dutta, Sennur Ulukus
-
BDFirewall: Towards Effective and Expeditiously Black-Box Backdoor Defense in MLaaS
Ye Li, Chengcheng Zhu, Yanchao Zhao, Jiale Zhang
-
Probing and Enhancing the Robustness of GNN-based QEC Decoders with Reinforcement Learning
Ryota Ikeda
-
Peizhuo Liu
-
Selection-Based Vulnerabilities: Clean-Label Backdoor Attacks in Active Learning
Yuhan Zhi, Longtian Wang, Xiaofei Xie, Chao Shen, Qiang Hu, Xiaohong Guan
-
Anti-Tamper Protection for Unauthorized Individual Image Generation
Zelin Li, Ruohan Zong, Yifan Liu, Ruichen Yao, Yaokun Liu, Yang Zhang, Dong Wang
-
EvaDrive: Evolutionary Adversarial Policy Optimization for End-to-End Autonomous Driving
Siwen Jiao, Kangan Qian, Hao Ye, Yang Zhong, Ziang Luo, Sicong Jiang, Zilin Huang, Yangyi Fang, Jinyu Miao, Zheng Fu, Yunlong Wang, Kun Jiang, Diange Yang, Rui Fan, Baoyun Peng
-
Defend LLMs Through Self-Consciousness
Boshi Huang, Fabio Nonato de Paula
-
Secure mmWave Beamforming with Proactive-ISAC Defense Against Beam-Stealing Attacks
Seyed Bagher Hashemi Natanzi, Hossein Mohammadi, Bo Tang, Vuk Marojevic
-
Highlight & Summarize: RAG without the jailbreaks
Giovanni Cherubin, Andrew Paverd
-
Liang Lin, Miao Yu, Kaiwen Luo, Yibo Zhang, Lilan Peng, Dexian Wang, Xuehai Tang, Yuanhe Zhang, Xikang Yang, Zhenhong Zhou, Kun Wang, Yang Liu
-
Defending Against Knowledge Poisoning Attacks During Retrieval-Augmented Generation
Kennedy Edemacu, Vinay M. Shashidhar, Micheal Tuape, Dan Abudu, Beakcheol Jang, Jong Wook Kim
-
Online Robust Multi-Agent Reinforcement Learning under Model Uncertainties
Zain Ulabedeen Farhat, Debamita Ghosh, George K. Atia, Yue Wang
-
Ko-Wei Chuang, Hen-Hsen Huang, Tsai-Yen Li
-
MetAdv: A Unified and Interactive Adversarial Testing Platform for Autonomous Driving
Aishan Liu, Jiakai Wang, Tianyuan Zhang, Hainan Li, Jiangfan Liu, Siyuan Liang, Yilong Ren, Xianglong Liu, Dacheng Tao
-
Yifan Liao, Yuxin Cao, Yedi Zhang, Wentao He, Yan Xiao, Xianglong Du, Zhiyong Huang, Jin Song Dong
-
Is Uncertainty Quantification a Viable Alternative to Learned Deferral?
Anna M. Wundram, Christian F. Baumgartner
-
Mitigating Attention Hacking in Preference-Based Reward Modeling via Interaction Distillation
Jianxiang Zang, Meiling Ning, Shihan Dou, Jiazheng Zhang, Tao Gui, Qi Zhang, Xuanjing Huang
-
What Makes "Good" Distractors for Object Hallucination Evaluation in Large Vision-Language Models?
Ming-Kun Xie, Jia-Hao Xiao, Gang Niu, Lei Feng, Zhiqiang Kou, Min-Ling Zhang, Masashi Sugiyama
-
IMU: Influence-guided Machine Unlearning
Xindi Fan, Jing Wu, Mingyi Zhou, Pengwei Liang, Dinh Phung
-
Mingyu Wang, Haojie Liu, Zhiyong Li, Wei Jiang
-
Joint Lossless Compression and Steganography for Medical Images via Large Language Models
Pengcheng Zheng, Xiaorong Pu, Kecheng Chen, Jiaxin Huang, Meng Yang, Bai Feng, Yazhou Ren, Jianan Jiang, Chaoning Zhang, Yang Yang, Heng Tao Shen
-
BlockA2A: Towards Secure and Verifiable Agent-to-Agent Interoperability
Zhenhua Zou, Zhuotao Liu, Lepeng Zhao, Qiuyang Zhan
-
ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models
Zihan Wang, Rui Zhang, Hongwei Li, Wenshu Fan, Wenbo Jiang, Qingchuan Zhao, Guowen Xu
-
PromptSafe: Gated Prompt Tuning for Safe Text-to-Image Generation
Zonglei Jing, Xiao Yang, Xiaoqian Li, Siyuan Liang, Aishan Liu, Mingchuan Zhang, Xianglong Liu
-
R1-ACT: Efficient Reasoning Model Safety Alignment by Activating Safety Knowledge
Yeonjun In, Wonjoong Kim, Sangwu Park, Chanyoung Park
-
Pro2Guard: Proactive Runtime Enforcement of LLM Agent Safety via Probabilistic Model Checking
Haoyu Wang, Chris M. Poskitt, Jun Sun, Jiali Wei
-
CyGATE: Game-Theoretic Cyber Attack-Defense Engine for Patch Strategy Optimization
Yuning Jiang, Nay Oo, Qiaoran Meng, Lu Lin, Dusit Niyato, Zehui Xiong, Hoon Wei Lim, Biplab Sikdar
-
Activation-Guided Local Editing for Jailbreaking Attacks
Jiecong Wang, Haoran Li, Hao Peng, Ziqian Zeng, Zihao Wang, Haohua Du, Zhengtao Yu
-
Wukong Framework for Not Safe For Work Detection in Text-to-Image systems
Mingrui Liu, Sixiao Zhang, Cheng Long
-
LeakSealer: A Semisupervised Defense for LLMs Against Prompt Injection and Leakage Attacks
Francesco Panebianco, Stefano Bonfanti, Francesco Trovò, Michele Carminati
-
Backdoor Attacks on Deep Learning Face Detection
Quentin Le Roux, Yannick Teglia, Teddy Furon, Philippe Loubet-Moundi
-
Laura Pedrouzo-Rodriguez, Pedro Delgado-DeRobles, Luis F. Gomez, Ruben Tolosana, Ruben Vera-Rodriguez, Aythami Morales, Julian Fierrez
-
Qiyao Xue, Yuchen Dou, Ryan Shi, Xiang Lorraine Li, Wei Gao
-
DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large Language Models
Shantanu Thorat, Andrew Caines
-
Privacy-Preserving Driver Drowsiness Detection with Spatial Self-Attention and Federated Learning
Tran Viet Khoa, Do Hai Son, Mohammad Abu Alsheikh, Yibeltal F Alem, Dinh Thai Hoang
-
IN2OUT: Fine-Tuning Video Inpainting Model for Video Outpainting Using Hierarchical Discriminator
Sangwoo Youn, Minji Lee, Nokap Tony Park, Yeonggyoo Jeon, Taeyoung Na
-
DBLP: Noise Bridge Consistency Distillation For Efficient And Reliable Adversarial Purification
Chihan Huang, Belal Alsinglawi, Islam Al-qudah
-
Junhao Zheng, Jiahao Sun, Chenhao Lin, Zhengyu Zhao, Chen Ma, Chong Zhang, Cong Wang, Qian Wang, Chao Shen
-
STF: Shallow-Level Temporal Feedback to Enhance Spiking Transformers
Zeqi Zheng, Zizheng Zhu, Yingchao Yu, Yanchen Huang, Changze Lv, Junfeng Tang, Zhaofei Yu, Yaochu Jin
-
Young-ho Cho, Hao Zhu, Duehee Lee, Ross Baldick
-
FedGuard: A Diverse-Byzantine-Robust Mechanism for Federated Learning with Major Malicious Clients
Haocheng Jiang, Hua Shen, Jixin Zhang, Willy Susilo, Mingwu Zhang
-
LeakyCLIP: Extracting Training Data from CLIP
Yunhao Chen, Shujie Wang, Xin Wang, Xingjun Ma
-
Random Walk Learning and the Pac-Man Attack
Xingran Chen, Parimal Parag, Rohit Bhagat, Zonghong Liu, Salim El Rouayheb
-
Privacy Enhancement for Gaze Data Using a Noise-Infused Autoencoder
Samantha Aziz, Oleg Komogortsev
-
Hyperproperty-Constrained Secure Reinforcement Learning
Ernest Bonnah, Luan Viet Nguyen, Khaza Anuarul Hoque
-
Watch the Weights: Unsupervised monitoring and control of fine-tuned LLMs
Ziqian Zhong, Aditi Raghunathan
-
On the Risk of Misleading Reports: Diagnosing Textual Biases in Multimodal Clinical AI
David Restrepo, Ira Ktena, Maria Vakalopoulou, Stergios Christodoulidis, Enzo Ferrante
-
Improved Robustness and Functional Localization in Topographic CNNs Through Weight Similarity
Nhut Truong, Uri Hasson
-
Data-driven global ocean model resolving ocean-atmosphere coupling dynamics
Jeong-Hwan Kim, Daehyun Kang, Young-Min Yang, Jae-Heung Park, Yoo-Geun Ham
-
Gaussian Splatting Feature Fields for Privacy-Preserving Visual Localization
Maxime Pietrantoni, Gabriela Csurka, Torsten Sattler
-
Foundations and Models in Modern Computer Vision: Key Building Blocks in Landmark Architectures
Radu-Andrei Bourceanu, Neil De La Fuente, Jan Grimm, Andrei Jardan, Andriy Manucharyan, Cornelius Weiss, Daniel Cremers, Roman Pflugfelder
-
FastDriveVLA: Efficient End-to-End Driving via Plug-and-Play Reconstruction-based Token Pruning
Jiajun Cao, Qizhe Zhang, Peidong Jia, Xuhui Zhao, Bo Lan, Xiaoan Zhang, Zhuo Li, Xiaobao Wei, Sixiang Chen, Liyun Li, Xianming Liu, Ming Lu, Yang Wang, Shanghang Zhang
-
Measuring Harmfulness of Computer-Using Agents
Aaron Xuxiang Tian, Ruofan Zhang, Janet Tang, Ji Wang, Tianyu Shi, Jiaxin Wen
-
Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level
Saleh Vatan Khah, Savelii Chezhegov, Shahrokh Farahmand, Samuel Horváth, Eduard Gorbunov
-
LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought Monitoring
Chloe Li, Mary Phuong, Noah Y. Siegel
-
Yunrui Yu, Hang Su, Cheng-zhong Xu, Zhizhong Su, Jun Zhu
-
RCR-AF: Enhancing Model Generalization via Rademacher Complexity Reduction Activation Function
Yunrui Yu, Kafeng Wang, Hang Su, Jun Zhu
-
LoReUn: Data Itself Implicitly Provides Cues to Improve Machine Unlearning
Xiang Li, Qianli Shen, Haonan Wang, Kenji Kawaguchi
-
Exploiting Synergistic Cognitive Biases to Bypass Safety in LLMs
Xikang Yang, Biyu Zhou, Xuehai Tang, Jizhong Han, Songlin Hu
-
Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning
Afshin Khadangi, Amir Sartipi, Igor Tchappi, Ramin Bahmani, Gilbert Fridgen
-
Metamorphic Testing of Deep Code Models: A Systematic Literature Review
Ali Asgari, Milan de Koning, Pouria Derakhshanfar, Annibale Panichella
-
Of Good Demons and Bad Angels: Guaranteeing Safe Control under Finite Precision
Samuel Teuber, Debasmita Lohar, Bernhard Beckert
-
CapRecover: A Cross-Modality Feature Inversion Attack Framework on Vision Language Models
Kedong Xiu, Saiqian Zhang
-
On the Reliability of Vision-Language Models Under Adversarial Frequency-Domain Perturbations
Jordan Vice, Naveed Akhtar, Yansong Gao, Richard Hartley, Ajmal Mian
-
Shenghao Zhu, Yifei Chen, Weihong Chen, Yuanhan Wang, Chang Liu, Shuo Jiang, Feiwei Qin, Changmiao Wang
-
DISTIL: Data-Free Inversion of Suspicious Trojan Inputs via Latent Diffusion
Hossein Mirzaei, Zeinab Taghavi, Sepehr Rezaee, Masoud Hadi, Moein Madadi, Mackenzie W. Mathis
-
LCS: An AI-based Low-Complexity Scaler for Power-Efficient Super-Resolution of Game Content
Simon Pochinda, Momen K. Tageldeen, Mark Thompson, Tony Rinaldi, Troy Giorshev, Keith Lee, Jie Zhou, Frederick Walls
-
Hate in Plain Sight: On the Risks of Moderating AI-Generated Hateful Illusions
Yiting Qu, Ziqing Yang, Yihan Ma, Michael Backes, Savvas Zannettou, Yang Zhang
-
Invisible Injections: Exploiting Vision-Language Models Through Steganographic Prompt Embedding
Chetan Pathade
-
Benchmarking Fraud Detectors on Private Graph Data
Alexander Goldberg, Giulia Fanti, Nihar Shah, Zhiwei Steven Wu
-
Low-Communication Resilient Distributed Estimation Algorithm Based on Memory Mechanism
Wei Li, Limei Hu, Feng Chen, Ye Yao
-
Song Yan, Hui Wei, Jinlong Fei, Guoliang Yang, Zhengyu Zhao, Zheng Wamg
-
Whisper Smarter, not Harder: Adversarial Attack on Partial Suppression
Zheng Jie Wong, Bingquan Shen
-
When Truthful Representations Flip Under Deceptive Instructions?
Xianxuan Long, Yao Fu, Runchao Li, Mu Sheng, Haotian Yu, Xiaotian Han, Pan Li
-
Strategic Deflection: Defending LLMs from Logit Manipulation
Yassine Rachidy, Jihad Rbaiti, Youssef Hmamouche, Faissal Sehbaoui, Amal El Fallah Seghrouchni
-
Quantum-Inspired Audio Unlearning: Towards Privacy-Preserving Voice Biometrics
Shreyansh Pathak, Sonu Shreshtha, Richa Singh, Mayank Vatsa
-
Prompt Optimization and Evaluation for LLM Automated Red Teaming
Michael Freenor, Lauren Alvarez, Milton Leal, Lily Smith, Joel Garrett, Yelyzaveta Husieva, Madeline Woodruff, Ryan Miller, Erich Kummerfeld, Rafael Medeiros, Sander Schulhoff
-
Towards Privacy-preserving Photorealistic Self-avatars in Mixed Reality
Ethan Wilson, Vincent Bindschaedler, Sophie Jörg, Sean Sheikholeslam, Kevin Butler, Eakta Jain
-
Cascading and Proxy Membership Inference Attacks
Yuntao Du, Jiacheng Li, Yuetian Chen, Kaiyuan Zhang, Zhizhen Yuan, Hanshen Xiao, Bruno Ribeiro, Ninghui Li
-
Yang Wang, Chenghao Xiao, Yizhi Li, Stuart E. Middleton, Noura Al Moubayed, Chenghua Lin
-
Yang Wang, Chenghao Xiao, Yizhi Li, Stuart E. Middleton, Noura Al Moubayed, Chenghua Lin
-
Enhancing Jailbreak Attacks on LLMs via Persona Prompts
Zheng Zhang, Peilin Zhao, Deheng Ye, Hao Wang
-
Harnessing Diffusion-Yielded Score Priors for Image Restoration
Xinqi Lin, Fanghua Yu, Jinfan Hu, Zhiyuan You, Wu Shi, Jimmy S. Ren, Jinjin Gu, Chao Dong
-
Security Challenges in AI Agent Deployment: Insights from a Large Scale Public Competition
Andy Zou, Maxwell Lin, Eliot Jones, Micha Nowak, Mateusz Dziemian, Nick Winter, Alexander Grattan, Valent Nathanael, Ayla Croft, Xander Davies, Jai Patel, Robert Kirk, Nate Burnikell, Yarin Gal, Dan Hendrycks, J. Zico Kolter, Matt Fredrikson
-
Core Safety Values for Provably Corrigible Agents
Aran Nayebi
-
Text2VLM: Adapting Text-Only Datasets to Evaluate Alignment Training in Visual Language Models
Gabriel Downer, Sean Craven, Damian Ruck, Jake Thomas
-
Rongyao Cai, Ming Jin, Qingsong Wen, Kexin Zhang
-
Security Tensors as a Cross-Modal Bridge: Extending Text-Aligned Safety to Vision in LVLM
Shen Li, Liuyi Yao, Wujia Niu, Lan Zhang, Yaliang Li
-
Memorization in Fine-Tuned Large Language Models
Danil Savine, Muni Sreenivas Pydi, Jamal Atif, Olivier Cappé
-
Verification Cost Asymmetry in Cognitive Warfare: A Complexity-Theoretic Framework
Joshua Luberisse
-
Rongyao Cai, Ming Jin, Qingsong Wen, Kexin Zhang
-
Memorization in Fine-Tuned Large Language Models
Danil Savine
-
The Blessing and Curse of Dimensionality in Safety Alignment
Rachel S.Y. Teo, Laziz U. Abdullaev, Tan M. Nguyen
-
Do Not Mimic My Voice: Speaker Identity Unlearning for Zero-Shot Text-to-Speech
Taesoo Kim, Jinju Kim, Dongchan Kim, Jong Hwan Ko, Gyeong-Moon Park
-
WBHT: A Generative Attention Architecture for Detecting Black Hole Anomalies in Backbone Networks
Kiymet Kaya, Elif Ak, Sule Gunduz Oguducu
-
VAE-GAN Based Price Manipulation in Coordinated Local Energy Markets
Biswarup Mukherjee, Li Zhou, S. Gokul Krishnan, Milad Kabirifar, Subhash Lakshminarayana, Charalambos Konstantinou
-
Trivial Trojans: How Minimal MCP Servers Enable Cross-Tool Exfiltration of Sensitive Data
Nicola Croce, Tobin South
-
Graph Structure Learning with Privacy Guarantees for Open Graph Data
Muhao Guo, Jiaqi Wu, Yang Weng, Yizheng Liao, Shengzhe Chen
-
Tarek Gasmi, Ramzi Guesmi, Mootez Aloui, Jihene Bennaceur
-
Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?
Chaymaa Abbas, Mariette Awad, Razane Tajeddine
-
Running in CIRCLE? A Simple Benchmark for LLM Code Interpreter Security
Gabriel Chua
-
Yuanhe Zhang, Fangzhou Xie, Zhenhong Zhou, Zherui Li, Hao Chen, Kun Wang, Yufei Guo
-
Transferable and Undefendable Point Cloud Attacks via Medial Axis Transform
Keke Tang, Yuze Gao, Weilong Peng, Xiaofei Wang, Meie Fang, Peican Zhu
-
Secure Best Arm Identification in the Presence of a Copycat
Asaf Cohen, Onur Günlü
-
Clustering-Oriented Generative Attribute Graph Imputation
Mulin Chen, Bocheng Wang, Jiaxin Zhong, Zongcheng Miao, Xuelong Li
-
Game-Theoretic Gradient Control for Robust Neural Network Training
Maria Zaitseva, Ivan Tomilov, Natalia Gusarova
-
MOCHA: Are Code Language Models Robust Against Multi-Turn Malicious Coding Prompts?
Muntasir Wahed, Xiaona Zhou, Kiet A. Nguyen, Tianjiao Yu, Nirav Diwan, Gang Wang, Dilek Hakkani-Tür, Ismini Lourentzou
-
ViGText: Deepfake Image Detection with Vision-Language Model Explanations and Graph Neural Networks
Ahmad ALBarqawi, Mahmoud Nazzal, Issa Khalil, Abdallah Khreishah, NhatHai Phan
-
San Kim, Jonghwi Kim, Yejin Jeon, Gary Geunbae Lee
-
LoRA-Leak: Membership Inference Attacks Against LoRA Fine-tuned Language Models
Delong Ran, Xinlei He, Tianshuo Cong, Anyu Wang, Qi Li, Xiaoyun Wang
-
Luo Cheng, Hanwei Zhang, Lijun Zhang, Holger Hermanns
-
Xiao Yang, Lingxuan Wu, Lizhong Wang, Chengyang Ying, Hang Su, Jun Zhu
-
Privacy-Preserving Synthetic Review Generation with Diverse Writing Styles Using LLMs
Tevin Atwal, Chan Nam Tieu, Yefeng Yuan, Zhan Shi, Yuhong Liu, Liang Cheng
-
Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to Misinformation
Kyubeen Han, Junseo Jang, Hongjin Kim, Geunyeong Jeong, Harksoo Kim
-
BadReasoner: Planting Tunable Overthinking Backdoors into Large Reasoning Models for Fun or Profit
Biao Yi, Zekun Fei, Jianing Geng, Tong Li, Lihai Nie, Zheli Liu, Yiming Li
-
RECALLED: An Unbounded Resource Consumption Attack on Large Vision-Language Models
Haoran Gao, Yuanhe Zhang, Zhenhong Zhou, Lei Jiang, Fanyu Meng, Yujia Xiao, Kun Wang, Yang Liu, Junlan Feng
-
Facial Demorphing from a Single Morph Using a Latent Conditional GAN
Nitish Shukla, Arun Ross
-
Yanzuo Lu, Yuxi Ren, Xin Xia, Shanchuan Lin, Xing Wang, Xuefeng Xiao, Andy J. Ma, Xiaohua Xie, Jian-Huang Lai
-
NWaaS: Nonintrusive Watermarking as a Service for X-to-Image DNN
Haonan An, Guang Hua, Yu Guo, Hangcheng Cao, Susanto Rahardja, Yuguang Fang
-
Ryusei Fujimoto, Yugo Nakamura, Yutaka Arakawa
-
Junyong Jiang, Buwei Tian, Chenxing Xu, Songze Li, Lu Dong
-
On Reconstructing Training Data From Bayesian Posteriors and Trained Models
George Wynne
-
Removing Box-Free Watermarks for Image-to-Image Models via Query-Based Reverse Engineering
Haonan An, Guang Hua, Hangcheng Cao, Zhengru Fang, Guowen Xu, Susanto Rahardja, Yuguang Fang
-
Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
Hao Li, Lijun Li, Zhenghao Lu, Xianyi Wei, Rui Li, Jing Shao, Lei Sha
-
RecPS: Privacy Risk Scoring for Recommender Systems
Jiajie He, Yuechun Gu, Keke Chen
-
The Right to be Forgotten in Pruning: Unveil Machine Unlearning on Sparse Models
Yang Xiao, Gen Li, Jie Ji, Ruimeng Ye, Xiaolong Ma, Bo Hui
-
Layer-Aware Representation Filtering: Purifying Finetuning Data to Preserve LLM Safety Alignment
Hao Li, Lijun Li, Zhenghao Lu, Xianyi Wei, Rui Li, Jing Shao, Lei Sha
-
Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content
Ran Tong, Songtao Wei, Jiaqi Liu, Lanruo Wang
-
Rainbow Noise: Stress-Testing Multimodal Harmful-Meme Detectors on LGBTQ Content
Ran Tong, Songtao Wei, Jiaqi Liu, Lanruo Wang
-
Joobin Jin, Seokjun Hong, Gyeongseon Baek, Yeeun Kim, Byeongjoon Noh
-
P3SL: Personalized Privacy-Preserving Split Learning on Heterogeneous Edge Devices
Wei Fan, JinYi Yoon, Xiaochang Li, Huajie Shao, Bo Ji
-
Investigating Training Data Detection in AI Coders
Tianlin Li, Yunxiang Wei, Zhiming Li, Aishan Liu, Qing Guo, Xianglong Liu, Dongning Sun, Yang Liu
-
On the Interaction of Compressibility and Adversarial Robustness
Melih Barsbey, Antônio H. Ribeiro, Umut Şimşekli, Tolga Birdal
-
Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks
Linbo Cao, Jinman Zhao
-
Tab-MIA: A Benchmark Dataset for Membership Inference Attacks on Tabular Data in LLMs
Eyal German, Sagiv Antebi, Daniel Samira, Asaf Shabtai, Yuval Elovici
-
An h-space Based Adversarial Attack for Protection Against Few-shot Personalization
Xide Xu, Sandesh Kamath, Muhammad Atif Butt, Bogdan Raducanu
-
Boosting Ray Search Procedure of Hard-label Attacks with Transfer-based Priors
Chen Ma, Xinjie Xu, Shuyu Cheng, Qi Xuan
-
BetterCheck: Towards Safeguarding VLMs for Automotive Perception Systems
Malsha Ashani Mahawatta Dona, Beatriz Cabrero-Daniel, Yinan Yu, Christian Berger
-
Kagan Ozturk, Louisa Conwill, Jacob Gutierrez, Kevin Bowyer, Walter J. Scheirer
-
Efficient Neural Network Verification via Order Leading Exploration of Branch-and-Bound Trees
Guanqin Zhang, Kota Fukuda, Zhenya Zhang, H.M.N. Dilum Bandara, Shiping Chen, Jianjun Zhao, Yulei Sui
-
A Privacy-Preserving Data Collection Method for Diversified Statistical Analysis
Hao Jiang, Quan Zhou, Dongdong Zhao, Shangshang Yang, Wenjian Luo, Xingyi Zhang
-
Ruoyang Rykie Guo
-
From Seed to Harvest: Augmenting Human Creativity with AI for Red-teaming Text-to-Image Models
Jessica Quaye, Charvi Rastogi, Alicia Parrish, Oana Inel, Minsuk Kahng, Lora Aroyo, Vijay Janapa Reddi
-
Bob's Confetti: Phonetic Memorization Attacks in Music and Video Generation
Jaechul Roh, Zachary Novack, Yuefeng Peng, Niloofar Mireshghallah, Taylor Berg-Kirkpatrick, Amir Houmansadr
-
Minimax Data Sanitization with Distortion Constraint and Adversarial Inference
Amirarsalan Moatazedian, Yauhen Yakimenka, Rémi A. Chou, Jörg Kliewer
-
Hulayyil Alshammari, Praveen Rao
-
Kenta Shiraishi, Yuka Muto, Atsushi Okazaki, Shunji Kotsuki
-
Lower Bounds for Public-Private Learning under Distribution Shift
Amrith Setlur, Pratiksha Thaker, Jonathan Ullman
-
CompLeak: Deep Learning Model Compression Exacerbates Privacy Leakage
Na Li, Yansong Gao, Hongsheng Hu, Boyu Kuang, Anmin Fu
-
Finding Dori: Memorization in Text-to-Image Diffusion Models Is Less Local Than Assumed
Antoni Kowalczuk, Dominik Hintersdorf, Lukas Struppek, Kristian Kersting, Adam Dziedzic, Franziska Boenisch
-
Towards Trustworthy AI: Secure Deepfake Detection using CNNs and Zero-Knowledge Proofs
H M Mohaimanul Islam, Huynh Q. N. Vo, Aditya Rane
-
Advancing Robustness in Deep Reinforcement Learning with an Ensemble Defense Approach
Adithya Mohan, Dominik Rößle, Daniel Cremers, Torsten Schön
-
Jessup Byun, Xiaofeng Lin, Joshua Ward, Guang Cheng
-
GATEBLEED: Exploiting On-Core Accelerator Power Gating for High Performance & Stealthy Attacks on AI
Joshua Kalyanapu, Farshad Dizani, Darsh Asher, Azam Ghanbari, Rosario Cammarota, Aydin Aysu, Samira Mirbagher Ajorpaz
-
LENS-DF: Deepfake Detection and Temporal Localization for Long-Form Noisy Speech
Xuechen Liu, Wanying Ge, Xin Wang, Junichi Yamagishi
-
Muhammad Zaeem Shahzad, Muhammad Abdullah Hanif, Bassem Ouni, Muhammad Shafique
-
Alaa Alhamzeh, Mays Al Rebdawi
-
The Cost of Compression: Tight Quadratic Black-Box Attacks on Sketches for $\ell_2$ Norm Estimation
Sara Ahmadian, Edith Cohen, Uri Stemmer
-
Robustifying Learning-Augmented Caching Efficiently without Compromising 1-Consistency
Peng Chen, Hailiang Zhao, Jiaji Zhang, Xueyan Tang, Yixuan Wang, Shuiguang Deng
-
Challenges of Trustworthy Federated Learning: What's Done, Current Trends and Remaining Work
Nuria Rodríguez-Barroso, Mario García-Márquez, M. Victoria Luzón, Francisco Herrera
-
PromptArmor: Simple yet Effective Prompt Injection Defenses
Tianneng Shi, Kaijie Zhu, Zhun Wang, Yuqi Jia, Will Cai, Weida Liang, Haonan Wang, Hend Alzahrani, Joshua Lu, Kenji Kawaguchi, Basel Alomair, Xuandong Zhao, William Yang Wang, Neil Gong, Wenbo Guo, Dawn Song
-
Scaling Decentralized Learning with FLock
Zehua Cheng, Rui Sun, Jiahao Sun, Yike Guo
-
Red-Team Multi-Agent Reinforcement Learning for Emergency Braking Scenario
Yinsong Chen, Kaifeng Wang, Xiaoqiang Meng, Xueyuan Li, Zirui Li, Xin Gao
-
Multi-Stage Prompt Inference Attacks on Enterprise LLM Systems
Andrii Balashov, Olena Ponomarova, Xiaohua Zhai
-
Missing value imputation with adversarial random forests -- MissARF
Pegah Golchian, Jan Kapar, David S. Watson, Marvin N. Wright
-
Beyond Easy Wins: A Text Hardness-Aware Benchmark for LLM-generated Text Detection
Navid Ayoobi, Sadat Shahriar, Arjun Mukherjee
-
Exploiting Context-dependent Duration Features for Voice Anonymization Attack Systems
Natalia Tomashenko, Emmanuel Vincent, Marc Tommasi
-
Lazaro Janier Gonzalez-Soler, Maciej Salwowski, Christoph Busch
-
Rethinking Occlusion in FER: A Semantic-Aware Perspective and Go Beyond
Huiyu Zhai, Xingxing Yang, Yalan Ye, Chenyang Li, Bin Fan, Changze Li
-
Optimizing Canaries for Privacy Auditing with Metagradient Descent
Matteo Boglioni, Terrance Liu, Andrew Ilyas, Zhiwei Steven Wu
-
Robust and Differentially Private PCA for non-Gaussian data
Minwoo Kim, Sungkyu Jung
-
Weak Links in LinkedIn: Enhancing Fake Profile Detection in the Age of LLMs
Apoorva Gulati, Rajesh Kumar, Vinti Agarwal, Aditya Sharma
-
Security study based on the Chatgptplugin system: ldentifying Security Vulnerabilities
Ruomai Ren
-
Jerry Wang, Fang Yu
-
Manipulating LLM Web Agents with Indirect Prompt Injection Attack via HTML Accessibility Tree
Sam Johnson, Viet Pham, Thai Le
-
Subliminal Learning: Language models transmit behavioral traits via hidden signals in data
Alex Cloud, Minh Le, James Chua, Jan Betley, Anna Sztyber-Betley, Jacob Hilton, Samuel Marks, Owain Evans
-
Byzantine-Robust Decentralized Coordination of LLM Agents
Yongrae Jo, Chanik Park
-
Robust Control with Gradient Uncertainty
Qian Qi
-
Towards Video Thinking Test: A Holistic Benchmark for Advanced Video Reasoning and Understanding
Yuanhan Zhang, Yunice Chew, Yuhao Dong, Aria Leo, Bo Hu, Ziwei Liu
-
Clustered Federated Learning for Generalizable FDIA Detection in Smart Grids with Heterogeneous Data
Yunfeng Li, Junhong Liu, Zhaohui Yang, Guofu Liao, Chuyun Zhang
-
ROBAD: Robust Adversary-aware Local-Global Attended Bad Actor Detection Sequential Model
Bing He, Mustaque Ahamad, Srijan Kumar
-
Distributional Unlearning: Forgetting Distributions, Not Just Samples
Youssef Allouah, Rachid Guerraoui, Sanmi Koyejo
-
Differentially Private Synthetic Graphs Preserving Triangle-Motif Cuts
Pan Peng, Hangyu Xu
-
Privacy Risks of LLM-Empowered Recommender Systems: An Inversion Attack Perspective
Yubo Wang, Min Tang, Nuo Shen, Shujie Cui, Weiqing Wang
-
Juan Manuel Contreras
-
Juntao Tan, Anran Li, Quanchao Liu, Peng Ran, Lan Zhang
-
VMask: Tunable Label Privacy Protection for Vertical Federated Learning via Layer Masking
Juntao Tan, Lan Zhang, Zhonghao Hu, Kai Yang, Peng Ran, Bo Li
-
GCC-Spam: Spam Detection via GAN, Contrastive Learning, and Character Similarity Networks
Zixin Xu, Zhijie Wang, Zhiyuan Pan
-
Analyzing Internal Activity and Robustness of SNNs Across Neuron Parameter Space
Szymon Mazurek, Jakub Caputa, Maciej Wielgosz
-
MultiRetNet: A Multimodal Vision Model and Deferral System for Staging Diabetic Retinopathy
Jeannie She, Katie Spivakovsky
-
Glitches in Decision Tree Ensemble Models
Satyankar Chandra, Ashutosh Gupta, Kaushik Mallik, Krishna Shankaranarayanan, Namrita Varshney
-
FORTA: Byzantine-Resilient FL Aggregation via DFT-Guided Krum
Usayd Shahul, J. Harshan
-
Towards Urban Planing AI Agent in the Age of Agentic AI
Yanjie Fu, Dongjie Wang
-
Towards Urban Planing AI Agent in the Age of Agentic AI
Rui Liu, Tao Zhe, Zhong-Ren Peng, Necati Catbas, Xinyue Ye, Dongjie Wang, Yanjie Fu
-
Amro Abdalla, Ismail Shaheen, Dan DeGenaro, Rupayan Mallick, Bogdan Raita, Sarah Adel Bargal
-
Large Language Models in Cybersecurity: Applications, Vulnerabilities, and Defense Techniques
Niveen O. Jaffal, Mohammed Alkhanafseh, David Mohaisen
-
Innocence in the Crossfire: Roles of Skip Connections in Jailbreaking Visual Language Models
Palash Nandi, Maithili Joshi, Tanmoy Chakraborty
-
Learning Deblurring Texture Prior from Unpaired Data with Diffusion Model
Chengxu Liu, Lu Qi, Jinshan Pan, Xueming Qian, Ming-Hsuan Yang
-
Adversarial Training Improves Generalization Under Distribution Shifts in Bioacoustics
René Heinrich, Lukas Rauch, Bernhard Sick, Christoph Scholz
-
Byzantine-resilient federated online learning for Gaussian process regression
Xu Zhang, Zhenyuan Yuan, Minghui Zhu
-
FuSeFL: Fully Secure and Scalable Cross-Silo Federated Learning
Sahar Ghoflsaz Ghinani, Elaheh Sadredini
-
An Adversarial-Driven Experimental Study on Deep Learning for RF Fingerprinting
Xinyu Cao, Bimal Adhikari, Shangqing Zhao, Jingxian Wu, Yanjun Pan
-
TopicAttack: An Indirect Prompt Injection Attack via Topic Transition
Yulin Chen, Haoran Li, Yuexin Li, Yue Liu, Yangqiu Song, Bryan Hooi
-
Eldor Abdukhamidov, Mohammed Abuhamad, Simon S. Woo, Hyoungshick Kim, Tamer Abuhmed
-
Hallucination Score: Towards Mitigating Hallucinations in Generative Image Super-Resolution
Weiming Ren, Raghav Goyal, Zhiming Hu, Tristan Ty Aumentado-Armstrong, Iqbal Mohomed, Alex Levinshtein
-
FedStrategist: A Meta-Learning Framework for Adaptive and Robust Aggregation in Federated Learning
Md Rafid Haque, Abu Raihan Mostofa Kamal, Md. Azam Hossain
-
Manipulation Attacks by Misaligned AI: Risk Analysis and Safety Case Framework
Rishane Dassanayake, Mario Demetroudi, James Walpole, Lindley Lentati, Jason R. Brown, Edward James Young
-
Youssef Tawfilis, Hossam Amer, Minar El-Aasser, Tallal Elshabrawy
-
Prompt Injection 2.0: Hybrid AI Threats
Jeremy McHugh, Kristina Šekrst, Jon Cefalu
-
Kutub Uddin, Awais Khan, Muhammad Umar Farooq, Khalid Malik
-
Automating Steering for Safe Multimodal Large Language Models
Lyucheng Wu, Mengru Wang, Ziwen Xu, Tri Cao, Nay Oo, Bryan Hooi, Shumin Deng
-
DiffClean: Diffusion-based Makeup Removal for Accurate Age Estimation
Ekta Balkrishna Gavas, Chinmay Hegde, Nasir Memon, Sudipta Banerjee
-
Taming Diffusion Transformer for Real-Time Mobile Video Generation
Yushu Wu, Yanyu Li, Anil Kag, Ivan Skorokhodov, Willi Menapace, Ke Ma, Arpit Sahni, Ju Hu, Aliaksandr Siarohin, Dhritiman Sagar, Yanzhi Wang, Sergey Tulyakov
-
Training Transformers with Enforced Lipschitz Constants
Laker Newhouse, R. Preston Hess, Franz Cesista, Andrii Zahorodnii, Jeremy Bernstein, Phillip Isola
-
Architectural Backdoors in Deep Learning: A Survey of Vulnerabilities, Detection, and Defense
Victoria Childress, Josh Collyer, Jodie Knapp
-
MAD-Spear: A Conformity-Driven Prompt Injection Attack on Multi-Agent Debate Systems
Yu Cui, Hongyang Du
-
IConMark: Robust Interpretable Concept-Based Watermark For AI Images
Vinu Sankar Sadasivan, Mehrdad Saberi, Soheil Feizi
-
Paper Summary Attack: Jailbreaking LLMs through LLM Safety Papers
Liang Lin, Zhihao Xu, Xuehai Tang, Shi Liu, Biyu Zhou, Fuqing Zhu, Jizhong Han, Songlin Hu
-
Revisiting LLM Value Probing Strategies: Are They Robust and Expressive?
Siqi Shen, Mehar Singh, Lajanugen Logeswaran, Moontae Lee, Honglak Lee, Rada Mihalcea
-
Fake or Real: The Impostor Hunt in Texts for Space Operations
Agata Kaczmarek, Dawid Płudowski, Piotr Wilczyński, Przemysław Biecek, Krzysztof Kotowski, Ramez Shendy, Jakub Nalepa, Artur Janicki, Evridiki Ntagiou
-
Spatial Frequency Modulation for Semantic Segmentation
Linwei Chen, Ying Fu, Lin Gu, Dezhi Zheng, Jifeng Dai
-
Robust Planning for Autonomous Vehicles with Diffusion-Based Failure Samplers
Juanran Wang, Marc R. Schlichting, Mykel J. Kochenderfer
-
InstructFLIP: Exploring Unified Vision-Language Model for Face Anti-spoofing
Kun-Hsiang Lin, Yu-Wen Tseng, Kang-Yang Huang, Jhih-Ciang Wu, Wen-Huang Cheng
-
Non-Adaptive Adversarial Face Generation
Sunpill Kim, Seunghun Paik, Chanwoo Hwang, Minsu Kim, Jae Hong Seo
-
Thought Purity: Defense Paradigm For Chain-of-Thought Attack
Zihao Xue, Zhen Bi, Long Ma, Zhenlin Hu, Yan Wang, Zhenfang Liu, Qing Sheng, Jie Xiao, Jungang Lou
-
LLMs Encode Harmfulness and Refusal Separately
Jiachen Zhao, Jing Huang, Zhengxuan Wu, David Bau, Weiyan Shi
-
Overview of the Sensemaking Task at the ELOQUENT 2025 Lab: LLMs as Teachers, Students and Evaluators
Pavel Šindelář, Ondřej Bojar
-
Nonlinear Concept Erasure: a Density Matching Approach
Antoine Saillenfest, Pirmin Lemberger
-
Sahid Hossain Mustakim, S M Jishanul Islam, Ummay Maria Muna, Montasir Chowdhury, Mohammed Jawwadul Islam, Sadia Ahmmed, Tashfia Sikder, Syed Tasdid Azam Dhrubo, Swakkhar Shatabda
-
FADE: Adversarial Concept Erasure in Flow Models
Zixuan Fu, Yan Ren, Finn Carter, Chenyue Wang, Ze Niu, Dacheng Yu, Emily Davis, Bo Zhang
-
Self-Adaptive and Robust Federated Spectrum Sensing without Benign Majority for Cellular Networks
Ngoc Duy Pham, Thusitha Dayaratne, Viet Vo, Shangqi Lai, Sharif Abuadbba, Hajime Suzuki, Xingliang Yuan, Carsten Rudolph
-
Bo Wen, Guoyun Gao, Zhicheng Xu, Ruibin Mao, Xiaojuan Qi, X. Sharon Hu, Xunzhao Yin, Can Li
-
A Bayesian Incentive Mechanism for Poison-Resilient Federated Learning
Daniel Commey, Rebecca A. Sarpong, Griffith S. Klogo, Winful Bagyl-Bac, Garth V. Crosby
-
Xiang Li, Yifan Lin, Yuanzhe Zhang
-
Rina Mishra, Gaurav Varshney
-
Benchmarking Deception Probes via Black-to-White Performance Boosts
Avi Parrack, Carlo Leonardo Attubato, Stefan Heimersheim
-
Safeguarding Federated Learning-based Road Condition Classification
Sheng Liu, Panos Papadimitratos
-
Minimalist Concept Erasure in Generative Models
Yang Zhang, Er Jin, Yanfei Dong, Yixuan Wu, Philip Torr, Ashkan Khakzar, Johannes Stegmaier, Kenji Kawaguchi
-
How to Protect Models against Adversarial Unlearning?
Patryk Jasiorski, Marek Klonowski, Michał Woźniak
-
Crafting Imperceptible On-Manifold Adversarial Attacks for Tabular Data
Zhipeng He, Alexander Stevens, Chun Ouyang, Johannes De Smedt, Alistair Barros, Catarina Moreira
-
The Devil behind the mask: An emergent safety vulnerability of Diffusion LLMs
Zichen Wen, Jiashu Qu, Dongrui Liu, Zhiyuan Liu, Ruixi Wu, Yicun Yang, Xiangqi Jin, Haoyun Xu, Xuyang Liu, Weijia Li, Chaochao Lu, Jing Shao, Conghui He, Linfeng Zhang
-
Multi-Trigger Poisoning Amplifies Backdoor Vulnerabilities in LLMs
Sanhanat Sivapiromrat, Caiqi Zhang, Marco Basaldella, Nigel Collier
-
What Should LLMs Forget? Quantifying Personal Data in LLMs for Right-to-Be-Forgotten Requests
Dimitri Staufer
-
Deepak Kumar Panda, Weisi Guo
-
Shao-Bo Lin, Xiaotong Liu, Yao Wang
-
Robust-Multi-Task Gradient Boosting
Seyedsaman Emami, Gonzalo Martínez-Muñoz, Daniel Hernández-Lobato
-
Yuan Yao, Jin Song, Jian Jin
-
Taemin Kim, James P. Bailey
-
Jailbreak-Tuning: Models Efficiently Learn Jailbreak Susceptibility
Brendan Murphy, Dillon Bowen, Shahrad Mohammadzadeh, Julius Broomfield, Adam Gleave, Kellin Pelrine
-
Subgraph Generation for Generalizing on Out-of-Distribution Links
Jay Revolinsky, Harry Shomer, Jiliang Tang
-
Challenges in GenAI and Authentication: a scoping review
Wesley dos Reis Bezerra, Lais Machado Bezerra, Carlos Becker Westphall
-
ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs
Daniel Commey, Benjamin Appiah, Griffith S. Klogo, Garth V. Crosby
-
Evasion Under Blockchain Sanctions
Endong Liu, Mark Ryan, Liyi Zhou, Pascal Berrang
-
Differentially Private Conformal Prediction via Quantile Binary Search
Ogonnaya M. Romanus, Roberto Molinari
-
Richard M. Charles, James H. Curry, Richard B. Charles
-
Secure Goal-Oriented Communication: Defending against Eavesdropping Timing Attacks
Federico Mason, Federico Chiariotti, Pietro Talli, Andrea Zanella
-
BlueGlass: A Framework for Composite AI Safety
Harshal Nandigramwar, Syed Qutub, Kay-Ulrich Scholl
-
Differentially Private Federated Low Rank Adaptation Beyond Fixed-Matrix
Ming Wen, Jiaqi Zhu, Yuedong Xu, Yipeng Zhou, Dingding Han
-
Learning Private Representations through Entropy-based Adversarial Training
Tassilo Klein, Moin Nabi
-
Logic layer Prompt Control Injection (LPCI): A Novel Security Vulnerability Class in Agentic Systems
Hammad Atta, Ken Huang, Manish Bhatt, Kamal Ahmed, Muhammad Aziz Ul Haq, Yasir Mehmood
-
Can You Detect the Difference?
İsmail Tarım, Aytuğ Onan
-
Mohammed Bouri, Adnane Saoud
-
Counterfactual Visual Explanation via Causally-Guided Adversarial Steering
Yiran Qiao, Disheng Liu, Yiren Lu, Yu Yin, Mengnan Du, Jing Ma
-
3DGAA: Realistic and Robust 3D Gaussian-based Adversarial Attack for Autonomous Driving
Yixun Zhang, Lizhi Wang, Junjun Zhao, Wending Zhao, Feng Zhou, Yonghao Dang, Jianqin Yin
-
Navigating the Challenges of AI-Generated Image Detection in the Wild: What Truly Matters?
Despina Konstantinidou, Dimitrios Karageorgiou, Christos Koutlis, Olga Papadopoulou, Emmanouil Schinas, Symeon Papadopoulos
-
Ben Hamscher, Edgar Heinert, Annika Mütze, Kira Maag, Matthias Rottmann
-
Kaleidoscopic Background Attack: Disrupting Pose Estimation with Multi-Fold Radial Symmetry Textures
Xinlong Ding, Hongwei Yu, Jiawei Li, Feifan Li, Yu Shang, Bochao Zou, Huimin Ma, Jiansheng Chen
-
Test-Time Canonicalization by Foundation Models for Robust Perception
Utkarsh Singhal, Ryan Feng, Stella X. Yu, Atul Prakash
-
On the Efficiency of Training Robust Decision Trees
Benedict Gerlach, Marie Anastacio, Holger H. Hoos
-
Mahmoud Bekhit, Ahmad Salah, Ahmed Salim Alrawahi, Tarek Attia, Ahmed Ali, Esraa Eldesokey, Ahmed Fathalla
-
Split Happens: Combating Advanced Threats with Split Learning and Function Secret Sharing
Tanveer Khan, Mindaugas Budzys, Antonis Michalas
-
Lixu Wang, Kaixiang Yao, Xinfeng Li, Dong Yang, Haoyang Li, Xiaofeng Wang, Wei Dong
-
HASSLE: A Self-Supervised Learning Enhanced Hijacking Attack on Vertical Federated Learning
Weiyang He, Chip-Hong Chang
-
BURN: Backdoor Unlearning via Adversarial Boundary Analysis
Yanghao Su, Jie Zhang, Yiming Li, Tianwei Zhang, Qing Guo, Weiming Zhang, Nenghai Yu, Nils Lukas, Wenbo Zhou
-
AdvGrasp: Adversarial Attacks on Robotic Grasping from a Physical Perspective
Xiaofei Wang, Mingliang Han, Tianyu Hao, Cegang Li, Yunbo Zhao, Keke Tang
-
Game Theory Meets LLM and Agentic AI: Reimagining Cybersecurity for the Age of Intelligent Threats
Quanyan Zhu
-
HKGAI-V1: Towards Regional Sovereign Large Language Model for Hong Kong
Sirui Han, Junqi Zhu, Ruiyuan Zhang, Yike Guo
-
Distributionally Robust Optimization with Adversarial Data Contamination
Shuyao Li, Ilias Diakonikolas, Jelena Diakonikolas
-
Formal Verification of Variational Quantum Circuits
Nicola Assolini, Luca Marzari, Isabella Mastroeni, Alessandra di Pierro
-
3S-Attack: Spatial, Spectral and Semantic Invisible Backdoor Attack Against DNN Models
Jianyao Yin, Luca Arnaboldi, Honglong Chen, Pascal Berrang
-
REAL-IoT: Characterizing GNN Intrusion Detection Robustness under Practical Adversarial Attack
Zhonghao Zhan, Huichi Zhou, Hamed Haddadi
-
ARMOR: Aligning Secure and Safe Large Language Models via Meticulous Reasoning
Zhengyue Zhao, Yingzi Ma, Somesh Jha, Marco Pavone, Chaowei Xiao
-
Optimal Debiased Inference on Privatized Data via Indirect Estimation and Parametric Bootstrap
Zhanyu Wang, Arin Chang, Jordan Awan
-
PRM-Free Security Alignment of Large Models via Red Teaming and Adversarial Training
Pengfei Du
-
DRAGD: A Federated Unlearning Data Reconstruction Attack Based on Gradient Differences
Bocheng Ju, Junchao Fan, Jiaqi Liu, Xiaolin Chang
-
Conformal Prediction for Privacy-Preserving Machine Learning
Alexander David Balinsky, Dominik Krzeminski, Alexander Balinsky
-
Large Language Models Encode Semantics in Low-Dimensional Linear Subspaces
Baturay Saglam, Paul Kassianik, Blaine Nelson, Sajana Weerawardhena, Yaron Singer, Amin Karbasi
-
Efficient Private Inference Based on Helper-Assisted Malicious Security Dishonest Majority MPC
Kaiwen Wang, Yuehan Dong, Junchao Fan, Xiaolin Chang
-
LaSM: Layer-wise Scaling Mechanism for Defending Pop-up Attack on GUI Agents
Zihe Yan, Zhuosheng Zhang
-
Ronghua Shi, Yiou Liu, Xinyu Ying, Yang Tan, Yuchun Feng, Lynn Ai, Bill Shi, Xuhui Wang, Zhuang Liu
-
LLM-Stackelberg Games: Conjectural Reasoning Equilibria and Their Applications to Spearphishing
Quanyan Zhu
-
Santhosh Kumar Ravindran
-
ClaritySpeech: Dementia Obfuscation in Speech
Dominika Woszczyk, Ranya Aloufi, Soteris Demetriou
-
On the Fragility of Multimodal Perception to Temporal Misalignment in Autonomous Driving
Md Hasan Shahriar, Md Mohaimin Al Barat, Harshavardhan Sundar, Naren Ramakrishnan, Y. Thomas Hou, Wenjing Lou
-
Digital Twin-Assisted Explainable AI for Robust Beam Prediction in mmWave MIMO Systems
Nasir Khan, Asmaa Abdallah, Abdulkadir Celik, Ahmed M. Eltawil, Sinem Coleri
-
Agent Safety Alignment via Reinforcement Learning
Zeyang Sha, Hanling Tian, Zhuoer Xu, Shiwen Cui, Changhua Meng, Weiqiang Wang
-
Lightweight Safety Guardrails via Synthetic Data and RL-guided Adversarial Training
Aleksei Ilin, Gor Matevosyan, Xueying Ma, Vladimir Eremin, Suhaa Dada, Muqun Li, Riyaaz Shaik, Haluk Noyan Tokgozoglu
-
Invariant-based Robust Weights Watermark for Large Language Models
Qingxiao Guo, Xinjie Zhu, Yilong Ma, Hui Jin, Yunhao Wang, Weifeng Zhang, Xiaobing Guo
-
One Token to Fool LLM-as-a-Judge
Yulai Zhao, Haolin Liu, Dian Yu, S.Y. Kung, Haitao Mi, Dong Yu
-
Junxue Yang, Xin Liao, Weixuan Tang, Jianhua Yang, Zheng Qin
-
Peter Crowley, Zachary Serlin, Tyler Paine, Makai Mann, Michael Benjamin, Calin Belta
-
Admissibility of Stein Shrinkage for Batch Normalization in the Presence of Adversarial Attacks
Sofia Ivolgina, P. Thomas Fletcher, Baba C. Vemuri
-
Entangled Threats: A Unified Kill Chain Model for Quantum Machine Learning Security
Pascal Debus, Maximilian Wendlinger, Kilian Tscharke, Daniel Herr, Cedric Brügmann, Daniel Ohl de Mello, Juris Ulmanis, Alexander Erhard, Arthur Schmidt, Fabian Petsch
-
Detecting Deepfake Talking Heads from Facial Biometric Anomalies
Justin D. Norman, Hany Farid
-
VIP: Visual Information Protection through Adversarial Attacks on Vision-Language Models
Hanene F. Z. Brachemi Meftah, Wassim Hamidouche, Sid Ahmed Fezza, Olivier Déforges
-
Exploiting Leaderboards for Large-Scale Distribution of Malicious Models
Anshuman Suri, Harsh Chaudhari, Yuefeng Peng, Ali Naseh, Amir Houmansadr, Alina Oprea
-
When and Where do Data Poisons Attack Textual Inversion?
Jeremy Styborski, Mingzhi Lyu, Jiayou Lu, Nupur Kapur, Adams Kong
-
$\texttt{Droid}$: A Resource Suite for AI-Generated Code Detection
Daniil Orel, Indraneil Paul, Iryna Gurevych, Preslav Nakov
-
Nishit V. Pandya, Andrey Labunets, Sicun Gao, Earlence Fernandes
-
OPC: One-Point-Contraction Unlearning Toward Deep Feature Forgetting
Jaeheun Jung, Bosung Jung, Suhyun Bae, Donghun Lee
-
Mitigating Watermark Stealing Attacks in Generative Models via Multi-Key Watermarking
Toluwani Aremu, Noor Hussein, Munachiso Nwadike, Samuele Poppi, Jie Zhang, Karthik Nandakumar, Neil Gong, Nils Lukas
-
Low Resource Reconstruction Attacks Through Benign Prompts
Sol Yarkoni, Roi Livni
-
Dominykas Seputis, Yongkang Li, Karsten Langerak, Serghei Mihailov
-
GuardVal: Dynamic Large Language Model Jailbreak Evaluation for Comprehensive Safety Testing
Peiyan Zhang, Haibo Jin, Liying Kang, Haohan Wang
-
Qiangqiang Wu, Yi Yu, Chenqi Kong, Ziquan Liu, Jia Wan, Haoliang Li, Alex C. Kot, Antoni B. Chan
-
Jiale Zhao, Xinyang Jiang, Junyao Gao, Yuhao Xue, Cairong Zhao
-
SCOOTER: A Human Evaluation Framework for Unrestricted Adversarial Examples
Dren Fazlija, Monty-Maximilian Zühlke, Johanna Schrader, Arkadij Orlov, Clara Stein, Iyiola E. Olatunji, Daniel Kudenko
-
TRIX- Trading Adversarial Fairness via Mixed Adversarial Training
Tejaswini Medi, Steffen Jung, Margret Keuper
-
Rainbow Artifacts from Electromagnetic Signal Injection Attacks on Image Sensors
Youqian Zhang, Xinyu Ji, Zhihao Wang, Qinhong Jiang
-
Defending Against Prompt Injection With a Few DefensiveTokens
Sizhe Chen, Yizhu Wang, Nicholas Carlini, Chawin Sitawarin, David Wagner
-
A Dynamic Stackelberg Game Framework for Agentic AI Defense Against LLM Jailbreaking
Zhengye Han, Quanyan Zhu
-
Ming Wang, Zhaoyang Duan, Dong Xue, Fangzhou Liu, Zhongheng Zhang
-
Quantum Properties Trojans (QuPTs) for Attacking Quantum Neural Networks
Sounak Bhowmik, Travis S. Humble, Himanshu Thapliyal
-
Simple Mechanistic Explanations for Out-Of-Context Reasoning
Atticus Wang, Joshua Engels, Oliver Clive-Griffin
-
Frederick Shpilevskiy, Saiyue Lyu, Krishnamurthy Dj Dvijotham, Mathias Lécuyer, Pierre-André Noël
-
EvA: Evolutionary Attacks on Graphs
Mohammad Sadegh Akhondzadeh, Soroush H. Zargarbashi, Jimin Cao, Aleksandar Bojchevski
-
Beyond the Worst Case: Extending Differential Privacy Guarantees to Realistic Adversaries
Marika Swanberg, Meenatchi Sundaram Muthu Selva Annamalai, Jamie Hayes, Borja Balle, Adam Smith
-
Towards Privacy-Preserving and Personalized Smart Homes via Tailored Small Language Models
Xinyu Huang, Leming Shen, Zijing Ma, Yuanqing Zheng
-
Exploiting Edge Features for Transferable Adversarial Attacks in Distributed Machine Learning
Giulio Rossolini, Fabio Brau, Alessandro Biondi, Battista Biggio, Giorgio Buttazzo
-
The Dark Side of LLMs: Agent-based Attacks for Complete Computer Takeover
Matteo Lupinacci, Francesco Aurelio Pironti, Francesco Blefari, Francesco Romeo, Luigi Arena, Angelo Furfaro
-
Renyang Liu, Guanlin Li, Tianwei Zhang, See-Kiong Ng
-
Concept Unlearning by Modeling Key Steps of Diffusion Process
Chaoshuo Zhang, Chenhao Lin, Zhengyu Zhao, Le Yang, Qian Wang, Chao Shen
-
An attention-aware GNN-based input defender against multi-turn jailbreak on LLMs
Zixuan Huang, Kecheng Huang, Lihao Yin, Bowei He, Huiling Zhen, Mingxuan Yuan, Zili Shao
-
Dongyu Wei, Xiaoren Xu, Yuchen Liu, H. Vincent Poor, Mingzhe Chen
-
Sarah Ball, Greg Gluch, Shafi Goldwasser, Frauke Kreuter, Omer Reingold, Guy N. Rothblum
-
Privacy-Utility-Fairness: A Balanced Approach to Vehicular-Traffic Management System
Poushali Sengupta, Sabita Maharjan, frank Eliassen, Yan Zhang
-
RAG Safety: Exploring Knowledge Poisoning Attacks to Retrieval-Augmented Generation
Tianzhe Zhao, Jiaoyan Chen, Yanchi Ru, Haiping Zhu, Nan Hu, Jun Liu, Qika Lin
-
VisualTrap: A Stealthy Backdoor Attack on GUI Agents via Visual Grounding Manipulation
Ziang Ye, Yang Zhang, Wentao Shi, Xiaoyu You, Fuli Feng, Tat-Seng Chua
-
DATABench: Evaluating Dataset Auditing in Deep Learning from an Adversarial Perspective
Shuo Shao, Yiming Li, Mengren Zheng, Zhiyang Hu, Yukun Chen, Boheng Li, Yu He, Junfeng Guo, Tianwei Zhang, Dacheng Tao, Zhan Qin
-
How Not to Detect Prompt Injections with an LLM
Sarthak Choudhary, Divyam Anshumaan, Nils Palumbo, Somesh Jha
-
TuneShield: Mitigating Toxicity in Conversational AI while Fine-tuning on Untrusted Data
Aravind Cheruvu, Shravya Kanchi, Sifat Muhammad Abdullah, Nicholas Kong, Daphne Yao, Murtuza Jadliwala, Bimal Viswanath
-
Xiaohu Li, Yunfeng Ning, Zepeng Bao, Mayi Xu, Jianhao Chen, Tieyun Qian
-
Gabriel Chua, Leanne Tan, Ziyu Ge, Roy Ka-Wei Lee
-
The Landscape of Memorization in LLMs: Mechanisms, Measurement, and Mitigation
Alexander Xiong, Xuandong Zhao, Aneesh Pappu, Dawn Song
-
ScoreAdv: Score-based Targeted Generation of Natural Adversarial Examples via Diffusion Models
Chihan Huang, Hao Tang
-
On the Inherent Privacy of Zeroth Order Projected Gradient Descent
Devansh Gupta, Meisam Razaviyayn, Vatsal Sharan
-
Asynchronous Event Error-Minimizing Noise for Safeguarding Event Dataset
Ruofei Wang, Peiqi Duan, Boxin Shi, Renjie Wan
-
Circumventing Safety Alignment in Large Language Models Through Embedding Space Toxicity Attenuation
Zhibo Zhang, Yuxi Li, Kailong Wang, Shuai Yuan, Ling Shi, Haoyu Wang
-
The Safety Gap Toolkit: Evaluating Hidden Dangers of Open-Source Models
Ann-Kathrin Dombrowski, Dillon Bowen, Adam Gleave, Chris Cundy
-
Poupak Azad, Jiahua Xu, Yebo Feng, Preston Strowbridge, Cuneyt Akcora
-
ScoreAdv: Score-based Targeted Generation of Natural Adversarial Examples via Diffusion Models
Chihan Huang, Hao Tang
-
Trojan Horse Prompting: Jailbreaking Conversational Multimodal Models by Forging Assistant Message
Wei Duan, Li Qian
-
Losing Control: Data Poisoning Attack on Guided Diffusion via ControlNet
Raz Lapid, Almog Dubin
-
Sanyam Vyas, Alberto Caron, Chris Hicks, Pete Burnap, Vasilios Mavroudis
-
BackFed: An Efficient & Standardized Benchmark Suite for Backdoor Attacks in Federated Learning
Thinh Dao, Dung Thuy Nguyen, Khoa D Doan, Kok-Seng Wong
-
ICAS: Detecting Training Data from Autoregressive Image Generative Models
Hongyao Yu, Yixiang Qiu, Yiheng Yang, Hao Fang, Tianqu Zhuang, Jiaxin Hong, Bin Chen, Hao Wu, Shu-Tao Xia
-
The Hidden Threat in Plain Text: Attacking RAG Data Loaders
Alberto Castagnaro, Umberto Salviati, Mauro Conti, Luca Pajola, Simeone Pizzi
-
Response Attack: Exploiting Contextual Priming to Jailbreak Large Language Models
Ziqi Miao, Lijun Li, Yuan Xiong, Zhenhua Liu, Pengyu Zhu, Jing Shao
-
Robustifying 3D Perception through Least-Squares Multi-Agent Graphs Object Tracking
Maria Damanaki, Ioulia Kapsali, Nikos Piperigkos, Alexandros Gkillas, Aris S. Lalos
-
Yong Zhang, Feng Liang, Guanghu Yuan, Min Yang, Chengming Li, Xiping Hu
-
Cascade: Token-Sharded Private LLM Inference
Rahul Thomas, Louai Zahran, Erica Choi, Akilesh Potti, Micah Goldblum, Arka Pal
-
CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation
Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang
-
Subhabrata Majumdar, Brian Pendleton, Abhishek Gupta
-
Controlling What You Share: Assessing Language Model Adherence to Privacy Preferences
Guillem Ramírez, Alexandra Birch, Ivan Titov
-
Edward Raff, Karen Kukla, Michel Benaroch, Joseph Comprix
-
Bit-Flip Fault Attack: Crushing Graph Neural Networks via Gradual Bit Search
Sanaz Kazemi Abharian, Sai Manoj Pudukotai Dinakarrao
-
CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation
Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang
-
Towards integration of Privacy Enhancing Technologies in Explainable Artificial Intelligence
Sonal Allana, Rozita Dara, Xiaodong Lin, Pulei Xiong
-
Hijacking JARVIS: Benchmarking Mobile GUI Agents against Unprivileged Third Parties
Guohong Liu, Jialei Ye, Jiacheng Liu, Yuanchun Li, Wei Liu, Pengzhi Gao, Jian Luan, Yunxin Liu
-
Attention Slipping: A Mechanistic Understanding of Jailbreak Attacks and Defenses in LLMs
Xiaomeng Hu, Pin-Yu Chen, Tsung-Yi Ho
-
Model Inversion Attacks on Llama 3: Extracting PII from Large Language Models
Sathesh P.Sivashanmugam
-
DP-Fusion: Token-Level Differentially Private Inference for Large Language Models
Rushil Thareja, Preslav Nakov, Praneeth Vepakomma, Nils Lukas
-
Tail-aware Adversarial Attacks: A Distributional Approach to Efficient LLM Jailbreaking
Tim Beyer, Yan Scholten, Stephan Günnemann, Leo Schwinn
-
Mass-Scale Analysis of In-the-Wild Conversations Reveals Complexity Bounds on LLM Jailbreaking
Aldan Creo, Raul Castro Fernandez, Manuel Cebrian
-
Enhancing Robustness of LLM-Driven Multi-Agent Systems through Randomized Smoothing
Jinwei Hu, Yi Dong, Zhengtao Ding, Xiaowei Huang
-
Evaluating Adversarial Protections for Diffusion Personalization: A Comprehensive Study
Kai Ye, Tianyi Chen, Zhen Wang
-
Hierarchical Testing with Rabbit Optimization for Industrial Cyber-Physical Systems
Jinwei Hu, Zezhi Tang, Xin Jin, Benyuan Zhang, Yi Dong, Xiaowei Huang
-
Addressing The Devastating Effects Of Single-Task Data Poisoning In Exemplar-Free Continual Learning
Stanisław Pawlak, Bartłomiej Twardowski, Tomasz Trzciński, Joost van de Weijer
-
Ziming Hong, Runnan Chen, Zengmao Wang, Bo Han, Bo Du, Tongliang Liu
-
On Jailbreaking Quantized Language Models Through Fault Injection Attacks
Noureldin Zahran, Ahmad Tahmasivand, Ihsen Alouani, Khaled Khasawneh, Mohammed E. Fouda
-
De-Fake: Style based Anomaly Deepfake Detection
Sudev Kumar Padhi, Harshit Kumar, Umesh Kashyap, Sk. Subidh Ali
-
Evaluating the Evaluators: Trust in Adversarial Robustness Tests
Antonio Emanuele Cinà, Maura Pintor, Luca Demetrio, Ambra Demontis, Battista Biggio, Fabio Roli
-
Beyond Weaponization: NLP Security for Medium and Lower-Resourced Languages in Their Own Right
Heather Lent
-
Rectifying Adversarial Sample with Low Entropy Prior for Test-Time Defense
Lina Ma, Xiaowei Fu, Fuxiang Huang, Xinbo Gao, Lei Zhang
-
SecureT2I: No More Unauthorized Manipulation on AI Generated Images from Prompts
Xiaodong Wu, Xiangman Li, Qi Li, Jianbing Ni, Rongxing Lu
-
Action Robust Reinforcement Learning via Optimal Adversary Aware Policy Optimization
Buqing Nie, Yangqing Fu, Jingtian Ji, Yue Gao
-
Blackbox Dataset Inference for LLM
Ruikai Zhou, Kang Yang, Xun Chen, Wendy Hui Wang, Guanhong Tao, Jun Xu
-
When There Is No Decoder: Removing Watermarks from Stable Diffusion Models in a No-box Setting
Xiaodong Wu, Tianyi Tang, Xiangman Li, Jianbing Ni, Yong Yu
-
De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks
Wei Fan, Kejiang Chen, Chang Liu, Weiming Zhang, Nenghai Yu
-
Meta SecAlign: A Secure Foundation LLM Against Prompt Injection Attacks
Sizhe Chen, Arman Zharmagambetov, David Wagner, Chuan Guo
-
Is Reasoning All You Need? Probing Bias in the Age of Reasoning Language Models
Riccardo Cantini, Nicola Gabriele, Alessio Orsino, Domenico Talia
-
LLM Hypnosis: Exploiting User Feedback for Unauthorized Knowledge Injection to All Users
Almog Hilel, Idan Shenfeld, Leshem Choshen, Jacob Andreas
-
Visual Contextual Attack: Jailbreaking MLLMs with Image-Driven Context Injection
Ziqi Miao, Yi Ding, Lijun Li, Jing Shao
-
Fluid Democracy in Federated Data Aggregation
Aditya Vema Reddy Kesari, Krishna Reddy Kesari
-
PII Jailbreaking in LLMs via Activation Steering Reveals Personal Information Leakage
Krishna Kanth Nakka, Xue Jiang, Xuebing Zhou
-
On the Mathematical Impossibility of Safe Universal Approximators
Jasper Yao
-
Adversarial Manipulation of Reasoning Models using Internal Representations
Kureha Yamaguchi, Benjamin Etheridge, Andy Arditi
-
Adopting a human developmental visual diet yields robust, shape-based AI vision
Zejin Lu, Sushrut Thorat, Radoslaw M Cichy, Tim C Kietzmann
-
Rethinking Data Protection in the (Generative) Artificial Intelligence Era
Yiming Li, Shuo Shao, Yu He, Junfeng Guo, Tianwei Zhang, Zhan Qin, Pin-Yu Chen, Michael Backes, Philip Torr, Dacheng Tao, Kui Ren
-
CyberRAG: An Agentic RAG cyber attack classification and reporting tool
Francesco Blefari, Cristian Cosentino, Francesco Aurelio Pironti, Angelo Furfaro, Fabrizio Marozzo
-
ICLShield: Exploring and Mitigating In-Context Learning Backdoor Attacks
Zhiyao Ren, Siyuan Liang, Aishan Liu, Dacheng Tao
-
Survivability of Backdoor Attacks on Unconstrained Face Recognition Systems
Quentin Le Roux, Yannick Teglia, Teddy Furon, Philippe Loubet-Moundi, Eric Bourbao
-
GPT, But Backwards: Exactly Inverting Language Model Outputs
Adrians Skapars, Edoardo Manino, Youcheng Sun, Lucas C. Cordeiro
-
Tuning without Peeking: Provable Privacy and Generalization Bounds for LLM Post-Training
Ismail Labiad, Mathurin Videau, Matthieu Kowalski, Marc Schoenauer, Alessandro Leite, Julia Kempe, Olivier Teytaud
-
Are Vision Transformer Representations Semantically Meaningful? A Case Study in Medical Imaging
Montasir Shams, Chashi Mahiul Islam, Shaeke Salman, Phat Tran, Xiuwen Liu
-
3D Gaussian Splatting Driven Multi-View Robust Physical Adversarial Camouflage Generation
Tianrui Lou, Xiaojun Jia, Siyuan Liang, Jiawei Liang, Ming Zhang, Yanjun Xiao, Xiaochun Cao
-
Gradient Short-Circuit: Efficient Out-of-Distribution Detection via Feature Intervention
Jiawei Gu, Ziyue Qiao, Zechao Li
-
Boosting Adversarial Transferability Against Defenses via Multi-Scale Transformation
Zihong Guo, Chen Wan, Yayin Zheng, Hailing Kuang, Xiaohai Lu
-
SafePTR: Token-Level Jailbreak Defense in Multimodal LLMs via Prune-then-Restore Mechanism
Beitao Chen, Xinyu Lyu, Lianli Gao, Jingkuan Song, Heng Tao Shen
-
Graph Representation-based Model Poisoning on Federated LLMs in CyberEdge Networks
Hanlin Cai, Haofan Dong, Houtianfu Wang, Kai Li, Ozgur B. Akan
-
Towards Better Attribute Inference Vulnerability Measures
Paul Francis, David Wagner
-
Subversion via Focal Points: Investigating Collusion in LLM Monitoring
Olli Järviniemi
-
Following the Clues: Experiments on Person Re-ID using Cross-Modal Intelligence
Robert Aufschläger, Youssef Shoeb, Azarm Nowzad, Michael Heigl, Fabian Bally, Martin Schramm
-
PNAct: Crafting Backdoor Attacks in Safe Reinforcement Learning
Weiran Guo, Guanjun Liu, Ziyuan Zhou, Ling Wang
-
BadViM: Backdoor Attack against Vision Mamba
Yinghao Wu, Liyan Zhang
-
CAVALRY-V: A Large-Scale Generator Framework for Adversarial Attacks on Video MLLMs
Jiaming Zhang, Rui Hu, Qing Guo, Wei Yang Bryan Lim
-
Reasoning as an Adaptive Defense for Safety
Taeyoun Kim, Fahim Tajwar, Aditi Raghunathan, Aviral Kumar
-
Cage-Based Deformation for Transferable and Undefendable Point Cloud Attack
Keke Tang, Ziyong Du, Weilong Peng, Xiaofei Wang, Peican Zhu, Ligang Liu, Zhihong Tian
-
Find a Scapegoat: Poisoning Membership Inference Attack and Defense to Federated Learning
Wenjin Mo, Zhiyuan Li, Minghong Fang, Mingwei Fang
-
Annika M Schoene, Cansu Canca
-
Xingke Yang, Liang Li, Zhiyi Wan, Sicong Li, Xiaoqi Qi, Jiang Liu, Tomoaki Ohtsuki, Xin Fu, Miao Pan
-
Yimin Dou, Xinming Wu, Nathan L Bangs, Harpreet Singh Sethi, Jintao Li, Hang Gao, Zhixiang Guo
-
Evaluating Multi-Agent Defences Against Jailbreaking Attacks on Large Language Models
Maria Carolina Cornelia Wit, Jun Pang
-
Xiao Li, Yiming Zhu, Yifan Huang, Wei Zhang, Yingzhe He, Jie Shi, Xiaolin Hu
-
SoK: Semantic Privacy in Large Language Models
Baihe Ma, Yanna Jiang, Xu Wang, Guangshen Yu, Qin Wang, Caijun Sun, Chen Li, Xuelei Qi, Ying He, Wei Ni, Ren Ping Liu
-
AutoEvoEval: An Automated Framework for Evolving Close-Ended LLM Evaluation Data
JiaRu Wu, Mingwei Liu
-
STACK: Adversarial Attacks on LLM Safeguard Pipelines
Ian R. McKenzie, Oskar J. Hollinsworth, Tom Tseng, Xander Davies, Stephen Casper, Aaron D. Tucker, Robert Kirk, Adam Gleave
-
SQUASH: A SWAP-Based Quantum Attack to Sabotage Hybrid Quantum Neural Networks
Rahul Kumar, Wenqi Wei, Ying Mao, Junaid Farooq, Ying Wang, Juntao Chen
-
Robustness of Misinformation Classification Systems to Adversarial Examples Through BeamAttack
Arnisa Fazla, Lucas Krauter, David Guzman Piedrahita, Andrianos Michail
-
AdFair-CLIP: Adversarial Fair Contrastive Language-Image Pre-training for Chest X-rays
Chenlang Yi, Zizhan Xiong, Qi Qi, Xiyuan Wei, Girish Bathla, Ching-Long Lin, Bobak Jack Mortazavi, Tianbao Yang
-
Gaozheng Pei, Ke Ma, Dongpeng Zhang, Chengzhi Sun, Qianqian Xu, Qingming Huang
-
A Scalable Approach for Safe and Robust Learning via Lipschitz-Constrained Networks
Zain ul Abdeen, Vassilis Kekatos, Ming Jin
-
Tim Roith, Leon Bungert, Philipp Wacker
-
Jiahui Wu, Fucai Luo, Tiecheng Sun, Haiyan Wang, Weizhe Zhang
-
Poisoning Attacks to Local Differential Privacy for Ranking Estimation
Pei Zhan, Peng Tang, Yangzhuo Li, Puwen Wei, Shanqing Guo
-
Impact of Fine-Tuning Methods on Memorization in Large Language Models
Jie Hou, Chuxiong Wu, Lannan Luo, Qiang Zeng
-
Peilin He, James Joshi
-
Concept-based Adversarial Attack: a Probabilistic Perspective
Andi Zhang, Xuan Ding, Steven McDonagh, Samuel Kaski
-
From Prompt Injections to Protocol Exploits: Threats in LLM-Powered AI Agents Workflows
Mohamed Amine Ferrag, Norbert Tihanyi, Djallel Hamouda, Leandros Maglaras, Merouane Debbah
-
Securing AI Systems: A Guide to Known Attacks and Impacts
Naoto Kiribuchi, Kengo Zenitani, Takayuki Semitsu
-
TuCo: Measuring the Contribution of Fine-Tuning to Individual Responses of LLMs
Felipe Nuti, Tim Franzmeyer, João Henriques
-
Trident: Detecting Face Forgeries with Adversarial Triplet Learning
Mustafa Hakan Kara, Aysegul Dundar, Uğur Güdükbay
-
Forget-MI: Machine Unlearning for Forgetting Multimodal Information in Healthcare Settings
Shahad Hardan, Darya Taratynova, Abdelmajid Essofi, Karthik Nandakumar, Mohammad Yaqub
-
A Practical and Secure Byzantine Robust Aggregator
De Zhang Lee, Aashish Kolluri, Prateek Saxena, Ee-Chien Chang
-
A Representation Engineering Perspective on the Effectiveness of Multi-Turn Jailbreaks
Blake Bullwinkel, Mark Russinovich, Ahmed Salem, Santiago Zanella-Beguelin, Daniel Jones, Giorgio Severi, Eugenia Kim, Keegan Hines, Amanda Minnich, Yonatan Zunger, Ram Shankar Siva Kumar
-
Mirror Descent Policy Optimisation for Robust Constrained Markov Decision Processes
David Bossens, Atsushi Nitanda
-
A Practical and Secure Byzantine Robust Aggregator
De Zhang Lee, Aashish Kolluri, Prateek Saxena, Ee-Chien Chang
-
Anmin Fu, Fanyu Meng, Huaibing Peng, Hua Ma, Zhi Zhang, Yifeng Zheng, Willy Susilo, Yansong Gao
-
Smaller = Weaker? Benchmarking Robustness of Quantized LLMs in Code Generation
Sen Fang, Weiyuan Ding, Antonio Mastropaolo, Bowen Xu
-
Oguzhan Baser, Ahmet Ege Tanriverdi, Sriram Vishwanath, Sandeep P. Chinchali
-
Oguzhan Baser, Ahmet Ege Tanriverdi, Kaan Kale, Sandeep P. Chinchali, Sriram Vishwanath
-
Concept Pinpoint Eraser for Text-to-image Diffusion Models via Residual Attention Gate
Byung Hyun Lee, Sungjin Lim, Seunggyu Lee, Dong Un Kang, Se Young Chun
-
Shreyas Dixit, Ashhar Aziz, Shashwat Bajpai, Vasu Sharma, Aman Chadha, Vinija Jain, Amitava Das
-
Atharv Mittal, Agam Pandey, Amritanshu Tiwari, Sukrit Jindal, Swadesh Swain
-
Zain ul Abdeen, Ming Jin
-
Yueyang Li, Shengyu Gong, Weiming Zeng, Nizhuan Wang, Wai Ting Siok
-
On the Feasibility of Poisoning Text-to-Image AI Models via Adversarial Mislabeling
Stanley Wu, Ronik Bhaskar, Anna Yoo Jeong Ha, Shawn Shan, Haitao Zheng, Ben Y. Zhao
-
Mohamed Ahmed, Mohamed Abdelmouty, Mingyu Kim, Gunvanth Kandula, Alex Park, James C. Davis
-
ARMOR: Robust Reinforcement Learning-based Control for UAVs under Physical Attacks
Pritam Dash, Ethan Chan, Nathan P. Lawrence, Karthik Pattabiraman
-
Adversarial Threats in Quantum Machine Learning: A Survey of Attacks and Defenses
Archisman Ghosh, Satwik Kundu, Swaroop Ghosh
-
VERA: Variational Inference Framework for Jailbreaking Large Language Models
Anamika Lochab, Lu Yan, Patrick Pynadath, Xiangyu Zhang, Ruqi Zhang
-
Are Fast Methods Stable in Adversarially Robust Transfer Learning?
Joshua C. Zhao, Saurabh Bagchi
-
Boyuan Chen, Minghao Shao, Abdul Basit, Siddharth Garg, Muhammad Shafique
-
Deepak Kumar Panda, Weisi Guo
-
Deepak Kumar Panda, Adolfo Perrusquia, Weisi Guo
-
TITAN: Query-Token based Domain Adaptive Adversarial Learning
Tajamul Ashraf, Janibul Bashir
-
Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features
Shangbo Wu, Yu-an Tan, Ruinan Ma, Wencong Ma, Dehua Zhu, Yuanzhang Li
-
GANet-Seg: Adversarial Learning for Brain Tumor Segmentation with Hybrid Generative Models
Qifei Cui, Xinyu Lu
-
Generative Adversarial Evasion and Out-of-Distribution Detection for UAV Cyber-Attacks
Deepak Kumar Panda, Weisi Guo
-
CodeGuard: A Generalized and Stealthy Backdoor Watermarking for Generative Code Models
Haoxuan Li, Jiale Zhang, Xiaobing Sun, Xiapu Luo
-
SPA: Towards More Stealth and Persistent Backdoor Attacks in Federated Learning
Chengcheng Zhu, Ye Li, Bosen Rao, Jiale Zhang, Yunlong Mao, Sheng Zhong
-
PrivacyGo: Privacy-Preserving Ad Measurement with Multidimensional Intersection
Jian Du, Haohao Qian, Shikun Zhang, Wen-jie Lu, Donghang Lu, Yongchuan Niu, Bo Jiang, Yongjun Zhao, Qiang Yan
-
AgentStealth: Reinforcing Large Language Model for Anonymizing User-generated Text
Chenyang Shao, Tianxing Li, Chenhao Pu, Fengli Xu, Yong Li
-
A Survey on Model Extraction Attacks and Defenses for Large Language Models
Kaixiang Zhao, Lincan Li, Kaize Ding, Neil Zhenqiang Gong, Yue Zhao, Yushun Dong
-
Balancing Privacy and Utility in Correlated Data: A Study of Bayesian Differential Privacy
Martin Lange, Patricia Guerra-Balboa, Javier Parra-Arnau, Thorsten Strufe
-
Boosting Generative Adversarial Transferability with Self-supervised Vision Transformer Features
Shangbo Wu, Yu-an Tan, Ruinan Ma, Wencong Ma, Dehua Zhu, Yuanzhang Li
-
Mohammad Mahdi Maheri, Denys Herasymuk, Hamed Haddadi
-
Manyi Li, Renshuai Tao, Yufan Liu, Chuangchuang Tan, Haotong Qin, Bing Li, Yunchao Wei, Yao Zhao
-
Vulnerability Disclosure through Adaptive Black-Box Adversarial Attacks on NIDS
Sabrine Ennaji, Elhadj Benkhelifa, Luigi V. Mancini
-
InvZW: Invariant Feature Learning via Noise-Adversarial Training for Robust Image Zero-Watermarking
Abdullah All Tanvir, Xin Zhong
-
AdvMIM: Adversarial Masked Image Modeling for Semi-Supervised Medical Image Segmentation
Lei Zhu, Jun Zhou, Rick Siow Mong Goh, Yong Liu
-
Hear No Evil: Detecting Gradient Leakage by Malicious Servers in Federated Learning
Fei Wang, Baochun Li
-
Autonomous Cyber Resilience via a Co-Evolutionary Arms Race within a Fortified Digital Twin Sandbox
Malikussaid, Sutiyo
-
Berkay Kemal Balioglu, Alireza Khodaie, Mehmet Emre Gursoy
-
Poster: Enhancing GNN Robustness for Network Intrusion Detection via Agent-based Analysis
Zhonghao Zhan, Huichi Zhou, Hamed Haddadi
-
Leaner Training, Lower Leakage: Revisiting Memorization in LLM Fine-Tuning with LoRA
Fei Wang, Baochun Li
-
Universal and Efficient Detection of Adversarial Data through Nonuniform Impact on Network Layers
Furkan Mumcu, Yasin Yilmaz
-
On the Necessity of Output Distribution Reweighting for Effective Class Unlearning
Yian Wang, Ali Ebrahimpour-Boroojeny, Hari Sundaram
-
SABRE-FL: Selective and Accurate Backdoor Rejection for Federated Prompt Learning
Momin Ahmad Khan, Yasra Chandio, Fatima Muhammad Anwar
-
VSF-Med:A Vulnerability Scoring Framework for Medical Vision-Language Models
Binesh Sadanandan, Vahid Behzadan
-
RedCoder: Automated Multi-Turn Red Teaming for Code LLMs
Wenjie Jacky Mo, Qin Liu, Xiaofei Wen, Dongwon Jung, Hadi Askari, Wenxuan Zhou, Zhe Zhao, Muhao Chen
-
On Convolutions, Intrinsic Dimension, and Diffusion Models
Kin Kwan Leung, Rasa Hosseinzadeh, Gabriel Loaiza-Ganem
-
Automated Detection of Pre-training Text in Black-box LLMs
Ruihan Hu, Yu-Ming Shang, Jiankun Peng, Wei Luo, Yazhe Wang, Xi Zhang
-
Recalling The Forgotten Class Memberships: Unlearned Models Can Be Noisy Labelers to Leak Privacy
Zhihao Sui, Liang Hu, Jian Cao, Dora D. Liu, Usman Naseem, Zhongyuan Lai, Qi Zhang
-
Jinwen He, Yiyang Lu, Zijin Lin, Kai Chen, Yue Zhao
-
MSR-Align: Policy-Grounded Multimodal Alignment for Safety-Aware Reasoning in Vision-Language Models
Yinan Xia, Yilei Jiang, Yingshui Tan, Xiaoyong Zhu, Xiangyu Yue, Bo Zheng
-
Self-Paced Collaborative and Adversarial Network for Unsupervised Domain Adaptation
Weichen Zhang, Dong Xu, Wanli Ouyang, Wen Li
-
Assessing Risk of Stealing Proprietary Models for Medical Imaging Tasks
Ankita Raj, Harsh Swaika, Deepankar Varma, Chetan Arora
-
Identifying Physically Realizable Triggers for Backdoored Face Recognition Networks
Ankita Raj, Ambar Pal, Chetan Arora
-
Yunsung Chung, Yunbei Zhang, Nassir Marrouche, Jihun Hamm
-
Adversarial Attacks on Deep Learning-Based False Data Injection Detection in Differential Relays
Ahmad Mohammad Saber, Aditi Maheshwari, Amr Youssef, Deepa Kundur
-
Network Structures as an Attack Surface: Topology-Based Privacy Leakage in Federated Learning
Murtaza Rangwala, Richard O. Sinnott, Rajkumar Buyya
-
KnowML: Improving Generalization of ML-NIDS with Attack Knowledge Graphs
Xin Fan Guo, Albert Merono Penuela, Sergio Maffeis, Fabio Pierazzi
-
Wanli Peng, Xin Chen, Hang Fu, XinYu He, Xue Yiming, Juan Wen
-
RepuNet: A Reputation System for Mitigating Malicious Clients in DFL
Isaac Marroqui Penalva, Enrique Tomás Martínez Beltrán, Manuel Gil Pérez, Alberto Huertas Celdrán
-
Diffusion-based Task-oriented Semantic Communications with Model Inversion Attack
Xuesong Wang, Mo Li, Xingyan Shi, Zhaoqian Liu, Shenghao Yang
-
Linghui Zhu, Yiming Li, Haiqin Weng, Yan Liu, Tianwei Zhang, Shu-Tao Xia, Zhi Wang
-
Robust Behavior Cloning Via Global Lipschitz Regularization
Shili Wu, Yizhao Jin, Puhua Niu, Aniruddha Datta, Sean B. Andersson
-
Model Guidance via Robust Feature Attribution
Mihnea Ghitu, Vihari Piratla, Matthew Wicker
-
Semantic Structure-Aware Generative Attacks for Enhanced Adversarial Transferability
Jongoh Jeong, Hunmin Yang, Jaeseok Jeong, Kuk-Jin Yoon
-
Junchao Fan, Xuyang Lei, Xiaolin Chang
-
Security Assessment of DeepSeek and GPT Series Models against Jailbreak Attacks
Xiaodong Wu, Xiangman Li, Jianbing Ni
-
SpaNN: Detecting Multiple Adversarial Patches on CNNs by Spanning Saliency Thresholds
Mauricio Byrd Victorica, György Dán, Henrik Sandberg
-
Xin An, Ruijie Li, Qiao Ning, Shikai Guo, Hui Li, Qian Ma
-
Multi-Agent Online Control with Adversarial Disturbances
Anas Barakat, John Lazarsfeld, Georgios Piliouras, Antonios Varvitsiotis
-
DUMB and DUMBer: Is Adversarial Training Worth It in the Real World?
Francesco Marchiori, Marco Alecci, Luca Pajola, Mauro Conti
-
Amplifying Machine Learning Attacks Through Strategic Compositions
Yugeng Liu, Zheng Li, Hai Huang, Michael Backes, Yang Zhang
-
Enhancing Security in LLM Applications: A Performance Evaluation of Early Detection Systems
Valerii Gakh, Hayretdin Bahsi
-
Georgii Bychkov, Khaled Abud, Egor Kovalev, Alexander Gushchin, Dmitriy Vatolin, Anastasia Antsiferova
-
Towards Provable (In)Secure Model Weight Release Schemes
Xing Yang, Bingtao Wang, Yuhao Wang, Zimo Ji, Terry Jingchen Zhang, Wenyuan Jiang
-
Multi-turn Jailbreaking via Global Refinement and Active Fabrication
Hua Tang, Lingyong Yan, Yukun Zhao, Shuaiqiang Wang, Jizhou Huang, Dawei Yin
-
Huaiying Luo, Cheng Ji
-
Bugra Kilictas, Faruk Alpay
-
Quan Zhou, Gan Luo, Qiang Hu, Qingyong Zhang, Jinhua Zhang, Yinjiao Tian, Qiang Li, Zhiwei Wang
-
Jiaming Hu, Debarghya Mukherjee, Ioannis Ch. Paschalidis
-
Thomas Boudou, Batiste Le Bars, Nirupam Gupta, Aurélien Bellet
-
An Attack Method for Medical Insurance Claim Fraud Detection based on Generative Adversarial Network
Yining Pang, Chenghan Li
-
Exploiting Efficiency Vulnerabilities in Dynamic Deep Learning Systems
Ravishka Rathnasuriya, Wei Yang
-
Optimization-Free Patch Attack on Stereo Depth Estimation
Hangcheng Liu, Xu Kuang, Xingshuo Han, Xingwan Wu, Haoran Ou, Shangwei Guo, Xingyi Huang, Tao Xiang, Tianwei Zhang
-
CEGA: A Cost-Effective Approach for Graph-Based Model Extraction and Acquisition
Zebin Wang, Menghan Lin, Bolin Shen, Ken Anderson, Molei Liu, Tianxi Cai, Yushun Dong
-
Md. Kamrul Hossain, Walid Aljoby, Anis Elgabli, Ahmed M. Abdelmoniem, Khaled A. Harras
-
LastingBench: Defend Benchmarks Against Knowledge Leakage
Yixiong Fang, Tianran Sun, Yuling Shi, Min Wang, Xiaodong Gu
-
Yuping Yan, Yizhi Wang, Yuanshuai Li, Yaochu Jin
-
Kosuke Nakanishi, Akihiro Kubo, Yuji Yasui, Shin Ishii
-
MIST: Jailbreaking Black-box Large Language Models via Iterative Semantic Tuning
Muyang Zheng, Yuanzhi Yao, Changting Lin, Rui Wang, Meng Han
-
Robust Training with Data Augmentation for Medical Imaging Classification
Josué Martínez-Martínez, Olivia Brown, Mostafa Karami, Sheida Nabavi
-
Cross-Modal Obfuscation for Jailbreak Attacks on Large Vision-Language Models
Lei Jiang, Zixun Zhang, Zizhou Wang, Xiaobing Sun, Zhen Li, Liangli Zhen, Xiaohua Xu
-
Better Language Model Inversion by Compactly Representing Next-Token Distributions
Murtaza Nazir, Matthew Finlayson, John X. Morris, Xiang Ren, Swabha Swayamdipta
-
DepthVanish: Optimizing Adversarial Interval Structures for Stereo-Depth-Invisible Patches
Yun Xing, Yue Cao, Nhat Chung, Jie Zhang, Ivor Tsang, Ming-Ming Cheng, Yang Liu, Lei Ma, Qing Guo
-
Lorenzo Tausani, Paolo Muratore, Morgan B. Talbot, Giacomo Amerio, Gabriel Kreiman, Davide Zoccolan
-
Navigating the Deep: Signature Extraction on Deep Neural Networks
Haolin Liu, Adrien Siproudhis, Samuel Experton, Peter Lorenz, Christina Boura, Thomas Peyrin
-
Side Liu, Jiang Ming, Guodong Zhou, Xinyi Liu, Jianming Fu, Guojun Peng
-
CUBA: Controlled Untargeted Backdoor Attack against Deep Neural Networks
Yinghao Wu, Liyan Zhang
-
Differentiation-Based Extraction of Proprietary Data from Fine-Tuned LLMs
Zongjie Li, Daoyuan Wu, Shuai Wang, Zhendong Su
-
SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification
Zhenglin Lai, Mengyao Liao, Dong Xu, Zebin Zhao, Zhihang Yuan, Chao Fan, Jianqiang Li, Bingzhe Wu
-
A workflow for generating synthetic LiDAR datasets in simulation environments
Abhishek Phadke, Shakib Mahmud Dipto, Pratip Rana
-
SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification
Zhenglin Lai, Mengyao Liao, Bingzhe Wu, Dong Xu, Zebin Zhao, Zhihang Yuan, Chao Fan, Jianqiang Li
-
Probing the Robustness of Large Language Models Safety to Latent Perturbations
Tianle Gu, Kexin Huang, Zongqi Wang, Yixu Wang, Jie Li, Yuanqi Yao, Yang Yao, Yujiu Yang, Yan Teng, Yingchun Wang
-
Dong Nguyen Tien, Dung D. Le
-
Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors
Riccardo Ziglio, Cecilia Pasquini, Silvio Ranise
-
Latent Noise Injection for Private and Statistically Aligned Synthetic Data Generation
Rex Shen, Lu Tian
-
PL-Guard: Benchmarking Language Model Safety for Polish
Aleksandra Krasnodębska, Karolina Seweryn, Szymon Łukasik, Wojciech Kusa
-
Biao Yi, Tiansheng Huang, Sishuo Chen, Tong Li, Zheli Liu, Zhixuan Chu, Yiming Li
-
Adversarial Attacks and Detection in Visual Place Recognition for Safer Robot Navigation
Connor Malone, Owen Claxton, Iman Shames, Michael Milford
-
MBA: Multimodal Bidirectional Attack for Referring Expression Segmentation Models
Xingbai Chen, Tingchao Fu, Renyang Liu, Wei Zhou, Chao Yi
-
Black-Box Privacy Attacks on Shared Representations in Multitask Learning
John Abascal, Nicolás Berrios, Alina Oprea, Jonathan Ullman, Adam Smith, Matthew Jagielski
-
Mr. Snuffleupagus at SemEval-2025 Task 4: Unlearning Factual Knowledge from LLMs Using Adaptive RMU
Arjun Dosajh, Mihika Sanghi
-
SecureFed: A Two-Phase Framework for Detecting Malicious Clients in Federated Learning
Likhitha Annapurna Kavuri, Akshay Mhatre, Akarsh K Nair, Deepti Gupta
-
Xinting Liao, Weiming Liu, Jiaming Qian, Pengyang Zhou, Jiahe Xu, Wenjie Wang, Chaochao Chen, Xiaolin Zheng, Tat-Seng Chua
-
From Teacher to Student: Tracking Memorization Through Model Distillation
Simardeep Singh
-
PRISON: Unmasking the Criminal Potential of Large Language Models
Xinyi Wu, Geng Hong, Pei Chen, Yueyue Chen, Xudong Pan, Min Yang
-
RAS-Eval: A Comprehensive Benchmark for Security Evaluation of LLM Agents in Real-World Environments
Yuchuan Fu, Xiaohan Yuan, Dongxia Wang
-
Pixel-level Certified Explanations via Randomized Smoothing
Alaa Anani, Tobias Lorenz, Mario Fritz, Bernt Schiele
-
LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning
Gabrel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong
-
Leaky Thoughts: Large Reasoning Models Are Not Private Thinkers
Tommaso Green, Martin Gubri, Haritz Puerto, Sangdoo Yun, Seong Joon Oh
-
Approximating Language Model Training Data from Weights
John X. Morris, Junjie Oscar Yin, Woojeong Kim, Vitaly Shmatikov, Alexander M. Rush
-
Xuelin Shen, Jiayin Xu, Kangsheng Yin, Wenhan Yang
-
ImprovDML: Improved Trade-off in Private Byzantine-Resilient Distributed Machine Learning
Bing Liu, Chengcheng Zhao, Li Chai, Peng Cheng, Yaonan Wang
-
Enhancing One-run Privacy Auditing with Quantile Regression-Based Membership Inference
Terrance Liu, Matteo Boglioni, Yiwei Fu, Shengyuan Hu, Pratiksha Thaker, Zhiwei Steven Wu
-
Insights on Adversarial Attacks for Tabular Machine Learning via a Systematic Literature Review
Salijona Dyrmishi, Mohamed Djilani, Thibault Simonetto, Salah Ghamizi, Maxime Cordy
-
PDLRecover: Privacy-preserving Decentralized Model Recovery with Machine Unlearning
Xiangman Li, Xiaodong Wu, Jianbing Ni, Mohamed Mahmoud, Maazen Alsabaan
-
Yanxu Mao, Tiehan Cui, Peipei Liu, Datao You, Hongsong Zhu
-
Sysformer: Safeguarding Frozen Large Language Models with Adaptive System Prompts
Kartik Sharma, Yiqiao Jin, Vineeth Rakesh, Yingtong Dou, Menghai Pan, Mahashweta Das, Srijan Kumar
-
VLMInferSlow: Evaluating the Efficiency Robustness of Large Vision-Language Models as a Service
Xiasi Wang, Tianliang Yao, Simin Chen, Runqi Wang, Lei YE, Kuofeng Gao, Yi Huang, Yuan Yao
-
Context manipulation attacks : Web agents are susceptible to corrupted memory
Atharv Singh Patlan, Ashwin Hebbar, Pramod Viswanath, Prateek Mittal
-
PolyGuard: Massive Multi-Domain Safety Policy-Grounded Guardrail Dataset
Mintong Kang, Zhaorun Chen, Chejian Xu, Jiawei Zhang, Chengquan Guo, Minzhou Pan, Ivan Revilla, Yu Sun, Bo Li
-
LoX: Low-Rank Extrapolation Robustifies LLM Safety Against Fine-tuning
Gabriel J. Perin, Runjin Chen, Xuxi Chen, Nina S. T. Hirata, Zhangyang Wang, Junyuan Hong
-
Frequency-Calibrated Membership Inference Attacks on Medical Image Diffusion Models
Xinkai Zhao, Yuta Tokuoka, Junichiro Iwasawa, Keita Oda
-
Winter Soldier: Backdooring Language Models at Pre-Training with Indirect Data Poisoning
Wassim Bouaziz, Mathurin Videau, Nicolas Usunier, El-Mahdi El-Mhamdi
-
RL-Obfuscation: Can Language Models Learn to Evade Latent-Space Monitors?
Rohan Gupta, Erik Jenner
-
Shuyi Lin, Anshuman Suri, Alina Oprea, Cheng Tan
-
Daewon Kang, YeongHwan Shin, Doyeon Kim, Kyu-Hwan Jung, Meong Hi Son
-
ROSE: Toward Reality-Oriented Safety Evaluation of Large Language Models
Jiale Ding, Xiang Zheng, Cong Wang, Wei-Bin Lee, Xingjun Ma, Yu-Gang Jiang
-
AGENTSAFE: Benchmarking the Safety of Embodied Agents on Hazardous Instructions
Aishan Liu, Zonghao Ying, Le Wang, Junjie Mu, Jinyang Guo, Jiakai Wang, Yuqing Ma, Siyuan Liang, Mingchuan Zhang, Xianglong Liu, Dacheng Tao
-
Empirical Evidence for Alignment Faking in a Small LLM and Prompt-Based Mitigation Techniques
J. Koorndijk
-
Yaqiao Zhu, Hongkai Wen, Geyong Min, Man Luo
-
OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents
Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, Zico Kolter, Nicolas Flammarion, Maksym Andriushchenko
-
Navigating the Black Box: Leveraging LLMs for Effective Text-Level Graph Injection Attacks
Yuefei Lyu, Chaozhuo Li, Xi Zhang, Tianle Zhang
-
Weakest Link in the Chain: Security Vulnerabilities in Advanced Reasoning Models
Arjun Krishna, Aaditya Rastogi, Erick Galinkin
-
CertDW: Towards Certified Dataset Ownership Verification via Conformal Prediction
Ting Qiao, Yiming Li, Jianbin Li, Yingjia Wang, Leyi Qi, Junfeng Guo, Ruili Feng, Dacheng Tao
-
Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments
Xuan Wang, Siyuan Liang, Zhe Liu, Yi Yu, Yuliang Lu, Xiaochun Cao, Ee-Chien Chang
-
Thought Crime: Backdoors and Emergent Misalignment in Reasoning Models
James Chua, Jan Betley, Mia Taylor, Owain Evans
-
Lorenzo Bini, Stephane Marchand-Maillet
-
EBS-CFL: Efficient and Byzantine-robust Secure Clustered Federated Learning
Zhiqiang Li, Haiyong Bao, Menghong Guan, Hao Pan, Cheng Huang, Hong-Ning Dai
-
Mitigating Safety Fallback in Editing-based Backdoor Injection on LLMs
Houcheng Jiang, Zetong Zhao, Junfeng Fang, Haokai Ma, Ruipeng Wang, Yang Deng, Xiang Wang, Xiangnan He
-
Perfect Privacy for Discriminator-Based Byzantine-Resilient Federated Learning
Yue Xia, Christoph Hofmeister, Maximilian Egger, Rawad Bitar
-
Nima Naderloui, Shenao Yan, Binghui Wang, Jie Fu, Wendy Hui Wang, Weiran Liu, Yuan Hong
-
Position: Certified Robustness Does Not (Yet) Imply Model Security
Andrew C. Cullen, Paul Montague, Sarah M. Erfani, Benjamin I.P. Rubinstein
-
From Promise to Peril: Rethinking Cybersecurity Red and Blue Teaming in the Age of LLMs
Alsharif Abuadbba, Chris Hicks, Kristen Moore, Vasilios Mavroudis, Burak Hasircioglu, Diksha Goel, Piers Jennings
-
Unlearning-Enhanced Website Fingerprinting Attack: Against Backdoor Poisoning in Anonymous Networks
Yali Yuan, Kai Xu, Ruolin Ma, Yuchen Zhang
-
Theoretically Unmasking Inference Attacks Against LDP-Protected Clients in Federated Vision Models
Quan Nguyen, Minh N. Vu, Truc Nguyen, My T. Thai
-
Membership Inference Attacks as Privacy Tools: Reliability, Disparity and Ensemble
Zhiqi Wang, Chengyu Zhang, Yuetian Chen, Nathalie Baracaldo, Swanand Kadhe, Lei Yu
-
Unlearning Isn't Invisible: Detecting Unlearning Traces in LLMs from Model Outputs
Yiwei Chen, Soumyadeep Pal, Yimeng Zhang, Qing Qu, Sijia Liu
-
Online Selective Generation with Adversarial Bandit Feedback
Minjae Lee, Yoonjae Jung, Sangdon Park
-
Adversarial Disentanglement by Backpropagation with Physics-Informed Variational Autoencoder
Ioannis Christoforos Koune, Alice Cicirello
-
Constraint-Guided Prediction Refinement via Deterministic Diffusion Trajectories
Pantelis Dogoulis, Fabien Bernier, Félix Fourreau, Karim Tit, Maxime Cordy
-
Alphabet Index Mapping: Jailbreaking LLMs through Semantic Dissimilarity
Bilal Saleh Husain
-
NAP-Tuning: Neural Augmented Prompt Tuning for Adversarially Robust Vision-Language Models
Jiaming Zhang, Xin Wang, Xingjun Ma, Lingyu Qiu, Yu-Gang Jiang, Jitao Sang
-
Nina Cai, Jinguang Han
-
Transforming Chatbot Text: A Sequence-to-Sequence Approach
Natesh Reddy, Mark Stamp
-
SecurityLingua: Efficient Defense of LLM Jailbreak Attacks via Security-Aware Prompt Compression
Yucheng Li, Surin Ahn, Huiqiang Jiang, Amir H. Abdi, Yuqing Yang, Lili Qiu
-
Active Adversarial Noise Suppression for Image Forgery Localization
Rongxuan Peng, Shunquan Tan, Xianbo Mo, Alex C. Kot, Jiwu Huang
-
Intriguing Frequency Interpretation of Adversarial Robustness for CNNs and ViTs
Lu Chen, Han Yang, Hu Wang, Yuxin Cao, Shaofeng Li, Yuan Luo
-
Free Privacy Protection for Wireless Federated Learning: Enjoy It or Suffer from It?
Weicai Li, Tiejun Lv, Xiyu Zhao, Xin Yuan, Wei Ni
-
TrojanTO: Action-Level Backdoor Attacks against Trajectory Optimization Models
Yang Dai, Oubo Ma, Longfei Zhang, Xingxing Liang, Xiaochun Cao, Shouling Ji, Jiaheng Zhang, Jincai Huang, Li Shen
-
Jailbreak Strength and Model Similarity Predict Transferability
Rico Angell, Jannik Brinkmann, He He
-
Universal Jailbreak Suffixes Are Strong Attention Hijackers
Matan Ben-Tov, Mor Geva, Mahmood Sharif
-
The Safety Reminder: A Soft Prompt to Reactivate Delayed Safety Awareness in Vision-Language Models
Peiyuan Tang, Haojie Xin, Xiaodong Zhang, Jun Sun, Qin Xia, Zijiang Yang
-
Reasoning Model Unlearning: Forgetting Traces, Not Just Answers, While Preserving Reasoning Skills
Changsheng Wang, Chongyu Fan, Yihua Zhang, Jinghan Jia, Dennis Wei, Parikshit Ram, Nathalie Baracaldo, Sijia Liu
-
MEraser: An Effective Fingerprint Erasure Approach for Large Language Models
Jingxuan Zhang, Zhenhua Xu, Rui Hu, Wenpeng Xing, Xuhong Zhang, Meng Han
-
Image Corruption-Inspired Membership Inference Attacks against Large Vision-Language Models
Zongyu Wu, Minhua Lin, Zhiwei Zhang, Fali Wang, Xianren Zhang, Xiang Zhang, Suhang Wang
-
Restoring Gaussian Blurred Face Images for Deanonymization Attacks
Haoyu Zhai, Shuo Wang, Pirouz Naghavi, Qingying Hao, Gang Wang
-
Mengyuan Sun, Yu Li, Yuchen Liu, Bo Du, Yunjie Ge
-
Pushing the Limits of Safety: A Technical Report on the ATLAS Challenge 2025
Zonghao Ying, Siyang Wu, Run Hao, Peng Ying, Shixuan Sun, Pengyu Chen, Junze Chen, Hao Du, Kaiwen Shen, Shangkun Wu, Jiwei Wei, Shiyuan He, Yang Yang, Xiaohai Xu, Ke Ma, Qianqian Xu, Qingming Huang, Shi Lin, Xun Wang, Changting Lin, Meng Han, Yilei Jiang, Siqi Lai, Yaozhi Zheng, Yifei Song, Xiangyu Yue, Zonglei Jing, Tianyuan Zhang, Zhilei Zhu, Aishan Liu, Jiakai Wang, Siyuan Liang, Xianglong Kong, Hainan Li, Junjie Mu, Haotong Qin, Yue Yu, Lei Chen, Felix Juefei-Xu, Qing Guo, Xinyun Chen, Yew Soon Ong, Xianglong Liu, Dawn Song, Alan Yuille, Philip Torr, Dacheng Tao
-
Amit Daniely
-
On the existence of consistent adversarial attacks in high-dimensional linear classification
Matteo Vilucchio, Lenka Zdeborová, Bruno Loureiro
-
Information-theoretic Estimation of the Risk of Privacy Leaks
Kenneth Odoh
-
Exploiting AI for Attacks: On the Interplay between Adversarial AI and Offensive AI
Saskia Laura Schröer, Luca Pajola, Alberto Castagnaro, Giovanni Apruzzese, Mauro Conti
-
When Forgetting Triggers Backdoors: A Clean Unlearning Attack
Marco Arazzi, Antonino Nocera, Vinod P
-
Step-by-Step Reasoning Attack: Revealing 'Erased' Knowledge in Large Language Models
Yash Sinha, Manit Baser, Murari Mandal, Dinil Mon Divakaran, Mohan Kankanhalli
-
Understanding and Benchmarking the Trustworthiness in Multimodal LLMs for Video Understanding
Youze Wang, Zijun Chen, Ruoyu Chen, Shishen Gu, Wenbo Hu, Jiayang Liu, Yinpeng Dong, Hang Su, Jun Zhu, Meng Wang, Richang Hong
-
Robust LLM Unlearning with MUDMAN: Meta-Unlearning with Disruption Masking And Normalization
Filip Sondej, Yushi Yang, Mikołaj Kniejski, Marcel Windys
-
LoRA Users Beware: A Few Spurious Tokens Can Manipulate Your Finetuned Model
Pradyut Sekhsaria, Marcel Mateos Salles, Hai Huang, Randall Balestriero
-
Jinming Wen, Xinyi Wu, Shuai Zhao, Yanhao Jia, Yuwen Li
-
Differential Privacy in Machine Learning: From Symbolic AI to LLMs
Francisco Aguilera-Martínez, Fernando Berzal
-
TrustGLM: Evaluating the Robustness of GraphLLMs Against Prompt, Text, and Structure Attacks
Qihai Zhang, Xinyue Sheng, Yuanfu Sun, Qiaoyu Tan
-
Lu Zhang, Sangarapillai Lambotharan, Gan Zheng, Guisheng Liao, Basil AsSadhan, Fabio Roli
-
A Neural Rejection System Against Universal Adversarial Perturbations in Radio Signal Classification
Lu Zhang, Sangarapillai Lambotharan, Gan Zheng, Fabio Roli
-
Improving Large Language Model Safety with Contrastive Representation Learning
Samuel Simko, Mrinmaya Sachan, Bernhard Schölkopf, Zhijing Jin
-
Bias Amplification in RAG: Poisoning Knowledge Retrieval to Steer LLMs
Linlin Wang, Tianqing Zhu, Laiqiao Qin, Longxiang Gao, Wanlei Zhou
-
Pedram MohajerAnsari, Amir Salarpour, Michael Kühr, Siyu Huang, Mohammad Hamad, Sebastian Steinhorst, Habeeb Olufowobi, Mert D. Pesé
-
Byzantine Outside, Curious Inside: Reconstructing Data Through Malicious Updates
Kai Yue, Richeng Jin, Chau-Wai Wong, Huaiyu Dai
-
KCES: Training-Free Defense for Robust Graph Neural Networks via Kernel Complexity
Yaning Jia, Shenyang Deng, Chiyu Ma, Yaoqing Yang, Soroush Vosoughi
-
InfoFlood: Jailbreaking Large Language Models with Information Overload
Advait Yadav, Haibo Jin, Man Luo, Jun Zhuang, Haohan Wang
-
EgoPrivacy: What Your First-Person Camera Says About You?
Yijiang Li, Genpei Zhang, Jiacheng Cheng, Yi Li, Xiaojun Shan, Dashan Gao, Jiancheng Lyu, Yuan Li, Ning Bi, Nuno Vasconcelos
-
Lu Zhang, Sangarapillai Lambotharan, Gan Zheng, Guisheng Liao, Xuekang Liu, Fabio Roli, Carsten Maple
-
DRIFT: Dynamic Rule-Based Defense with Injection Isolation for Securing LLM Agents
Hao Li, Xiaogeng Liu, Hung-Chun Chiu, Dianqi Li, Ning Zhang, Chaowei Xiao
-
Xue Zhou, Dapeng Man, Chen Xu, Fanyi Zeng, Tao Liu, Huan Wang, Shucheng He, Chaoyang Gao, Wu Yang
-
SOFT: Selective Data Obfuscation for Protecting LLM Fine-tuning against Membership Inference Attacks
Kaiyuan Zhang, Siyuan Cheng, Hanxi Guo, Yuetian Chen, Zian Su, Shengwei An, Yuntao Du, Charles Fleming, Ashish Kundu, Xiangyu Zhang, Ninghui Li
-
SoK: Evaluating Jailbreak Guardrails for Large Language Models
Xunguang Wang, Zhenlan Ji, Wenxuan Wang, Zongjie Li, Daoyuan Wu, Shuai Wang
-
TED-LaST: Towards Robust Backdoor Defense Against Adaptive Attacks
Xiaoxing Mo, Yuxuan Cheng, Nan Sun, Leo Yu Zhang, Wei Luo, Shang Gao
-
ME: Trigger Element Combination Backdoor Attack on Copyright Infringement
Feiyu Yang, Siyuan Liang, Aishan Liu, Dacheng Tao
-
Efficiency Robustness of Dynamic Deep Learning Systems
Ravishka Rathnasuriya, Tingxi Li, Zexin Xu, Zihe Song, Mirazul Haque, Simin Chen, Wei Yang
-
Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
Chen Yueh-Han, Nitish Joshi, Yulin Chen, Maksym Andriushchenko, Rico Angell, He He
-
Can We Infer Confidential Properties of Training Data from LLMs?
Penguin Huang, Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri
-
Chun Liu, Bingqian Zhu, Tao Xu, Zheng Zheng, Zheng Li, Wei Yang, Zhigang Han, Jiayao Wang
-
Unsourced Adversarial CAPTCHA: A Bi-Phase Adversarial CAPTCHA Framework
Xia Du, Xiaoyuan Liu, Jizhe Zhou, Zheng Lin, Chi-man Pun, Zhe Chen, Wei Ni, Jun Luo
-
Lattice Climber Attack: Adversarial attacks for randomized mixtures of classifiers
Lucas Gnecco-Heredia, Benjamin Negrevergne, Yann Chevaleyre
-
Distributionally-Constrained Adversaries in Online Learning
Moïse Blanchard, Samory Kpotufe
-
A Crack in the Bark: Leveraging Public Knowledge to Remove Tree-Ring Watermarks
Junhua Lin, Marc Juarez
-
Assessing the Resilience of Automotive Intrusion Detection Systems to Adversarial Manipulation
Stefano Longari, Paolo Cerracchio, Michele Carminati, Stefano Zanero
-
ObfusBFA: A Holistic Approach to Safeguarding DNNs from Different Types of Bit-Flip Attacks
Xiaobei Yan, Han Qiu, Tianwei Zhang
-
Monitoring Decomposition Attacks in LLMs with Lightweight Sequential Monitors
Chen Yueh-Han, Nitish Joshi, Yulin Chen, Maksym Andriushchenko, Rico Angell, He He
-
Can We Infer Confidential Properties of Training Data from LLMs?
Pengrun Huang, Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri
-
Unsourced Adversarial CAPTCHA: A Bi-Phase Adversarial CAPTCHA Framework
Xia Du, Xiaoyuan Liu, Jizhe Zhou, Zheng Lin, Chi-man Pun, Cong Wu, Tao Li, Zhe Chen, Wei Ni, Jun Luo
-
Distributionally-Constrained Adversaries in Online Learning
Moïse Blanchard, Samory Kpotufe
-
Saad Alqithami
-
Chun Liu, Bingqian Zhu, Tao Xu, Zheng Zheng, Zheng Li, Wei Yang, Zhigang Han, Jiayao Wang
-
SoK: Evaluating Jailbreak Guardrails for Large Language Models
Xunguang Wang, Zhenlan Ji, Wenxuan Wang, Zongjie Li, Daoyuan Wu, Shuai Wang
-
Token Constraint Decoding Improves Robustness on Question Answering for Large Language Models
Jui-Ming Yao, Hao-Yuan Chen, Zi-Xian Tang, Bing-Jia Tan, Sheng-Wei Peng, Bing-Cheng Xie, Shun-Feng Su
-
Effective Red-Teaming of Policy-Adherent Agents
Itay Nakash, George Kour, Koren Lazar, Matan Vetzler, Guy Uziel, Ateret Anaby-Tavor
-
Reasoning Models Are More Easily Gaslighted Than You Think
Bin Zhu, Hailong Yin, Jingjing Chen, Yu-Gang Jiang
-
Inverting Black-Box Face Recognition Systems via Zero-Order Optimization in Eigenface Space
Anton Razzhigaev, Matvey Mikhalchuk, Klim Kireev, Igor Udovichenko, Andrey Kuznetsov, Aleksandr Petiushko
-
LLMail-Inject: A Dataset from a Realistic Adaptive Prompt Injection Challenge
Sahar Abdelnabi, Aideen Fay, Ahmed Salem, Egor Zverev, Kai-Chieh Liao, Chi-Huang Liu, Chun-Chih Kuo, Jannis Weigend, Danyael Manlangit, Alex Apostolov, Haris Umair, João Donato, Masayuki Kawakita, Athar Mahboob, Tran Huu Bach, Tsun-Han Chiang, Myeongjin Cho, Hajin Choi, Byeonghyeon Kim, Hyeonjin Lee, Benjamin Pannell, Conor McCauley, Mark Russinovich, Andrew Paverd, Giovanni Cherubin
-
Memorization in Language Models through the Lens of Intrinsic Dimension
Stefan Arnold
-
You Are What You Say: Exploiting Linguistic Content for VoicePrivacy Attacks
Ünal Ege Gaznepoglu, Anna Leschanowsky, Ahmad Aloradi, Prachi Singh, Daniel Tenbrinck, Emanuël A. P. Habets, Nils Peters
-
AngleRoCL: Angle-Robust Concept Learning for Physically View-Invariant T2I Adversarial Patches
Wenjun Ji, Yuxiang Fu, Luyang Ying, Deng-Ping Fan, Yuyi Wang, Ming-Ming Cheng, Ivor Tsang, Qing Guo
-
DAVSP: Safety Alignment for Large Vision-Language Models via Deep Aligned Visual Safety Prompt
Yitong Zhang, Jia Li, Liyi Cai, Ge Li
-
Canonical Latent Representations in Conditional Diffusion Models
Yitao Xu, Tong Zhang, Ehsan Pajouheshgar, Sabine Süsstrunk
-
Adversarial Surrogate Risk Bounds for Binary Classification
Natalie S. Frank
-
In-Context Bias Propagation in LLM-Based Tabular Data Generation
Pol G.Recasens, Alberto Gutierrez, Jordi Torres, Josep.Ll Berral, Anisa Halimi, Kieran Fraser
-
Devil's Hand: Data Poisoning Attacks to Locally Private Graph Learning Protocols
Longzhu He, Chaozhuo Li, Peng Tang, Litian Zhang, Sen Su
-
A look at adversarial attacks on radio waveforms from discrete latent space
Attanasia Garuso, Silvija Kokalj-Filipovic, Yagna Kaasaragadda
-
Apollo: A Posteriori Label-Only Membership Inference Attack Towards Machine Unlearning
Liou Tang, James Joshi, Ashish Kundu
-
Songze Li, Mingxuan Zhang, Oubo Ma, Kang Wei, Shouling Ji
-
Evasion Attacks Against Bayesian Predictive Models
Pablo G. Arce, Roi Naveiro, David Ríos Insua
-
LLMs Cannot Reliably Judge (Yet?): A Comprehensive Assessment on the Robustness of LLM-as-a-Judge
Songze Li, Chuokun Xu, Jiaying Wang, Xueluan Gong, Chen Chen, Jirui Zhang, Jun Wang, Kwok-Yan Lam, Shouling Ji
-
Disclosure Audits for LLM Agents
Saswat Das, Jameson Sandler, Ferdinando Fioretto
-
Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods
Yeonwoo Jang, Shariqah Hossain, Ashwin Sreevatsa, Diogo Cruz
-
GenBreak: Red Teaming Text-to-Image Generators Using Large Language Models
Zilong Wang, Xiang Zheng, Xiaosen Wang, Bo Wang, Xingjun Ma, Yu-Gang Jiang
-
Songze Li, Mingxuan Zhang, Kang Wei, Shouling Ji
-
Disclosure Audits for LLM Agents
Saswat Das, Jameson Sandler, Ferdinando Fioretto
-
Prompt Attacks Reveal Superficial Knowledge Removal in Unlearning Methods
Yeonwoo Jang, Shariqah Hossain, Ashwin Sreevatsa, Diogo Cruz
-
Revisiting Visual Understanding in Multimodal Reasoning through a Lens of Image Perturbation
Yuting Li, Lai Wei, Kaipeng Zheng, Jingyuan Huang, Guilin Li, Bo Wang, Linghe Kong, Lichao Sun, Weiran Huang
-
Adversarial Surrogate Risk Bounds for Binary Classification
Natalie S. Frank
-
Adv-BMT: Bidirectional Motion Transformer for Safety-Critical Traffic Scenario Generation
Yuxin Liu, Zhenghao Peng, Xuanhao Cui, Bolei Zhou
-
Single-Node Trigger Backdoor Attacks in Graph-Based Recommendation Systems
Runze Li, Di Jin, Xiaobao Wang, Dongxiao He, Bingdao Feng, Zhen Wang
-
Your Agent Can Defend Itself against Backdoor Attacks
Li Changjiang, Liang Jiacheng, Cao Bochuan, Chen Jinghui, Wang Ting
-
SPBA: Utilizing Speech Large Language Model for Backdoor Attacks on Speech Classification Models
Wenhan Yao, Fen Xiao, Xiarun Chen, Jia Liu, YongQiang He, Weiping Wen
-
WGLE:Backdoor-free and Multi-bit Black-box Watermarking for Graph Neural Networks
Tingzhi Li, Xuefeng Liu
-
Towards Robust Deep Reinforcement Learning against Environmental State Perturbation
Chenxu Wang, Huaping Liu
-
Yahan Li, Jifan Yao, John Bosco S. Bunyi, Adam C. Frank, Angel Hwang, Ruishan Liu
-
Danush Khanna, Krishna Kumar, Basab Ghosh, Vinija Jain, Vasu Sharma, Aman Chadha, Amitava Das
-
Towards Class-wise Fair Adversarial Training via Anti-Bias Soft Label Distillation
Shiji Zhao, Chi Chen, Ranjie Duan, Xizhe Wang, Xingxing Wei
-
Boosting Gradient Leakage Attacks: Data Reconstruction in Realistic FL Settings
Mingyuan Fan, Fuyi Wang, Cen Chen, Jianying Zhou
-
DiffGradCAM: A Universal Class Activation Map Resistant to Adversarial Training
Jacob Piland, Chris Sweet, Adam Czakja
-
Design Patterns for Securing LLM Agents against Prompt Injections
Luca Beurer-Kellner, Beat Buesser Ana-Maria Creţu, Edoardo Debenedetti, Daniel Dobos, Daniel Fabian, Marc Fischer, David Froelicher, Kathrin Grosse, Daniel Naeff, Ezinwanne Ozoani, Andrew Paverd, Florian Tramèr, Václav Volhejn
-
GPS Spoofing Attacks on AI-based Navigation Systems with Obstacle Avoidance in UAV
Ji Hyuk Jung, Mi Yeon Hong, Ji Won Yoon
-
One Patch to Rule Them All: Transforming Static Patches into Dynamic Attacks in the Physical World
Xingshuo Han, Chen Ling, Shiyi Yao, Haozhao Wang, Hangcheng Liu, Yutong Wu, Shengmin Xu, Changhai Ou, Xinyi Huang, Tianwei Zhang
-
Adversarial Text Generation with Dynamic Contextual Perturbation
Hetvi Waghela, Jaydip Sen, Sneha Rakshit, Subhasis Dasgupta
-
Mojtaba Nafez, Amirhossein Koochakian, Arad Maleki, Jafar Habibi, Mohammad Hossein Rohban
-
ASRJam: Human-Friendly AI Speech Jamming to Prevent Automated Phone Scams
Freddie Grabovski, Gilad Gressel, Yisroel Mirsky
-
Rafaël Nouailles
-
Towards Cross-Subject EMG Pattern Recognition via Dual-Branch Adversarial Feature Disentanglement
Xinyue Niu, Akira Furui
-
Does Multimodal Large Language Model Truly Unlearn? Stealthy MLLM Unlearning Attack
Xianren Zhang, Hui Liu, Delvin Ce Zhang, Xianfeng Tang, Qi He, Dongwon Lee, Suhang Wang
-
HeTa: Relation-wise Heterogeneous Graph Foundation Attack Model
Yuling Wang, Zihui Chen, Pengfei Jiao, Xiao Wang
-
RSafe: Incentivizing proactive reasoning to build robust and adaptive LLM safeguards
Jingnan Zheng, Xiangtian Ji, Yijun Lu, Chenhang Cui, Weixiang Zhao, Gelei Deng, Zhenkai Liang, An Zhang, Tat-Seng Chua
-
JavelinGuard: Low-Cost Transformer Architectures for LLM Security
Yash Datta, Sharath Rajasekar
-
MrM: Black-Box Membership Inference Attacks against Multimodal RAG Systems
Peiru Yang, Jinhua Yin, Haoran Zheng, Xueying Bai, Huili Wang, Yufei Sun, Xintian Li, Shangguang Wang, Yongfeng Huang, Tao Qi
-
When Style Breaks Safety: Defending Language Models Against Superficial Style Alignment
Yuxin Xiao, Sana Tonekaboni, Walter Gerych, Vinith Suriyakumar, Marzyeh Ghassemi
-
Jie Bao, Chuangyin Dang, Rui Luo, Hanwei Zhang, Zhixin Zhou
-
Evaluating LLMs Robustness in Less Resourced Languages with Proxy Models
Maciej Chrabąszcz, Katarzyna Lorenc, Karolina Seweryn
-
Yukai Zhou, Sibei Yang, Wenjie Wang
-
Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models
Mickel Liu, Liwei Jiang, Yancheng Liang, Simon Shaolei Du, Yejin Choi, Tim Althoff, Natasha Jaques
-
Explore the vulnerability of black-box models via diffusion models
Jiacheng Shi, Yanfu Zhang, Huajie Shao, Ashley Gao
-
Circumventing Backdoor Space via Weight Symmetry
Jie Peng, Hongwei Yang, Jing Zhao, Hengji Dong, Hui He, Weizhe Zhang, Haoyu He
-
TwinBreak: Jailbreaking LLM Security Alignments based on Twin Prompts
Torsten Krauß, Hamid Dashtbani, Alexandra Dmitrienko
-
ProARD: progressive adversarial robustness distillation: provide wide range of robust students
Seyedhamidreza Mousavi, Seyedali Mousavi, Masoud Daneshtalab
-
TokenBreak: Bypassing Text Classification Models Through Token Manipulation
Kasimir Schulz, Kenneth Yeung, Kieran Evans
-
Marco Di Gennaro, Giovanni De Lucia, Stefano Longari, Stefano Zanero, Michele Carminati
-
SoK: Data Reconstruction Attacks Against Machine Learning Models: Definition, Metrics, and Benchmark
Rui Wen, Yiyong Liu, Michael Backes, Yang Zhang
-
Muhammad Ali Najjar, Ren-Yi Huang, Dumindu Samaraweera, Prashant Shekhar
-
Huixin Zhan, Jason H. Moore
-
SHIELD: Secure Hypernetworks for Incremental Expansion Learning Defense
Patryk Krukowski, Łukasz Gorczyca, Piotr Helm, Kamil Książek, Przemysław Spurek
-
GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors
Wenlong Meng, Shuguo Fan, Chengkun Wei, Min Chen, Yuwei Li, Yuanchao Zhang, Zhikun Zhang, Wenzhi Chen
-
LLMs Caught in the Crossfire: Malware Requests and Jailbreak Challenges
Haoyang Li, Huan Gao, Zhiyuan Zhao, Zhiyu Lin, Junyu Gao, Xuelong Li
-
Elena Sofia Ruzzetti, Giancarlo A. Xompero, Davide Venditti, Fabio Massimo Zanzotto
-
Marco Di Gennaro, Giovanni De Lucia, Stefano Longari, Stefano Zanero, Michele Carminati
-
QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA
Jacob Dineen, Aswin RRV, Qin Liu, Zhikun Xu, Xiao Ye, Ming Shen, Zhaonan Li, Shijie Lu, Chitta Baral, Muhao Chen, Ben Zhou
-
InverseScope: Scalable Activation Inversion for Interpreting Large Language Models
Yifan Luo, Zhennan Zhou, Bin Dong
-
Seokil Ham, Yubin Choi, Yujin Yang, Seungju Cho, Younghun Kim, Changick Kim
-
TAI3: Testing Agent Integrity in Interpreting User Intent
Shiwei Feng, Xiangzhe Xu, Xuan Chen, Kaiyuan Zhang, Syed Yusuf Ahmed, Zian Su, Mingwei Zheng, Xiangyu Zhang
-
GradEscape: A Gradient-Based Evader Against AI-Generated Text Detectors
Wenlong Meng, Shuguo Fan, Chengkun Wei, Min Chen, Yuwei Li, Yuanchao Zhang, Zhikun Zhang, Wenzhi Chen
-
Auditing Black-Box LLM APIs with a Rank-Based Uniformity Test
Xiaoyuan Zhu, Yaowen Ye, Tianyi Qiu, Hanlin Zhu, Sijun Tan, Ajraf Mannan, Jonathan Michala, Raluca Ada Popa, Willie Neiswanger
-
AlphaSteer: Learning Refusal Steering with Principled Null-Space Constraint
Leheng Sheng, Changshuo Shen, Weixiang Zhao, Junfeng Fang, Xiaohao Liu, Zhenkai Liang, Xiang Wang, An Zhang, Tat-Seng Chua
-
HauntAttack: When Attack Follows Reasoning as a Shadow
Jingyuan Ma, Rui Li, Zheng Li, Junfeng Liu, Lei Sha, Zhifang Sui
-
Ren-Jian Wang, Ke Xue, Zeyu Qin, Ziniu Li, Sheng Tang, Hao-Tian Li, Shengcai Liu, Chao Qian
-
Break-The-Chain: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation
Jaechul Roh, Varun Gandhi, Shivani Anilkumar, Arin Garg
-
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text
Yize Cheng, Vinu Sankar Sadasivan, Mehrdad Saberi, Shoumik Saha, Soheil Feizi
-
Boosting Adversarial Transferability via Commonality-Oriented Gradient Optimization
Yanting Gao, Yepeng Liu, Junming Liu, Qi Zhang, Hongyun Zhang, Duoqian Miao, Cairong Zhao
-
D2R: dual regularization loss with collaborative adversarial generation for model robustness
Zhenyu Liu, Huizhi Liang, Rajiv Ranjan, Zhanxing Zhu, Vaclav Snasel, Varun Ojha
-
UCOD-DPL: Unsupervised Camouflaged Object Detection via Dynamic Pseudo-label Learning
Weiqi Yan, Lvhai Chen, Huaijia Kou, Shengchuan Zhang, Yan Zhang, Liujuan Cao
-
Backdoor Attack on Vision Language Models with Stealthy Semantic Manipulation
Zhiyuan Zhong, Zhen Sun, Yepang Liu, Xinlei He, Guanhong Tao
-
PASS: Private Attributes Protection with Stochastic Data Substitution
Yizhuo Chen, Chun-Fu (Richard)Chen, Hsiang Hsu, Shaohan Hu, Tarek Abdelzaher
-
Enhancing the Safety of Medical Vision-Language Models by Synthetic Demonstrations
Zhiyu Xue, Reza Abbasi-Asl, Ramtin Pedarsani
-
Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions
Kun Zhang, Le Wu, Kui Yu, Guangyi Lv, Dacao Zhang
-
Tzu-Ling Lin, Wei-Chih Chen, Teng-Fang Hsiao, Hou-I Liu, Ya-Hsin Yeh, Yu Kai Chan, Wen-Sheng Lien, Po-Yen Kuo, Philip S. Yu, Hong-Han Shuai
-
Chain-of-Code Collapse: Reasoning Failures in LLMs via Adversarial Prompting in Code Generation
Jaechul Roh, Varun Gandhi, Shivani Anilkumar, Arin Garg
-
Towards Interpretable Adversarial Examples via Sparse Adversarial Attack
Fudong Lin, Jiadong Lou, Hao Wang, Brian Jalaian, Xu Yuan
-
Tzu-Ling Lin, Wei-Chih Chen, Teng-Fang Hsiao, Hou-I Liu, Ya-Hsin Yeh, Yu Kai Chan, Wen-Sheng Lien, Po-Yen Kuo, Philip S. Yu, Hong-Han Shuai
-
Boosting Adversarial Transferability via Commonality-Oriented Gradient Optimization
Yanting Gao, Yepeng Liu, Junming Liu, Qi Zhang, Hongyun Zhang, Duoqian Miao, Cairong Zhao
-
Adversarial Paraphrasing: A Universal Attack for Humanizing AI-Generated Text
Yize Cheng, Vinu Sankar Sadasivan, Mehrdad Saberi, Shoumik Saha, Soheil Feizi
-
Rewriting the Budget: A General Framework for Black-Box Attacks Under Cost Asymmetry
Mahdi Salmani, Alireza Abdollahpoorrostam, Seyed-Mohsen Moosavi-Dezfooli
-
KNN-Defense: Defense against 3D Adversarial Point Clouds using Nearest-Neighbor Search
Nima Jamali, Matina Mahdizadeh Sani, Hanieh Naderi, Shohreh Kasaei
-
FREE: Fast and Robust Vision Language Models with Early Exits
Divya Jyoti Bajpai, Manjesh Kumar Hanawal
-
Rescaled Influence Functions: Accurate Data Attribution in High Dimension
Ittai Rubinstein, Samuel B. Hopkins
-
Can In-Context Reinforcement Learning Recover From Reward Poisoning Attacks?
Paulius Sasnauskas, Yiğit Yalın, Goran Radanović
-
Robust Learnability of Sample-Compressible Distributions under Noisy or Adversarial Perturbations
Arefe Boushehrian, Amir Najafi
-
Stochastic Training for Side-Channel Resilient AI
Anuj Dubey, Aydin Aysu
-
Zeyu Yan, Yifei Yao, Xuanbing Wen, Juli Zhang, Kai Fan
-
From Threat to Tool: Leveraging Refusal-Aware Injection Attacks for Safety Alignment
Kyubyung Chae, Hyunbin Jin, Taesup Kim
-
Neural Spectral Band Generation for Audio Coding
Woongjib Choi, Byeong Hyeon Kim, Hyungseob Lim, Inseon Jang, Hong-Goo Kang
-
To Protect the LLM Agent Against the Prompt Injection Attack with Polymorphic Prompt
Zhilong Wang, Neha Nagaraja, Lan Zhang, Hayretdin Bahsi, Pawan Patil, Peng Liu
-
When Better Features Mean Greater Risks: The Performance-Privacy Trade-Off in Contrastive Learning
Ruining Sun, Hongsheng Hu, Wei Luo, Zhaoxi Zhang, Yanjun Zhang, Haizhuan Yuan, Leo Yu Zhang
-
DeepFake Doctor: Diagnosing and Treating Audio-Video Fake Detection
Marcel Klemt, Carlotta Segna, Anna Rohrbach
-
Quantifying Adversarial Uncertainty in Evidential Deep Learning using Conflict Resolution
Charmaine Barker, Daniel Bethell, Simos Gerasimou
-
Hey, That's My Data! Label-Only Dataset Inference in Large Language Models
Chen Xiong, Zihao Wang, Rui Zhu, Tsung-Yi Ho, Pin-Yu Chen, Jingwei Xiong, Haixu Tang, Lucila Ohno-Machado
-
Yingqi Hu, Zhuo Zhang, Jingyuan Zhang, Lizhen Qu, Zenglin Xu
-
Towards Lifecycle Unlearning Commitment Management: Measuring Sample-level Unlearning Completeness
Cheng-Long Wang, Qi Li, Zihang Xiang, Yinzhi Cao, Di Wang
-
Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems
Haowei Wang, Rupeng Zhang, Junjie Wang, Mingyang Li, Yuekai Huang, Dandan Wang, Qing Wang
-
AdvSumm: Adversarial Training for Bias Mitigation in Text Summarization
Mukur Gupta, Nikhil Reddy Varimalla, Nicholas Deas, Melanie Subbiah, Kathleen McKeown
-
MCA-Bench: A Multimodal Benchmark for Evaluating CAPTCHA Robustness Against VLM-based Attacks
Zonglin Wu, Yule Xue, Xin Wei, Yiren Song
-
Sample-Specific Noise Injection For Diffusion-Based Adversarial Purification
Yuhao Sun, Jiacheng Zhang, Zesheng Ye, Chaowei Xiao, Feng Liu
-
What Really is a Member? Discrediting Membership Inference via Poisoning
Neal Mangaokar, Ashish Hooda, Zhuohang Li, Bradley A. Malin, Kassem Fawaz, Somesh Jha, Atul Prakash, Amrita Roy Chowdhury
-
Synthetic Tabular Data: Methods, Attacks and Defenses
Graham Cormode, Samuel Maddock, Enayat Ullah, Shripad Gade
-
Stealix: Model Stealing via Prompt Evolution
Zhixiong Zhuang, Hui-Po Wang, Maria-Irina Nicolae, Mario Fritz
-
FedShield-LLM: A Secure and Scalable Federated Fine-Tuned Large Language Model
Md Jueal Mia, M. Hadi Amini
-
SATversary: Adversarial Attacks on Satellite Fingerprinting
Joshua Smailes, Sebastian Köhler, Simon Birnbach, Martin Strohmeier, Ivan Martinovic
-
Benchmarking Misuse Mitigation Against Covert Adversaries
Davis Brown, Mahdi Sabbaghi, Luze Sun, Alexander Robey, George J. Pappas, Eric Wong, Hamed Hassani
-
Securing Traffic Sign Recognition Systems in Autonomous Vehicles
Thushari Hapuarachchi, Long Dang, Kaiqi Xiong
-
Membership Inference Attacks for Unseen Classes
Pratiksha Thaker, Neil Kale, Zhiwei Steven Wu, Virginia Smith
-
Long Dang, Thushari Hapuarachchi, Kaiqi Xiong, Yi Li
-
A Systematic Review of Poisoning Attacks Against Large Language Models
Neil Fendley, Edward W. Staley, Joshua Carney, William Redman, Marie Chau, Nathan Drenkow
-
Adapting Under Fire: Multi-Agent Reinforcement Learning for Adversarial Drift in Network Security
Emilia Rivas, Sabrina Saika, Ahtesham Bakht, Aritran Piplai, Nathaniel D. Bastian, Ankit Shah
-
A Certified Unlearning Approach without Access to Source Data
Umit Yigit Basaran, Sk Miraj Ahmed, Amit Roy-Chowdhury, Basak Guler
-
The Scales of Justitia: A Comprehensive Survey on Safety Evaluation of LLMs
Songyang Liu, Chaozhuo Li, Jiameng Qiu, Xi Zhang, Feiran Huang, Litian Zhang, Yiming Hei, Philip S. Yu
-
Control Tax: The Price of Keeping AI in Check
Mikhail Terekhov, Zhen Ning David Liu, Caglar Gulcehre, Samuel Albanie
-
BESA: Boosting Encoder Stealing Attack with Perturbation Recovery
Xuhao Ren, Haotian Liang, Yajie Wang, Chuan Zhang, Zehui Xiong, Liehuang Zhu
-
Hongjun Liu, Yilun Zhao, Arman Cohan, Chen Zhao
-
Influence Functions for Edge Edits in Non-Convex Graph Neural Networks
Jaeseung Heo, Kyeongheung Yun, Seokwon Yoon, MoonJeong Park, Jungseul Ok, Dongwoo Kim
-
Robustness as Architecture: Designing IQA Models to Withstand Adversarial Perturbations
Igor Meleshin, Anna Chistyakova, Anastasia Antsiferova, Dmitriy Vatolin
-
Identifying and Understanding Cross-Class Features in Adversarial Training
Zeming Wei, Yiwen Guo, Yisen Wang
-
Normative Conflicts and Shallow AI Alignment
Raphaël Millière
-
Wenxi Li
-
Lei Hsiung, Tianyu Pang, Yung-Chen Tang, Linyue Song, Tsung-Yi Ho, Pin-Yu Chen, Yaoqing Yang
-
SRD: Reinforcement-Learned Semantic Perturbation for Backdoor Defense in VLMs
Shuhan Xu, Siyuan Liang, Hongling Zheng, Yong Luo, Aishan Liu, Dacheng Tao
-
Fool the Stoplight: Realistic Adversarial Patch Attacks on Traffic Light Detectors
Svetlana Pavlitska, Jamie Robb, Nikolai Polley, Melih Yazgan, J. Marius Zöllner
-
Invisible Backdoor Triggers in Image Editing Model via Deep Watermarking
Yu-Feng Chen, Tzuhsuan Huang, Pin-Yen Chiu, Jun-Cheng Chen
-
Privacy Amplification Through Synthetic Data: Insights from Linear Regression
Clément Pierquin, Aurélien Bellet, Marc Tommasi, Matthieu Boussard
-
Membership Inference Attacks on Sequence Models
Lorenzo Rossi, Michael Aerni, Jie Zhang, Florian Tramèr
-
Coordinated Robustness Evaluation Framework for Vision-Language Models
Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Antonio Guillen, Ricardo Luna Gutierrez, Soumyendu Sarkar
-
Explainer-guided Targeted Adversarial Attacks against Binary Code Similarity Detection Models
Mingjie Chen, Tiancheng Zhu, Mingxue Zhang, Yiling He, Minghao Lin, Penghui Li, Kui Ren
-
Robustness Evaluation for Video Models with Reinforcement Learning
Ashwin Ramesh Babu, Sajad Mousavi, Vineet Gundecha, Sahand Ghorbanpour, Avisek Naug, Antonio Guillen, Ricardo Luna Gutierrez, Soumyendu Sarkar
-
Efficient Robust Conformal Prediction via Lipschitz-Bounded Networks
Thomas Massena, Léo andéol, Thibaut Boissin, Franck Mamalet, Corentin Friedrich, Mathieu Serrurier, Sébastien Gerchinovitz
-
Sentinel: SOTA model to protect against prompt injections
Dror Ivry, Oran Nahum
-
SoK: Are Watermarks in LLMs Ready for Deployment?
Kieu Dang, Phung Lai, NhatHai Phan, Yelong Shen, Ruoming Jin, Abdallah Khreishah, My Thai
-
Arnesh Batra, Anushk Kumar, Jashn Khemani, Arush Gumber, Arhan Jain, Somil Gupta
-
Al Nahian Bin Emran, Dhiman Goswami, Md Hasan Ullah Sadi, Sanchari Das
-
Breaking Anonymity at Scale: Re-identifying the Trajectories of 100K Real Users in Japan
Abhishek Kumar Mishra, Mathieu Cunche, Heber H. Arcolezi
-
Yi Ji, Runzhi Li, Baolei Mao
-
Rifat Sadik, Tanvir Rahman, Arpan Bhattacharjee, Bikash Chandra Halder, Ismail Hossain
-
Al Nahian Bin Emran, Dhiman Goswami, Md Hasan Ullah Sadi, Sanchari Das
-
RIVAL: Reinforcement Learning with Iterative and Adversarial Optimization for Machine Translation
Tianjiao Li, Mengran Yu, Chenyu Shi, Yanjun Zhao, Xiaojing Liu, Qiang Zhang, Qi Zhang, Xuanjing Huang, Jiayin Wang
-
Towards Better Generalization via Distributional Input Projection Network
Yifan Hao, Yanxin Lu, Hanning Zhang, Xinwei Shen, Tong Zhang
-
Pierre Tholoniat, Alison Caulfield, Giorgio Cavicchioli, Mark Chen, Nikos Goutzoulias, Benjamin Case, Asaf Cidon, Roxana Geambasu, Mathias Lécuyer, Martin Thomson
-
A Smooth Sea Never Made a Skilled SAILOR: Robust Imitation via Learning to Search
Arnav Kumar Jain, Vibhakar Mohta, Subin Kim, Atiksh Bhardwaj, Juntao Ren, Yunhai Feng, Sanjiban Choudhury, Gokul Swamy
-
VLMs Can Aggregate Scattered Training Patches
Zhanhui Zhou, Lingjie Chen, Chao Yang, Chaochao Lu
-
Robustness of Prompting: Enhancing Robustness of Large Language Models Against Prompting Attacks
Lin Mu, Guowei Chu, Li Ni, Lei Sang, Zhize Wu, Peiquan Jin, Yiwen Zhang
-
DiffCAP: Diffusion-based Cumulative Adversarial Purification for Vision Language Models
Jia Fu, Yongtao Wu, Yihang Chen, Kunyu Peng, Xiao Zhang, Volkan Cevher, Sepideh Pashami, Anders Holst
-
Privacy and Security Threat for OpenAI GPTs
Wei Wenying, Zhao Kaifa, Xue Lei, Fan Ming
-
RAID: A Dataset for Testing the Adversarial Robustness of AI-Generated Image Detectors
Hicham Eddoubi, Jonas Ricker, Federico Cocchi, Lorenzo Baraldi, Angelo Sotgiu, Maura Pintor, Marcella Cornia, Lorenzo Baraldi, Asja Fischer, Rita Cucchiara, Battista Biggio
-
Vulnerability-Aware Alignment: Mitigating Uneven Forgetting in Harmful Fine-Tuning
Liang Chen, Xueting Han, Li Shen, Jing Bai, Kam-Fai Wong
-
Mohd. Farhan Israk Soumik, Syed Mhamudul Hasan, Abdur R. Shahid
-
Prediction Inconsistency Helps Achieve Generalizable Detection of Adversarial Examples
Sicong Han, Chenhao Lin, Zhengyu Zhao, Xiyuan Wang, Xinlei He, Qian Li, Cong Wang, Qian Wang, Chao Shen
-
Through the Stealth Lens: Rethinking Attacks and Defenses in RAG
Sarthak Choudhary, Nils Palumbo, Ashish Hooda, Krishnamurthy Dj Dvijotham, Somesh Jha
-
Is Perturbation-Based Image Protection Disruptive to Image Editing?
Qiuyu Tang, Bonor Ayambem, Mooi Choo Chuah, Aparna Bharati
-
Gradient Inversion Attacks on Parameter-Efficient Fine-Tuning
Hasin Us Sami, Swapneel Sen, Amit K. Roy-Chowdhury, Srikanth V. Krishnamurthy, Basak Guler
-
Robust Anti-Backdoor Instruction Tuning in LVLMs
Yuan Xun, Siyuan Liang, Xiaojun Jia, Xinwei Liu, Xiaochun Cao
-
Tianyu Qi, Lei Xue, Yufeng Zhan, Xiaobo Ma
-
Ruba Nasser, Ahmed Alagha, Shakti Singh, Rabeb Mizouni, Hadi Otrok, Jamal Bentahar
-
RedDebate: Safer Responses through Multi-Agent Red Teaming Debates
Ali Asad, Stephen Obadinma, Radin Shayanfar, Xiaodan Zhu
-
Shaina Raza, Ranjan Sapkota, Manoj Karkee, Christos Emmanouilidis
-
macOSWorld: A Multilingual Interactive Benchmark for GUI Agents
Pei Yang, Hai Ci, Mike Zheng Shou
-
VPI-Bench: Visual Prompt Injection Attacks for Computer-Use Agents
Tri Cao, Bennett Lim, Yue Liu, Yuan Sui, Yuexin Li, Shumin Deng, Lin Lu, Nay Oo, Shuicheng Yan, Bryan Hooi
-
Shaking to Reveal: Perturbation-Based Detection of LLM Hallucinations
Jinyuan Luo, Zhen Fang, Yixuan Li, Seongheon Park, Ling Chen
-
MISLEADER: Defending against Model Extraction with Ensembles of Distilled Models
Xueqi Cheng, Minxing Zheng, Shixiang Zhu, Yushun Dong
-
ATAG: AI-Agent Application Threat Assessment with Attack Graphs
Parth Atulbhai Gandhi, Akansha Shukla, David Tayouri, Beni Ifland, Yuval Elovici, Rami Puzis, Asaf Shabtai
-
How Explanations Leak the Decision Logic: Stealing Graph Neural Networks via Explanation Alignment
Bin Ma, Yuyuan Feng, Minhua Lin, Enyan Dai
-
Should LLM Safety Be More Than Refusing Harmful Instructions?
Utsav Maskey, Mark Dras, Usman Naseem
-
BitBypass: A New Direction in Jailbreaking Aligned Large Language Models with Bitstream Camouflage
Kalyan Nakka, Nitesh Saxena
-
Synthetic Iris Image Databases and Identity Leakage: Risks and Mitigation Strategies
Ada Sawilska, Mateusz Trokielewicz
-
Explicitly Modeling Subcortical Vision with a Neuro-Inspired Front-End Improves CNN Robustness
Lucas Piper, Arlindo L. Oliveira, Tiago Marques
-
On the Robustness of Tabular Foundation Models: Test-Time Attacks and In-Context Defenses
Mohamed Djilani, Thibault Simonetto, Karim Tit, Florian Tambon, Paul Récamier, Salah Ghamizi, Maxime Cordy, Mike Papadakis
-
Agnostic Learning under Targeted Poisoning: Optimal Rates and the Role of Randomness
Bogdan Chornomaz, Yonatan Koren, Shay Moran, Tom Waknine
-
On the Benefits of Accelerated Optimization in Robust and Private Estimation
Laurentiu Andrei Marchis, Po-Ling Loh
-
Tarallo: Evading Behavioral Malware Detectors in the Problem Space
Gabriele Digregorio, Salvatore Maccarrone, Mario D'Onghia, Luigi Gallo, Michele Carminati, Mario Polino, Stefano Zanero
-
Poster: FedBlockParadox -- A Framework for Simulating and Securing Decentralized Federated Learning
Gabriele Digregorio, Francesco Bleggi, Federico Caroli, Michele Carminati, Stefano Zanero, Stefano Longari
-
Privacy Leaks by Adversaries: Adversarial Iterations for Membership Inference Attack
Jing Xue, Zhishen Sun, Haishan Ye, Luo Luo, Xiangyu Chang, Ivor Tsang, Guang Dai
-
Helpful Agent Meets Deceptive Judge: Understanding Vulnerabilities in Agentic Workflows
Yifei Ming, Zixuan Ke, Xuan-Phi Nguyen, Jiayu Wang, Shafiq Joty
-
BadReward: Clean-Label Poisoning of Reward Models in Text-to-Image RLHF
Kaiwen Duan, Hongwei Yao, Yufei Chen, Ziyun Li, Tong Qiao, Zhan Qin, Cong Wang
-
Adversarial Attacks on Robotic Vision Language Action Models
Eliot Krzysztof Jones, Alexander Robey, Andy Zou, Zachary Ravichandran, George J. Pappas, Hamed Hassani, Matt Fredrikson, J. Zico Kolter
-
Robustness in Both Domains: CLIP Needs a Robust Text Encoder
Elias Abad Rocamora, Christian Schlarmann, Naman Deep Singh, Yongtao Wu, Matthias Hein, Volkan Cevher
-
Dynamic Epsilon Scheduling: A Multi-Factor Adaptive Perturbation Budget for Adversarial Training
Alan Mitkiy, James Smith, Hana Satou, Hiroshi Tanaka, Emily Johnson, F Monkey
-
How stealthy is stealthy? Studying the Efficacy of Black-Box Adversarial Attacks in the Real World
Francesco Panebianco, Mario D'Onghia, Stefano Zanero aand Michele Carminati
-
Attacking Attention of Foundation Models Disrupts Downstream Tasks
Hondamunige Prasanna Silva, Federico Becattini, Lorenzo Seidenari
-
Unlearning's Blind Spots: Over-Unlearning and Prototypical Relearning Attack
SeungBum Ha, Saerom Park, Sung Whan Yoon
-
Rishi Raj Sahoo, Rucha Bhalchandra Joshi, Subhankar Mishra
-
Mitigating Data Poisoning Attacks to Local Differential Privacy
Xiaolin Li, Ninghui Li, Boyang Wang, Wenhai Sun
-
Fingerprinting Deep Learning Models via Network Traffic Patterns in Federated Learning
Md Nahid Hasan Shuvo, Moinul Hossain
-
Dirty and Clean-Label attack detection using GAN discriminators
John W. Smutny
-
Comprehensive Vulnerability Analysis is Necessary for Trustworthy LLM-MAS
Pengfei He, Yue Xing, Shen Dong, Juanhui Li, Zhenwei Dai, Xianfeng Tang, Hui Liu, Han Xu, Zhen Xiang, Charu C. Aggarwal, Hui Liu
-
Variance-Based Defense Against Blended Backdoor Attacks
Sujeevan Aseervatham, Achraf Kerzazi, Younès Bennani
-
MMD-Flagger: Leveraging Maximum Mean Discrepancy to Detect Hallucinations
Kensuke Mitsuzawa, Damien Garreau
-
Self-Refining Language Model Anonymizers via Adversarial Distillation
Kyuyoung Kim, Hyunjun Jeon, Jinwoo Shin
-
CoP: Agentic Red-teaming for Large Language Models using Composition of Principles
Chen Xiong, Pin-Yu Chen, Tsung-Yi Ho
-
Jailbreak-R1: Exploring the Jailbreak Capabilities of LLMs via Reinforcement Learning
Weiyang Guo, Zesheng Shi, Zhuo Li, Yequan Wang, Xuebo Liu, Wenya Wang, Fangming Liu, Min Zhang, Jing Li
-
Unlearning Inversion Attacks for Graph Neural Networks
Jiahao Zhang, Yilong Wang, Zhiwei Zhang, Xiaorui Liu, Suhang Wang
-
SafeGenes: Evaluating the Adversarial Robustness of Genomic Foundation Models
Huixin Zhan, Jason H. Moore
-
Wenshuo Dong, Qingsong Yang, Shu Yang, Lijie Hu, Meng Ding, Wanyu Lin, Tianhang Zheng, Di Wang
-
CAPAA: Classifier-Agnostic Projector-Based Adversarial Attack
Zhan Li, Mingyu Zhao, Xin Dong, Haibin Ling, Bingyao Huang
-
Yudong Zhang, Ruobing Xie, Yiqing Huang, Jiansheng Chen, Xingwu Sun, Zhanhui Kang, Di Wang, Yu Wang
-
Monitoring Robustness and Individual Fairness
Ashutosh Gupta, Thomas A. Henzinger, Konstantin Kueffner, Kaushik Mallik, David Pape
-
The Security Threat of Compressed Projectors in Large Vision-Language Models
Yudong Zhang, Ruobing Xie, Xingwu Sun, Jiansheng Chen, Zhanhui Kang, Di Wang, Yu Wang
-
Bayesian Inference of Training Dataset Membership
Yongchao Huang
-
Spectral Insights into Data-Oblivious Critical Layers in Large Language Models
Xuyuan Liu, Lei Hsiung, Yaoqing Yang, Yujun Yan
-
LoRA as a Flexible Framework for Securing Large Vision Systems
Zander W. Blasingame, Richard E. Neddo, Chen Liu
-
Beyond the Protocol: Unveiling Attack Vectors in the Model Context Protocol (MCP) Ecosystem
Hao Song, Yiming Shen, Wenxuan Luo, Leixin Guo, Ting Chen, Jiashui Wang, Beibei Li, Xiaosong Zhang, Jiachi Chen
-
SEAR: A Multimodal Dataset for Analyzing AR-LLM-Driven Social Engineering Behaviors
Tianlong Yu, Chenghang Ye, Zheyu Yang, Ziyi Zhou, Cui Tang, Zui Tao, Jun Zhang, Kailong Wang, Liting Zhou, Yang Yang, Ting Bi
-
The Butterfly Effect in Pathology: Exploring Security in Pathology Foundation Models
Jiashuai Liu, Yingjia Shang, Yingkang Zhan, Di Zhang, Yi Niu, Dong Wei, Xian Wu, Zeyu Gao, Chen Li, Yefeng Zheng
-
From Hallucinations to Jailbreaks: Rethinking the Vulnerability of Large Foundation Models
Haibo Jin, Peiyan Zhang, Peiran Wang, Man Luo, Haohan Wang
-
An Adversary-Resistant Multi-Agent LLM System via Credibility Scoring
Sana Ebrahimi, Mohsen Dehghankar, Abolfazl Asudeh
-
Exploring Multimodal Challenges in Toxic Chinese Detection: Taxonomy, Benchmark, and Findings
Shujian Yang, Shiyao Cui, Chuanrui Hu, Haicheng Wang, Tianwei Zhang, Minlie Huang, Jialiang Lu, Han Qiu
-
Adversarial Preference Learning for Robust LLM Alignment
Yuanfu Wang, Pengyu Wang, Chenyang Xi, Bo Tang, Junyi Zhu, Wenqiang Wei, Chen Chen, Chao Yang, Jingfeng Zhang, Chaochao Lu, Yijun Niu, Keming Mao, Zhiyu Li, Feiyu Xiong, Jie Hu, Mingchuan Yang
-
Xiaoyu Wu, Yifei Pang, Terrance Liu, Zhiwei Steven Wu
-
Learning Safety Constraints for Large Language Models
Xin Chen, Yarden As, Andreas Krause
-
Andrea Pedrotti, Michele Papucci, Cristiano Ciaccio, Alessio Miaschi, Giovanni Puccetti, Felice Dell'Orletta, Andrea Esuli
-
A Flat Minima Perspective on Understanding Augmentations and Model Robustness
Weebum Yoo, Sung Whan Yoon
-
Model Unlearning via Sparse Autoencoder Subspace Guided Projections
Xu Wang, Zihao Li, Benyou Wang, Yan Hu, Difan Zou
-
AMIA: Automatic Masking and Joint Intention Analysis Makes LVLMs Robust Jailbreak Defenders
Yuqi Zhang, Yuchun Miao, Zuchao Li, Liang Ding
-
Light as Deception: GPT-driven Natural Relighting Against Vision-Language Pre-training Models
Ying Yang, Jie Zhang, Xiao Lv, Di Lin, Tao Xiang, Qing Guo
-
Leveraging Intermediate Features of Vision Transformer for Face Anti-Spoofing
Mika Feng, Koichi Ito, Takafumi Aoki, Tetsushi Ohki, Masakatsu Nishigaki
-
Black-box Adversarial Attacks on CNN-based SLAM Algorithms
Maria Rafaela Gkeka, Bowen Sun, Evgenia Smirni, Christos D. Antonopoulos, Spyros Lalis, Nikolaos Bellas
-
PatchDEMUX: A Certifiably Robust Framework for Multi-label Classifiers Against Adversarial Patches
Dennis Jacob, Chong Xiang, Prateek Mittal
-
Practical Bayes-Optimal Membership Inference Attacks
Marcus Lassila, Johan Östman, Khac-Hoang Ngo, Alexandre Graell i Amat
-
Robust Federated Learning against Model Perturbation in Edge Networks
Dongzi Jin, Yong Xiao, Yingyu Li
-
ByzFL: Research Framework for Robust Federated Learning
Marc González, Rachid Guerraoui, Rafael Pinot, Geovani Rizk, John Stephan, François Taïani
-
Cascading Adversarial Bias from Injection to Distillation in Language Models
Harsh Chaudhari, Jamie Hayes, Matthew Jagielski, Ilia Shumailov, Milad Nasr, Alina Oprea
-
COSMIC: Generalized Refusal Direction Identification in LLM Activations
Vincent Siu, Nicholas Crispino, Zihao Yu, Sam Pan, Zhun Wang, Yang Liu, Dawn Song, Chenguang Wang
-
TRAPDOC: Deceiving LLM Users by Injecting Imperceptible Phantom Tokens into Documents
Hyundong Jin, Sicheol Sung, Shinwoo Park, SeungYeop Baik, Yo-Sub Han
-
Heterogeneous Graph Backdoor Attack
Jiawei Chen, Lusi Li, Daniel Takabi, Masha Sosonkina, Rui Ning
-
Adversarial Threat Vectors and Risk Mitigation for Retrieval-Augmented Generation Systems
Chris M. Ward, Josh Harguess
-
Towards Secure MLOps: Surveying Attacks, Mitigation Strategies, and Research Challenges
Raj Patel, Himanshu Tripathi, Jasper Stone, Noorbakhsh Amiri Golilarz, Sudip Mittal, Shahram Rahimi, Vini Chaudhary
-
A Red Teaming Roadmap Towards System-Level Safety
Zifan Wang, Christina Q. Knight, Jeremy Kritz, Willow E. Primack, Julian Michael
-
An Independent Discriminant Network Towards Identification of Counterfeit Images and Videos
Shayantani Kar, B. Shresth Bhimrajka, Aditya Kumar, Sahil Gupta, Sourav Ghosh, Subhamita Mukherjee, Shauvik Paul
-
How much do language models memorize?
John X. Morris, Chawin Sitawarin, Chuan Guo, Narine Kokhlikyan, G. Edward Suh, Alexander M. Rush, Kamalika Chaudhuri, Saeed Mahloujifar
-
Shadow defense against gradient inversion attack in federated learning
Le Jiang, Liyan Ma, Guang Yang
-
TRAP: Targeted Redirecting of Agentic Preferences
Hangoo Kang, Jehyeok Yeon, Gagandeep Singh
-
SafeScientist: Toward Risk-Aware Scientific Discoveries by LLM Agents
Kunlun Zhu, Jiaxun Zhang, Ziheng Qi, Nuoxing Shang, Zijia Liu, Peixuan Han, Yue Su, Haofei Yu, Jiaxuan You
-
Fooling the Watchers: Breaking AIGC Detectors via Semantic Prompt Attacks
Run Hao, Peng Ying
-
Disrupting Vision-Language Model-Driven Navigation Services via Adversarial Object Fusion
Chunlong Xie, Jialing He, Shangwei Guo, Jiacheng Wang, Shudong Zhang, Tianwei Zhang, Tao Xiang
-
Adversarial Semantic and Label Perturbation Attack for Pedestrian Attribute Recognition
Weizhe Kong, Xiao Wang, Ruichong Gao, Chenglong Li, Yu Zhang, Xing Yang, Yaowei Wang, Jin Tang
-
Keyed Chaotic Tensor Transformations for Secure And Attributable Neural Inference
Peter David Fagan
-
Utku Demir, Yalin E. Sagduyu, Tugba Erpek, Hossein Jafari, Sastry Kompella, Mengran Xue
-
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors
Yize Cheng, Wenxiao Wang, Mazda Moayeri, Soheil Feizi
-
Detecting Stealthy Backdoor Samples based on Intra-class Distance for Large Language Models
Jinwen Chen, Hainan Zhang, Fei Sun, Qinnan Zhang, Sijia Wen, Ziwei Wang, Zhiming Zheng
-
Mingyu Yu, Wei Wang, Yanjie Wei, Sujuan Qin
-
John Halloran
-
Model Immunization from a Condition Number Perspective
Amber Yijia Zheng, Cedar Site Bai, Brian Bullins, Raymond A. Yeh
-
Bayesian Perspective on Memorization and Reconstruction
Haim Kaplan, Yishay Mansour, Kobbi Nissim, Uri Stemmer
-
Merge Hijacking: Backdoor Attacks to Model Merging of Large Language Models
Zenghui Yuan, Yangming Xu, Jiawen Shi, Pan Zhou, Lichao Sun
-
Confidential Guardian: Cryptographically Prohibiting the Abuse of Model Abstention
Stephan Rabanser, Ali Shahin Shamsabadi, Olive Franzese, Xiao Wang, Adrian Weller, Nicolas Papernot
-
LLM Agents Should Employ Security Principles
Kaiyuan Zhang, Zian Su, Pin-Yu Chen, Elisa Bertino, Xiangyu Zhang, Ninghui Li
-
Keyed Chaotic Masking: A Functional Privacy Framework for Neural Inference
Peter David Fagan
-
NeuronTune: Towards Self-Guided Spurious Bias Mitigation
Guangtao Zheng, Wenqian Ye, Aidong Zhang
-
Can Emotion Fool Anti-spoofing?
Aurosweta Mahapatra, Ismail Rasim Ulgen, Abinay Reddy Naini, Carlos Busso, Berrak Sisman
-
DyePack: Provably Flagging Test Set Contamination in LLMs Using Backdoors
Yize Cheng, Wenxiao Wang, Mazda Moayeri, Soheil Feizi
-
Vid-SME: Membership Inference Attacks against Large Video Understanding Models
Qi Li, Runpeng Yu, Xinchao Wang
-
Mingyu Yu, Wei Wang, Yanjie Wei, Sujuan Qin
-
Securing AI Agents with Information-Flow Control
Manuel Costa, Boris Köpf, Aashish Kolluri, Andrew Paverd, Mark Russinovich, Ahmed Salem, Shruti Tople, Lukas Wutschitz, Santiago Zanella-Béguelin
-
Noise-Robustness Through Noise: Asymmetric LoRA Adaption with Poisoning Expert
Zhaokun Wang, Jinyu Guo, Jingwen Pu, Lingfeng Chen, Hongli Pu, Jie Ou, Libo Qin, Wenhong Tian
-
Differential Gated Self-Attention
Elpiniki Maria Lygizou, Mónika Farsang, Radu Grosu
-
ROTATE: Regret-driven Open-ended Training for Ad Hoc Teamwork
Caroline Wang, Arrasy Rahman, Jiaxun Cui, Yoonchang Sung, Peter Stone
-
Rethinking Gradient-based Adversarial Attacks on Point Cloud Classification
Jun Chen, Xinke Li, Mingyue Xu, Tianrui Li, Chongshou Li
-
Practical Adversarial Attacks on Stochastic Bandits via Fake Data Injection
Qirun Zeng, Eric He, Richard Hoffmann, Xuchuang Wang, Jinhang Zuo
-
Yongcan Yu, Yanbo Wang, Ran He, Jian Liang
-
From Dormant to Deleted: Tamper-Resistant Unlearning Through Weight-Space Regularization
Shoaib Ahmed Siddiqui, Adrian Weller, David Krueger, Gintare Karolina Dziugaite, Michael Curtis Mozer, Eleni Triantafillou
-
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments
Zeyi Liao, Jaylen Jones, Linxi Jiang, Eric Fosler-Lussier, Yu Su, Zhiqiang Lin, Huan Sun
-
Seeing the Threat: Vulnerabilities in Vision-Language Models to Adversarial Attack
Juan Ren, Mark Dras, Usman Naseem
-
Yujin Choi, Youngjoo Park, Junyoung Byun, Jaewook Lee, Jinseong Park
-
Yifan Lu, Jing Li, Yigeng Zhou, Yihui Zhang, Wenya Wang, Xiucheng Li, Meishan Zhang, Fangming Liu, Jun Yu, Min Zhang
-
Wenwen Qiang, Ziyin Gu, Lingyu Si, Jiangmeng Li, Changwen Zheng, Fuchun Sun, Hui Xiong
-
Md Touhidul Islam, Imran Kabir, Md Alimoor Reza, Syed Masum Billah
-
The Meeseeks Mesh: Spatially Consistent 3D Adversarial Objects for BEV Detector
Aixuan Li, Mochu Xiang, Jing Zhang, Yuchao Dai
-
Adversarially Robust AI-Generated Image Detection for Free: An Information Theoretic Perspective
Ruixuan Zhang, He Wang, Zhengyu Zhao, Zhiqing Guo, Xun Yang, Yunfeng Diao, Meng Wang
-
Ruiguo Yu, Yiyang Zhang, Yuan Tian, Yujie Diao, Di Jin, Witold Pedrycz
-
Understanding Adversarial Training with Energy-based Models
Mujtaba Hussain Mirza, Maria Rosaria Briglia, Filippo Bartolucci, Senad Beadini, Giuseppe Lisanti, Iacopo Masi
-
A Closer Look on Memorization in Tabular Diffusion Model: A Data-Centric Perspective
Zhengyu Fang, Zhimeng Jiang, Huiyuan Chen, Xiaoge Zhang, Kaiyu Tang, Xiao Li, Jing Li
-
Privacy-preserving Prompt Personalization in Federated Learning for Multimodal Large Language Models
Sizai Hou, Songze Li, Baturalp Buyukates
-
Efficient Preimage Approximation for Neural Network Certification
Anton Björklund, Mykola Zaitsev, Marta Kwiatkowska
-
How Do Diffusion Models Improve Adversarial Robustness?
Liu Yuezhang, Xue-Xin Wei
-
Operationalizing CaMeL: Strengthening LLM Defenses for Enterprise Deployment
Krti Tallam, Emma Miller
-
Jaewoo Ahn, Heeseung Yun, Dayoon Ko, Gunhee Kim
-
Machine Learning Models Have a Supply Chain Problem
Sarah Meiklejohn, Hayden Blauzvern, Mihai Maruseac, Spencer Schrock, Laurent Simon, Ilia Shumailov
-
TensorShield: Safeguarding On-Device Inference by Shielding Critical DNN Tensors with TEE
Tong Sun, Bowen Jiang, Hailong Lin, Borui Li, Yixiao Teng, Yi Gao, Wei Dong
-
Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM
Lei Yu, Yechao Zhang, Ziqi Zhou, Yang Wu, Wei Wan, Minghui Li, Shengshan Hu, Pei Xiaobing, Jing Wang
-
GeneBreaker: Jailbreak Attacks against DNA Language Models with Pathogenicity Guidance
Zaixi Zhang, Zhenghong Zhou, Ruofan Jin, Le Cong, Mengdi Wang
-
Are classical deep neural networks weakly adversarially robust?
Nuolin Sun, Linyuan Wang, Dongyang Li, Bin Yan, Lei Li
-
PALADIN : Robust Neural Fingerprinting for Text-to-Image Diffusion Models
Murthy L, Subarna Tripathi
-
Seven Security Challenges That Must be Solved in Cross-domain Multi-agent LLM Systems
Ronny Ko, Jiseong Jeong, Shuyuan Zheng, Chuan Xiao, Tae-Wan Kim, Makoto Onizuka, Won-Yong Shin
-
Precise In-Parameter Concept Erasure in Large Language Models
Yoav Gur-Arieh, Clara Suslik, Yihuai Hong, Fazl Barez, Mor Geva
-
Preventing Adversarial AI Attacks Against Autonomous Situational Awareness: A Maritime Case Study
Mathew J. Walter, Aaron Barrett, Kimberly Tam
-
VideoMarkBench: Benchmarking Robustness of Video Watermarking
Zhengyuan Jiang, Moyang Guo, Kecen Li, Yuepeng Hu, Yupu Wang, Zhicong Huang, Cheng Hong, Neil Zhenqiang Gong
-
Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space
Yao Huang, Yitong Sun, Shouwei Ruan, Yichi Zhang, Yinpeng Dong, Xingxing Wei
-
Calibrating LLM Confidence by Probing Perturbed Representation Stability
Reza Khanmohammadi, Erfan Miahi, Mehrsa Mardikoraem, Simerjot Kaur, Ivan Brugere, Charese H. Smiley, Kundan Thind, Mohammad M. Ghassemi
-
What is Adversarial Training for Diffusion Models?
Briglia Maria Rosaria, Mujtaba Hussain Mirza, Giuseppe Lisanti, Iacopo Masi
-
Faster Rates for Private Adversarial Bandits
Hilal Asi, Vinod Raman, Kunal Talwar
-
System Prompt Extraction Attacks and Defenses in Large Language Models
Badhan Chandra Das, M. Hadi Amini, Yanzhao Wu
-
Multi-level Certified Defense Against Poisoning Attacks in Offline Reinforcement Learning
Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah Erfani, Benjamin I. P. Rubinstein
-
MedSentry: Understanding and Mitigating Safety Risks in Medical LLM Multi-Agent Systems
Kai Chen, Taihang Zhen, Hewei Wang, Kailai Liu, Xinfeng Li, Jing Huo, Tianpei Yang, Jinfeng Xu, Wei Dong, Yang Gao
-
TabAttackBench: A Benchmark for Adversarial Attacks on Tabular Data
Zhipeng He, Chun Ouyang, Lijie Wen, Cong Liu, Catarina Moreira
-
Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling
Yichuan Cao, Yibo Miao, Xiao-Shan Gao, Yinpeng Dong
-
HeteroBA: A Structure-Manipulating Backdoor Attack on Heterogeneous Graphs
Honglin Gao, Xiang Li, Lan Zhao, Gaoxi Xiao
-
PoisonSwarm: Universal Harmful Information Synthesis via Model Crowdsourcing
Yu Yan, Sheng Sun, Zhifei Zheng, Ziji Hao, Teli Liu, Min Liu
-
Breaking the Ceiling: Exploring the Potential of Jailbreak Attacks through Expanding Strategy Space
Yao Huang, Yitong Sun, Shouwei Ruan, Yichi Zhang, Yinpeng Dong, Xingxing Wei
-
A Framework for Adversarial Analysis of Decision Support Systems Prior to Deployment
Brett Bissey, Kyle Gatesman, Walker Dimon, Mohammad Alam, Luis Robaina, Joseph Weissman
-
AdInject: Real-World Black-Box Attacks on Web Agents via Advertising Delivery
Haowei Wang, Junjie Wang, Xiaojun Jia, Rupeng Zhang, Mingyang Li, Zhe Liu, Yang Liu, Qing Wang
-
Adversarial bandit optimization for approximately linear functions
Zhuoyu Cheng, Kohei Hatano, Eiji Takimoto
-
Label Leakage in Federated Inertial-based Human Activity Recognition
Marius Bock, Maximilian Hopp, Kristof Van Laerhoven, Michael Moeller
-
Automated Privacy Information Annotation in Large Language Model Interactions
Hang Zeng, Xiangyu Liu, Yong Hu, Chaoyue Niu, Fan Wu, Shaojie Tang, Guihai Chen
-
Mitigating Hallucination in Large Vision-Language Models via Adaptive Attention Calibration
Mehrdad Fazli, Bowen Wei, Ahmet Sari, Ziwei Zhu
-
Concealment of Intent: A Game-Theoretic Analysis
Xinbo Wu, Abhishek Umrawal, Lav R. Varshney
-
Learnable Kernel Density Estimation for Graphs
Xudong Wang, Ziheng Sun, Chris Ding, Jicong Fan
-
PoisonSwarm: Universal Harmful Information Synthesis via Model Crowdsourcing
Yu Yan, Sheng Sun, Zhifei Zheng, Ziji Hao, Teli Liu, Min Liu
-
Unveiling Impact of Frequency Components on Membership Inference Attacks for Diffusion Models
Puwei Lian, Yujun Cai, Songze Li, Bingkun Bao
-
Capability-Based Scaling Laws for LLM Red-Teaming
Alexander Panfilov, Paul Kassianik, Maksym Andriushchenko, Jonas Geiping
-
Your Classifier Can Do More: Towards Bridging the Gaps in Classification, Robustness, and Generation
Kaichao Jiang, He Wang, Xiaoshuai Hao, Xiulong Yang, Ajian Liu, Qi Chu, Yunfeng Diao
-
DOGe: Defensive Output Generation for LLM Protection Against Knowledge Distillation
Pingzhi Li, Zhen Tan, Huaizhi Qu, Huan Liu, Tianlong Chen
-
Diagnosing and Mitigating Modality Interference in Multimodal Large Language Models
Rui Cai, Bangzheng Li, Xiaofei Wen, Muhao Chen, Zhe Zhao
-
Pengcheng Sun, Erwu Liu, Wei Ni, Rui Wang, Yuanzhe Geng, Lijuan Lai, Abbas Jamalipour
-
Novel Loss-Enhanced Universal Adversarial Patches for Sustainable Speaker Privacy
Elvir Karimov, Alexander Varlamov, Danil Ivanov, Dmitrii Korzh, Oleg Y. Rogov
-
Xinping Chen, Chen Liu
-
Lifelong Safety Alignment for Language Models
Haoyu Wang, Zeyu Qin, Yifei Zhao, Chao Du, Min Lin, Xueqian Wang, Tianyu Pang
-
Holes in Latent Space: Topological Signatures Under Adversarial Influence
Aideen Fay, Inés García-Redondo, Qiquan Wang, Haim Dubossarsky, Anthea Monod
-
Benign-to-Toxic Jailbreaking: Inducing Harmful Responses from Harmless Prompts
Hee-Seon Kim, Minbeom Kim, Wonjun Lee, Kihyun Kim, Changick Kim
-
VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models
Bingrui Sima, Linhua Cong, Wenxuan Wang, Kun He
-
Erasing Concepts, Steering Generations: A Comprehensive Survey of Concept Suppression
Yiwei Xie, Ping Liu, Zheng Zhang
-
Kai Li, Conggai Li, Xin Yuan, Shenghong Li, Sai Zou, Syed Sohail Ahmed, Wei Ni, Dusit Niyato, Abbas Jamalipour, Falko Dressler, Ozgur B. Akan
-
MultiPhishGuard: An LLM-based Multi-Agent System for Phishing Email Detection
Yinuo Xue, Eric Spero, Yun Sing Koh, Giovanni Russello
-
JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models
Jiaxin Song, Yixu Wang, Jie Li, Rui Yu, Yan Teng, Xingjun Ma, Yingchun Wang
-
Spurious Privacy Leakage in Neural Networks
Chenxiang Zhang, Jun Pang, Sjouke Mauw
-
Poison in the Well: Feature Embedding Disruption in Backdoor Attacks
Zhou Feng, Jiahao Chen, Chunyi Zhou, Yuwen Pu, Qingming Li, Shouling Ji
-
One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP
Binyan Xu, Xilin Dai, Di Tang, Kehuan Zhang
-
CPA-RAG:Covert Poisoning Attacks on Retrieval-Augmented Generation in Large Language Models
Chunyang Li, Junwei Zhang, Anda Cheng, Zhuo Ma, Xinghua Li, Jianfeng Ma
-
Holes in Latent Space: Topological Signatures Under Adversarial Influence
Aideen Fay, Inés García-Redondo, Qiquan Wang, Haim Dubossarsky, Anthea Monod
-
TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent
Dominik Meier, Jan Philip Wahle, Paul Röttger, Terry Ruas, Bela Gipp
-
Erasing Concepts, Steering Generations: A Comprehensive Survey of Concept Suppression
Yiwei Xie, Ping Liu, Zheng Zhang
-
Yuhao He, Jinyu Tian, Haiwei Wu, Jianqing Li
-
Guard Me If You Know Me: Protecting Specific Face-Identity from Deepfakes
Kaiqing Lin, Zhiyuan Yan, Ke-Yue Zhang, Li Hao, Yue Zhou, Yuzhen Lin, Weixiang Li, Taiping Yao, Shouhong Ding, Bin Li
-
JailBound: Jailbreaking Internal Safety Boundaries of Vision-Language Models
Jiaxin Song, Yixu Wang, Jie Li, Rui Yu, Yan Teng, Xingjun Ma, Yingchun Wang
-
Amira Guesmi, Bassem Ouni, Muhammad Shafique
-
VisCRA: A Visual Chain Reasoning Attack for Jailbreaking Multimodal Large Language Models
Bingrui Sima, Linhua Cong, Wenxuan Wang, Kun He
-
Attention! You Vision Language Model Could Be Maliciously Manipulated
Xiaosen Wang, Shaokang Wang, Zhijin Ge, Yuyang Luo, Shudong Zhang
-
Jiawen Zhang, Zhenwei Zhang, Shun Zheng, Xumeng Wen, Jia Li, Jiang Bian
-
Fox in the Henhouse: Supply-Chain Backdoor Attacks Against Reinforcement Learning
Shijie Liu, Andrew C. Cullen, Paul Montague, Sarah Erfani, Benjamin I. P. Rubinstein
-
An Out-Of-Distribution Membership Inference Attack Approach for Cross-Domain Graph Attacks
Jinyan Wang, Liu Yang, Yuecen Wei, Jiaxuan Si, Chenhao Guo, Qingyun Sun, Xianxian Li, Xingcheng Fu
-
Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study
Guanyu Hou, Jiaming He, Yinhang Zhou, Ji Guo, Yitong Qiao, Rui Zhang, Wenbo Jiang
-
What Really Matters in Many-Shot Attacks? An Empirical Study of Long-Context Vulnerabilities in LLMs
Sangyeop Kim, Yohan Lee, Yongwoo Song, Kimin Lee
-
Efficient and Stealthy Jailbreak Attacks via Adversarial Prompt Distillation from LLMs to SLMs
Xiang Li, Chong Zhang, Jia Wang, Fangyu Wu, Yushi Li, Xiaobo Jin
-
Semantic-Preserving Adversarial Attacks on LLMs: An Adaptive Greedy Binary Search Approach
Chong Zhang, Xiang Li, Jia Wang, Shan Liang, Haochen Xue, Xiaobo Jin
-
One Surrogate to Fool Them All: Universal, Transferable, and Targeted Adversarial Attacks with CLIP
Binyan Xu, Xilin Dai, Di Tang, Kehuan Zhang
-
Evaluating Robustness of Large Audio Language Models to Audio Injection: An Empirical Study
Guanyu Hou, Jiaming He, Yinhang Zhou, Ji Guo, Yitong Qiao, Rui Zhang, Wenbo Jiang
-
Amira Guesmi, Bassem Ouni, Muhammad Shafique
-
TrojanStego: Your Language Model Can Secretly Be A Steganographic Privacy Leaking Agent
Dominik Meier, Jan Philip Wahle, Paul Röttger, Terry Ruas, Bela Gipp
-
Stronger Enforcement of Instruction Hierarchy via Augmented Intermediate Representations
Sanjay Kariyappa, G. Edward Suh
-
An Embarrassingly Simple Defense Against LLM Abliteration Attacks
Harethah Abu Shairah, Hasan Abed Al Kader Hammoud, Bernard Ghanem, George Turkiyyah
-
CloneShield: A Framework for Universal Perturbation Against Zero-Shot Voice Cloning
Renyuan Li, Zhibo Liang, Haichuan Zhang, Tianyu Shi, Zhiyuan Cheng, Jia Shi, Carl Yang, Mingjie Tang
-
Peiran Sun
-
Sayyed Farid Ahamed, Sandip Roy, Soumya Banerjee, Marc Vucovich, Kevin Choi, Abdul Rahman, Alison Hu, Edward Bowen, Sachin Shetty
-
Towards Generalized Proactive Defense against Face Swappingwith Contour-Hybrid Watermark
Ruiyang Xia, Dawei Zhou, Decheng Liu, Lin Yuan, Jie Li, Nannan Wang, Xinbo Gao
-
Feiran Liu, Yuzhe Zhang, Xinyi Huang, Yinan Peng, Xinfeng Li, Lixu Wang, Yutong Shen, Ranjie Duan, Simeng Qin, Xiaoj
-
GhostPrompt: Jailbreaking Text-to-image Generative Models based on Dynamic Optimization
Zixuan Chen, Hao Lin, Ke Xu, Xinghao Jiang, Tanfeng Sun
-
Querying Kernel Methods Suffices for Reconstructing their Training Data
Daniel Barzilai, Yuval Margalit, Eitan Gronich, Gilad Yehudai, Meirav Galun, Ronen Basri
-
Shiyu Xiang, Tong Zhang, Ronghao Chen
-
RADEP: A Resilient Adaptive Defense Framework Against Model Extraction Attacks
Amit Chakraborty, Sayyed Farid Ahamed, Sandip Roy, Soumya Banerjee, Kevin Choi, Abdul Rahman, Alison Hu, Edward Bowen, Sachin Shetty
-
A Comprehensive Survey on the Risks and Limitations of Concept-based Models
Sanchit Sinha, Aidong Zhang
-
Ignition Phase : Standard Training for Fast Adversarial Robustness
Wang Yu-Hang, Liu ying, Fang liang, Wang Xuelin, Junkang Guo, Shiwei Li, Lei Gao, Jian Liu, Wenfei Yin
-
JEDI: The Force of Jensen-Shannon Divergence in Disentangling Diffusion Models
Eric Tillmann Bill, Enis Simsar, Thomas Hofmann
-
Shiyu Xiang, Tong Zhang, Ronghao Chen
-
Ignition Phase : Standard Training for Fast Adversarial Robustness
Wang Yu-Hang, Liu ying, Fang liang, Wang Xuelin, Junkang Guo, Shiwei Li, Lei Gao, Jian Liu, Wenfei Yin
-
GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
Jialong Zhou, Lichao Wang, Xiao Yang
-
AssistedDS: Benchmarking How External Domain Knowledge Assists LLMs in Automated Data Science
An Luo, Xun Xian, Jin Du, Fangqiao Tian, Ganghua Wang, Ming Zhong, Shengchun Zhao, Xuan Bi, Zirui Liu, Jiawei Zhou, Jayanth Srinivasa, Ashish Kundu, Charles Fleming, Mingyi Hong, Jie Ding
-
EdgeAgentX: A Novel Framework for Agentic AI at the Edge in Military Communication Networks
Abir Ray
-
Jun Zhuang, Haibo Jin, Ye Zhang, Zhengjian Kang, Wenbin Zhang, Gaby G. Dagher, Haohan Wang
-
Robustness in Large Language Models: A Survey of Mitigation Strategies and Evaluation Metrics
Pankaj Kumar, Subhankar Mishra
-
StyleGuard: Preventing Text-to-Image-Model-based Style Mimicry Attacks by Style Perturbations
Yanjie Li, Wenxuan Zhang, Xinqi Lyu, Yihao Liu, Bin Xiao
-
Strong Membership Inference Attacks on Massive Datasets and (Moderately) Large Language Models
Jamie Hayes, Ilia Shumailov, Christopher A. Choquette-Choo, Matthew Jagielski, George Kaissis, Katherine Lee, Milad Nasr, Sahra Ghalebikesabi, Niloofar Mireshghallah, Meenatchi Sundaram Mutu Selva Annamalai, Igor Shilov, Matthieu Meeus, Yves-Alexandre de Montjoye, Franziska Boenisch, Adam Dziedzic, A. Feder Cooper
-
LORE: Lagrangian-Optimized Robust Embeddings for Visual Encoders
Borna Khodabandeh, Amirabbas Afzali, Amirhossein Afsharrad, Seyed Shahabeddin Mousavi, Sanjay Lall, Sajjad Amini, Seyed-Mohsen Moosavi-Dezfooli
-
Security Concerns for Large Language Models: A Survey
Miles Q. Li, Benjamin C. M. Fung
-
Mind the Gap: A Practical Attack on GGUF Quantization
Kazuki Egashira, Robin Staab, Mark Vero, Jingxuan He, Martin Vechev
-
Audio Jailbreak Attacks: Exposing Vulnerabilities in SpeechGPT in a White-Box Framework
Binhao Ma, Hanqing Guo, Zhengping Jay Luo, Rui Duan
-
Does Representation Intervention Really Identify Desired Concepts and Elicit Alignment?
Hongzheng Yang, Yongqiang Chen, Zeyu Qin, Tongliang Liu, Chaowei Xiao, Kun Zhang, Bo Han
-
Benchmarking Poisoning Attacks against Retrieval-Augmented Generation
Baolei Zhang, Haoran Xin, Jiatong Li, Dongzhe Zhang, Minghong Fang, Zhuqing Liu, Lihai Nie, Zheli Liu
-
Mal-D2GAN: Double-Detector based GAN for Malware Generation
Nam Hoang Thanh, Trung Pham Duy, Lam Bui Thu
-
$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking
Peijie Yu, Yifan Yang, Jinjian Li, Zelong Zhang, Haorui Wang, Xiao Feng, Feng Zhang
-
$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking
Peijie Yu, Yifan Yang, Jinjian Li, Zelong Zhang, Haorui Wang, Xiao Feng, Feng Zhang
-
Chen Han, Wenzhen Zheng, Xijin Tang
-
How Is LLM Reasoning Distracted by Irrelevant Context? An Analysis Using a Controlled Benchmark
Minglai Yang, Ethan Huang, Liang Zhang, Mihai Surdeanu, William Wang, Liangming Pan
-
StyleGuard: Preventing Text-to-Image-Model-based Style Mimicry Attacks by Style Perturbations
Yanjie Li, Wenxuan Zhang, Xinqi Lyu, Yihao Liu, Bin Xiao
-
Misaligning Reasoning with Answers -- A Framework for Assessing LLM CoT Robustness
Enyi Jiang, Changming Xu, Nischay Singh, Gagandeep Singh
-
Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?
Chengda Lu, Xiaoyu Fan, Yu Huang, Rongwu Xu, Jijie Li, Wei Xu
-
RoHyDR: Robust Hybrid Diffusion Recovery for Incomplete Multimodal Emotion Recognition
Yuehan Jin, Xiaoqing Liu, Yiyuan Yang, Zhiwen Yu, Tong Zhang, Kaixiang Yang
-
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
Zifan Peng, Yule Liu, Zhen Sun, Mingchen Li, Zeren Luo, Jingyi Zheng, Wenhan Dong, Xinlei He, Xuechao Wang, Yingjie Xue, Shengmin Xu, Xinyi Huang
-
Jiawei Kong, Hao Fang, Xiaochen Yang, Kuofeng Gao, Bin Chen, Shu-Tao Xia, Yaowei Wang, Min Zhang
-
What You Read Isn't What You Hear: Linguistic Sensitivity in Deepfake Speech Detection
Binh Nguyen, Shuji Shi, Ryan Ofman, Thai Le
-
Chain-of-Lure: A Synthetic Narrative-Driven Approach to Compromise Large Language Models
Wenhan Chang, Tianqing Zhu, Yu Zhao, Shuangyong Song, Ping Xiong, Wanlei Zhou, Yongxiang Li
-
One Model Transfer to All: On Robust Jailbreak Prompts Generation against LLMs
Linbao Li, Yannan Liu, Daojing He, Yu Li
-
VEAttack: Downstream-agnostic Vision Encoder Attack against Large Vision Language Models
Hefei Mei, Zirui Wang, Shen You, Minjing Dong, Chang Xu
-
The Coherence Trap: When MLLM-Crafted Narratives Exploit Manipulated Visual Contexts
Yuchen Zhang, Yaxiong Wang, Yujiao Wu, Lianwei Wu, Li Zhu
-
Enhancing Adversarial Robustness of Vision Language Models via Adversarial Mixture Prompt Tuning
Shiji Zhao, Qihui Zhu, Shukun Xiong, Shouwei Ruan, Yize Fan, Ranjie Duan, Qing Guo, Xingxing Wei
-
Ping Li, Jianan Ni, Bo Pang
-
SemSegBench & DetecBench: Benchmarking Reliability and Generalization Beyond Classification
Shashank Agnihotri, David Schader, Jonas Jakubassa, Nico Sharei, Simon Kral, Mehmet Ege Kaçar, Ruben Weber, Margret Keuper
-
CAMME: Adaptive Deepfake Image Detection with Multi-Modal Cross-Attention
Naseem Khan, Tuan Nguyen, Amine Bermak, Issa Khalil
-
Mahalanobis++: Improving OOD Detection via Feature Normalization
Maximilian Mueller, Matthias Hein
-
Towards more transferable adversarial attack in black-box manner
Chun Tong Lei, Zhongliang Guo, Hon Chung Lee, Minh Quoc Duong, Chun Pong Lau
-
Adversarial Robustness of Nonparametric Regression
Parsa Moradi, Hanzaleh Akabrinodehi, Mohammad Ali Maddah-Ali
-
Improved and Oracle-Efficient Online $\ell_1$-Multicalibration
Rohan Ghuge, Vidya Muthukumar, Sahil Singla
-
Teruki Sano, Minoru Kuribayashi, Masao Sakai, Shuji Ishobe, Eisuke Koizumi
-
Sec5GLoc: Securing 5G Indoor Localization via Adversary-Resilient Deep Learning Architecture
Ildi Alla, Valeria Loscri
-
Architectural Backdoors for Within-Batch Data Stealing and Model Inference Manipulation
Nicolas Küchler, Ivan Petrov, Conrad Grobler, Ilia Shumailov
-
A Critical Evaluation of Defenses against Prompt Injection Attacks
Yuqi Jia, Zedian Shao, Yupei Liu, Jinyuan Jia, Dawn Song, Neil Zhenqiang Gong
-
An Attack to Break Permutation-Based Private Third-Party Inference Schemes for LLMs
Rahul Thomas, Louai Zahran, Erica Choi, Akilesh Potti, Micah Goldblum, Arka Pal
-
AI/ML for 5G and Beyond Cybersecurity
Sandeep Pirbhulal, Habtamu Abie, Martin Jullum, Didrik Nielsen, Anders Løland
-
EVADE: Multimodal Benchmark for Evasive Content Detection in E-Commerce Applications
Ancheng Xu, Zhihao Yang, Jingpeng Li, Guanghu Yuan, Longze Chen, Liang Yan, Jiehui Zhou, Zhen Qin, Hengyun Chang, Hamid Alinejad-Rokny, Bo Zheng, Min Yang
-
How Can I Publish My LLM Benchmark Without Giving the True Answers Away?
Takashi Ishida, Thanawat Lodkaew, Ikko Yamane
-
Reward Model Overoptimisation in Iterated RLHF
Lorenz Wolf, Robert Kirk, Mirco Musolesi
-
T2VUnlearning: A Concept Erasing Method for Text-to-Video Diffusion Models
Xiaoyu Ye, Songjie Cheng, Yongtao Wang, Yajiao Xiong, Yishen Li
-
Unveiling the Basin-Like Loss Landscape in Large Language Models
Huanran Chen, Yinpeng Dong, Zeming Wei, Yao Huang, Yichi Zhang, Hang Su, Jun Zhu
-
Nicolas Castanet, Olivier Sigaud, Sylvain Lamprier
-
SafeKey: Amplifying Aha-Moment Insights for Safety Reasoning
Kaiwen Zhou, Xuandong Zhao, Gaowen Liu, Jayanth Srinivasa, Aosong Feng, Dawn Song, Xin Eric Wang
-
Finetuning-Activated Backdoors in LLMs
Thibaud Gloaguen, Mark Vero, Robin Staab, Martin Vechev
-
Xueyang Zhou, Guiyao Tie, Guowen Zhang, Hechang Wang, Pan Zhou, Lichao Sun
-
From Evaluation to Defense: Advancing Safety in Video Large Language Models
Yiwei Sun, Peiqi Jiang, Chuanbin Liu, Luohao Lin, Zhiying Lu, Hongtao Xie
-
BitHydra: Towards Bit-flip Inference Cost Attack against Large Language Models
Xiaobei Yan, Yiming Li, Zhaoxin Fan, Han Qiu, Tianwei Zhang
-
Mitigating Fine-tuning Risks in LLMs via Safety-Aware Probing Optimization
Chengcan Wu, Zhixin Zhang, Zeming Wei, Yihao Zhang, Meng Sun
-
Jianing Geng, Biao Yi, Zekun Fei, Tongxi Wu, Lihai Nie, Zheli Liu
-
CoTSRF: Utilize Chain of Thought as Stealthy and Robust Fingerprint of Large Language Models
Zhenzhen Ren, GuoBiao Li, Sheng Li, Zhenxing Qian, Xinpeng Zhang
-
Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected Vulnerability
Punya Syon Pandey, Samuel Simko, Kellin Pelrine, Zhijing Jin
-
Viet Pham, Thai Le
-
MixAT: Combining Continuous and Discrete Adversarial Training for LLMs
Csaba Dékány, Stefan Balauca, Robin Staab, Dimitar I. Dimitrov, Martin Vechev
-
Junjie Xiong, Changjia Zhu, Shuhang Lin, Chong Zhang, Yongfeng Zhang, Yao Liu, Lingyao Li
-
Three Minds, One Legend: Jailbreak Large Reasoning Model with Adaptive Stacked Ciphers
Viet-Anh Nguyen, Shiqian Zhao, Gia Dao, Runyi Hu, Yi Xie, Luu Anh Tuan
-
All You Need is "Leet": Evading Hate-speech Detection AI
Sampanna Yashwant Kahu, Naman Ahuja
-
CTRAP: Embedding Collapse Trap to Safeguard Large Language Models from Harmful Fine-Tuning
Biao Yi, Tiansheng Huang, Baolei Zhang, Tong Li, Lihai Nie, Zheli Liu, Li Shen
-
BadDepth: Backdoor Attacks Against Monocular Depth Estimation in the Physical World
Ji Guo, Long Zhou, Zhijin Wang, Jiaming He, Qiyang Song, Aiguo Chen, Wenbo Jiang
-
TRAIL: Transferable Robust Adversarial Images via Latent diffusion
Yuhao Xue, Zhifei Zhang, Xinyang Jiang, Yifei Shen, Junyao Gao, Wentao Gu, Jiale Zhao, Miaojing Shi, Cairong Zhao
-
Accelerating Targeted Hard-Label Adversarial Attacks in Low-Query Black-Box Settings
Arjhun Swaminathan, Mete Akgün
-
Hossein Khalili, Seongbin Park, Venkat Bollapragada, Nader Sehatbakhsh
-
Yuanhao Huang, Yilong Ren, Jinlei Wang, Lujia Huo, Xuesong Bai, Jinchuan Zhang, Haiyan Yu
-
Backdoor Cleaning without External Guidance in MLLM Fine-tuning
Xuankun Rong, Wenke Huang, Jian Liang, Jinhe Bi, Xun Xiao, Yiming Li, Bo Du, Mang Ye
-
When Are Concepts Erased From Diffusion Models?
Kevin Lu, Nicky Kriplani, Rohit Gandikota, Minh Pham, David Bau, Chinmay Hegde, Niv Cohen
-
Performance Guaranteed Poisoning Attacks in Federated Learning: A Sliding Mode Approach
Huazi Pan, Yanjun Zhang, Leo Yu Zhang, Scott Adams, Abbas Kouzani, Suiyang Khoo
-
Implicit Jailbreak Attacks via Cross-Modal Information Concealment on Vision-Language Models
Zhaoxin Wang, Handing Wang, Cong Tian, Yaochu Jin
-
Experimental robustness benchmark of quantum neural network on a superconducting quantum processor
Hai-Feng Zhang, Zhao-Yun Chen, Peng Wang, Liang-Liang Guo, Tian-Le Wang, Xiao-Yan Yang, Ren-Ze Zhao, Ze-An Zhao, Sheng Zhang, Lei Du, Hao-Ran Tao, Zhi-Long Jia, Wei-Cheng Kong, Huan-Yu Liu, Athanasios V. Vasilakos, Yang Yang, Yu-Chun Wu, Ji Guan, Peng Duan, Guo-Ping Guo
-
Robust LLM Fingerprinting via Domain-Specific Watermarks
Thibaud Gloaguen, Robin Staab, Nikola Jovanović, Martin Vechev
-
Privacy-Aware Cyberterrorism Network Analysis using Graph Neural Networks and Federated Learning
Anas Ali, Mubashar Husain, Peter Hans
-
MTSA: Multi-turn Safety Alignment for LLMs through Multi-round Red-teaming
Weiyang Guo, Jing Li, Wenya Wang, YU LI, Daojing He, Jun Yu, Min Zhang
-
Bang Trinh Tran To, Thai Le
-
Robustifying Vision-Language Models via Dynamic Token Reweighting
Tanqiu Jiang, Jiacheng Liang, Rongyi Zhu, Jiawei Zhou, Fenglong Ma, Ting Wang
-
Secure and Private Federated Learning: Achieving Adversarial Resilience through Robust Aggregation
Kun Yang, Neena Imam
-
Backdoors in DRL: Four Environments Focusing on In-distribution Triggers
Chace Ashcraft, Ted Staley, Josh Carney, Cameron Hickert, Derek Juba, Kiran Karra, Nathan Drenkow
-
Towards medical AI misalignment: a preliminary study
Barbara Puccio, Federico Castagna, Allan Tucker, Pierangelo Veltri
-
Three Minds, One Legend: Jailbreak Large Reasoning Model with Adaptive Stacked Ciphers
Viet-Anh Nguyen, Shiqian Zhao, Gia Dao, Runyi Hu, Yi Xie, Luu Anh Tuan
-
When Are Concepts Erased From Diffusion Models?
Kevin Lu, Nicky Kriplani, Rohit Gandikota, Minh Pham, David Bau, Chinmay Hegde, Niv Cohen
-
Training on Plausible Counterfactuals Removes Spurious Correlations
Shpresim Sadiku, Kartikeya Chitranshi, Hiroshi Kera, Sebastian Pokutta
-
Accidental Vulnerability: Factors in Fine-Tuning that Shift Model Safeguards
Punya Syon Pandey, Samuel Simko, Kellin Pelrine, Zhijing Jin
-
Ercong Nie, Helmut Schmid, Hinrich Schütze
-
Erased or Dormant? Rethinking Concept Erasure Through Reversibility
Ping Liu, Chi Zhang
-
Guiding Giants: Lightweight Controllers for Weighted Activation Steering in LLMs
Amr Hegazy, Mostafa Elhoushi, Amr Alanwar
-
Watch your steps: Dormant Adversarial Behaviors that Activate upon LLM Finetuning
Thibaud Gloaguen, Mark Vero, Robin Staab, Martin Vechev
-
REOBench: Benchmarking Robustness of Earth Observation Foundation Models
Xiang Li, Yong Tao, Siyuan Zhang, Siwei Liu, Zhitong Xiong, Chunbo Luo, Lu Liu, Mykola Pechenizkiy, Xiao Xiang Zhu, Tianjin Huang
-
Shape it Up! Restoring LLM Safety during Finetuning
ShengYun Peng, Pin-Yu Chen, Jianfeng Chi, Seongmin Lee, Duen Horng Chau
-
Blind Spot Navigation: Evolutionary Discovery of Sensitive Semantic Concepts for LVLMs
Zihao Pan, Yu Tong, Weibin Wu, Jingyi Wang, Lifeng Chen, Zhe Zhao, Jiajia Wei, Yitong Qiao, Zibin Zheng
-
BadSR: Stealthy Label Backdoor Attacks on Image Super-Resolution
Ji Guo, Xiaolei Wen, Wenbo Jiang, Cheng Huang, Jinjin Li, Hongwei Li
-
Hao Fang, Jiawei Kong, Tianqu Zhuang, Yixiang Qiu, Kuofeng Gao, Bin Chen, Shu-Tao Xia, Yaowei Wang, Min Zhang
-
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
Zirui Song, Qian Jiang, Mingxuan Cui, Mingzhe Li, Lang Gao, Zeyu Zhang, Zixiang Xu, Yanbo Wang, Chenxi Wang, Guangxian Ouyang, Zhenhao Chen, Xiuying Chen
-
Silent Leaks: Implicit Knowledge Extraction Attack on RAG Systems through Benign Queries
Yuhao Wang, Wenjie Qu, Yanze Jiang, Zichen Liu, Yue Liu, Shengfang Zhai, Yinpeng Dong, Jiaheng Zhang
-
Beyond Classification: Evaluating Diffusion Denoised Smoothing for Security-Utility Trade off
Yury Belousov, Brian Pulfer, Vitaliy Kinakh, Slava Voloshynovskiy
-
A Federated Splitting Framework for LLMs: Security, Efficiency, and Adaptability
Zishuai Zhang, Hainan Zhang, Jiaying Zheng, Ziwei Wang, Yongxin Tong, Jin Dong, Zhiming Zheng
-
A Unified Theoretical Analysis of Private and Robust Offline Alignment: from RLHF to DPO
Xingyu Zhou, Yulian Wu, Francesco Orabona
-
Alignment Under Pressure: The Case for Informed Adversaries When Evaluating LLM Defenses
Xiaoxue Yang, Bozhidar Stevanoski, Matthieu Meeus, Yves-Alexandre de Montjoye
-
Scalable Defense against In-the-wild Jailbreaking Attacks with Safety Context Retrieval
Taiye Chen, Zeming Wei, Ang Li, Yisen Wang
-
Fooling the LVLM Judges: Visual Biases in LVLM-Based Evaluation
Yerin Hwang, Dongryeol Lee, Kyungmin Min, Taegwan Kang, Yong-il Kim, Kyomin Jung
-
Improving LLM First-Token Predictions in Multiple-Choice Question Answering via Prefilling Attack
Silvia Cappelletti, Tobia Poppi, Samuele Poppi, Zheng-Xin Yong, Diego Garcia-Olano, Marcella Cornia, Lorenzo Baraldi, Rita Cucchiara
-
Are Vision-Language Models Safe in the Wild? A Meme-Based Benchmark Study
DongGeon Lee, Joonwon Jang, Jihae Jeong, Hwanjo Yu
-
Be Careful When Fine-tuning On Open-Source LLMs: Your Fine-tuning Data Could Be Secretly Stolen!
Zhexin Zhang, Yuhao Sun, Junxiao Yang, Shiyao Cui, Hongning Wang, Minlie Huang
-
Advancing LLM Safe Alignment with Safety Representation Ranking
Tianqi Du, Zeming Wei, Quan Chen, Chenheng Zhang, Yisen Wang
-
Reverse Engineering Human Preferences with Reinforcement Learning
Lisa Alazraki, Tan Yi-Chern, Jon Ander Campos, Maximilian Mozes, Marek Rei, Max Bartolo
-
Hwan Chang, Yumin Kim, Yonghyun Jun, Hwanhee Lee
-
Geometrically Regularized Transfer Learning with On-Manifold and Off-Manifold Perturbation
Hana Satou, Alan Mitkiy, F Monkey
-
Hana Satou, F Monkey
-
My Face Is Mine, Not Yours: Facial Protection Against Diffusion Model Face Swapping
Hon Ming Yam, Zhongliang Guo, Chun Pong Lau
-
Few-Shot Adversarial Low-Rank Fine-Tuning of Vision-Language Models
Sajjad Ghiasvand, Haniyeh Ehsani Oskouie, Mahnoosh Alizadeh, Ramtin Pedarsani
-
Tong Cheng, Fu Jie, Xinpeng Ling, Huifa Li, Zhili Chen
-
Enhancing Certified Robustness via Block Reflector Orthogonal Layers and Logit Annealing Loss
Bo-Han Lai, Pin-Han Huang, Bo-Han Kung, Shang-Tse Chen
-
A Linear Approach to Data Poisoning
Diego Granziol, Donald Flynn
-
Rina Tazaki, Tomoyuki Akiyama, Akira Furui
-
A Survey On Secure Machine Learning
Taobo Liao, Taoran Li, Prathamesh Nadkarni
-
Aaron J. Li, Suraj Srinivas, Usha Bhalla, Himabindu Lakkaraju
-
Wenrui Yu, Yiyi Chen, Johannes Bjerva, Sokol Kosta, Qiongxiu Li
-
Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains
Yash Saxena, Anpur Padia, Mandar S Chaudhary, Kalpa Gunaratna, Srinivasan Parthasarathy, Manas Gaur
-
Challenger: Affordable Adversarial Driving Video Generation
Zhiyuan Xu, Bohan Li, Huan-ang Gao, Mingju Gao, Yong Chen, Ming Liu, Chenxu Yan, Hang Zhao, Shuo Feng, Hao Zhao
-
RRTL: Red Teaming Reasoning Large Language Models in Tool Learning
Yifei Liu, Yu Cui, Haibin Zhang
-
Covert Attacks on Machine Learning Training in Passively Secure MPC
Matthew Jagielski, Daniel Escudero, Rahul Rachuri, Peter Scholl
-
Neuromorphic Mimicry Attacks Exploiting Brain-Inspired Computing for Covert Cyber Intrusions
Hemanth Ravipati
-
CrossRF: A Domain-Invariant Deep Learning Approach for RF Fingerprinting
Fahrettin Emin Tiras, Hayriye Serra Altinoluk
-
Hao Fang, Jiawei Kong, Tianqu Zhuang, Yixiang Qiu, Kuofeng Gao, Bin Chen, Shu-Tao Xia, Yaowei Wang, Min Zhang
-
Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains
Yash Saxena, Ankur Padia, Mandar S Chaudhary, Kalpa Gunaratna, Srinivasan Parthasarathy, Manas Gaur
-
Blind Spot Navigation: Evolutionary Discovery of Sensitive Semantic Concepts for LVLMs
Zihao Pan, Yu Tong, Weibin Wu, Jingyi Wang, Lifeng Chen, Zhe Zhao, Jiajia Wei, Yitong Qiao, Zibin Zheng
-
MAPS: A Multilingual Benchmark for Global Agent Performance and Security
Omer Hofman, Jonathan Brokman, Oren Rachmil, Shamik Bose, Vikas Pahuja, Toshiya Shimizu, Trisha Starostina, Kelly Marchisio, Seraphina Goldfarb-Tarrant, Roman Vainshtein
-
Yiming Huang, Junyan Zhang, Zihao Wang, Biquan Bie, Yunzhong Qiu, Yi R. Fung, Xinlei He
-
EVA: Red-Teaming GUI Agents via Evolving Indirect Prompt Injection
Yijie Lu, Tianjie Ju, Manman Zhao, Xinbei Ma, Yuan Guo, ZhuoSheng Zhang
-
SafetyNet: Detecting Harmful Outputs in LLMs by Modeling and Monitoring Deceptive Behaviors
Maheep Chaudhary, Fazl Barez
-
Safety2Drive: Safety-Critical Scenario Benchmark for the Evaluation of Autonomous Driving
Jingzheng Li, Tiancheng Wang, Xingyu Peng, Jiacheng Chen, Zhijun Chen, Bing Li, Xianglong Liu
-
FedGraM: Defending Against Untargeted Attacks in Federated Learning via Embedding Gram Matrix
Di Wu, Qian Li, Heng Yang, Yong Han
-
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
Guangke Chen, Fu Song, Zhe Zhao, Xiaojun Jia, Yang Liu, Yanchen Qiao, Weizhe Zhang
-
"Haet Bhasha aur Diskrimineshun": Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs
Darpan Aswal, Siddharth D Jaiswal
-
Exploring Jailbreak Attacks on LLMs through Intent Concealment and Diversion
Tiehan Cui, Yanxu Mao, Peipei Liu, Congying Liu, Datao You
-
Can Large Language Models Really Recognize Your Name?
Dzung Pham, Peter Kairouz, Niloofar Mireshghallah, Eugene Bagdasarian, Chau Minh Pham, Amir Houmansadr
-
Language Models Optimized to Fool Detectors Still Have a Distinct Style (And How to Change It)
Rafael Rivera Soto, Barry Chen, Nicholas Andrews
-
Universal Acoustic Adversarial Attacks for Flexible Control of Speech-LLMs
Rao Ma, Mengjie Qian, Vyas Raina, Mark Gales, Kate Knill
-
Hidden Ghost Hand: Unveiling Backdoor Vulnerabilities in MLLM-Powered Mobile GUI Agents
Pengzhou Cheng, Haowen Hu, Zheng Wu, Zongru Wu, Tianjie Ju, Daizong Ding, Zhuosheng Zhang, Gongshen Liu
-
PandaGuard: Systematic Evaluation of LLM Safety in the Era of Jailbreaking Attacks
Guobin Shen, Dongcheng Zhao, Linghao Feng, Xiang He, Jihang Wang, Sicheng Shen, Haibo Tong, Yiting Dong, Jindong Li, Xiang Zheng, Yi Zeng
-
Beyond Text: Unveiling Privacy Vulnerabilities in Multi-modal Retrieval-Augmented Generation
Jiankun Zhang, Shenglai Zeng, Jie Ren, Tianqi Zheng, Hui Liu, Xianfeng Tang, Hui Liu, Yi Chang
-
Is Your Prompt Safe? Investigating Prompt Injection Attacks Against Open-Source LLMs
Jiawen Wang, Pritha Gupta, Ivan Habernal, Eyke Hüllermeier
-
Domain Adaptation for Multi-label Image Classification: a Discriminator-free Approach
Inder Pal Singh, Enjie Ghorbel, Anis Kacem, Djamila Aouada
-
Adversarial Training from Mean Field Perspective
Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki
-
Adversarially Pretrained Transformers may be Universally Robust In-Context Learners
Soichiro Kumano, Hiroshi Kera, Toshihiko Yamasaki
-
Fragments to Facts: Partial-Information Fragment Inference from LLMs
Lucas Rosenblatt, Bin Han, Robert Wolfe, Bill Howe
-
ShortcutProbe: Probing Prediction Shortcuts for Learning Robust Models
Guangtao Zheng, Wenqian Ye, Aidong Zhang
-
Adverseness vs. Equilibrium: Exploring Graph Adversarial Resilience through Dynamic Equilibrium
Xinxin Fan, Wenxiong Chen, Mengfan Li, Wenqi Wei, Ling Liu
-
SifterNet: A Generalized and Model-Agnostic Trigger Purification Approach
Shaoye Luo, Xinxin Fan, Quanliang Jing, Chi Lin, Mengfan Li, Yunfeng Lu, Yongjun Xu
-
Tomasz Maciążek, Robert Allison
-
Lessons from Defending Gemini Against Indirect Prompt Injections
Chongyang Shi, Sharon Lin, Shuang Song, Jamie Hayes, Ilia Shumailov, Itay Yona, Juliette Pluto, Aneesh Pappu, Christopher A. Choquette-Choo, Milad Nasr, Chawin Sitawarin, Gena Gibson, Andreas Terzis, John "Four" Flynn
-
D4+: Emergent Adversarial Driving Maneuvers with Approximate Functional Optimization
Diego Ortiz Barbosa, Luis Burbano, Carlos Hernandez, Zengxiang Lei, Younghee Park, Satish Ukkusuri, Alvaro A Cardenas
-
Replay Attacks Against Audio Deepfake Detection
Nicolas Müller, Piotr Kawa, Wei-Herng Choong, Adriana Stan, Aditya Tirumala Bukkapatnam, Karla Pizzi, Alexander Wagner, Philip Sperl
-
Anomaly Detection Based on Critical Paths for Deep Neural Networks
Fangzhen Zhao, Chenyi Zhang, Naipeng Dong, Ming Li, Jinxiao Shan
-
Efficient Privacy-Preserving Cross-Silo Federated Learning with Multi-Key Homomorphic Encryption
Abdullah Al Omar, Xin Yang, Euijin Choo, Omid Ardakanian
-
GloSS over Toxicity: Understanding and Mitigating Toxicity in LLMs via Global Toxic Subspace
Zenghao Duan, Zhiyi Yin, Zhichao Shi, Liang Pang, Shaoling Jing, Jiayi Wu, Yu Yan, Huawei Shen, Xueqi Cheng
-
Md Rafi Ur Rashid, Vishnu Asutosh Dasu, Ye Wang, Gang Tan, Shagufta Mehnaz
-
PandaGuard: Systematic Evaluation of LLM Safety against Jailbreaking Attacks
Guobin Shen, Dongcheng Zhao, Linghao Feng, Xiang He, Jihang Wang, Sicheng Shen, Haibo Tong, Yiting Dong, Jindong Li, Xiang Zheng, Yi Zeng
-
GraphemeAug: A Systematic Approach to Synthesized Hard Negative Keyword Spotting Examples
Harry Zhang, Kurt Partridge, Pai Zhu, Neng Chen, Hyun Jin Park, Dhruuv Agarwal, Quan Wang
-
Dual Data Alignment Makes AI-Generated Image Detector Easier Generalizable
Ruoxin Chen, Junwei Xi, Zhiyuan Yan, Ke-Yue Zhang, Shuang Wu, Jingyi Xie, Xu Chen, Lei Xu, Isabel Guan, Taiping Yao, Shouhong Ding
-
Wenbin Hu, Haoran Li, Huihao Jing, Qi Hu, Ziqian Zeng, Sirui Han, Heli Xu, Tianshu Chu, Peizhao Hu, Yangqiu Song
-
Causes and Consequences of Representational Similarity in Machine Learning Models
Zeyu Michael Li, Hung Anh Vu, Damilola Awofisayo, Emily Wenger
-
Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining
Chenxi Liu, Tianyi Xiong, Yanshuo Chen, Ruibo Chen, Yihan Wu, Junfeng Guo, Tianyi Zhou, Heng Huang
-
"Haet Bhasha aur Diskrimineshun": Phonetic Perturbations in Code-Mixed Hinglish to Red-Team LLMs
Darpan Aswal, Siddharth D Jaiswal
-
SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
Wonje Jeung, Sangyeon Yoon, Minsuk Kahng, Albert No
-
Bullying the Machine: How Personas Increase LLM Vulnerability
Ziwei Xu, Udit Sanghi, Mohan Kankanhalli
-
Language Models That Walk the Talk: A Framework for Formal Fairness Certificates
Danqing Chen, Tobias Ladner, Ahmed Rayen Mhadhbi, Matthias Althoff
-
Adversarial Testing in LLMs: Insights into Decision-Making Vulnerabilities
Lili Zhang, Haomiaomiao Wang, Long Cheng, Libao Deng, Tomas Ward
-
FLTG: Byzantine-Robust Federated Learning via Angle-Based Defense and Non-IID-Aware Weighting
Yanhua Wen, Lu Ai, Gang Liu, Chuang Li, Jianhao Wei
-
Does Low Rank Adaptation Lead to Lower Robustness against Training-Time Attacks?
Zi Liang, Haibo Hu, Qingqing Ye, Yaxin Xiao, Ronghua Li
-
From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents
Liangxuan Wu, Chao Wang, Tianming Liu, Yanjie Zhao, Haoyu Wang
-
Yimao Guo, Zuomin Qu, Wei Lu, Xiangyang Luo
-
Evaluatiing the efficacy of LLM Safety Solutions : The Palit Benchmark Dataset
Sayon Palit, Daniel Woods
-
FlowPure: Continuous Normalizing Flows for Adversarial Purification
Elias Collaert, Abel Rodríguez, Sander Joos, Lieven Desmet, Vera Rimmer
-
Recommender Systems for Democracy: Toward Adversarial Robustness in Voting Advice Applications
Frédéric Berdoz, Dustin Brunner, Yann Vonlanthen, Roger Wattenhofer
-
Policy-Driven World Model Adaptation for Robust Offline Model-based Reinforcement Learning
Jiayu Chen, Aravind Venugopal, Jeff Schneider
-
Robust learning of halfspaces under log-concave marginals
Jane Lange, Arsen Vasilyan
-
A Few Large Shifts: Layer-Inconsistency Based Minimal Overhead Adversarial Example Detection
Sanggeon Yun, Ryozo Masukawa, Hyunwoo Oh, Nathaniel D. Bastian, Mohsen Imani
-
BeamClean: Language Aware Embedding Reconstruction
Kaan Kale, Kyle Mylonakis, Jay Roberts, Sidhartha Roy
-
SVAFD: A Secure and Verifiable Co-Aggregation Protocol for Federated Distillation
Tian Wen, Sheng Sun, Yuwei Wang, Peiyan Chen, Zhiyuan Wu, Min Liu, Bo Gao
-
Safety Alignment Can Be Not Superficial With Explicit Safety Signals
Jianwei Li, Jung-Eng Kim
-
Benchmarking Unified Face Attack Detection via Hierarchical Prompt Tuning
Ajian Liu, Haocheng Yuan, Xiao Guo, Hui Ma, Wanyi Zhuang, Changtao Miao, Yan Hong, Chuanbiao Song, Jun Lan, Qi Chu, Tao Gong, Yanyan Liang, Weiqiang Wang, Jun Wan, Xiaoming Liu, Zhen Lei
-
Jiaqi Tan, Xu Zheng, Yang Liu
-
CURE: Concept Unlearning via Orthogonal Representation Editing in Diffusion Models
Shristi Das Biswas, Arani Roy, Kaushik Roy
-
3D Visual Illusion Depth Estimation
Chengtang Yao, Zhidan Liu, Jiaxi Zeng, Lidong Yu, Yuwei Wu, Yunde Jia
-
Language Models Are Capable of Metacognitive Monitoring and Control of Their Internal Activations
Li Ji-An, Hua-Dong Xiong, Robert C. Wilson, Marcelo G. Mattar, Marcus K. Benna
-
Self-Destructive Language Model
Yuhui Wang, Rongyi Zhu, Ting Wang
-
PANORAMA: A synthetic PII-laced dataset for studying sensitive data memorization in LLMs
Sriram Selvam, Anneswa Ghosh
-
The Tower of Babel Revisited: Multilingual Jailbreak Prompts on Closed-Source Large Language Models
Linghan Huang, Haolin