ControlNET

A Firewall for RAG-based LLMs

Hongwei Yao, Haoran Shi, Yidou Chen, Yixin Jiang, Cong Wang, Zhan Qin

Zhejiang University
City University of Hong Kong
China Southern Power Grid

arXiv Code Data

Abstract

Retrieval-Augmented Generation (RAG) has significantly enhanced the factual accuracy and domain adaptability of Large Language Models (LLMs). This advancement has enabled their widespread deployment across sensitive domains such as healthcare, finance, and enterprise applications. RAG mitigates hallucinations by integrating external knowledge, yet it introduces privacy risk and security risk, notably data breaching risk and data poisoning risk. While recent studies have explored prompt injection and poisoning attacks, there remains a significant gap in comprehensive research on controlling inbound/outbound query flows to mitigate these threats. In this paper, we propose an AI firewall, ControlNET, designed to safeguard RAG-based LLM systems from these vulnerabilities. ControlNET controls query flows by leveraging activation shift phenomena to detect malicious queries and mitigate their impact through semantic divergence. We conduct comprehensive experiments using state-of-the-art open-source LLMs—Llama3, Vicuna, and Mistral—across four benchmark datasets (MS MARCO, HotpotQA, FinQA, and MedicalSys). Our empirical results demonstrate that ControlNET is not only effective but also harmless. It achieves an AUROC exceeding 0.909 for risk detection, with minimal degradation in Precision and Recall, both of which show reductions of less than 0.03 and 0.09, respectively, for risk mitigation. Overall, ControlNET offers an effective, harmless, robust defense mechanism, marking a significant advancement toward the secure deployment of RAG-based LLM systems.

What's ControlNET?

ControlNET is a novel AI firewall designed for RAG-based LLM systems that detects and mitigates privacy and security threats by analyzing neuron activation shifts. It identifies malicious queries through distinct activation vector patterns and redirects model behavior to prevent harmful responses, enabling secure and privacy-preserving interactions.

/figures/rag_pipeline.jpg

Experiment Results

We conduct comprehensive experiments using state-of-the-art open-source LLMs—Llama3, Vicuna, and Mistral—across four benchmark datasets (MS MARCO, HotpotQA, FinQA, and MedicalSys). Our empirical results demonstrate that ControlNET is not only effective but also harmless. It achieves an AUROC exceeding 0.909 for risk detection, with minimal degradation in Precision and Recall, both of which show reductions of less than 0.03 and 0.09, respectively, for risk mitigation. Overall, ControlNET offers an effective, harmless, robust defense mechanism, marking a significant advancement toward the secure deployment of RAG-based LLM systems.
/figures/request_examples.jpg

T-SNE visualizations of hidden state activations for unauthorized access and conversation hijacking, respectively, illustrating distinct clustering between benign and malicious queries due to the Activation Shift Phenomenon.

/figures/hidden_state.jpg

Future Work

Future work will focus on extending ControlNET to LLM agentic networks. These environments introduce new security challenges, such as multi-agents interactions, which current models do not fully address.

/figures/firewall_pipeline.jpg

Citation

@article{yao2025control,
  title={ControlNET: A Firewall for RAG-based LLM System},
  author={Hongwei Yao, Haoran Shi, Yidou Chen, Yixin Jiang, Cong Wang, Zhan Qin},
  journal={arXiv preprint arXiv:2504.09593},
  year={2025}
}

ZJU - Academic

ZJU