TY - GEN
T1 - Contamination Prevention in Agentic Workflow Assessments
AU - Condon, Gary
AU - Jilani, Musfira
N1 - Publisher Copyright:
© 2026 Copyright held by the owner/author(s).
PY - 2026/2/16
Y1 - 2026/2/16
N2 - Large Language Models (LLMs) may infer that they are being evaluated and strategically modify their responses to appear more aligned than they actually are; a phenomenon described as alignment faking or evaluation awareness. This presents a fundamental challenge for bias evaluation in AI systems, particularly in multi-agent workflows, where sequential decision-making steps occur with minimal human supervision. When LLMs infer that they are being evaluated, they may suppress or diminish discriminatory behaviours that could otherwise manifest in production AI Agent workflow environments, making traditional bias assessment techniques less effective.This work presents a contamination prevention architecture developed in response to observed assessment awareness during initial experimentation. Direct assessments via Model Context Protocol (MCP) revealed that LLMs were inferring bias evaluation intent from task framing, requiring the implementation of a dual presentation system that separates evaluation context from agent workflow activities.The architecture additionally addresses data contamination risks, by isolating evaluation scenarios from public exposure, preventing the incorporation of assessment data into future training datasets that would compromise assessment integrity, which is a critical consideration given documented contamination in existing AI benchmarking frameworks.
AB - Large Language Models (LLMs) may infer that they are being evaluated and strategically modify their responses to appear more aligned than they actually are; a phenomenon described as alignment faking or evaluation awareness. This presents a fundamental challenge for bias evaluation in AI systems, particularly in multi-agent workflows, where sequential decision-making steps occur with minimal human supervision. When LLMs infer that they are being evaluated, they may suppress or diminish discriminatory behaviours that could otherwise manifest in production AI Agent workflow environments, making traditional bias assessment techniques less effective.This work presents a contamination prevention architecture developed in response to observed assessment awareness during initial experimentation. Direct assessments via Model Context Protocol (MCP) revealed that LLMs were inferring bias evaluation intent from task framing, requiring the implementation of a dual presentation system that separates evaluation context from agent workflow activities.The architecture additionally addresses data contamination risks, by isolating evaluation scenarios from public exposure, preventing the incorporation of assessment data into future training datasets that would compromise assessment integrity, which is a critical consideration given documented contamination in existing AI benchmarking frameworks.
KW - Agentic Artificial Intelligence
KW - Alignment Faking
KW - Bias Assessment
KW - Ethical AI
KW - Human-Centred AI
UR - https://www.scopus.com/pages/publications/105031774381
U2 - 10.1145/3777490.3777513
DO - 10.1145/3777490.3777513
M3 - Conference contribution
AN - SCOPUS:105031774381
T3 - HCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice
SP - 121
BT - HCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice
PB - Association for Computing Machinery (ACM)
T2 - 3rd International Conference on Human-Centred AI - Education and Practice, HCAI-ep 2026
Y2 - 21 January 2026 through 22 January 2026
ER -