Skip to main navigation Skip to search Skip to main content

Contamination Prevention in Agentic Workflow Assessments

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Large Language Models (LLMs) may infer that they are being evaluated and strategically modify their responses to appear more aligned than they actually are; a phenomenon described as alignment faking or evaluation awareness. This presents a fundamental challenge for bias evaluation in AI systems, particularly in multi-agent workflows, where sequential decision-making steps occur with minimal human supervision. When LLMs infer that they are being evaluated, they may suppress or diminish discriminatory behaviours that could otherwise manifest in production AI Agent workflow environments, making traditional bias assessment techniques less effective.This work presents a contamination prevention architecture developed in response to observed assessment awareness during initial experimentation. Direct assessments via Model Context Protocol (MCP) revealed that LLMs were inferring bias evaluation intent from task framing, requiring the implementation of a dual presentation system that separates evaluation context from agent workflow activities.The architecture additionally addresses data contamination risks, by isolating evaluation scenarios from public exposure, preventing the incorporation of assessment data into future training datasets that would compromise assessment integrity, which is a critical consideration given documented contamination in existing AI benchmarking frameworks.

Original languageEnglish
Title of host publicationHCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice
PublisherAssociation for Computing Machinery (ACM)
Pages121
Number of pages1
ISBN (Electronic)9798400721533
DOIs
Publication statusPublished - 16 Feb 2026
Event3rd International Conference on Human-Centred AI - Education and Practice, HCAI-ep 2026 - Kildare, Ireland
Duration: 21 Jan 202622 Jan 2026

Publication series

NameHCAI-ep 2026 - Proceedings of the 2026 Conference on Human Centered Artificial Intelligence - Education and Practice

Conference

Conference3rd International Conference on Human-Centred AI - Education and Practice, HCAI-ep 2026
Country/TerritoryIreland
CityKildare
Period21/01/2622/01/26

Keywords

  • Agentic Artificial Intelligence
  • Alignment Faking
  • Bias Assessment
  • Ethical AI
  • Human-Centred AI

Fingerprint

Dive into the research topics of 'Contamination Prevention in Agentic Workflow Assessments'. Together they form a unique fingerprint.

Cite this