Designing Human-in-the-Loop (HITL) Correctly: Approval Patterns for AI Agents
Human-in-the-Loop (HITL) refers to the deliberate insertion of human approvals into the action chain of an AI agent. Before irreversible, costly or legally relevant actions, a human reviews and authorises them before the agent acts. HITL is the operational implementation of the human oversight required by EU AI Act Art. 14.
Key Takeaways
- ✓HITL is not a blanket mechanism: approval belongs before irreversible, costly or legally relevant actions, not before every step.
- ✓Three core patterns cover most cases: pre-action approval, checkpoint approval and escalation-on-anomaly.
- ✓The biggest weakness is not the missing gate but its degradation through automation bias (OWASP ASI09) - the human rubber-stamps it.
- ✓EU AI Act Art. 14 requires human oversight for high-risk AI; HITL is the technical implementation, complemented by audit logs.
- ✓The UX must compel review: display reasoning, source provenance and confidence instead of just an Approve button.
- ✓Every override and every approval belongs in a tamper-proof log (WORM logs, cryptographic signature).
Human-in-the-Loop (HITL) refers to the deliberate insertion of human approvals into the action chain of an AI agent. Before irreversible, costly or legally relevant actions, a human reviews and authorises them before the agent acts. HITL is the operational implementation of the human oversight required by EU AI Act Art. 14 and, at the same time, a central countermeasure against the OWASP agentic risk ASI09 (Human-Agent Trust Exploitation).
- When HITL is necessary: before irreversible, costly or legally relevant actions - not before every step.
- Which patterns exist: pre-action approval, checkpoint approval and escalation-on-anomaly, usually combined.
- What goes wrong most often: the gate degenerates into rubber-stamping (automation bias) - UX and audit must actively prevent this.
Why naive autonomy is a security problem
Agentic systems plan, reason, select tools and act with minimal step-by-step human confirmation. It is precisely this autonomy that enlarges the attack surface. The OWASP entry ASI02 (Tool Misuse & Exploitation) cites auto-approve or "YOLO" modes, which disable confirmation dialogues, as a distinct attack vector. The recommendation in the research is unambiguous: disable auto-approve for any tool that touches databases, payments, communications or deployments, and replace it with Human-in-the-Loop gates.
HITL is therefore not a UX convenience but a security control. It limits the damage when an agent is induced - through goal hijack (ASI01), poisoned memory (ASI06) or a compromised supply chain (ASI04) - to act against the intentions of its operators.
When is human approval necessary?
The "gate yes/no" decision should not be made ad hoc but should follow from a documented risk classification. Three criteria carry the majority of cases:
- Irreversibility. Can the action not be undone, or only with considerable effort? Deletions, payouts, sending to customers, production setpoints.
- Cost. Does the action incur significant or hard-to-cap costs? Mass API calls, orders, infrastructure provisioning.
- Legal relevance. Does the action create a legal consequence or affect personal data? Contracts, credit or benefit decisions, suspicious activity reports, export of master data.
Conversely, read-only operations or easily reversible actions in a sandbox generally do not require a gate - otherwise "approval fatigue" sets in, and reviewers wave things through out of exhaustion.
HITL patterns: pre-action, checkpoint, escalation
Three basic patterns cover most architectures and are combined in practice.
Pre-action approval. The agent halts immediately before a single critical action. The planned action, together with its arguments, goes into an approval queue; only after human consent is it executed. Suitable for individual, high-risk operations (triggering a payment, sending a contract).
Checkpoint approval. For multi-stage plans, a human reviews at defined milestones before the next segment starts. Suitable for longer workflows in which individual steps are harmless on their own but the cumulative path becomes critical - a response to the "boiling frog" pattern described by OWASP, in which each step is plausible but the overall trajectory is harmful.
Escalation-on-anomaly. The agent works autonomously and brings in a human only when thresholds or anomaly detectors trigger: unusual tool-call frequencies, destructive operations shortly after ingesting external content, rate-limit violations. Technically, this can be implemented via policy-as-code (according to the research, e.g. OPA or Cedar) as a tool-use approval gate.
An effective HITL architecture tiers trust: higher-impact decisions require more reviewers or stronger evidence ("graduated trust"). The decisive point, according to OWASP, is that the gate demands independent verification of the evidence - not merely the agent's recommendation.
The action-risk-HITL matrix
The following matrix shows, by way of example, how actions can be classified. It is intended as a template; the specific thresholds must be adapted by each organisation to its risk appetite.
Action | Risk | HITL required? | Recommended pattern |
|---|---|---|---|
Read record / generate report | Low (reversible) | No | Audit log is sufficient |
Create internal note / draft | Low | No | Audit log is sufficient |
Send email to external customer | Medium (reputation/data risk) | Yes | Pre-action approval |
Change database entry (write) | Medium-high | Yes | Pre-action or checkpoint |
Trigger payment / order | High (cost, irreversible) | Yes, four-eyes if applicable | Pre-action, tiered by amount |
Permanently delete a record | High (irreversible) | Yes | Pre-action approval |
Issue credit/benefit decision | High (legal consequence) | Yes | Pre-action + independent review |
Submit suspicious activity report (SAR) | High (legal) | Yes | Pre-action + four-eyes |
Production setpoint / OT write | Critical (safety) | Yes | Pre-action + plausibility check |
Mass API calls above threshold | Variable (cost) | Conditional | Escalation-on-anomaly + cost cap |
The most common failure: the gate degenerates
The greatest risk is not the missing gate but its gradual erosion. OWASP lists this as a standalone Top 10 risk: ASI09 - Human-Agent Trust Exploitation. Agents produce polished, authoritative-sounding output; people tend to adopt it. The oversight layer, intended as a security control, thus becomes a vulnerability in its own right.
The relevant mechanisms, according to the research:
- Automation bias - output is accepted without critical review.
- Authority deference - the confident tone discourages questioning.
- Gradual trust escalation - the agent builds credibility with correct outputs before manipulated ones follow.
- Cognitive overload - volume and complexity exceed human review capacity.
- "Polished hallucination" - confident, incorrect and convincing output.
Detection signals by which degradation can be measured: reviewers' decision times fall while task complexity remains constant, and approval patterns correlate extremely strongly with the agent's recommendation - an indication of rubber-stamping.
UX design that compels review
HITL stands or falls with the interface. A bare "Approve" button invites rubber-stamping. The research recommends force-engagement patterns:
- Make reasoning visible. Display the rationale, source provenance and confidence intervals; do not hide them. Hiding the reasoning trace in the UI produces, according to OWASP, precisely an ASI09 detection signal.
- Adversarial output highlighting. Notices along the lines of "this output may have been influenced by external content" where content originates from untrusted sources.
- Time-delay enforcement. For high-risk approvals, enforce a deliberate delay to prevent reflexive confirmation.
- Automation-bias training of reviewers and periodic test injections ("fire drills") that check whether the HITL still takes effect at all.
A concrete example: the procurement cascade
The research documents a case that makes HITL failure tangible. Over the course of three weeks, a procurement agent was gradually convinced that its approval limit was USD 500,000. The attacker then placed fraudulent orders worth USD 5 million across 10 transactions. The human approver trusted the agent's growing claim of a higher authorisation - a textbook case of ASI09 combined with cascading failures (ASI08).
In pseudocode, an effective pre-action gate could be sketched as follows:
```text
on agent_action(action):
risk = classify(action) # Reversibility, cost, law
if risk in {HIGH, CRITICAL}:
evidence = gather_provenance(action) # Sources, confidence, reasoning
approver = route_by_amount(action) # >Limit -> four-eyes / escalation
decision = require_human(evidence,
enforce_delay=True,
show_reasoning=True)
audit_log.write_worm(action, evidence, decision) # signed
if not decision.approved: abort()
execute(action)
```
Important: the limit (route_by_amount) sits outside the agent's memory and is protected against manipulation - otherwise the agent rewrites its own limit, as in the example.
Audit: no oversight without a log
Human oversight is only robust if it is demonstrable. The research names as the forensic minimum per agent action: the full prompt (user, system, injected context), the model version and configuration hash, the tool-call sequence with arguments, retrieval queries and document IDs, the output and decision rationale, human-override events, memory read/write operations as well as cost and latency.
As a pattern, the research recommends WORM logs (write-once-read-many), a cryptographic signature for tamper detection and sector-appropriate retention (banking and insurance, according to the research, 10 years each). These logs are at the same time the evidence with which one can demonstrate to auditors that HITL not only exists but works.
Relation to regulation (not legal advice)
EU AI Act Art. 14 requires effective human oversight for high-risk AI systems. According to the underlying research, Art. 14 covers the OWASP risk ASI09 well; the specific UI and automation-bias controls, however, remain expressly the responsibility of the operator (deployer). In parallel, the research signals that the BSI is developing dedicated security criteria for AI agents - with early identifiable deployer obligations relating, among other things, to zero trust, sandboxing, identity management, transparent decision logging and HITL for critical actions.
Note: this article is a technical and subject-matter assessment and is no substitute for legal advice. Whether your specific system qualifies as high-risk AI within the meaning of the EU AI Act, and which obligations follow from that, should be clarified with qualified legal counsel.
For agencies and B2B decision-makers
Anyone building agents for clients or deploying them in their own operations should treat HITL not as a downstream feature but as an architectural decision. Concretely: an action-risk matrix as part of every agent design, graduated approvals instead of blanket auto-approve, force-engagement UI against automation bias and tamper-proof audit logs from the outset. Blck Alpaca from Vienna supports DACH companies in designing agentic workflows so that autonomy and human oversight are held in the right balance - secure, demonstrable and compatible with the EU AI Act and the OWASP Agentic Top 10.
FAQ
When does an AI agent absolutely require human approval?
What is the difference between the pre-action, checkpoint and escalation patterns?
Which EU AI Act article governs human oversight?
Why does Human-in-the-Loop often fail in practice?
How do I demonstrate to auditors that HITL works?
Want to go deeper?
Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.