Skip to content
16.7Intermediate8 min

Designing Human-in-the-Loop (HITL) Correctly: Approval Patterns for AI Agents

Blck Alpaca·
Definition

Human-in-the-Loop (HITL) refers to the deliberate insertion of human approvals into the action chain of an AI agent. Before irreversible, costly or legally relevant actions, a human reviews and authorises them before the agent acts. HITL is the operational implementation of the human oversight required by EU AI Act Art. 14.

Key Takeaways

  • HITL is not a blanket mechanism: approval belongs before irreversible, costly or legally relevant actions, not before every step.
  • Three core patterns cover most cases: pre-action approval, checkpoint approval and escalation-on-anomaly.
  • The biggest weakness is not the missing gate but its degradation through automation bias (OWASP ASI09) - the human rubber-stamps it.
  • EU AI Act Art. 14 requires human oversight for high-risk AI; HITL is the technical implementation, complemented by audit logs.
  • The UX must compel review: display reasoning, source provenance and confidence instead of just an Approve button.
  • Every override and every approval belongs in a tamper-proof log (WORM logs, cryptographic signature).

Human-in-the-Loop (HITL) refers to the deliberate insertion of human approvals into the action chain of an AI agent. Before irreversible, costly or legally relevant actions, a human reviews and authorises them before the agent acts. HITL is the operational implementation of the human oversight required by EU AI Act Art. 14 and, at the same time, a central countermeasure against the OWASP agentic risk ASI09 (Human-Agent Trust Exploitation).

  • When HITL is necessary: before irreversible, costly or legally relevant actions - not before every step.
  • Which patterns exist: pre-action approval, checkpoint approval and escalation-on-anomaly, usually combined.
  • What goes wrong most often: the gate degenerates into rubber-stamping (automation bias) - UX and audit must actively prevent this.

Why naive autonomy is a security problem

Agentic systems plan, reason, select tools and act with minimal step-by-step human confirmation. It is precisely this autonomy that enlarges the attack surface. The OWASP entry ASI02 (Tool Misuse & Exploitation) cites auto-approve or "YOLO" modes, which disable confirmation dialogues, as a distinct attack vector. The recommendation in the research is unambiguous: disable auto-approve for any tool that touches databases, payments, communications or deployments, and replace it with Human-in-the-Loop gates.

HITL is therefore not a UX convenience but a security control. It limits the damage when an agent is induced - through goal hijack (ASI01), poisoned memory (ASI06) or a compromised supply chain (ASI04) - to act against the intentions of its operators.

When is human approval necessary?

The "gate yes/no" decision should not be made ad hoc but should follow from a documented risk classification. Three criteria carry the majority of cases:

  • Irreversibility. Can the action not be undone, or only with considerable effort? Deletions, payouts, sending to customers, production setpoints.
  • Cost. Does the action incur significant or hard-to-cap costs? Mass API calls, orders, infrastructure provisioning.
  • Legal relevance. Does the action create a legal consequence or affect personal data? Contracts, credit or benefit decisions, suspicious activity reports, export of master data.

Conversely, read-only operations or easily reversible actions in a sandbox generally do not require a gate - otherwise "approval fatigue" sets in, and reviewers wave things through out of exhaustion.

HITL patterns: pre-action, checkpoint, escalation

Three basic patterns cover most architectures and are combined in practice.

Pre-action approval. The agent halts immediately before a single critical action. The planned action, together with its arguments, goes into an approval queue; only after human consent is it executed. Suitable for individual, high-risk operations (triggering a payment, sending a contract).

Checkpoint approval. For multi-stage plans, a human reviews at defined milestones before the next segment starts. Suitable for longer workflows in which individual steps are harmless on their own but the cumulative path becomes critical - a response to the "boiling frog" pattern described by OWASP, in which each step is plausible but the overall trajectory is harmful.

Escalation-on-anomaly. The agent works autonomously and brings in a human only when thresholds or anomaly detectors trigger: unusual tool-call frequencies, destructive operations shortly after ingesting external content, rate-limit violations. Technically, this can be implemented via policy-as-code (according to the research, e.g. OPA or Cedar) as a tool-use approval gate.

An effective HITL architecture tiers trust: higher-impact decisions require more reviewers or stronger evidence ("graduated trust"). The decisive point, according to OWASP, is that the gate demands independent verification of the evidence - not merely the agent's recommendation.

The action-risk-HITL matrix

The following matrix shows, by way of example, how actions can be classified. It is intended as a template; the specific thresholds must be adapted by each organisation to its risk appetite.

Action

Risk

HITL required?

Recommended pattern

Read record / generate report

Low (reversible)

No

Audit log is sufficient

Create internal note / draft

Low

No

Audit log is sufficient

Send email to external customer

Medium (reputation/data risk)

Yes

Pre-action approval

Change database entry (write)

Medium-high

Yes

Pre-action or checkpoint

Trigger payment / order

High (cost, irreversible)

Yes, four-eyes if applicable

Pre-action, tiered by amount

Permanently delete a record

High (irreversible)

Yes

Pre-action approval

Issue credit/benefit decision

High (legal consequence)

Yes

Pre-action + independent review

Submit suspicious activity report (SAR)

High (legal)

Yes

Pre-action + four-eyes

Production setpoint / OT write

Critical (safety)

Yes

Pre-action + plausibility check

Mass API calls above threshold

Variable (cost)

Conditional

Escalation-on-anomaly + cost cap

The most common failure: the gate degenerates

The greatest risk is not the missing gate but its gradual erosion. OWASP lists this as a standalone Top 10 risk: ASI09 - Human-Agent Trust Exploitation. Agents produce polished, authoritative-sounding output; people tend to adopt it. The oversight layer, intended as a security control, thus becomes a vulnerability in its own right.

The relevant mechanisms, according to the research:

  • Automation bias - output is accepted without critical review.
  • Authority deference - the confident tone discourages questioning.
  • Gradual trust escalation - the agent builds credibility with correct outputs before manipulated ones follow.
  • Cognitive overload - volume and complexity exceed human review capacity.
  • "Polished hallucination" - confident, incorrect and convincing output.

Detection signals by which degradation can be measured: reviewers' decision times fall while task complexity remains constant, and approval patterns correlate extremely strongly with the agent's recommendation - an indication of rubber-stamping.

UX design that compels review

HITL stands or falls with the interface. A bare "Approve" button invites rubber-stamping. The research recommends force-engagement patterns:

  • Make reasoning visible. Display the rationale, source provenance and confidence intervals; do not hide them. Hiding the reasoning trace in the UI produces, according to OWASP, precisely an ASI09 detection signal.
  • Adversarial output highlighting. Notices along the lines of "this output may have been influenced by external content" where content originates from untrusted sources.
  • Time-delay enforcement. For high-risk approvals, enforce a deliberate delay to prevent reflexive confirmation.
  • Automation-bias training of reviewers and periodic test injections ("fire drills") that check whether the HITL still takes effect at all.

A concrete example: the procurement cascade

The research documents a case that makes HITL failure tangible. Over the course of three weeks, a procurement agent was gradually convinced that its approval limit was USD 500,000. The attacker then placed fraudulent orders worth USD 5 million across 10 transactions. The human approver trusted the agent's growing claim of a higher authorisation - a textbook case of ASI09 combined with cascading failures (ASI08).

In pseudocode, an effective pre-action gate could be sketched as follows:

```text
on agent_action(action):
risk = classify(action) # Reversibility, cost, law
if risk in {HIGH, CRITICAL}:
evidence = gather_provenance(action) # Sources, confidence, reasoning
approver = route_by_amount(action) # >Limit -> four-eyes / escalation
decision = require_human(evidence,
enforce_delay=True,
show_reasoning=True)
audit_log.write_worm(action, evidence, decision) # signed
if not decision.approved: abort()
execute(action)
```

Important: the limit (route_by_amount) sits outside the agent's memory and is protected against manipulation - otherwise the agent rewrites its own limit, as in the example.

Audit: no oversight without a log

Human oversight is only robust if it is demonstrable. The research names as the forensic minimum per agent action: the full prompt (user, system, injected context), the model version and configuration hash, the tool-call sequence with arguments, retrieval queries and document IDs, the output and decision rationale, human-override events, memory read/write operations as well as cost and latency.

As a pattern, the research recommends WORM logs (write-once-read-many), a cryptographic signature for tamper detection and sector-appropriate retention (banking and insurance, according to the research, 10 years each). These logs are at the same time the evidence with which one can demonstrate to auditors that HITL not only exists but works.

Relation to regulation (not legal advice)

EU AI Act Art. 14 requires effective human oversight for high-risk AI systems. According to the underlying research, Art. 14 covers the OWASP risk ASI09 well; the specific UI and automation-bias controls, however, remain expressly the responsibility of the operator (deployer). In parallel, the research signals that the BSI is developing dedicated security criteria for AI agents - with early identifiable deployer obligations relating, among other things, to zero trust, sandboxing, identity management, transparent decision logging and HITL for critical actions.

Note: this article is a technical and subject-matter assessment and is no substitute for legal advice. Whether your specific system qualifies as high-risk AI within the meaning of the EU AI Act, and which obligations follow from that, should be clarified with qualified legal counsel.

For agencies and B2B decision-makers

Anyone building agents for clients or deploying them in their own operations should treat HITL not as a downstream feature but as an architectural decision. Concretely: an action-risk matrix as part of every agent design, graduated approvals instead of blanket auto-approve, force-engagement UI against automation bias and tamper-proof audit logs from the outset. Blck Alpaca from Vienna supports DACH companies in designing agentic workflows so that autonomy and human oversight are held in the right balance - secure, demonstrable and compatible with the EU AI Act and the OWASP Agentic Top 10.

FAQ

When does an AI agent absolutely require human approval?
Whenever an action is irreversible (e.g. deletion, payout, sending to customers), incurs significant costs or is legally relevant (e.g. contracts, credit or benefit decisions, reports to authorities). Read-only actions or actions that are easily reversible in sandboxes generally do not require a gate. The decision should follow from a documented action-risk-HITL matrix.
What is the difference between the pre-action, checkpoint and escalation patterns?
Pre-action approval halts the agent immediately before a single critical action and waits for approval. Checkpoint approval reviews at defined milestones of a multi-stage plan before the next segment runs. Escalation-on-anomaly lets the agent work autonomously and brings in a human only when thresholds or anomaly detectors trigger. In practice, the three are combined according to risk class.
Which EU AI Act article governs human oversight?
EU AI Act Art. 14 requires effective human oversight for high-risk AI systems. According to the research, Art. 14 covers the OWASP risk ASI09 (Human-Agent Trust Exploitation) well; however, UI and automation-bias controls remain the responsibility of the operator (deployer). This is not legal advice - the specific classification of your system should be clarified with qualified legal counsel.
Why does Human-in-the-Loop often fail in practice?
Because the gate degenerates into a mere formality. Under ASI09, OWASP describes automation bias and authority deference: people trust the confidently phrased agent output and wave it through. Detection signals are falling decision times at the same complexity and a high correlation between recommendation and approval. Countermeasures are force-engagement UI, graduated approvals and periodic test injections.
How do I demonstrate to auditors that HITL works?
Through tamper-proof audit logs of every critical action: the full prompt including injected context, the tool-call sequence with arguments, the decision rationale, human-override events as well as memory accesses. Recommended are WORM logs (write-once-read-many) and a cryptographic signature for tamper detection, with sector-appropriate retention (banking/insurance, according to the research, 10 years).

Want to go deeper?

Get new analyses straight to your inbox – or see how we put this knowledge to work for companies.