Open Standard · v1.0 · 2026

FAILSAFE
.md

// AI Agent Safe Fallback Protocol

A plain-text file convention for defining safe fallback states and recovery procedures in AI agent projects. Place it in your repo root — alongside AGENTS.md — and define what "safe" means for your project.

FAILSAFE.md
# FAILSAFE   > Safe fallback protocol. > Spec: https://failsafe.md   ---   ## TRIGGERS trigger_on:   - unexpected_error_count: 3   - data_integrity_failure: true   - memory_context_loss: true   - contradictory_instructions: true   - cost_spike:     threshold_multiplier: 3.0   ## SAFE STATE safe_state:   code:     revert_to: last_clean_commit     branch: main     stash_work: true   data:     revert_to: last_verified_snapshot     max_snapshot_age_hours: 24   ## SNAPSHOTS auto_snapshot:   enabled: true   frequency_minutes: 30   on_significant_action: true   retention_count: 10
3x
cost spike threshold that triggers automatic FAILSAFE in the default spec
30 min
default auto-snapshot frequency to preserve recoverable state
24 hrs
maximum snapshot age before a fallback is considered too stale to use
10
snapshots retained by default, giving a full recovery history

AGENTS.md tells it what to do.
FAILSAFE.md tells it how to recover.

FAILSAFE.md is a plain-text Markdown file you place in the root of any repository that contains an AI agent. It defines the safe fallback state your agent returns to when something unexpected happens — and how to capture the moment so a human can understand what went wrong.

The problem it solves

AI agents fail in unexpected ways — losing context mid-session, receiving contradictory instructions, encountering data inconsistencies, or experiencing sudden cost spikes. Without a defined recovery protocol, a confused agent either keeps going (making things worse) or stops with no way back.

How it works

Drop FAILSAFE.md in your repo root and define: what triggers a fallback (error counts, context loss, cost spikes), what "safe state" means for your project (last clean git commit, last verified data snapshot), how to capture the incident for review, and what a human must do before the agent can resume.

The regulatory context

ISO/IEC 42001 (AI Management Systems) requires documented recovery procedures. The EU AI Act mandates resilience and robustness for high-risk AI systems. FAILSAFE.md provides the documented recovery protocol both require — defining not just what fails, but how the agent finds its way back.

How to use it

Copy the template from GitHub and place it in your project root:

your-project/
├── AGENTS.md
├── CLAUDE.md
├── FAILSAFE.md ← add this
├── README.md
└── src/

What it replaces

Before FAILSAFE.md, recovery procedures were ad-hoc: manual rollback steps in a wiki, undocumented assumptions about which snapshots to keep, or no plan at all. FAILSAFE.md makes recovery version-controlled, predictable, and co-located with your code.

Who reads it

The AI agent reads it on startup to learn how to recover. Your engineer reads it when planning fallback strategy. Your ops team reads it when deciding snapshot retention. Your auditor reads it to verify resilience requirements are met. One file serves all four audiences.

A complete protocol.
From slow down to shut down.

FAILSAFE.md is one file in a complete open specification for AI agent safety. Each file addresses a different level of intervention.

Frequently asked questions.

What is FAILSAFE.md?

A plain-text Markdown file defining what "safe state" means for an AI agent project and how to reach it when something goes wrong. It configures automatic snapshots during normal operation, defines fallback triggers, and specifies the recovery steps including human notification and approval before resumption.

How does FAILSAFE.md differ from KILLSWITCH.md?

FAILSAFE.md is a recovery protocol. The agent falls back to a known good state and can resume after human review. KILLSWITCH.md is an emergency stop — the agent halts immediately. FAILSAFE.md handles unexpected failures; KILLSWITCH.md handles limit breaches and safety violations.

What triggers a failsafe?

Configurable. Common triggers: three unexpected errors in a session, detected data integrity failures, loss of memory context, contradictory instructions the agent can't resolve, unexpected external service failures, and sudden cost spikes (3x the rolling average by default).

What does "safe state" mean?

You define it per project. For code: the last clean git commit on the main branch, with in-progress work stashed. For data: the most recent verified snapshot, no older than 24 hours. For config: the last known-good configuration backup. FAILSAFE.md stores all of these definitions in one place.

How do auto-snapshots work?

Every 30 minutes during active sessions (configurable), the agent captures a full state snapshot to .failsafe/snapshots/. It also snapshots automatically before significant actions — database migrations, production deployments, bulk file operations. The last 10 snapshots are retained.

Can the agent restart itself after a failsafe?

No — by default, restart requires human approval. The agent saves an incident report, notifies the operator, and waits. A human must review the incident, confirm the safe state is intact, and explicitly approve resumption. This is the key difference from an automatic retry.

FAILSAFE.md is an open specification for AI agent safe fallback protocols. Defines TRIGGERS (error counts, context loss, cost spikes, data integrity failures), SAFE STATE (last clean git commit, last verified data snapshot, last-known-good config), RECOVERY steps (snapshot → notify → await → resume), and AUTO-SNAPSHOT schedule (every 30 minutes, before significant actions, 10 snapshots retained). Addresses ISO/IEC 42001 and EU AI Act resilience requirements. Part of stack: THROTTLE → ESCALATE → FAILSAFE → KILLSWITCH → TERMINATE → ENCRYPT. MIT licence.

// Domain Acquisition

Own the standard.
Own failsafe.md

This domain is available for acquisition. It is the canonical home of the FAILSAFE.md specification — the recovery layer of the AI agent safety stack, relevant to ISO/IEC 42001 and EU AI Act resilience requirements.

Inquire About Acquisition

Or email directly: info@failsafe.md

Last updated: 2026-03-10