Skip to content

Error Handling

FlowState provides structured error handling through exit codes and the on-error mapping. This enables workflows to recover gracefully from failures rather than terminating immediately.

Every tool execution produces an exit code:

CodeMeaning
0Success - proceed to next state
1General error
2Misuse or invalid arguments
124Timeout exceeded
130User cancelled (Ctrl+C)
-1Pause requested (internal)

Tools may define additional codes. Check individual tool documentation for specifics.

The on-error field maps exit codes to error-handling states:

risky-operation:
tool: bash
arguments:
command: ./might-fail.sh
on-error:
1: handle-general-error
2: handle-invalid-args
_: handle-unknown-error
next: success-path

When the tool exits with a non-zero code:

  1. FlowState checks on-error for that specific code
  2. If found, transitions to the mapped state
  3. If not found, checks for the _ wildcard
  4. If no handler matches, the workflow halts with that exit code

The _ key is a catch-all that matches any exit code not explicitly mapped:

fetch-data:
tool: bash
arguments:
command: curl https://api.example.com/data
on-error:
_: handle-network-error
next: process-data

Best practice: Always include a _ handler unless you want the workflow to halt on unexpected errors.

fetch-data:
tool: bash
arguments:
command: curl https://api.example.com/data
output: var(data)
on-error:
_: retry-fetch
next: process-data
retry-fetch:
tool: bash
arguments:
command: sleep 5
next: fetch-data

For more sophisticated retry logic, track attempt count:

variables:
max_retries: "3"
attempt: "0"
fetch-data:
tool: bash
arguments:
command: curl https://api.example.com/data
output: var(data)
on-error:
_: check-retry
next: process-data
check-retry:
tool: bash
arguments:
command: |
attempt=$(({{ attempt }} + 1))
echo $attempt
output: var(attempt)
next: evaluate-retry
evaluate-retry:
tool: switch
arguments:
value: "{{ attempt == '{{ max_retries }}' }}"
goto:
"true": give-up
_: wait-and-retry
wait-and-retry:
tool: bash
arguments:
command: sleep 5
next: fetch-data
give-up:
tool: bash
arguments:
command: 'echo "Failed after {{ max_retries }} attempts" >&2'
get-config:
tool: bash
arguments:
command: cat /etc/myapp/config.json
output: var(config)
on-error:
_: use-default-config
next: apply-config
use-default-config:
tool: bash
arguments:
command: echo '{"mode": "default"}'
output: var(config)
next: apply-config
create-temp-resources:
tool: bash
arguments:
command: mkdir -p ./temp && touch ./temp/lock
next: risky-operation
risky-operation:
tool: bash
arguments:
command: ./might-fail.sh
on-error:
_: cleanup-and-fail
next: cleanup-and-succeed
cleanup-and-succeed:
tool: bash
arguments:
command: rm -rf ./temp
next: done
cleanup-and-fail:
tool: bash
arguments:
command: rm -rf ./temp
next: report-failure
report-failure:
tool: bash
arguments:
command: 'echo "Operation failed" >&2 && exit 1'
important-task:
tool: bash
arguments:
command: ./critical-operation.sh
on-error:
_: notify-and-fail
next: complete
notify-and-fail:
tool: bash
arguments:
command: |
curl -X POST https://hooks.slack.com/... \
-d '{"text": "Workflow failed in important-task"}'
# No next - workflow ends after notification

Exit code 124 indicates a timeout. Handle it explicitly:

long-running-task:
tool: claude
arguments:
prompt: "Analyze this large codebase..."
timeout: 10m
on-error:
124: handle-timeout
_: handle-other-error
next: use-result
handle-timeout:
tool: bash
arguments:
command: echo "Task timed out - using cached result"
next: use-cached-result

Exit code 130 indicates the user pressed Ctrl+C during interactive prompts:

get-user-input:
tool: ask-user
arguments:
question: "Enter the deployment target:"
output: var(target)
on-error:
130: user-cancelled
_: input-error
next: deploy
user-cancelled:
tool: bash
arguments:
command: echo "Deployment cancelled by user"
  1. Name clearly: Use prefixes like handle-, recover-, fallback-

  2. Log context: Include enough information to diagnose issues

    handle-error:
    tool: bash
    arguments:
    command: 'echo "Error in fetch-data: check network connectivity" >&2'
  3. Preserve state: Avoid modifying variables that might be needed for debugging

  4. Consider idempotency: Error handlers may run multiple times if retrying

  5. Exit cleanly: Terminal error states should produce meaningful exit codes

    fatal-error:
    tool: bash
    arguments:
    command: exit 1

When a workflow halts due to an unhandled error:

  1. Check the instance’s context.json for the current state
  2. Review the exit code from the failed tool
  3. Examine any captured output for error messages
  4. Resume with --resume after fixing the issue, or clean up and start fresh