Error Handling

This guide covers best practices for handling errors in NudgeLang applications.

Error Types

1. LLM Errors

states:
  - id: llm_state
    type: llm
    model: gpt-4
    prompt: "Process: {input}"
    input:
      input: "${input.data}"
    error:
      next: handle_llm_error
      retry:
        max_attempts: 3
        delay: 1000
        backoff: exponential

2. Tool Errors

states:
  - id: tool_state
    type: tool
    tool: process_data
    input:
      data: "${input.data}"
    error:
      next: handle_tool_error
      retry:
        max_attempts: 2
        delay: 500
        backoff: linear

3. State Transition Errors

states:
  - id: transition_state
    type: llm
    model: classifier
    prompt: "Classify: {input}"
    input:
      input: "${input.data}"
    transitions:
      - when: "${output === 'category1'}"
        next: handle_category1
      - when: "${output === 'category2'}"
        next: handle_category2
      - next: handle_unknown
    error:
      next: handle_transition_error

Error Handling Strategies

1. Retry Mechanism

states:
  - id: retry_state
    type: tool
    tool: api_call
    input:
      data: "${input.data}"
    error:
      retry:
        max_attempts: 3
        delay: 1000
        backoff: exponential
        conditions:
          - status_code: 429
          - status_code: 503
      next: handle_final_error

2. Fallback Pattern

states:
  - id: primary_state
    type: tool
    tool: primary_api
    input:
      data: "${input.data}"
    error:
      next: fallback_state
  
  - id: fallback_state
    type: tool
    tool: backup_api
    input:
      data: "${input.data}"
    error:
      next: handle_final_error

3. Circuit Breaker

states:
  - id: api_state
    type: tool
    tool: external_api
    input:
      data: "${input.data}"
    error:
      circuit_breaker:
        failure_threshold: 5
        reset_timeout: 30000
        half_open_timeout: 5000
      next: handle_circuit_breaker

Error Recovery

1. State Recovery

states:
  - id: process_state
    type: tool
    tool: process_data
    input:
      data: "${input.data}"
    error:
      next: recover_state
      save_state: true
  
  - id: recover_state
    type: tool
    tool: recover_process
    input:
      saved_state: "${error.saved_state}"
      error: "${error}"
    error:
      next: handle_final_error

2. Data Recovery

states:
  - id: data_state
    type: tool
    tool: process_data
    input:
      data: "${input.data}"
    error:
      next: recover_data
      save_data: true
  
  - id: recover_data
    type: tool
    tool: recover_data
    input:
      saved_data: "${error.saved_data}"
      error: "${error}"
    error:
      next: handle_final_error

Error Logging

1. Basic Logging

states:
  - id: log_state
    type: tool
    tool: process_data
    input:
      data: "${input.data}"
    error:
      next: log_error
      log:
        level: error
        message: "Error processing data"
        context:
          input: "${input}"
          error: "${error}"

2. Structured Logging

states:
  - id: structured_log_state
    type: tool
    tool: process_data
    input:
      data: "${input.data}"
    error:
      next: log_structured_error
      log:
        level: error
        message: "Error processing data"
        context:
          input: "${input}"
          error: "${error}"
        metadata:
          timestamp: "${timestamp}"
          request_id: "${request_id}"
          environment: "${environment}"

Best Practices

  1. Error Classification: Categorize errors appropriately
  2. Retry Strategy: Implement appropriate retry mechanisms
  3. Fallback Plans: Have fallback options for critical operations
  4. Error Logging: Log errors with sufficient context
  5. Recovery Plans: Implement recovery mechanisms
  6. Monitoring: Monitor error rates and patterns
  7. Documentation: Document error handling strategies

Common Pitfalls

  1. Silent Failures: Not handling errors properly
  2. Infinite Retries: Not limiting retry attempts
  3. Missing Context: Insufficient error logging
  4. No Recovery: Lack of recovery mechanisms
  5. Poor Monitoring: Inadequate error monitoring

Next Steps

Last updated on