Operations
Operational guidance for running Excalibur in production environments, including resilience, recovery procedures, and maintenance runbooks.
Before You Start
- .NET 8.0+ (or .NET 9/10 for latest features)
- A deployed Dispatch application
- Familiarity with configuration and observability
Guides
| Topic | Description |
|---|---|
| Runtime Contract | Canonical runtime semantics for dispatch ordering, cancellation, retries, and context propagation |
| Reliability Guarantees | Delivery/ordering/deduplication/dead-letter guarantees by execution path and provider family |
| SLO, SLI, and Telemetry | Production objectives and telemetry schema for release readiness and operations |
| Incident Runbooks | Escalation model and step-by-step response playbooks for common runtime incidents |
| Operational Resilience | Transient error handling, retry policies, and recovery strategies |
| Recovery Runbooks | Step-by-step recovery procedures for common failure scenarios |
Quick Reference
Provider Resilience Matrix
| Provider | Retry Policy | Recovery Options | CDC Position Recovery |
|---|---|---|---|
| SQL Server | SqlServerRetryPolicy | Automatic reconnect | CdcRecoveryOptions |
| PostgreSQL | PostgresRetryPolicy | Automatic reconnect | PostgresCdcRecoveryOptions |
| CosmosDB | SDK-managed | Automatic | Continuation token |
| DynamoDB | SDK-managed | Automatic | Stream ARN |
| MongoDB | Driver pool | Automatic | Resume token |
| Redis | Manual reconnect | ConnectionMultiplexer | N/A |
Key Error Codes
SQL Server Transient Errors:
596- Session killed by backup/restore (critical for CDC)9001,9002- Transaction log unavailable1205- Deadlock victim40613- Database unavailable
PostgreSQL Transient Errors:
08xxx- Connection errors40001,40P01- Serialization/deadlock57Pxx- Admin/crash shutdown53xxx- Insufficient resources
Related Documentation
- Observability - Monitoring and alerting
- Deployment - Deployment configurations
- Event Sourcing - Event store operations
- Testing Overview - Conformance and integration quality expectations
See Also
- Resilience with Polly — Polly-based retry policies, circuit breakers, and resilience pipelines
- Performance Tuning — Optimize event store, outbox, and projection performance
- Health Checks — Application health monitoring and diagnostics