Container Deployment Guide
Deploy Excalibur.Dispatch applications in containers with production-ready health probes, GC tuning, graceful shutdown, and Native AOT support.
This guide focuses specifically on container-optimized deployment. For non-container scenarios (IIS, Windows Service, Azure Functions), see Deployment. For AOT source generator setup, see Native AOT.
1. Choosing a Build Strategy
Pick the strategy that matches your workload:
| Strategy | Best For | Startup | Memory | Dispatch Compatibility |
|---|---|---|---|---|
| JIT | Long-running APIs, plugin architectures | ~600 ms | Higher | All 170 packages |
| ReadyToRun | APIs needing fast startup + full features | ~380 ms | Medium | All 170 packages |
| Native AOT | Workers, jobs, event handlers, sidecars | <100 ms | Lowest | 150/170 packages |
150 of 170 packages are AOT-compatible. Core dispatch, pipeline, handlers, and most transports work fully in AOT. Packages depending on external SDKs without AOT support (Kafka/Confluent, AWS SDK) remain JIT-only -- see the AOT Compatibility Matrix.
AOT eliminates JIT warmup, giving faster startup. However, source-generated deserialization can be 41-93% slower for complex message types compared to JIT-optimized paths (measured in AotPathSerializationBenchmarks). For most container workloads, the startup improvement outweighs the per-message overhead. Profile your specific workload to decide.
2. Dockerfile Recipes
Three production-ready Dockerfiles. All examples use .NET 10. For .NET 8 or 9, replace image tags accordingly (e.g., sdk:10.0 to sdk:9.0).
2.1 JIT (Default)
FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
WORKDIR /src
COPY ["src/MyApp/MyApp.csproj", "src/MyApp/"]
RUN dotnet restore
COPY . .
RUN dotnet publish src/MyApp -c Release -o /app/publish
FROM mcr.microsoft.com/dotnet/aspnet:10.0 AS final
WORKDIR /app
COPY --from=build /app/publish .
USER app
EXPOSE 8080
ENTRYPOINT ["dotnet", "MyApp.dll"]
Standard multi-stage build. USER app runs as non-root. Compatible with all packages.
2.2 ReadyToRun
FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
WORKDIR /src
COPY ["src/MyApp/MyApp.csproj", "src/MyApp/"]
RUN dotnet restore
COPY . .
RUN dotnet publish src/MyApp -c Release -o /app/publish \
-p:PublishReadyToRun=true
FROM mcr.microsoft.com/dotnet/aspnet:10.0 AS final
WORKDIR /app
COPY --from=build /app/publish .
USER app
EXPOSE 8080
ENTRYPOINT ["dotnet", "MyApp.dll"]
20-40% faster startup. Images are 20-60% larger. Architecture-specific (linux-x64).
2.3 Native AOT
FROM mcr.microsoft.com/dotnet/sdk:10.0 AS build
WORKDIR /src
COPY ["src/MyApp/MyApp.csproj", "src/MyApp/"]
RUN dotnet restore
COPY . .
RUN dotnet publish src/MyApp -c Release -o /app/publish \
-r linux-x64 -p:PublishAot=true
FROM mcr.microsoft.com/dotnet/runtime-deps:10.0-noble-chiseled
WORKDIR /app
COPY --from=build /app/publish .
USER app
EXPOSE 8080
ENTRYPOINT ["./MyApp"]
Chiseled base image (~10 MB). No .NET runtime needed. Requires AddGeneratedServices() in Program.cs. Expect zero IL2xxx/IL3xxx warnings for supported packages.
3. Health Checks and Kubernetes Probes
3.1 Framework Health Check Setup
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddExcalibur(excalibur => { /* configure dispatch */ });
// Option A: Unified registration with transport health
builder.Services.AddExcaliburHealthChecks(withHealthChecks: checks =>
checks.AddTransportHealthChecks());
// Option B: Separate registration
builder.Services.AddHealthChecks()
.AddTransportHealthChecks(); // Tags: "transports"
builder.Services.AddExcaliburHealthChecks();
var app = builder.Build();
app.Run();
AddExcaliburHealthChecks() maps the readiness endpoint to /.well-known/ready by default. Customize via the endpointUri parameter.
3.2 Kubernetes Probe Mapping
startupProbe:
httpGet:
path: /.well-known/ready
port: 8080
periodSeconds: 3
failureThreshold: 20 # 60s max startup window
# Allows transport connections to establish before
# readiness probe takes over
readinessProbe:
httpGet:
path: /.well-known/ready
port: 8080
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
livenessProbe:
httpGet:
path: /healthz/live # Dedicated liveness -- NO dependency checks
port: 8080
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
The liveness probe must NOT check transport or database dependencies. If a broker goes down temporarily and liveness fails, Kubernetes restarts the pod -- making the outage worse. Use a dedicated /healthz/live endpoint that returns 200 if the process is running. Use /.well-known/ready (with transport health) only for readiness.
Liveness endpoint setup:
app.MapHealthChecks("/healthz/live", new HealthCheckOptions
{
Predicate = _ => false // No dependency checks -- just "process is alive"
});
Key guidance:
- Startup probe prevents Kubernetes from killing the pod while transports initialize
MultiTransportHealthCheckreports Unhealthy when transports are not runningMultiTransportHealthCheckalso reports Unhealthy during transport initialization (beforeStartAsynccompletes), which is the correct behavior for startup probes- For AOT apps, startup is fast (under 100 ms) but transport connections still take time
- Adjust
failureThresholdbased on your broker's connection time
3.3 ThrowOnStartupFailure
TransportAdapterHostedServiceOptions.ThrowOnStartupFailure defaults to true. If a transport cannot connect at startup, the app crashes. In Kubernetes, this triggers CrashLoopBackOff with exponential backoff -- the standard pattern for "wait until dependency is ready."
Do not set ThrowOnStartupFailure = false unless you have a specific degraded-operation scenario where the app should continue without its transport.
4. Graceful Shutdown and Drain Alignment
spec:
terminationGracePeriodSeconds: 35 # >= DrainTimeoutSeconds + 5
containers:
- name: api
# ...
Rule: terminationGracePeriodSeconds must be >= DrainTimeoutSeconds (default: 30) plus a small buffer (5s) for SIGTERM propagation.
What happens on shutdown:
- Kubernetes sends SIGTERM -- .NET host triggers
ApplicationStopping TransportAdapterHostedService.StopAsyncbegins drain (reverse start order)- Each adapter gets up to
DrainTimeoutSecondsto finish in-flight messages - If drain exceeds timeout, adapter is forcefully stopped (logged as warning)
- After
terminationGracePeriodSeconds, Kubernetes sends SIGKILL
Configuration:
builder.Services.Configure<TransportAdapterHostedServiceOptions>(options =>
{
options.DrainTimeoutSeconds = 30; // Default
});
5. GC Tuning Recipes
GC behavior is highly workload-dependent -- allocation rate, object lifetime distribution, and message size all affect optimal settings. Use these profiles as baselines, then profile your specific workload with dotnet-counters and adjust.
5.1 API Service (512 MiB limit)
env:
- name: DOTNET_gcServer
value: "1"
- name: DOTNET_GCHeapHardLimitPercent
value: "65"
resources:
requests:
memory: "384Mi"
cpu: "500m"
limits:
memory: "512Mi"
cpu: "1000m"
Server GC with 65% heap cap. Leaves headroom for native allocations, thread stacks, and runtime overhead.
5.2 Background Worker (256 MiB limit)
env:
- name: DOTNET_gcServer
value: "0" # Workstation GC -- lower overhead for single-core
- name: DOTNET_GCHeapHardLimitPercent
value: "60"
resources:
requests:
memory: "192Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "500m"
Workstation GC for workers processing queue messages. Lower memory overhead, good for scale-to-zero scenarios.
5.3 High-Throughput gRPC Service (1 GiB limit)
env:
- name: DOTNET_gcServer
value: "1"
- name: DOTNET_GCHeapHardLimitPercent
value: "75"
resources:
requests:
memory: "768Mi"
cpu: "2000m"
limits:
memory: "1Gi"
cpu: "2000m"
Server GC with higher heap allowance for sustained throughput. Pin CPU requests = limits to avoid throttling.
5.4 cgroup v2 Note
AKS and Azure Linux node pools now default to cgroup v2. The .NET runtime correctly reads v2 memory limits, but RSS metrics from kubectl top pod may differ from v1 by 5-15% due to different page cache accounting. If you see unexpected OOMs after a cluster upgrade, revalidate your DOTNET_GCHeapHardLimitPercent settings -- the runtime now perceives less available memory under v2.
6. Running with Sidecars
Each sidecar process reads the pod's memory limit and allocates for its own use. Combined, they can exceed the limit and trigger OOMKill.
Fix: Split limits explicitly per container:
containers:
- name: api
image: myapi:aot
resources:
limits:
memory: "384Mi"
env:
- name: DOTNET_GCHeapHardLimitPercent
value: "65" # 65% of 384 MiB = ~250 MiB managed heap
- name: dapr
image: daprio/daprd:1.13.2
resources:
limits:
memory: "128Mi"
When to use sidecars vs in-process:
| Concern | Sidecar | In-Process |
|---|---|---|
| mTLS between services | Envoy/Istio | N/A |
| Retry/circuit breaker | Dapr (if polyglot) | Polly (recommended for .NET-only) |
| Observability | OTEL Collector | Direct OTLP export (leaner) |
| Pub/sub | Dapr bindings | Direct transport SDK (lower latency) |
For .NET-only deployments using Excalibur.Dispatch transports, in-process is almost always leaner. Dispatch already handles retry, transport abstraction, and observability.
7. Azure Container Apps
7.1 Cold-Start Optimization
- Use Native AOT for event-triggered workloads (queue, timer)
- Keep images under 100 MB (achievable with AOT + chiseled base)
- Set
minReplicas: 1for latency-sensitive APIs to avoid scale-from-zero
7.2 KEDA Scaling for Queue Workers
scale:
minReplicas: 0
maxReplicas: 10
rules:
- name: queue-trigger
azureQueue:
queueName: dispatch-messages
queueLength: 5
AOT workers start in under 100 ms, making scale-from-zero practical for bursty workloads.
8. Observability in Containers
Excalibur.Dispatch includes built-in OpenTelemetry instrumentation. In containers, add these container-specific considerations:
Live GC diagnostics:
# Inside a running pod
kubectl exec -it <pod> -- dotnet-counters monitor \
--counters System.Runtime,Microsoft.Extensions.Hosting
Key metrics to monitor:
gc-heap-size-bytes-- managed heap usage against yourGCHeapHardLimitPercentthreadpool-queue-length-- saturation indicator for dispatch throughput- Transport health status via
/.well-known/ready
OpenTelemetry export:
Excalibur.Dispatch's ActivitySource and Meter instrumentation (prefixed Excalibur.Dispatch.*) integrates with the standard OpenTelemetry.Extensions.Hosting pipeline. Export traces and metrics directly via OTLP (leaner than a sidecar collector for .NET-only deployments):
builder.Services.AddOpenTelemetry()
.WithTracing(tracing => tracing
.AddSource("Excalibur.Dispatch.*")
.AddOtlpExporter())
.WithMetrics(metrics => metrics
.AddMeter("Excalibur.Dispatch.*")
.AddOtlpExporter());
For detailed observability setup, see Observability.
Quick Reference
| Setting | Default | Where |
|---|---|---|
| Health endpoint | /.well-known/ready | AddExcaliburHealthChecks() |
| Drain timeout | 30 seconds | TransportAdapterHostedServiceOptions.DrainTimeoutSeconds |
| Throw on startup failure | true | TransportAdapterHostedServiceOptions.ThrowOnStartupFailure |
| AOT packages | 150/170 | Compatibility Matrix |