Performance Overview
Excalibur.Dispatch is designed for low-latency messaging with explicit performance profiles for local and transport paths.
Before You Start
- .NET 8.0+ (benchmarks validated on .NET 10.0.103)
- Familiarity with pipeline profiles and middleware
Key Performance Metrics
Results: benchmarks/runs/BenchmarkDotNet.Artifacts/results/
| Metric | Value | Source |
|---|---|---|
| Dispatch single command (standard) | 62.8 ns / 240 B | MediatRWarmPathComparisonBenchmarks (April 13, 2026) |
| Dispatch ultra-local command | 33.3 ns / 24 B | MediatRWarmPathComparisonBenchmarks |
| Dispatch vs MediatR (ultra-local) | 33.3 ns vs 44.0 ns (1.3x faster) | MediatRWarmPathComparisonBenchmarks |
| Handler activation (precreated) | 24.4 ns / 0 B | DispatchHotPathBreakdownBenchmarks |
| Handler invocation | 6.0 ns / 0 B | DispatchHotPathBreakdownBenchmarks |
| vs Wolverine InvokeAsync | 70.35 ns vs 183.56 ns (2.6x faster) | WolverineInProcessWarmPathComparisonBenchmarks |
Diagnostics Baseline (April 13, 2026)
| Component | Value | Allocated |
|---|---|---|
| Single command dispatch (full) | 58.5 ns | 208 B |
| Query with response | 74.8 ns | 400 B |
| Middleware invoker direct | 44.2 ns | 280 B |
| FinalDispatchHandler action | 58.7 ns | 208 B |
| LocalMessageBus send | 38.9 ns | 64 B |
| Handler activator (precreated) | 24.4 ns | 0 B |
| Handler invocation | 6.0 ns | 0 B |
| Handler registry lookup | 6.1 ns | 0 B |
The diagnostics baseline above is from DispatchHotPathBreakdownBenchmarks which isolates each component. The comparison numbers (59.5 ns for standard command) are from MediatRWarmPathComparisonBenchmarks which measures the full end-to-end path including context factory creation and return — matching how consumers use the framework.
Comparison Snapshot
| Track | Summary |
|---|---|
| MediatR WarmPath parity | MediatR ~1.4x faster on standard; Dispatch ultra-local 1.3x faster with 6.3x less memory |
| Wolverine in-process parity | Dispatch ~2.6x faster on command, ~61x on notifications |
| Pipeline parity (3 middleware) | MediatR 2.7x faster; Wolverine 1.2x faster; Dispatch 6.8x faster than MassTransit; Dispatch allocates least |
See Competitor Comparison for full tables and methodology notes.
Quick Wins
1. Use Ultra-Local for local hot paths
var result = await dispatcher.DispatchAsync(new CreateOrderAction(...), ct);
For explicit control, see Ultra-Local Dispatch.
2. Keep messages deterministic where possible
public record CreateOrderCommand(Guid OrderId, string CustomerId) : IDispatchAction;
public class CreateOrderHandler : IActionHandler<CreateOrderCommand> { }
3. Keep auto-freeze enabled
var host = builder.Build();
await host.RunAsync();
4. Prefer direct IMessageContext properties
context.ProcessingAttempts++;
Performance Guides
| Guide | Description |
|---|---|
| Ultra-Local Dispatch | Lowest-overhead local command/query path |
| Auto-Freeze | Automatic cache optimization |
| MessageContext Best Practices | Hot-path optimization patterns |
| Competitor Comparison | Multi-track benchmarks vs MediatR/Wolverine/MassTransit |
Hot-Path Optimizations
Nine micro-optimizations targeting the dispatch hot path:
| Optimization | Pattern |
|---|---|
Dual-write elimination in RoutingDecisionAccessor | Single-write via CachedRoutingDecision field with Features dictionary fallback |
RoutingDecision.Local singleton | Cached static property (like Task.CompletedTask) |
Lock removal on MessageContext.Success | Volatile fields + AggressiveInlining |
Single-lookup GetOrCreateFeature | TryGetValue + direct store |
| Lightweight context init (Sprint 660) | Skip GetTransportBinding for outbound dispatches when no transport correlation needed |
| Per-profile middleware bypass (Sprint 660) | Pre-computed _hasAnyNonRoutingMiddleware flag skips FrozenDictionary chain lookup |
| Single transport bus pre-resolution (Sprint 660) | Pre-resolve single non-local bus at construction, bypass ConcurrentDictionary lookup |
| Routing decision cache (Sprint 660) | ConcurrentDictionary<Type, RoutingDecision> for deterministic single-route types |
| Combined transport fast path (Sprint 660) | All 4 optimizations compose: Wolverine parity improved from 0.59x to 2.3x on SingleCommand |
Memory Allocation Strategy
Dispatch reduces allocations through:
- Object pooling for
MessageContext ArrayPool<T>on batch-style paths- Lazy initialization for optional context state
- ValueTask-based local fast paths
- Hot-path single-write patterns eliminating redundant dictionary allocations
- Package extraction reducing dependency graph complexity (64.88 MB at 100K ops)
Running Benchmarks
# Full matrix refresh
pwsh ./eng/run-benchmark-matrix.ps1 -NoRestore -NoBuild
# In-process parity track
pwsh ./eng/run-benchmark-matrix.ps1 -NoRestore -NoBuild -Classes MediatRComparisonBenchmarks,WolverineInProcessComparisonBenchmarks,MassTransitMediatorComparisonBenchmarks
# Queued/bus end-to-end parity track
pwsh ./eng/run-benchmark-matrix.ps1 -NoRestore -NoBuild -Classes TransportQueueParityComparisonBenchmarks
Results default to benchmarks/runs/BenchmarkDotNet.Artifacts/results/.