Skip to main content

Performance Overview

Excalibur.Dispatch is designed for low-latency messaging with explicit performance profiles for local and transport paths.

Before You Start

Key Performance Metrics

Results: benchmarks/runs/BenchmarkDotNet.Artifacts/results/

MetricValueSource
Dispatch single command (standard)62.8 ns / 240 BMediatRWarmPathComparisonBenchmarks (April 13, 2026)
Dispatch ultra-local command33.3 ns / 24 BMediatRWarmPathComparisonBenchmarks
Dispatch vs MediatR (ultra-local)33.3 ns vs 44.0 ns (1.3x faster)MediatRWarmPathComparisonBenchmarks
Handler activation (precreated)24.4 ns / 0 BDispatchHotPathBreakdownBenchmarks
Handler invocation6.0 ns / 0 BDispatchHotPathBreakdownBenchmarks
vs Wolverine InvokeAsync70.35 ns vs 183.56 ns (2.6x faster)WolverineInProcessWarmPathComparisonBenchmarks

Diagnostics Baseline (April 13, 2026)

ComponentValueAllocated
Single command dispatch (full)58.5 ns208 B
Query with response74.8 ns400 B
Middleware invoker direct44.2 ns280 B
FinalDispatchHandler action58.7 ns208 B
LocalMessageBus send38.9 ns64 B
Handler activator (precreated)24.4 ns0 B
Handler invocation6.0 ns0 B
Handler registry lookup6.1 ns0 B
Breakdown vs Comparison

The diagnostics baseline above is from DispatchHotPathBreakdownBenchmarks which isolates each component. The comparison numbers (59.5 ns for standard command) are from MediatRWarmPathComparisonBenchmarks which measures the full end-to-end path including context factory creation and return — matching how consumers use the framework.

Comparison Snapshot

TrackSummary
MediatR WarmPath parityMediatR ~1.4x faster on standard; Dispatch ultra-local 1.3x faster with 6.3x less memory
Wolverine in-process parityDispatch ~2.6x faster on command, ~61x on notifications
Pipeline parity (3 middleware)MediatR 2.7x faster; Wolverine 1.2x faster; Dispatch 6.8x faster than MassTransit; Dispatch allocates least

See Competitor Comparison for full tables and methodology notes.

Quick Wins

1. Use Ultra-Local for local hot paths

var result = await dispatcher.DispatchAsync(new CreateOrderAction(...), ct);

For explicit control, see Ultra-Local Dispatch.

2. Keep messages deterministic where possible

public record CreateOrderCommand(Guid OrderId, string CustomerId) : IDispatchAction;
public class CreateOrderHandler : IActionHandler<CreateOrderCommand> { }

3. Keep auto-freeze enabled

var host = builder.Build();
await host.RunAsync();

4. Prefer direct IMessageContext properties

context.ProcessingAttempts++;

Performance Guides

GuideDescription
Ultra-Local DispatchLowest-overhead local command/query path
Auto-FreezeAutomatic cache optimization
MessageContext Best PracticesHot-path optimization patterns
Competitor ComparisonMulti-track benchmarks vs MediatR/Wolverine/MassTransit

Hot-Path Optimizations

Nine micro-optimizations targeting the dispatch hot path:

OptimizationPattern
Dual-write elimination in RoutingDecisionAccessorSingle-write via CachedRoutingDecision field with Features dictionary fallback
RoutingDecision.Local singletonCached static property (like Task.CompletedTask)
Lock removal on MessageContext.SuccessVolatile fields + AggressiveInlining
Single-lookup GetOrCreateFeatureTryGetValue + direct store
Lightweight context init (Sprint 660)Skip GetTransportBinding for outbound dispatches when no transport correlation needed
Per-profile middleware bypass (Sprint 660)Pre-computed _hasAnyNonRoutingMiddleware flag skips FrozenDictionary chain lookup
Single transport bus pre-resolution (Sprint 660)Pre-resolve single non-local bus at construction, bypass ConcurrentDictionary lookup
Routing decision cache (Sprint 660)ConcurrentDictionary<Type, RoutingDecision> for deterministic single-route types
Combined transport fast path (Sprint 660)All 4 optimizations compose: Wolverine parity improved from 0.59x to 2.3x on SingleCommand

Memory Allocation Strategy

Dispatch reduces allocations through:

  1. Object pooling for MessageContext
  2. ArrayPool<T> on batch-style paths
  3. Lazy initialization for optional context state
  4. ValueTask-based local fast paths
  5. Hot-path single-write patterns eliminating redundant dictionary allocations
  6. Package extraction reducing dependency graph complexity (64.88 MB at 100K ops)

Running Benchmarks

# Full matrix refresh
pwsh ./eng/run-benchmark-matrix.ps1 -NoRestore -NoBuild

# In-process parity track
pwsh ./eng/run-benchmark-matrix.ps1 -NoRestore -NoBuild -Classes MediatRComparisonBenchmarks,WolverineInProcessComparisonBenchmarks,MassTransitMediatorComparisonBenchmarks

# Queued/bus end-to-end parity track
pwsh ./eng/run-benchmark-matrix.ps1 -NoRestore -NoBuild -Classes TransportQueueParityComparisonBenchmarks

Results default to benchmarks/runs/BenchmarkDotNet.Artifacts/results/.

See Also