Performance Overview

Excalibur.Dispatch is designed for low-latency messaging with explicit performance profiles for local and transport paths.

Before You Start

.NET 10.0 (benchmarks validated on .NET 10.0.6, SDK 10.0.202, BenchmarkDotNet 0.15.8)
Familiarity with pipeline profiles and middleware

Key Performance Metrics

Source baseline: benchmarks/baselines/net10.0/dispatch-comparative-20260420/results/ (April 20, 2026 epoch)

Metric	Value	Source
Dispatch single command (standard)	70.87 ns / 240 B	MediatRWarmPathComparisonBenchmarks (April 20, 2026)
Dispatch ultra-local command	34.56 ns / 24 B	MediatRWarmPathComparisonBenchmarks
Dispatch vs MediatR (ultra-local)	34.56 ns vs 44.20 ns (1.28x faster, 6.3x less memory)	MediatRWarmPathComparisonBenchmarks
Handler activation (precreated)	24.4 ns / 0 B	DispatchHotPathBreakdownBenchmarks (not in 20260420 epoch; see performance-report)
Handler invocation	6.0 ns / 0 B	DispatchHotPathBreakdownBenchmarks (not in 20260420 epoch)
Dispatch vs Wolverine InvokeAsync	74.83 ns vs 197.75 ns (2.64x faster)	WolverineInProcessWarmPathComparisonBenchmarks

Epoch boundary

These numbers are from the 20260420 epoch (BenchmarkDotNet 0.15.8 literal InvocationCount=1). Prior baselines (20260302, dispatch-all 20260413) used BDN 0.15.4 which auto-tuned InvocationCount, producing numerically different (but qualitatively equivalent) ns-scale numbers. Do not cross-diff individual Mean values between the two epochs — ratios within each report remain apples-to-apples. See benchmarks/RUNBOOK.md for the methodology shift.

Diagnostics Baseline (April 13, 2026)

Component	Value	Allocated
Single command dispatch (full)	58.5 ns	208 B
Query with response	74.8 ns	400 B
Middleware invoker direct	44.2 ns	280 B
FinalDispatchHandler action	58.7 ns	208 B
LocalMessageBus send	38.9 ns	64 B
Handler activator (precreated)	24.4 ns	0 B
Handler invocation	6.0 ns	0 B
Handler registry lookup	6.1 ns	0 B

Breakdown vs Comparison

The diagnostics baseline above is from DispatchHotPathBreakdownBenchmarks which isolates each component (last refreshed April 13, 2026 — NOT in the April 20, 2026 epoch). The comparison numbers (70.87 ns for standard command) are from MediatRWarmPathComparisonBenchmarks in the 20260420 epoch and measure the full end-to-end path including context factory creation and return — matching how consumers use the framework.

Comparison Snapshot (April 20, 2026 epoch)

Track	Summary
MediatR WarmPath parity	MediatR ~1.6x faster on standard; Dispatch ultra-local 1.28x faster with 6.3x less memory
Wolverine in-process parity	Dispatch ~2.64x faster on command; ~54x faster on notifications (Dispatch 120 ns vs Wolverine 6,455 ns to 2 handlers)
MassTransit in-memory parity	Dispatch leads on all in-process tiers, see MassTransitComparisonBenchmarks
Pipeline parity (3 middleware)	See PipelineComparisonBenchmarks — Dispatch leads on allocation; latency tiers per ratio column

See Competitor Comparison for full tables and methodology notes. One finding under investigation: Dispatch: 100 concurrent commands WarmPath allocation vs MediatR — a methodology-matched rerun is queued for a future sprint.

Quick Wins

1. Use Ultra-Local for local hot paths

var result = await dispatcher.DispatchAsync(new CreateOrderAction(...), ct);

For explicit control, see Ultra-Local Dispatch.

2. Keep messages deterministic where possible

public record CreateOrderCommand(Guid OrderId, string CustomerId) : IDispatchAction;
public class CreateOrderHandler : IActionHandler<CreateOrderCommand> { }

3. Keep auto-freeze enabled

var host = builder.Build();
await host.RunAsync();

4. Prefer direct `IMessageContext` properties

context.ProcessingAttempts++;

Performance Guides

Guide	Description
Ultra-Local Dispatch	Lowest-overhead local command/query path
Auto-Freeze	Automatic cache optimization
MessageContext Best Practices	Hot-path optimization patterns
Competitor Comparison	Multi-track benchmarks vs MediatR/Wolverine/MassTransit

Hot-Path Optimizations

Nine micro-optimizations targeting the dispatch hot path:

Optimization	Pattern
Dual-write elimination in `RoutingDecisionAccessor`	Single-write via `CachedRoutingDecision` field with Features dictionary fallback
`RoutingDecision.Local` singleton	Cached static property (like `Task.CompletedTask`)
Lock removal on `MessageContext.Success`	Volatile fields + `AggressiveInlining`
Single-lookup `GetOrCreateFeature`	`TryGetValue` + direct store
Lightweight context init	Skip `GetTransportBinding` for outbound dispatches when no transport correlation needed
Per-profile middleware bypass	Pre-computed `_hasAnyNonRoutingMiddleware` flag skips FrozenDictionary chain lookup
Single transport bus pre-resolution	Pre-resolve single non-local bus at construction, bypass ConcurrentDictionary lookup
Routing decision cache	`ConcurrentDictionary<Type, RoutingDecision>` for deterministic single-route types
Combined transport fast path	All 4 optimizations compose: Wolverine parity improved from 0.59x to 2.3x on SingleCommand

Memory Allocation Strategy

Dispatch reduces allocations through:

Object pooling for MessageContext
ArrayPool<T> on batch-style paths
Lazy initialization for optional context state
ValueTask-based local fast paths
Hot-path single-write patterns eliminating redundant dictionary allocations
Package extraction reducing dependency graph complexity (64.88 MB at 100K ops)

Running Benchmarks

# Full matrix refresh
pwsh ./eng/run-benchmark-matrix.ps1 -NoRestore -NoBuild

# In-process parity track
pwsh ./eng/run-benchmark-matrix.ps1 -NoRestore -NoBuild -Classes MediatRComparisonBenchmarks,WolverineInProcessComparisonBenchmarks,MassTransitMediatorComparisonBenchmarks

# Queued/bus end-to-end parity track
pwsh ./eng/run-benchmark-matrix.ps1 -NoRestore -NoBuild -Classes TransportQueueParityComparisonBenchmarks

Results default to benchmarks/runs/BenchmarkDotNet.Artifacts/results/.

Before You Start​

Key Performance Metrics​

Diagnostics Baseline (April 13, 2026)​

Comparison Snapshot (April 20, 2026 epoch)​

Quick Wins​

1. Use Ultra-Local for local hot paths​

2. Keep messages deterministic where possible​

3. Keep auto-freeze enabled​

4. Prefer direct IMessageContext properties​

Performance Guides​

Hot-Path Optimizations​

Memory Allocation Strategy​

Running Benchmarks​

See Also​