Skip to main content

Competitor Comparison

This page documents comparative benchmarks for Excalibur.Dispatch using three explicit tracks:

  1. In-process parity (raw handler-dispatch, no middleware)
  2. Pipeline parity (3 passthrough middleware/behaviors per framework)
  3. Queued/bus semantics (publish/send + consumer flow)

Test Environment

BenchmarkDotNet v0.15.4, Windows 11 (10.0.26200.7922)
Intel Core i9-14900K 3.20GHz, 1 CPU, 32 logical and 24 physical cores
.NET SDK 10.0.103
Runtime: .NET 10.0.3 (10.0.326.7603), X64 RyuJIT x86-64-v3

Results: benchmarks/runs/BenchmarkDotNet.Artifacts/results/

Scope

These are microbenchmarks for framework overhead and path cost. They are not end-to-end production latency claims.

Methodology

All comparisons use lean AddDispatch() registration with no middleware enabled, matching each competitor's minimal configuration. A fresh IMessageContext is created and returned per iteration. Handler and pipeline caches are warmed up and frozen before measurement.

Dual Benchmark Methodology

This project uses two benchmark configurations for different purposes:

  • WarmPath (WarmPathBenchmarkConfig): BDN defaults with auto-calibrated InvocationCount and UnrollFactor. Measures steady-state throughput with warm JIT and caches. Used for published competitor comparisons (Tracks A, B above).
  • ColdPath (ComparativeBenchmarkConfig): InvocationCount=1, UnrollFactor=1, IterationCount=3. Measures single-invocation correctness including framework setup overhead. Used for CI regression gates (Track C, performance gate checks).

WarmPath numbers reflect what users experience in production; ColdPath numbers catch regressions in framework initialization paths.

Executive Summary

TrackSummary
In-process parity (MediatR)MediatR ~1.4x faster on standard; Dispatch ultra-local 1.3x faster with 6.3x less memory; Dispatch allocates 2.6x less on notifications
In-process parity (Wolverine InvokeAsync)Dispatch ~2.6x faster on command, ~61x on notifications
Pipeline parity (3 middleware each)MediatR 2.7x faster; Wolverine 1.2x faster; Dispatch 6.8x faster than MassTransit; Dispatch allocates least
April 2026 Performance Update

Ultra-local dispatch remains the standout path: 33 ns / 24 B -- 1.3x faster than MediatR with 6.3x less memory. LightMode opt-in disables correlation ID generation for workloads that don't need it. Hot-path breakdown: handler activation 24.4 ns / 0 B, handler invocation 6.0 ns / 0 B — all zero-allocation internals. See benchmarks/experiments/ for optimization experiment details.

Track A: In-Process Parity

Dispatch vs MediatR

ScenarioDispatchMediatRRelative Result
Single command handler62.76 ns / 240 B43.96 ns / 152 BMediatR ~1.4x faster
Single command direct-local62.39 ns / 240 B43.96 ns / 152 BMediatR ~1.4x faster
Single command ultra-local33.30 ns / 24 B43.96 ns / 152 BDispatch ~1.3x faster; Dispatch allocates ~6.3x less
Singleton-promoted command33.01 ns / 24 B43.96 ns / 152 BDispatch ~1.3x faster; Dispatch allocates ~6.3x less
Notification to 3 handlers115.33 ns / 240 B95.40 ns / 616 BMediatR ~1.2x faster; Dispatch allocates ~2.6x less
Query with return value72.94 ns / 336 B52.18 ns / 296 BMediatR ~1.4x faster
Query with return (typed API)80.49 ns / 432 B52.18 ns / 296 BMediatR ~1.5x faster
Query ultra-local56.09 ns / 192 B52.18 ns / 296 BNear parity; Dispatch allocates ~1.5x less
Query singleton-promoted56.05 ns / 192 B52.18 ns / 296 BNear parity; Dispatch allocates ~1.5x less
10 concurrent commands772 ns / 2,080 B531 ns / 1,856 BMediatR ~1.5x faster
100 concurrent commands6,696 ns / 19,360 B4,994 ns / 17,064 BMediatR ~1.3x faster

Dispatch vs Wolverine (InvokeAsync parity)

ScenarioDispatchWolverine (InvokeAsync)Relative Result
Single command (local)70.35 ns / 264 B183.56 ns / 672 BDispatch 2.6x faster
Single command (ultra-local)39.43 ns / 48 B183.56 ns / 672 BDispatch 4.7x faster
Notification to 2 handlers116.38 ns / 288 B7,128 ns / 5,640 BDispatch 61.3x faster
Query with return74.15 ns / 456 B258.00 ns / 936 BDispatch 3.5x faster
10 concurrent commands828.62 ns / 2,320 B1,994 ns / 6,928 BDispatch 2.4x faster
100 concurrent commands6,474 ns / 21,760 B17,391 ns / 68,128 BDispatch 2.7x faster

Track B: Pipeline Parity (3 Middleware Each)

Each framework configured with 3 passthrough middleware/behaviors that mirror each other:

  • Dispatch: 3 IDispatchMiddleware (logging, validation, timing)
  • MediatR: 3 IPipelineBehavior<T, Unit> (logging, validation, timing)
  • Wolverine: 3 convention-based middleware with BeforeAsync/AfterAsync
  • MassTransit: 3 IFilter<ConsumeContext<T>> (logging, validation, timing)
ScenarioDispatchMediatRWolverineMassTransitRelative Result
3 middleware (single)280.3 ns / 392 B105.2 ns / 680 B228.3 ns / 768 B1,892.8 ns / 4,568 BMediatR 2.7x faster; Wolverine 1.2x faster; Dispatch 6.8x faster than MT; Dispatch allocates 1.7x less than MediatR
10 concurrent + 3 middleware2,868 ns / 3,632 B1,108 ns / 7,168 B2,223 ns / 7,888 B19,290 ns / 45,888 BMediatR 2.6x faster; Wolverine 1.3x faster; Dispatch 6.7x faster than MT; Dispatch allocates 2x less than MediatR

Track C: Queued/Bus End-to-End Parity

Track C methodology

Track C benchmarks use InvocationCount=1, UnrollFactor=1, IterationCount=3 with InProcessEmitToolchain. Error margins are higher with fewer iterations; treat relative ratios as directional rather than precise. Run *TransportQueueParityWarmPathComparisonBenchmarks* to regenerate.

Interpretation Guardrail

Use Track A for closest in-process handler overhead parity. Use Track B when comparing middleware/pipeline cost across frameworks. Use Track C when comparing queued/bus completion semantics.

Allocation Profiles

Excalibur.Dispatch offers multiple dispatch paths with different allocation characteristics.

ProfileAllocationLatencyWhen to Use
Standard dispatch240 B~63 nsDefault path for all message types (April 2026 WarmPath)
Ultra-local dispatch24 B~33 nsLowest-overhead local path, near-zero allocation
Singleton-promoted24 B~33 nsHandlers registered as singletons via promotion
Query with response336 B~73 nsTyped query responses
Query ultra-local192 B~56 nsUltra-local query path
MessageContext pool rent+return0 B~9 nsPool infrastructure cost only
Allocation Guidance
  • "Near-zero allocation": Ultra-local and singleton-promoted paths (24 B per dispatch)
  • "Low-allocation": Standard path (240 B -- context + routing metadata + ambient context flow)
  • "Zero-allocation internals": Handler activation (24.4 ns / 0 B), invocation (6.0 ns / 0 B)

Routing-First Local + Hybrid Parity

note

Routing-first numbers below are from the March 2026 baseline. These paths were not affected by the April 2026 dispatch hot-path optimizations since routing occurs before the dispatch fast path.

ScenarioMeanAllocatedRelative to local command
Pre-routed local command75.42 ns232 Bbaseline
Pre-routed local query86.58 ns424 B+14.8%
Pre-routed remote event (AWS SQS)134.53 ns232 B+78.4%
Pre-routed remote event (Azure Service Bus)138.17 ns232 B+83.2%
Pre-routed remote event (AWS SNS)133.72 ns232 B+77.3%
Pre-routed remote event (AWS EventBridge)139.65 ns232 B+85.2%
Pre-routed remote event (Azure Event Hubs)136.87 ns232 B+81.5%
Pre-routed remote event (gRPC)128.99 ns232 B+71.0%
Pre-routed remote event (Kafka)132.57 ns232 B+75.8%
Pre-routed remote event (RabbitMQ)131.23 ns232 B+74.0%

Provider Profile Extensions

ScenarioMeanAllocated
Kafka throughput profile190.12 ns280 B
Kafka retry profile186.46 ns304 B
Kafka poison profile175.34 ns256 B
Kafka observability profile272.03 ns304 B
RabbitMQ throughput profile190.15 ns280 B
RabbitMQ retry profile186.29 ns304 B
RabbitMQ poison profile176.35 ns256 B
RabbitMQ observability profile268.35 ns304 B

Running These Comparisons

# Build once
dotnet build benchmarks/Excalibur.Dispatch.Benchmarks/Excalibur.Dispatch.Benchmarks.csproj -c Release --nologo -v minimal

# All competitor benchmarks
pwsh ./eng/run-comparative-benchmarks.ps1 -RuntimeProfile ci

# Track A (in-process parity)
pwsh ./eng/run-benchmark-matrix.ps1 -NoBuild -NoRestore -Classes MediatRComparisonBenchmarks,WolverineInProcessComparisonBenchmarks,MassTransitMediatorComparisonBenchmarks

# Track B (pipeline parity)
pwsh ./eng/run-benchmark-matrix.ps1 -NoBuild -NoRestore -Classes PipelineComparisonBenchmarks

# WarmPath (published comparisons -- BDN defaults, auto-calibrated iterations)
dotnet run -c Release --project benchmarks/Excalibur.Dispatch.Benchmarks -- --filter *MediatRComparisonBenchmarks* --join --anyCategories WarmPath

# ColdPath / CI gates (single-invocation, used by CI performance gates)
dotnet run -c Release --project benchmarks/Excalibur.Dispatch.Benchmarks -- --filter *ComparisonBenchmarks* --join

# Track C (queued/bus end-to-end parity)
pwsh ./eng/run-benchmark-matrix.ps1 -NoBuild -NoRestore -Classes TransportQueueParityComparisonBenchmarks

Results are written to benchmarks/runs/BenchmarkDotNet.Artifacts/results/ unless -ArtifactsPath is provided.