We present a 243,000-trial combinatorial simulation study evaluating six independent variables affecting multi-agent orchestration performance across seven dependent variables. Using a full factorial design (486 cells, 500 trials per cell), we identify empirically optimal configurations for federated autonomous agent fleets.
Key findings include: coordination intensity has a medium effect on task quality, with a counterintuitive ordering where minimal coordination performs worse than no coordination; election strategy has a large effect on leader selection accuracy but a negligible effect on downstream task quality, revealing a quality buffer mechanism; the largest interaction effect is election strategy by fleet size, demonstrating that orchestration configuration must be scale-aware; and consistency review reduces measured consistency scores—a measurement artifact where absence of detection is mistaken for absence of defects.
All analyses use non-parametric methods with Benjamini-Hochberg FDR correction at q = 0.001, achieving statistical power of 1.0 across all tests.
| Hypothesis | Outcome | |
|---|---|---|
| H1 | Coordination intensity affects task quality, with full coordination producing the highest quality | Supported |
| H2 | Election strategy has a larger effect on election accuracy than on task quality | Supported |
| H3 | Election strategy and fleet size interact to affect election accuracy | Supported |
| H4 | Consistency review improves consistency scores | Rejected |
| H5 | Task quality and failure recovery are positively correlated | Supported |
We employed a full factorial combinatorial design with six independent variables (3×3×3×2×3×3 = 486 experimental cells) and seven dependent variables. Each cell was replicated 500 times with deterministic seeding for bit-for-bit reproducibility, yielding 243,000 total trials.
Coordination intensity has a medium effect on task quality (ω² = 0.072). The ordering is full > none > minimal—partial coordination performs worse than no coordination. Half-measures introduce overhead without the structured feedback loops that make full coordination effective.
Election strategy has a large effect on leader selection accuracy (ω² = 0.352) but a negligible effect on downstream task quality (ω² < 0.001). The system exhibits a quality buffer—downstream mechanisms compensate for suboptimal leader selection. Accuracy and quality are dissociated.
The largest interaction effect in the study is election strategy × fleet size on election accuracy (ω² = 0.163, large). Competence-based election maintains high accuracy across fleet sizes while simpler strategies degrade sharply. Configuration must be fleet-size-aware—no single strategy generalizes across scales.
Consistency review reduces the consistency metric (ω² = 0.264, large). Without review, no inconsistencies are detected, producing a perfect score. With review, real deviations are measured. Absence of detection is not absence of defects. Naive optimization against this metric leads to the wrong conclusion.
Task quality and failure recovery are moderately correlated (r = 0.485, p < .001). Configurations that invest in quality also provide the redundancy needed for failure recovery. Quality and resilience emerge from the same underlying mechanisms—they are not independent investment targets.
| Interaction | Outcome Measure | ω² | Effect |
|---|---|---|---|
| Election strategy × fleet size | Election accuracy | 0.163 | Large |
| Coordination × election strategy | Delegation efficiency | 0.108 | Medium |
| Coordination × fleet size | Delegation efficiency | 0.077 | Medium |
| Consistency review × coordination | Consistency score | 0.014 | Small |
| Election strategy × fleet size | Delegation efficiency | 0.015 | Small |
All p < .001, all padj < .001 after Benjamini-Hochberg FDR correction. 105 total interaction tests conducted.
| Variable A | Variable B | r | Effect |
|---|---|---|---|
| Task quality | Failure recovery | 0.485 | Medium |
| Delegation efficiency | Resource cost | 0.303 | Medium |
| Task quality | Delegation efficiency | 0.273 | Small |
| Election accuracy | All other DVs | < 0.02 | Null |
FDR-corrected across 21 pairwise tests. The null correlation between election accuracy and all other DVs further confirms the quality buffer mechanism.
Following Cohen (1988):
| Measure | Small | Medium | Large |
|---|---|---|---|
| Omega-squared (ω²) | 0.01 | 0.06 | 0.14 |
| Cohen’s d | 0.20 | 0.50 | 0.80 |
| Pearson r | 0.10 | 0.30 | 0.50 |
This study is the first in a series. Each acknowledged constraint above represents a planned follow-up study. All validation work is conducted in-house to maintain data integrity and protect proprietary methodology. Findings are applied directly to the product—subscribers benefit from every study through configuration improvements and capability updates.
Each study follows the same publication-grade methodology: full factorial design, deterministic seeding, non-parametric tests, FDR correction, and raw trial-level data preservation.
Subscribers who opt in to anonymized performance telemetry help accelerate this research—and benefit directly from the results. Aggregated patterns across real-world deployments inform configuration refinements that ship back to every participant. Your brain data stays private and encrypted on your machine. Only anonymized orchestration metrics (timing, accuracy, recovery rates) are shared, and only with explicit consent.
This is how Rebis improves: real-world signal from opted-in users, validated through the same rigorous methodology, applied as product updates for everyone.
Get notified when new studies are published. No spam—only findings.
Research updates only. Unsubscribe anytime.
You're on the list. We'll send the next study when it's ready.
Alberts, D. S. (2011). The agility advantage: A survival guide for complex enterprises and endeavors. DoD Command and Control Research Program.
Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57(1), 289–300.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum Associates.
Grassé, P.-P. (1959). La reconstruction du nid et les coordinations interindividuelles chez Bellicositermes natalensis et Cubitermes sp. Insectes Sociaux, 6(1), 41–80.
Ongaro, D., & Ousterhout, J. (2014). In search of an understandable consensus algorithm. USENIX Annual Technical Conference, 305–319.
Parker, L. E. (1998). ALLIANCE: An architecture for fault tolerant multirobot cooperation. IEEE Transactions on Robotics and Automation, 14(2), 220–240.
Smith, R. G. (1980). The Contract Net Protocol: High-level communication and control in a distributed problem solver. IEEE Transactions on Computers, C-29(12), 1104–1113.
Every orchestration parameter is empirically optimized—not guessed, not copied from a framework default. The full study drives the product.
Start Free Trial