Technical Research: Architecture & Technology Landscape for Quorum

1. Multi-Agent Orchestration Frameworks

Quorum's core is a named AI team (John/PO, Winston/Dev Lead, Mary/UX Researcher, Kinsley/Product Designer, Luca/Motion Designer, Jaymes/Frontend, Damien/Backend when needed, Quinn/QA, Cipher/Security, Bob/Scrum Master; plus Amelia for BMAD story execution) that collaborate with the human conductor. The orchestration framework is the most critical architectural decision.

Framework Landscape (April 2026)

Three dominant frameworks, each with a fundamentally different philosophy:

Framework Philosophy GitHub Stars Production Share
LangGraph Workflow-first, graph-based state machines 48K ~40% of production deployments
CrewAI Collaboration-first, role-based agent teams 29K Growing but fragile at scale
AutoGen Conversation-first, agent-to-agent messaging 37K (Microsoft) Strong in code-gen, expensive

Performance Benchmarks (GPT-4o, identical infrastructure)

Metric LangGraph 0.3 CrewAI 0.80 AutoGen 0.4
Research task (median) 14.1s 18.4s 22.7s
Research task (p95) 19.8s 31.2s 41.5s
Code review (median) 8.3s 9.1s 11.6s
Cost per 1,000 research tasks $41.70 $48.20 $67.40
Token overhead vs raw API +9% +18% +31%

Framework Analysis for Quorum

LangGraph — Best Fit - Native state persistence and checkpointing — critical for long-running product sessions that span hours/days - Human-in-the-loop is a first-class primitive — maps directly to Quorum's conductor model where humans approve decisions - Cyclic graph support — agents can debate in loops (PO proposes → Dev Lead challenges → PO adjusts → human decides) - LangSmith observability — can expose agent reasoning to users ("show your work") - Durable execution — sessions survive server restarts, network interruptions - Lowest token overhead and cost per task - Steeper learning curve (~55 min to first setup vs. 25 min for CrewAI) — acceptable for a platform team

CrewAI — Interesting but Risky - Role-based agent design maps perfectly to Quorum's team model (assign roles, define collaboration patterns) - Fastest to prototype and most intuitive API - Breaks down in complex, long-running workflows — Quorum sessions are exactly this - Limited checkpointing — can't reliably pause/resume multi-hour sessions - Higher token overhead than LangGraph

AutoGen — Poor Fit - Conversational model accumulates context across multi-turn interactions → expensive (+31% overhead) - Dynamic emergent collaboration is interesting but unpredictable — Quorum needs reliable, repeatable agent behavior - Microsoft-backed but less mature in production deployments

OpenAI Agents SDK — Worth Watching

OpenAI's Agents SDK (upgraded from Swarm) uses five primitives: agents, tools, handoffs, guardrails, and tracing. Two key patterns: - Agents as tools: Manager agent calls specialists and combines outputs — maps to Quorum's John (PO) coordinating the team - Handoffs: Triage agent routes to specialists who speak directly to the user — maps to Quorum's routing model

Available in both Python and TypeScript (2026). Simpler than LangGraph but less mature for complex multi-agent choreography. Could be a future migration target if the SDK matures.

Four Production Orchestration Patterns

  1. Triage/Router — One agent classifies and routes to specialists (cheapest, uses small model for routing)
  2. Fan-out/Fan-in — Multiple agents work simultaneously, results merged (good for parallel pillar analysis)
  3. Supervisor/Hierarchical — Central agent delegates and reviews (maps to Quorum's agent hierarchy)
  4. Pipeline/Sequential — Agents execute in fixed order (good for vision → filter → prioritize flow)

Quorum likely needs a hybrid: supervisor pattern for the team (John / PO as team lead), fan-out for three-pillar analysis (Desirability/Feasibility/Viability evaluated in parallel), and triage for feedback loop routing.

Recommendation

LangGraph as primary orchestration framework. It's the only framework with native state persistence, human-in-the-loop, and durable execution — all non-negotiable for Quorum. CrewAI's role-based design is more intuitive, but its production fragility is disqualifying for a platform where sessions last hours and span days.


2. LLM Selection & Cost Modeling

Current Pricing (April 2026)

Model Input/1M tokens Output/1M tokens Context Window Best For
GPT-4o $2.50 $10.00 128K Balanced performance/cost
Claude 3.5 Sonnet $3.00 $15.00 200K Code gen, nuanced reasoning
Gemini 1.5 Pro $1.25 $5.00 2M Long context, cost-sensitive
GPT-4o-mini $0.15 $0.60 128K Routing, classification, simple tasks
Claude 3 Haiku $0.25 $1.25 200K Fast, cheap structured tasks
Gemini 1.5 Flash $0.075 $0.30 1M Cheapest, massive context

Cost Modeling for Quorum

Key insight: Output tokens are 3–8x more expensive than input tokens. Quorum's agents generate substantial output (analysis, recommendations, debate), so output cost dominates.

Tiered model strategy (use the right model for each task):

Agent Task Model Tier Rationale
Routing/triage (feedback → specialist) Mini/Flash ($0.15–0.075/M in) Simple classification, high volume
Bob / SM (ceremony prep, status) Mini/Flash Structured, template-driven output
Complexity scan, sprint impact Mid-tier (GPT-4o / Gemini Pro) Needs reasoning but formulaic
John / PO (strategy, prioritization) Full-tier (GPT-4o / Claude Sonnet) Nuanced judgment, stakeholder-facing
Mary / UX Researcher (synthesis) Full-tier + long context Complex synthesis across many signals
Kinsley / Product Designer Full-tier Creative, needs quality output
Luca / Motion Designer Full-tier Temporal design, precise specs
Jaymes / Frontend, Damien / Backend Full-tier Implementation direction for coding tools
Winston / Dev Lead (estimates, challenges) Full-tier Technical reasoning, code awareness
Quinn / QA, Cipher / Security Mid–Full Validation and audit depth varies
Agent-to-agent debate Full-tier Core differentiator, must be quality

Estimated cost per user session (1-hour active session with full named team — adjust for which agents are active):

Assumptions: ~50K input tokens (context, history, documents), ~20K output tokens (agent responses, debates, recommendations), tiered model usage.

Scenario Estimated Cost
All GPT-4o ~$0.33 per session
Tiered (mini routing + GPT-4o reasoning) ~$0.18 per session
All Gemini 1.5 Pro ~$0.16 per session
Tiered (Flash routing + Gemini Pro reasoning) ~$0.09 per session

At 20 sessions/month per user: $1.80–$6.60/month in LLM costs per user. This is sustainable at a $20–30/mo solo tier price point with healthy margins.

Cost reduction strategies: - Aggressive caching of common agent patterns (complexity scans, ceremony templates) - Prompt compression for repeated context across sessions - Summary-based context management (don't replay full history, summarize) - Model routing: cheap models for 60–70% of tasks, expensive models only for judgment-heavy work


3. Web Platform Architecture

Frontend Framework

Framework Strengths for Quorum Risks
Next.js 16 Largest ecosystem (132K stars, ~68% production usage), React Server Components for streaming agent responses, Turbopack for fast dev, massive talent pool Vercel-optimized (deploy flexibility concerns), heavier bundle (~566KB)
Remix / React Router 7 Built on web standards, superior data loading model (loaders/actions), 35% smaller bundle, 10x faster HMR, no vendor lock-in, Shopify-backed Smaller ecosystem, fewer commercial boilerplates
SvelteKit Lightest bundle, growing enterprise adoption, excellent DX Smallest talent pool, less mature ecosystem

For Quorum's team room GUI: The interface is primarily conversational with streaming agent responses, drag-and-drop for findings/boards, and real-time multi-user collaboration (enterprise). This is a highly interactive, real-time-heavy application.

Recommendation: Next.js for ecosystem breadth, streaming support via React Server Components, and hiring pool. Remix is a strong alternative if vendor independence is prioritized.

Real-Time Architecture

The team room requires real-time updates: agent responses streaming, board state syncing, finding status changes.

Technology Latency Use Case in Quorum
WebSockets 10–50ms Agent response streaming, team room state sync, agent-to-agent debate display
Server-Sent Events (SSE) 50–100ms Notifications, digest delivery, background status updates
HTTP/3 100–200ms Standard API calls, document CRUD

Architecture pattern: - WebSocket connections for active sessions (agent conversations, real-time board) - Redis Pub/Sub as message broker for horizontal scaling (multiple server instances) - SSE for background notifications (feedback digest ready, finding status changes) - Optimistic UI updates on client, confirmed by server

Conversational UI Frameworks

Several React frameworks support building chat interfaces for AI agents:

For Quorum's team room: The UI is more than chat — it's a spatial team room where agents have presence, boards are visible, and findings can be dragged between agents. This likely requires a custom UI layer built on top of a streaming chat primitive (ChatKit or Tambo for the conversational parts, custom components for boards/grids/spatial elements).


4. Data Architecture

Core Data Model

Quorum manages several interconnected data types:

Data Type Characteristics Storage Need
Vision & features Structured, versioned, filtered through pillars Relational (PostgreSQL)
Agent conversations Append-only, streaming, long-running Event log + conversation store
Findings (feedback loop) Lifecycle states, transitions, audit trail Event-sourced with projections
Decision records Immutable, rationale + projected impact Append-only with references
Sprint/board state Mutable, real-time, multi-user Relational + real-time sync
Agent memory Cross-session context, user preferences Vector store + relational
Documents (briefs, PRDs) Rich text, versioned, collaborative Document store or relational

Event Sourcing for Findings & Decisions

The finding lifecycle (Detected → Triaged → Batched → Ceremony-Ready → Approved/Deferred/Killed) and decision records are natural fits for event sourcing:

PostgreSQL supports this natively via JSONB payloads, table partitioning, LISTEN/NOTIFY for pub/sub, and partial indexes — no specialized event store needed.

Agent Memory & Context Management

Agents need to remember across sessions — Winston (Dev Lead) learns estimation patterns, Mary remembers user segments, John knows stakeholder preferences.

Layered memory architecture: - Working memory: current session context (conversation history, active findings) — in-memory + Redis - Episodic memory: timestamped interaction history — PostgreSQL with efficient retrieval - Semantic memory: learned facts, preferences, patterns — PostgreSQL + pgvector for similarity search - Procedural memory: learned behaviors (estimation calibration, user proficiency models) — structured storage

pgvector enables semantic search within PostgreSQL itself, eliminating the need for a separate vector database. "What did Winston (Dev Lead) say about payment service complexity last sprint?" becomes a vector similarity query.

Token budgeting is critical: agents can't load all memory into context. The system must intelligently select relevant memories, compress/summarize old context, and respect token limits per model.

Recommendation

PostgreSQL as the primary database with pgvector extension for semantic search. Event sourcing for findings/decisions (append-only tables with JSONB). Standard relational tables for features, sprints, boards, users. Redis for real-time state and pub/sub.


5. External Integration Architecture

JIRA / Linear Bidirectional Sync

Linear already has native Jira Sync (two-way sync of issues, assignees, status, labels, priority, comments). This validates the pattern and provides a reference implementation.

Architecture for Quorum's enterprise integration:

Component Approach
Inbound (JIRA/Linear → Quorum) Webhooks: JIRA/Linear push changes via HTTP POST. Quorum processes and updates internal state.
Outbound (Quorum → JIRA/Linear) REST API: Quorum creates/updates tickets via API when findings are approved for sprint.
Conflict resolution Quorum detects conflicts (e.g., "Ticket moved to Done in JIRA but AI shows tests haven't passed") and surfaces to user.
Field mapping Configurable per tenant: map Quorum finding types to JIRA issue types, statuses, priorities.
Sync scope Only feedback-loop-approved items sync out. Sprint board state syncs bidirectionally.

Key design decisions: - Webhook processing must be idempotent (JIRA/Linear may retry) - Queue-based processing (don't process webhooks synchronously — buffer in a job queue) - Conflict detection before overwrite — never silently lose data - Per-tenant API credentials stored encrypted, scoped to specific JIRA/Linear projects

MCP (Model Context Protocol) Integration

MCP is emerging as the standard for AI agent tool integration. Quorum's agents could expose tools via MCP for extensibility — allowing third-party agents or tools to interact with Quorum's data.


6. Multi-Tenant SaaS Architecture

Tenant Isolation Patterns

Pattern Tenants Isolation Cost Fit for Quorum
Shared model, isolated context <100 Logical (per-tenant prompts/memory) Low Solo tier
Agent pool 100–1,000 Process-level (pre-configured agents per tenant) Medium Growth phase
Tenant-specific deployment Enterprise Complete infrastructure isolation High Enterprise tier
Hybrid namespace All Shared infra, logical data separation Medium Default approach

Recommended: Hybrid namespace as the default, with tenant-specific deployment available for enterprise customers who require it.

Critical Multi-Tenant Challenges

  1. Data leakage prevention: 68% of organizations have experienced AI-related data leaks. Every RAG index, vector store query, file access, and agent memory must be scoped to a tenant workspace ID. This is non-negotiable.

  2. Noisy neighbor: One tenant's heavy AI workload (e.g., running full three-pillar analysis on 50 features) can saturate LLM API rate limits and degrade others. Requires per-tenant quotas and rate limiting.

  3. Unbounded agent work: Agents can loop through tool calls or accumulate unbounded token counts. Explicit per-session and per-tenant cost caps required.

  4. Access control complexity: AI agents need five identity layers — trigger identity (who started it), execution identity (what credentials it uses), authorization identity (what it's allowed to do), tenant identity (whose data it accesses), and channel identity (where results go).

Microservice Decomposition

Decompose the agent lifecycle into distinct services:

Service Responsibility
Gateway API handling, auth, rate limiting, tenant routing
Orchestrator LLM orchestration (LangGraph), agent lifecycle, state management
Scheduler Feedback loop timing, ceremony scheduling, digest generation
Memory Agent memory, conversation history, semantic search
Sandbox Tool execution isolation (JIRA API calls, web research)
Board/State Sprint board, backlog, finding lifecycle, real-time sync

7. Infrastructure & Deployment

Layer Technology Rationale
Frontend Next.js 16 + React Largest ecosystem, streaming RSC, real-time capable
Conversational UI Custom + ChatKit/Tambo primitives Team room is more than chat — needs spatial elements
Real-time WebSockets + Redis Pub/Sub Sub-50ms agent streaming, horizontal scaling
Agent orchestration LangGraph State persistence, human-in-the-loop, durable execution
LLM providers Multi-provider (OpenAI + Anthropic + Google) Tiered model strategy, redundancy, cost optimization
Primary database PostgreSQL + pgvector Relational + vector search + event sourcing in one
Cache/pubsub Redis Session state, real-time messaging, agent working memory
Job queue Redis-based (BullMQ) or PostgreSQL (Graphile Worker) Webhook processing, async agent tasks, digest generation
Auth Clerk or Auth.js Multi-tenant, team management, SSO for enterprise
Deployment Vercel (frontend) + Railway/Fly.io (services) or AWS Start simple, scale to dedicated infra
Observability LangSmith (agents) + Sentry (app) + PostHog (analytics) Agent debugging, error tracking, usage analytics

Scaling Considerations


8. Key Technical Risks & Open Questions

Risk Severity Mitigation
LLM cost unpredictability High Tiered model strategy, aggressive caching, per-tenant cost caps, monitoring
Agent quality consistency High Extensive prompt engineering, evaluation suites, A/B testing agent behaviors
Multi-agent debate reliability Medium Structured debate protocols (not free-form), bounded turn limits, fallback to single-agent
Context window limits Medium Summary-based memory management, token budgeting, selective memory retrieval
Data leakage between tenants Critical Workspace-scoped everything, automated isolation testing, security audits
LangGraph vendor dependency Medium Abstract orchestration behind internal interfaces, monitor OpenAI Agents SDK maturity
Real-time scaling Medium Redis Pub/Sub proven pattern, but WebSocket connection management at scale needs planning
JIRA/Linear API rate limits Low Queue-based sync, batching, respect rate limits, exponential backoff

Open Technical Questions

  1. Agent personality persistence — How to maintain consistent agent personalities across sessions without consuming excessive context? Personality as system prompt vs. fine-tuned model vs. few-shot examples?
  2. Debate protocol design — How structured should agent-to-agent debates be? Fully scripted turns vs. emergent conversation? How to prevent infinite loops or repetitive arguments?
  3. Estimation learning — How does Winston (Dev Lead)'s estimate accuracy tracking feed back into future estimates? Online learning vs. batch recalibration?
  4. Feedback loop data sources — What analytics events constitute "passive signals"? Integration with Mixpanel/Amplitude/PostHog, or build-in analytics?
  5. Offline/async agent work — Can agents do background work (research, synthesis) between user sessions? Implications for cost and notification design.
  6. Three-pillar filter execution — What specific data sources and methods power each pillar? Desirability: user interviews + analytics? Feasibility: codebase analysis + complexity heuristics? Viability: market data + financial modeling?