Technical Research: Architecture & Technology Landscape for Quorum

1. Multi-Agent Orchestration Frameworks

Quorum's core is a named AI team (John/PO, Winston/Dev Lead, Mary/UX Researcher, Kinsley/Product Designer, Luca/Motion Designer, Jaymes/Frontend, Damien/Backend when needed, Quinn/QA, Cipher/Security, Bob/Scrum Master; plus Amelia for BMAD story execution) that collaborate with the human conductor. The orchestration framework is the most critical architectural decision.

Framework Landscape (April 2026)

Three dominant frameworks, each with a fundamentally different philosophy:

Framework	Philosophy	GitHub Stars	Production Share
LangGraph	Workflow-first, graph-based state machines	48K	~40% of production deployments
CrewAI	Collaboration-first, role-based agent teams	29K	Growing but fragile at scale
AutoGen	Conversation-first, agent-to-agent messaging	37K (Microsoft)	Strong in code-gen, expensive

Performance Benchmarks (GPT-4o, identical infrastructure)

Metric	LangGraph 0.3	CrewAI 0.80	AutoGen 0.4
Research task (median)	14.1s	18.4s	22.7s
Research task (p95)	19.8s	31.2s	41.5s
Code review (median)	8.3s	9.1s	11.6s
Cost per 1,000 research tasks	$41.70	$48.20	$67.40
Token overhead vs raw API	+9%	+18%	+31%

Framework Analysis for Quorum

LangGraph — Best Fit - Native state persistence and checkpointing — critical for long-running product sessions that span hours/days - Human-in-the-loop is a first-class primitive — maps directly to Quorum's conductor model where humans approve decisions - Cyclic graph support — agents can debate in loops (PO proposes → Dev Lead challenges → PO adjusts → human decides) - LangSmith observability — can expose agent reasoning to users ("show your work") - Durable execution — sessions survive server restarts, network interruptions - Lowest token overhead and cost per task - Steeper learning curve (~55 min to first setup vs. 25 min for CrewAI) — acceptable for a platform team

CrewAI — Interesting but Risky - Role-based agent design maps perfectly to Quorum's team model (assign roles, define collaboration patterns) - Fastest to prototype and most intuitive API - Breaks down in complex, long-running workflows — Quorum sessions are exactly this - Limited checkpointing — can't reliably pause/resume multi-hour sessions - Higher token overhead than LangGraph

AutoGen — Poor Fit - Conversational model accumulates context across multi-turn interactions → expensive (+31% overhead) - Dynamic emergent collaboration is interesting but unpredictable — Quorum needs reliable, repeatable agent behavior - Microsoft-backed but less mature in production deployments

OpenAI Agents SDK — Worth Watching

OpenAI's Agents SDK (upgraded from Swarm) uses five primitives: agents, tools, handoffs, guardrails, and tracing. Two key patterns: - Agents as tools: Manager agent calls specialists and combines outputs — maps to Quorum's John (PO) coordinating the team - Handoffs: Triage agent routes to specialists who speak directly to the user — maps to Quorum's routing model

Available in both Python and TypeScript (2026). Simpler than LangGraph but less mature for complex multi-agent choreography. Could be a future migration target if the SDK matures.

Four Production Orchestration Patterns

Triage/Router — One agent classifies and routes to specialists (cheapest, uses small model for routing)
Fan-out/Fan-in — Multiple agents work simultaneously, results merged (good for parallel pillar analysis)
Supervisor/Hierarchical — Central agent delegates and reviews (maps to Quorum's agent hierarchy)
Pipeline/Sequential — Agents execute in fixed order (good for vision → filter → prioritize flow)

Quorum likely needs a hybrid: supervisor pattern for the team (John / PO as team lead), fan-out for three-pillar analysis (Desirability/Feasibility/Viability evaluated in parallel), and triage for feedback loop routing.

Recommendation

LangGraph as primary orchestration framework. It's the only framework with native state persistence, human-in-the-loop, and durable execution — all non-negotiable for Quorum. CrewAI's role-based design is more intuitive, but its production fragility is disqualifying for a platform where sessions last hours and span days.

2. LLM Selection & Cost Modeling

Current Pricing (April 2026)

Model	Input/1M tokens	Output/1M tokens	Context Window	Best For
GPT-4o	$2.50	$10.00	128K	Balanced performance/cost
Claude 3.5 Sonnet	$3.00	$15.00	200K	Code gen, nuanced reasoning
Gemini 1.5 Pro	$1.25	$5.00	2M	Long context, cost-sensitive
GPT-4o-mini	$0.15	$0.60	128K	Routing, classification, simple tasks
Claude 3 Haiku	$0.25	$1.25	200K	Fast, cheap structured tasks
Gemini 1.5 Flash	$0.075	$0.30	1M	Cheapest, massive context

Cost Modeling for Quorum

Key insight: Output tokens are 3–8x more expensive than input tokens. Quorum's agents generate substantial output (analysis, recommendations, debate), so output cost dominates.

Tiered model strategy (use the right model for each task):

Agent Task	Model Tier	Rationale
Routing/triage (feedback → specialist)	Mini/Flash ($0.15–0.075/M in)	Simple classification, high volume
Bob / SM (ceremony prep, status)	Mini/Flash	Structured, template-driven output
Complexity scan, sprint impact	Mid-tier (GPT-4o / Gemini Pro)	Needs reasoning but formulaic
John / PO (strategy, prioritization)	Full-tier (GPT-4o / Claude Sonnet)	Nuanced judgment, stakeholder-facing
Mary / UX Researcher (synthesis)	Full-tier + long context	Complex synthesis across many signals
Kinsley / Product Designer	Full-tier	Creative, needs quality output
Luca / Motion Designer	Full-tier	Temporal design, precise specs
Jaymes / Frontend, Damien / Backend	Full-tier	Implementation direction for coding tools
Winston / Dev Lead (estimates, challenges)	Full-tier	Technical reasoning, code awareness
Quinn / QA, Cipher / Security	Mid–Full	Validation and audit depth varies
Agent-to-agent debate	Full-tier	Core differentiator, must be quality

Estimated cost per user session (1-hour active session with full named team — adjust for which agents are active):

Assumptions: ~50K input tokens (context, history, documents), ~20K output tokens (agent responses, debates, recommendations), tiered model usage.

Scenario	Estimated Cost
All GPT-4o	~$0.33 per session
Tiered (mini routing + GPT-4o reasoning)	~$0.18 per session
All Gemini 1.5 Pro	~$0.16 per session
Tiered (Flash routing + Gemini Pro reasoning)	~$0.09 per session

At 20 sessions/month per user: $1.80–$6.60/month in LLM costs per user. This is sustainable at a $20–30/mo solo tier price point with healthy margins.

Cost reduction strategies: - Aggressive caching of common agent patterns (complexity scans, ceremony templates) - Prompt compression for repeated context across sessions - Summary-based context management (don't replay full history, summarize) - Model routing: cheap models for 60–70% of tasks, expensive models only for judgment-heavy work

3. Web Platform Architecture

Frontend Framework

Framework	Strengths for Quorum	Risks
Next.js 16	Largest ecosystem (132K stars, ~68% production usage), React Server Components for streaming agent responses, Turbopack for fast dev, massive talent pool	Vercel-optimized (deploy flexibility concerns), heavier bundle (~566KB)
Remix / React Router 7	Built on web standards, superior data loading model (loaders/actions), 35% smaller bundle, 10x faster HMR, no vendor lock-in, Shopify-backed	Smaller ecosystem, fewer commercial boilerplates
SvelteKit	Lightest bundle, growing enterprise adoption, excellent DX	Smallest talent pool, less mature ecosystem

For Quorum's team room GUI: The interface is primarily conversational with streaming agent responses, drag-and-drop for findings/boards, and real-time multi-user collaboration (enterprise). This is a highly interactive, real-time-heavy application.

Recommendation: Next.js for ecosystem breadth, streaming support via React Server Components, and hiring pool. Remix is a strong alternative if vendor independence is prioritized.

Real-Time Architecture

The team room requires real-time updates: agent responses streaming, board state syncing, finding status changes.

Technology	Latency	Use Case in Quorum
WebSockets	10–50ms	Agent response streaming, team room state sync, agent-to-agent debate display
Server-Sent Events (SSE)	50–100ms	Notifications, digest delivery, background status updates
HTTP/3	100–200ms	Standard API calls, document CRUD

Architecture pattern: - WebSocket connections for active sessions (agent conversations, real-time board) - Redis Pub/Sub as message broker for horizontal scaling (multiple server instances) - SSE for background notifications (feedback digest ready, finding status changes) - Optimistic UI updates on client, confirmed by server

Conversational UI Frameworks

Several React frameworks support building chat interfaces for AI agents:

OpenAI ChatKit — batteries-included, streaming, tool display, rich widgets, source annotations
Tambo AI — generative UI where agents can dynamically render registered React components (useful for agents presenting data visualizations, boards, grids)
Hexos — pre-built components (ChatWindow, MessageList, StreamingIndicator), multi-agent support, theming
AGNO Agent UI — streaming-enabled, integrates with orchestration libraries

For Quorum's team room: The UI is more than chat — it's a spatial team room where agents have presence, boards are visible, and findings can be dragged between agents. This likely requires a custom UI layer built on top of a streaming chat primitive (ChatKit or Tambo for the conversational parts, custom components for boards/grids/spatial elements).

4. Data Architecture

Core Data Model

Quorum manages several interconnected data types:

Data Type	Characteristics	Storage Need
Vision & features	Structured, versioned, filtered through pillars	Relational (PostgreSQL)
Agent conversations	Append-only, streaming, long-running	Event log + conversation store
Findings (feedback loop)	Lifecycle states, transitions, audit trail	Event-sourced with projections
Decision records	Immutable, rationale + projected impact	Append-only with references
Sprint/board state	Mutable, real-time, multi-user	Relational + real-time sync
Agent memory	Cross-session context, user preferences	Vector store + relational
Documents (briefs, PRDs)	Rich text, versioned, collaborative	Document store or relational

Event Sourcing for Findings & Decisions

The finding lifecycle (Detected → Triaged → Batched → Ceremony-Ready → Approved/Deferred/Killed) and decision records are natural fits for event sourcing:

Append-only event log: every state transition logged with who/why/when — full audit trail
Projections: materialized views for current state (sprint board, backlog, hospital, morgue)
Time-travel: replay decision history ("when you evaluated in Sprint 12, cost was X")
Deterministic replay: debug agent behavior by replaying events

PostgreSQL supports this natively via JSONB payloads, table partitioning, LISTEN/NOTIFY for pub/sub, and partial indexes — no specialized event store needed.

Agent Memory & Context Management

Agents need to remember across sessions — Winston (Dev Lead) learns estimation patterns, Mary remembers user segments, John knows stakeholder preferences.

Layered memory architecture: - Working memory: current session context (conversation history, active findings) — in-memory + Redis - Episodic memory: timestamped interaction history — PostgreSQL with efficient retrieval - Semantic memory: learned facts, preferences, patterns — PostgreSQL + pgvector for similarity search - Procedural memory: learned behaviors (estimation calibration, user proficiency models) — structured storage

pgvector enables semantic search within PostgreSQL itself, eliminating the need for a separate vector database. "What did Winston (Dev Lead) say about payment service complexity last sprint?" becomes a vector similarity query.

Token budgeting is critical: agents can't load all memory into context. The system must intelligently select relevant memories, compress/summarize old context, and respect token limits per model.

Recommendation

PostgreSQL as the primary database with pgvector extension for semantic search. Event sourcing for findings/decisions (append-only tables with JSONB). Standard relational tables for features, sprints, boards, users. Redis for real-time state and pub/sub.

5. External Integration Architecture

JIRA / Linear Bidirectional Sync

Linear already has native Jira Sync (two-way sync of issues, assignees, status, labels, priority, comments). This validates the pattern and provides a reference implementation.

Architecture for Quorum's enterprise integration:

Component	Approach
Inbound (JIRA/Linear → Quorum)	Webhooks: JIRA/Linear push changes via HTTP POST. Quorum processes and updates internal state.
Outbound (Quorum → JIRA/Linear)	REST API: Quorum creates/updates tickets via API when findings are approved for sprint.
Conflict resolution	Quorum detects conflicts (e.g., "Ticket moved to Done in JIRA but AI shows tests haven't passed") and surfaces to user.
Field mapping	Configurable per tenant: map Quorum finding types to JIRA issue types, statuses, priorities.
Sync scope	Only feedback-loop-approved items sync out. Sprint board state syncs bidirectionally.

Key design decisions: - Webhook processing must be idempotent (JIRA/Linear may retry) - Queue-based processing (don't process webhooks synchronously — buffer in a job queue) - Conflict detection before overwrite — never silently lose data - Per-tenant API credentials stored encrypted, scoped to specific JIRA/Linear projects

MCP (Model Context Protocol) Integration

MCP is emerging as the standard for AI agent tool integration. Quorum's agents could expose tools via MCP for extensibility — allowing third-party agents or tools to interact with Quorum's data.

6. Multi-Tenant SaaS Architecture

Tenant Isolation Patterns

Pattern	Tenants	Isolation	Cost	Fit for Quorum
Shared model, isolated context	<100	Logical (per-tenant prompts/memory)	Low	Solo tier
Agent pool	100–1,000	Process-level (pre-configured agents per tenant)	Medium	Growth phase
Tenant-specific deployment	Enterprise	Complete infrastructure isolation	High	Enterprise tier
Hybrid namespace	All	Shared infra, logical data separation	Medium	Default approach

Recommended: Hybrid namespace as the default, with tenant-specific deployment available for enterprise customers who require it.

Critical Multi-Tenant Challenges

Data leakage prevention: 68% of organizations have experienced AI-related data leaks. Every RAG index, vector store query, file access, and agent memory must be scoped to a tenant workspace ID. This is non-negotiable.
Noisy neighbor: One tenant's heavy AI workload (e.g., running full three-pillar analysis on 50 features) can saturate LLM API rate limits and degrade others. Requires per-tenant quotas and rate limiting.
Unbounded agent work: Agents can loop through tool calls or accumulate unbounded token counts. Explicit per-session and per-tenant cost caps required.
Access control complexity: AI agents need five identity layers — trigger identity (who started it), execution identity (what credentials it uses), authorization identity (what it's allowed to do), tenant identity (whose data it accesses), and channel identity (where results go).

Microservice Decomposition

Decompose the agent lifecycle into distinct services:

Service	Responsibility
Gateway	API handling, auth, rate limiting, tenant routing
Orchestrator	LLM orchestration (LangGraph), agent lifecycle, state management
Scheduler	Feedback loop timing, ceremony scheduling, digest generation
Memory	Agent memory, conversation history, semantic search
Sandbox	Tool execution isolation (JIRA API calls, web research)
Board/State	Sprint board, backlog, finding lifecycle, real-time sync

7. Infrastructure & Deployment

Recommended Stack Summary

Layer	Technology	Rationale
Frontend	Next.js 16 + React	Largest ecosystem, streaming RSC, real-time capable
Conversational UI	Custom + ChatKit/Tambo primitives	Team room is more than chat — needs spatial elements
Real-time	WebSockets + Redis Pub/Sub	Sub-50ms agent streaming, horizontal scaling
Agent orchestration	LangGraph	State persistence, human-in-the-loop, durable execution
LLM providers	Multi-provider (OpenAI + Anthropic + Google)	Tiered model strategy, redundancy, cost optimization
Primary database	PostgreSQL + pgvector	Relational + vector search + event sourcing in one
Cache/pubsub	Redis	Session state, real-time messaging, agent working memory
Job queue	Redis-based (BullMQ) or PostgreSQL (Graphile Worker)	Webhook processing, async agent tasks, digest generation
Auth	Clerk or Auth.js	Multi-tenant, team management, SSO for enterprise
Deployment	Vercel (frontend) + Railway/Fly.io (services) or AWS	Start simple, scale to dedicated infra
Observability	LangSmith (agents) + Sentry (app) + PostHog (analytics)	Agent debugging, error tracking, usage analytics

Scaling Considerations

Stateless frontend on edge (Vercel) — scales horizontally automatically
Orchestrator is the bottleneck — scales by tenant partitioning and agent pool sizing
LLM API rate limits are the ceiling — multi-provider strategy with failover
Database scales vertically first (PostgreSQL handles significant load), then read replicas
WebSocket connections scale via Redis Pub/Sub (any server can serve any user)

8. Key Technical Risks & Open Questions

Risk	Severity	Mitigation
LLM cost unpredictability	High	Tiered model strategy, aggressive caching, per-tenant cost caps, monitoring
Agent quality consistency	High	Extensive prompt engineering, evaluation suites, A/B testing agent behaviors
Multi-agent debate reliability	Medium	Structured debate protocols (not free-form), bounded turn limits, fallback to single-agent
Context window limits	Medium	Summary-based memory management, token budgeting, selective memory retrieval
Data leakage between tenants	Critical	Workspace-scoped everything, automated isolation testing, security audits
LangGraph vendor dependency	Medium	Abstract orchestration behind internal interfaces, monitor OpenAI Agents SDK maturity
Real-time scaling	Medium	Redis Pub/Sub proven pattern, but WebSocket connection management at scale needs planning
JIRA/Linear API rate limits	Low	Queue-based sync, batching, respect rate limits, exponential backoff

Open Technical Questions

Agent personality persistence — How to maintain consistent agent personalities across sessions without consuming excessive context? Personality as system prompt vs. fine-tuned model vs. few-shot examples?
Debate protocol design — How structured should agent-to-agent debates be? Fully scripted turns vs. emergent conversation? How to prevent infinite loops or repetitive arguments?
Estimation learning — How does Winston (Dev Lead)'s estimate accuracy tracking feed back into future estimates? Online learning vs. batch recalibration?
Feedback loop data sources — What analytics events constitute "passive signals"? Integration with Mixpanel/Amplitude/PostHog, or build-in analytics?
Offline/async agent work — Can agents do background work (research, synthesis) between user sessions? Implications for cost and notification design.
Three-pillar filter execution — What specific data sources and methods power each pillar? Desirability: user interviews + analytics? Feasibility: codebase analysis + complexity heuristics? Viability: market data + financial modeling?