Quorum's core is a named AI team (John/PO, Winston/Dev Lead, Mary/UX Researcher, Kinsley/Product Designer, Luca/Motion Designer, Jaymes/Frontend, Damien/Backend when needed, Quinn/QA, Cipher/Security, Bob/Scrum Master; plus Amelia for BMAD story execution) that collaborate with the human conductor. The orchestration framework is the most critical architectural decision.
Three dominant frameworks, each with a fundamentally different philosophy:
| Framework | Philosophy | GitHub Stars | Production Share |
|---|---|---|---|
| LangGraph | Workflow-first, graph-based state machines | 48K | ~40% of production deployments |
| CrewAI | Collaboration-first, role-based agent teams | 29K | Growing but fragile at scale |
| AutoGen | Conversation-first, agent-to-agent messaging | 37K (Microsoft) | Strong in code-gen, expensive |
| Metric | LangGraph 0.3 | CrewAI 0.80 | AutoGen 0.4 |
|---|---|---|---|
| Research task (median) | 14.1s | 18.4s | 22.7s |
| Research task (p95) | 19.8s | 31.2s | 41.5s |
| Code review (median) | 8.3s | 9.1s | 11.6s |
| Cost per 1,000 research tasks | $41.70 | $48.20 | $67.40 |
| Token overhead vs raw API | +9% | +18% | +31% |
LangGraph — Best Fit - Native state persistence and checkpointing — critical for long-running product sessions that span hours/days - Human-in-the-loop is a first-class primitive — maps directly to Quorum's conductor model where humans approve decisions - Cyclic graph support — agents can debate in loops (PO proposes → Dev Lead challenges → PO adjusts → human decides) - LangSmith observability — can expose agent reasoning to users ("show your work") - Durable execution — sessions survive server restarts, network interruptions - Lowest token overhead and cost per task - Steeper learning curve (~55 min to first setup vs. 25 min for CrewAI) — acceptable for a platform team
CrewAI — Interesting but Risky - Role-based agent design maps perfectly to Quorum's team model (assign roles, define collaboration patterns) - Fastest to prototype and most intuitive API - Breaks down in complex, long-running workflows — Quorum sessions are exactly this - Limited checkpointing — can't reliably pause/resume multi-hour sessions - Higher token overhead than LangGraph
AutoGen — Poor Fit - Conversational model accumulates context across multi-turn interactions → expensive (+31% overhead) - Dynamic emergent collaboration is interesting but unpredictable — Quorum needs reliable, repeatable agent behavior - Microsoft-backed but less mature in production deployments
OpenAI's Agents SDK (upgraded from Swarm) uses five primitives: agents, tools, handoffs, guardrails, and tracing. Two key patterns: - Agents as tools: Manager agent calls specialists and combines outputs — maps to Quorum's John (PO) coordinating the team - Handoffs: Triage agent routes to specialists who speak directly to the user — maps to Quorum's routing model
Available in both Python and TypeScript (2026). Simpler than LangGraph but less mature for complex multi-agent choreography. Could be a future migration target if the SDK matures.
Quorum likely needs a hybrid: supervisor pattern for the team (John / PO as team lead), fan-out for three-pillar analysis (Desirability/Feasibility/Viability evaluated in parallel), and triage for feedback loop routing.
LangGraph as primary orchestration framework. It's the only framework with native state persistence, human-in-the-loop, and durable execution — all non-negotiable for Quorum. CrewAI's role-based design is more intuitive, but its production fragility is disqualifying for a platform where sessions last hours and span days.
| Model | Input/1M tokens | Output/1M tokens | Context Window | Best For |
|---|---|---|---|---|
| GPT-4o | $2.50 | $10.00 | 128K | Balanced performance/cost |
| Claude 3.5 Sonnet | $3.00 | $15.00 | 200K | Code gen, nuanced reasoning |
| Gemini 1.5 Pro | $1.25 | $5.00 | 2M | Long context, cost-sensitive |
| GPT-4o-mini | $0.15 | $0.60 | 128K | Routing, classification, simple tasks |
| Claude 3 Haiku | $0.25 | $1.25 | 200K | Fast, cheap structured tasks |
| Gemini 1.5 Flash | $0.075 | $0.30 | 1M | Cheapest, massive context |
Key insight: Output tokens are 3–8x more expensive than input tokens. Quorum's agents generate substantial output (analysis, recommendations, debate), so output cost dominates.
Tiered model strategy (use the right model for each task):
| Agent Task | Model Tier | Rationale |
|---|---|---|
| Routing/triage (feedback → specialist) | Mini/Flash ($0.15–0.075/M in) | Simple classification, high volume |
| Bob / SM (ceremony prep, status) | Mini/Flash | Structured, template-driven output |
| Complexity scan, sprint impact | Mid-tier (GPT-4o / Gemini Pro) | Needs reasoning but formulaic |
| John / PO (strategy, prioritization) | Full-tier (GPT-4o / Claude Sonnet) | Nuanced judgment, stakeholder-facing |
| Mary / UX Researcher (synthesis) | Full-tier + long context | Complex synthesis across many signals |
| Kinsley / Product Designer | Full-tier | Creative, needs quality output |
| Luca / Motion Designer | Full-tier | Temporal design, precise specs |
| Jaymes / Frontend, Damien / Backend | Full-tier | Implementation direction for coding tools |
| Winston / Dev Lead (estimates, challenges) | Full-tier | Technical reasoning, code awareness |
| Quinn / QA, Cipher / Security | Mid–Full | Validation and audit depth varies |
| Agent-to-agent debate | Full-tier | Core differentiator, must be quality |
Estimated cost per user session (1-hour active session with full named team — adjust for which agents are active):
Assumptions: ~50K input tokens (context, history, documents), ~20K output tokens (agent responses, debates, recommendations), tiered model usage.
| Scenario | Estimated Cost |
|---|---|
| All GPT-4o | ~$0.33 per session |
| Tiered (mini routing + GPT-4o reasoning) | ~$0.18 per session |
| All Gemini 1.5 Pro | ~$0.16 per session |
| Tiered (Flash routing + Gemini Pro reasoning) | ~$0.09 per session |
At 20 sessions/month per user: $1.80–$6.60/month in LLM costs per user. This is sustainable at a $20–30/mo solo tier price point with healthy margins.
Cost reduction strategies: - Aggressive caching of common agent patterns (complexity scans, ceremony templates) - Prompt compression for repeated context across sessions - Summary-based context management (don't replay full history, summarize) - Model routing: cheap models for 60–70% of tasks, expensive models only for judgment-heavy work
| Framework | Strengths for Quorum | Risks |
|---|---|---|
| Next.js 16 | Largest ecosystem (132K stars, ~68% production usage), React Server Components for streaming agent responses, Turbopack for fast dev, massive talent pool | Vercel-optimized (deploy flexibility concerns), heavier bundle (~566KB) |
| Remix / React Router 7 | Built on web standards, superior data loading model (loaders/actions), 35% smaller bundle, 10x faster HMR, no vendor lock-in, Shopify-backed | Smaller ecosystem, fewer commercial boilerplates |
| SvelteKit | Lightest bundle, growing enterprise adoption, excellent DX | Smallest talent pool, less mature ecosystem |
For Quorum's team room GUI: The interface is primarily conversational with streaming agent responses, drag-and-drop for findings/boards, and real-time multi-user collaboration (enterprise). This is a highly interactive, real-time-heavy application.
Recommendation: Next.js for ecosystem breadth, streaming support via React Server Components, and hiring pool. Remix is a strong alternative if vendor independence is prioritized.
The team room requires real-time updates: agent responses streaming, board state syncing, finding status changes.
| Technology | Latency | Use Case in Quorum |
|---|---|---|
| WebSockets | 10–50ms | Agent response streaming, team room state sync, agent-to-agent debate display |
| Server-Sent Events (SSE) | 50–100ms | Notifications, digest delivery, background status updates |
| HTTP/3 | 100–200ms | Standard API calls, document CRUD |
Architecture pattern: - WebSocket connections for active sessions (agent conversations, real-time board) - Redis Pub/Sub as message broker for horizontal scaling (multiple server instances) - SSE for background notifications (feedback digest ready, finding status changes) - Optimistic UI updates on client, confirmed by server
Several React frameworks support building chat interfaces for AI agents:
For Quorum's team room: The UI is more than chat — it's a spatial team room where agents have presence, boards are visible, and findings can be dragged between agents. This likely requires a custom UI layer built on top of a streaming chat primitive (ChatKit or Tambo for the conversational parts, custom components for boards/grids/spatial elements).
Quorum manages several interconnected data types:
| Data Type | Characteristics | Storage Need |
|---|---|---|
| Vision & features | Structured, versioned, filtered through pillars | Relational (PostgreSQL) |
| Agent conversations | Append-only, streaming, long-running | Event log + conversation store |
| Findings (feedback loop) | Lifecycle states, transitions, audit trail | Event-sourced with projections |
| Decision records | Immutable, rationale + projected impact | Append-only with references |
| Sprint/board state | Mutable, real-time, multi-user | Relational + real-time sync |
| Agent memory | Cross-session context, user preferences | Vector store + relational |
| Documents (briefs, PRDs) | Rich text, versioned, collaborative | Document store or relational |
The finding lifecycle (Detected → Triaged → Batched → Ceremony-Ready → Approved/Deferred/Killed) and decision records are natural fits for event sourcing:
PostgreSQL supports this natively via JSONB payloads, table partitioning, LISTEN/NOTIFY for pub/sub, and partial indexes — no specialized event store needed.
Agents need to remember across sessions — Winston (Dev Lead) learns estimation patterns, Mary remembers user segments, John knows stakeholder preferences.
Layered memory architecture: - Working memory: current session context (conversation history, active findings) — in-memory + Redis - Episodic memory: timestamped interaction history — PostgreSQL with efficient retrieval - Semantic memory: learned facts, preferences, patterns — PostgreSQL + pgvector for similarity search - Procedural memory: learned behaviors (estimation calibration, user proficiency models) — structured storage
pgvector enables semantic search within PostgreSQL itself, eliminating the need for a separate vector database. "What did Winston (Dev Lead) say about payment service complexity last sprint?" becomes a vector similarity query.
Token budgeting is critical: agents can't load all memory into context. The system must intelligently select relevant memories, compress/summarize old context, and respect token limits per model.
PostgreSQL as the primary database with pgvector extension for semantic search. Event sourcing for findings/decisions (append-only tables with JSONB). Standard relational tables for features, sprints, boards, users. Redis for real-time state and pub/sub.
Linear already has native Jira Sync (two-way sync of issues, assignees, status, labels, priority, comments). This validates the pattern and provides a reference implementation.
Architecture for Quorum's enterprise integration:
| Component | Approach |
|---|---|
| Inbound (JIRA/Linear → Quorum) | Webhooks: JIRA/Linear push changes via HTTP POST. Quorum processes and updates internal state. |
| Outbound (Quorum → JIRA/Linear) | REST API: Quorum creates/updates tickets via API when findings are approved for sprint. |
| Conflict resolution | Quorum detects conflicts (e.g., "Ticket moved to Done in JIRA but AI shows tests haven't passed") and surfaces to user. |
| Field mapping | Configurable per tenant: map Quorum finding types to JIRA issue types, statuses, priorities. |
| Sync scope | Only feedback-loop-approved items sync out. Sprint board state syncs bidirectionally. |
Key design decisions: - Webhook processing must be idempotent (JIRA/Linear may retry) - Queue-based processing (don't process webhooks synchronously — buffer in a job queue) - Conflict detection before overwrite — never silently lose data - Per-tenant API credentials stored encrypted, scoped to specific JIRA/Linear projects
MCP is emerging as the standard for AI agent tool integration. Quorum's agents could expose tools via MCP for extensibility — allowing third-party agents or tools to interact with Quorum's data.
| Pattern | Tenants | Isolation | Cost | Fit for Quorum |
|---|---|---|---|---|
| Shared model, isolated context | <100 | Logical (per-tenant prompts/memory) | Low | Solo tier |
| Agent pool | 100–1,000 | Process-level (pre-configured agents per tenant) | Medium | Growth phase |
| Tenant-specific deployment | Enterprise | Complete infrastructure isolation | High | Enterprise tier |
| Hybrid namespace | All | Shared infra, logical data separation | Medium | Default approach |
Recommended: Hybrid namespace as the default, with tenant-specific deployment available for enterprise customers who require it.
Data leakage prevention: 68% of organizations have experienced AI-related data leaks. Every RAG index, vector store query, file access, and agent memory must be scoped to a tenant workspace ID. This is non-negotiable.
Noisy neighbor: One tenant's heavy AI workload (e.g., running full three-pillar analysis on 50 features) can saturate LLM API rate limits and degrade others. Requires per-tenant quotas and rate limiting.
Unbounded agent work: Agents can loop through tool calls or accumulate unbounded token counts. Explicit per-session and per-tenant cost caps required.
Access control complexity: AI agents need five identity layers — trigger identity (who started it), execution identity (what credentials it uses), authorization identity (what it's allowed to do), tenant identity (whose data it accesses), and channel identity (where results go).
Decompose the agent lifecycle into distinct services:
| Service | Responsibility |
|---|---|
| Gateway | API handling, auth, rate limiting, tenant routing |
| Orchestrator | LLM orchestration (LangGraph), agent lifecycle, state management |
| Scheduler | Feedback loop timing, ceremony scheduling, digest generation |
| Memory | Agent memory, conversation history, semantic search |
| Sandbox | Tool execution isolation (JIRA API calls, web research) |
| Board/State | Sprint board, backlog, finding lifecycle, real-time sync |
| Layer | Technology | Rationale |
|---|---|---|
| Frontend | Next.js 16 + React | Largest ecosystem, streaming RSC, real-time capable |
| Conversational UI | Custom + ChatKit/Tambo primitives | Team room is more than chat — needs spatial elements |
| Real-time | WebSockets + Redis Pub/Sub | Sub-50ms agent streaming, horizontal scaling |
| Agent orchestration | LangGraph | State persistence, human-in-the-loop, durable execution |
| LLM providers | Multi-provider (OpenAI + Anthropic + Google) | Tiered model strategy, redundancy, cost optimization |
| Primary database | PostgreSQL + pgvector | Relational + vector search + event sourcing in one |
| Cache/pubsub | Redis | Session state, real-time messaging, agent working memory |
| Job queue | Redis-based (BullMQ) or PostgreSQL (Graphile Worker) | Webhook processing, async agent tasks, digest generation |
| Auth | Clerk or Auth.js | Multi-tenant, team management, SSO for enterprise |
| Deployment | Vercel (frontend) + Railway/Fly.io (services) or AWS | Start simple, scale to dedicated infra |
| Observability | LangSmith (agents) + Sentry (app) + PostHog (analytics) | Agent debugging, error tracking, usage analytics |
| Risk | Severity | Mitigation |
|---|---|---|
| LLM cost unpredictability | High | Tiered model strategy, aggressive caching, per-tenant cost caps, monitoring |
| Agent quality consistency | High | Extensive prompt engineering, evaluation suites, A/B testing agent behaviors |
| Multi-agent debate reliability | Medium | Structured debate protocols (not free-form), bounded turn limits, fallback to single-agent |
| Context window limits | Medium | Summary-based memory management, token budgeting, selective memory retrieval |
| Data leakage between tenants | Critical | Workspace-scoped everything, automated isolation testing, security audits |
| LangGraph vendor dependency | Medium | Abstract orchestration behind internal interfaces, monitor OpenAI Agents SDK maturity |
| Real-time scaling | Medium | Redis Pub/Sub proven pattern, but WebSocket connection management at scale needs planning |
| JIRA/Linear API rate limits | Low | Queue-based sync, batching, respect rate limits, exponential backoff |