CardEx Core
2026

CardEx Core

How Capital One BC&P Turns Four Post-Acquisition Data Silos Into One Context Layer for Field Sales AI

My Role
Product Strategy, Platform Architecture & GenAI Systems Design
Project Timeline
June 2026
Pilot Market
Capital One BC&P Field Sales Organization
Project Stack
Claude
Miro
Notion
Cursor

This is a PM portfolio case study targeting the Capital One BC&P Manager, Product Management — GenAI Transformation role (Req ID R240507). It covers a full double-diamond process — market research through a 20-assumption register — built entirely from public information. The interactive simulator below is a functional prototype of the platform mechanics described in the case study.

The thesis:Capital One's field sales AI tools produce inconsistent recommendations not because the models are wrong — but because each application retrieves customer context independently. CardEx Core is horizontal context infrastructure that solves the data layer, not the model layer.

Live Simulator
CardEx Core Platform Simulator
Four customer scenarios · Platform layer animation · D1–D4 Eval framework · Persistent feedback loop
contextos.capitalone.internal / field-sales-ai / context-platform
4 Customer ProfilesPlatform Layer AnimD1–D4 Eval FrameworkPersistent Feedback
Open simulator ↗
Phase 0
Research Brief
Company snapshot · AI in production · Business urgency · RAG vs fine-tuning · HITL chain · SR 26-2 · A-01–A-07
What Capital One Is Today

Capital One is a cloud-native financial institution with $669B in total assets (Dec 31, 2025) and 100M+ customers — the only major U.S. bank to have migrated entirely to public cloud, closing its last data center in 2021. Competitors (JPMorgan, BofA) are still paying to migrate; Capital One is building higher-order capabilities on top of a stack competitors haven't reached yet.

Total Assets
$669B
Dec 31, 2025
Brex Acquisition
$5.15B
Closed April 7, 2026
SMB Card Franchise
#3 in U.S.
CEO Fairbank, Q4 2025 earnings
EventDateScaleWhy it matters to this role
Discover Financial acquisition2025~$35BCreated proprietary payment network; Capital One now competes directly with Visa/Mastercard
Brex acquisition closedApril 7, 2026$5.15B · 35K+ customersAdded Brex corporate cards + spend management + agentic AI workflows. Brex operating independently post-acquisition.
Top-3 corporate card issuerApril 2026$100B+ combined card spendSame field reps now sell across 4 product lines from 4 separate data ecosystems with no unified customer view
The FSO complexity spike: Before Brex, field reps sold Spark Business cards from one ecosystem. Post-Brex, those same reps navigate Spark cards + Brex corporate cards + Brex spend management + SMB banking — four data sources in four separate systems. A rep pitching a mid-size construction firm today needs to synthesize card history (Capital One CRM), expense management behavior (Brex platform), payment flow patterns (Discover network), and credit profile (Capital One underwriting) — currently living in four systems with no unified view.
Agent Assist Tool · Internal · Customer Service
84% → 93% search relevance

Used 10,000+ times by customer service agents. Built on proprietary Capital One data. Agents use it to search for relevant information in real-time during customer calls. Proves good retrieval drives trust — and bad retrieval destroys it.

Chat Concierge · External · Auto Dealers
55% improvement in leads

Multi-agent system: one agent communicates with customer, one creates action plan based on business rules, one assesses other agents' outputs, one validates and explains the plan. Latency reduced fivefold since launch. Prem Natarajan (EVP, Enterprise AI): “We want to start off at the low end of the risk spectrum, but also find use cases with impact and enough complexity that we can learn from it.”

The gap this role fills: Both Agent Assist and Chat Concierge were built as vertical, domain-specific systems with their own data retrieval layers. The field sales org is the next frontier — requiring synthesis across multiple product lines, not just one vertical. Milind Naphade (SVP, AI Foundations): “We'd like to bring this capability to more of our customer-facing engagements. But we want to do it in a well-managed way.”

The JD is explicitly scoped to building the horizontal foundation that enables multiple downstream AI applications to run on shared, trusted customer context.

MIT NANDA 2025
95%

of enterprise GenAI pilots delivered no measurable P&L impact. Core cause: tools that “do not retain feedback, adapt to context, or improve over time.”

Stanford AI Index 2025
41%

of LLM failures in enterprises caused by upstream data issues — not model problems. The model is often fine. The data infrastructure is broken.

Deloitte Dec 2025
11%

of enterprises with agentic AI use it in production. Only 20% of enterprise AI tools work cross-functionally (McKinsey). The pilot-to-production drop exists because trust fails before scale.

IBM (June 2026): “Many companies operate with fragmented and siloed data environments… Critical business information is often spread across disconnected systems and inconsistent data formats. AI systems struggle in these environments because poor-quality data weakens the performance and reliability of AI models. If enterprise data sources contain gaps or errors, AI agents can make flawed recommendations or run incorrect actions at scale.”

Force 1 · April 7, 2026
Brex tripled product complexity overnight

Without a shared context platform, every new AI application built for the expanded portfolio replicates the same fragmented data retrieval problem. This is not a future risk — it is happening now as Brex integration begins. Each new AI tool adds to the fragmentation rather than solving it.

Force 2 · April 17, 2026
SR 26-2 replaced SR 11-7

New model risk management guidance is live. GenAI/agentic systems are technically “out of scope” but regulators and internal audit are already applying MRM expectations by analogy. A “forthcoming RFI on AI/GenAI/agentic-AI model risk” is signaled. A shared context platform with built-in provenance satisfies governance requirements once for all downstream apps.

Force 3 · Ongoing
AI ROI measurement imperative

JD explicitly states: “deep experience working with Generative AI systems, with a particular emphasis on building measurement for GenAI in production.” Chat Concierge's 55% lead lift exists because they instrumented the measurement. FSO AI needs the same infrastructure. Without it, leadership is flying blind.

Customer data (spend behavior, headcount, payment timing, card utilization) changes weekly or monthly. Fine-tuning would require continuous retraining — operationally unsustainable. More critically, fine-tuned models cannot satisfy SR 26-2 source attribution requirements: if an AI recommendation led a rep to offer a higher credit limit to a business that then defaulted, the bank cannot trace which data drove the recommendation. RAG provides that trace natively.

DimensionRAG ✓Fine-Tuning ✗
Customer data freshnessRetrieves at inference time — always currentBaked into weights at training time — stale within weeks for dynamic customer data
Data update cadenceUpdate the source; model adjusts automaticallyRequires retraining on every material data change
Source attribution (SR 26-2)Every response traceable to retrieved documentsBlack-box — weights don't reveal which training data drove a decision
Cost modelUpfront indexing + modest retrieval costs; scales with query volumeHigh upfront training compute; recurring retraining as customer data evolves
Operational fitIndex update = configuration operationModel retraining = ML engineering sprint
51% of enterprise AI deployments in production use RAG (Menlo Ventures, 2024 State of Generative AI in the Enterprise). RAG is “the default starting point for most enterprise AI deployments particularly where data freshness, regulatory compliance, and speed to production are priorities” (SculptSoft, June 2026).
1
Rep receives AI recommendation
Starting point — generated from whatever context was retrieved by the downstream AI application
2
Rep acts
[Uses it] [Modifies it] [Ignores it] [Flags it as wrong] — four distinct signal types with different confidence levels
3
System captures
recommendation_id + retrieved_context_ids + generated_output + rep_action — all four fields required for a complete eval event
4
Human evaluators (or automated eval) score
Retrieval quality: was the right context retrieved? Generation quality: coherent, accurate, safe? Business outcome (30-day lag): did the pitch succeed? Did the rep follow up?
5
Eval dataset grows
(context, output, score) triples accumulate — the labeled dataset that drives all downstream model improvement
6
Two improvement levers selected by failure mode
L1: Retrieval tuning (embedding weights, chunk size, metadata filters). L2: Generation improvement (prompt engineering, guardrail adjustment, model swap). PM selects which lever based on which failure mode the eval data reveals.
Why the PM owns this, not the data scientist: The PM defines what counts as a “good” recommendation. Is it good if the rep read it (engagement)? Used it (behavior)? Led to an upgrade (30-day outcome)? Customer stayed 12 months (retention)? Each definition produces a different training signal. The data scientist builds the pipeline; the PM decides what the pipeline measures.
IDAssumptionBasisUrgencyValidation · Week
A-01FSO lacks unified customer view across Spark, Brex, Discover data sourcesBrex operating independently post-acquisition; 12–24 month typical integration timelineCriticalArchitecture review with FSO engineering leads, Week 1
A-02No shared context platform exists; each FSO AI app has its own retrieval layerJD language — 'design and build a horizontal foundation' — implies platform doesn't existCriticalProduct landscape review, Week 1
A-03Data siloing is causing measurable adoption friction among field repsIndustry-level evidence (MIT NANDA, Stanford AI Index) applied directionally to Capital One FSOHighContextual inquiry with 6–8 FSRs across two regional offices, Month 1
A-04MRM is applying SR 26-2 principles to field sales GenAI by analogyDocumented pattern at large regulated institutions (Databricks, April 2026)HighMRM team introductory meeting, Week 2
A-05Current tools pass CRM data to LLMs without standardized PII preprocessingJD 'standardize' language implies current state is non-standardMediumArchitecture review of existing app prompt construction, Week 1
A-06RAG is correct architecture for CardEx Core MVP — not fine-tuningData freshness, SR 26-2 attribution, 51% enterprise RAG production adoptionHighData refresh cadence audit per source system, Week 2
A-07Rep behavior signals are primary HITL signal at MVP; outcome signals lagged 2–4 weeks via CRMStandard enterprise AI feedback loop pattern; CRM instrumentation unknownHighCRM instrumentation review, Week 2
Phase 2
Discovery
Problem statement · Current state · 5 Whys · 6 Affinity clusters · Stakeholder map · Personas · Empathy maps · Journey map · 4 Structural failure modes · A-08–A-11
How Capital One's field sales AI program is currently trying to fix the problem
How do we make each field sales AI tool more accurate?
The question CardEx Core is built to answer
How do we ensure that every field sales AI tool reasons from the same version of the customer?
BC&P's field sales AI tools produce inconsistent recommendations because each application retrieves customer context independently, with no shared platform ensuring that every tool reasons from the same version of the same customer at the same point in time. The problem is not that the models are bad. The problem is that there is no shared version of the customer beneath them.
Current State — Fragmented Retrieval (A-08 · Inferred · Not yet validated)

Field Sales Rep
    │
    ├── Lead Scoring AI ──── Spark CRM (Capital One schema)
    ├── Pitch Recommendation AI ──── Transaction History DW (daily batch)
    ├── Credit Suggestion AI ──── Credit Profile (monthly refresh)
    └── [Post-Brex] Spend Insights AI ──── Brex Platform (company-centric schema · batch feed)

Each application: owns its own retrieval pipeline · applies its own preprocessing (or none)
                  has its own freshness cadence · has no shared entity resolution
                  produces no structured output that feeds back to improve recommendations
                  logs nothing in a consistent schema
Brex Platform is NOT currently connected to any of the above AI applications · Brex customers appear in CRM with no Brex context

Starting observation: Field sales AI recommendations are inconsistent and distrusted by reps.

Why 1
Why are the recommendations inconsistent?
Because each AI application retrieves customer context from a different data source with a different freshness cadence and a different customer entity schema.
Why 2
Why does each application retrieve context independently?
Because Capital One's field sales AI was built vertically — one use case at a time, each team owning its full stack — with no horizontal abstraction layer for shared customer context.
Why 3
Why was it built vertically without a shared layer?
Because Capital One's GenAI deployment was in a proof-of-concept phase for its first production deployments (Chat Concierge for auto, Agent Assist for service). Moving fast required each team to own their full stack. No PM was assigned to own the horizontal layer — because the horizontal problem wasn't visible when each vertical was the only one.
Why 4
Why wasn't the horizontal problem visible earlier?
Because each vertical deployment was in a single domain (auto dealerships, customer service) with a single, well-understood data source. The cross-domain consistency problem only emerges when multiple products and multiple data sources must be reconciled for the same customer — which became the FSO's reality when Brex was acquired.
Why 5
Why did the Brex acquisition make the horizontal problem visible and urgent?
Because Brex operates on a different technology stack, a different customer entity model (company-centric, not person-centric), and a different data freshness pattern than Capital One's existing systems. Integrating Brex context into FSO AI requires entity resolution, schema normalization, and freshness reconciliation — exactly the functions a shared context platform provides. Without that platform, the Brex context either never reaches the FSO AI tools or arrives inconsistently, making the fragmentation problem acute rather than latent.
Root cause — structural, not organizational: An organizational explanation would be: teams didn't communicate well enough, or there was no governance process to enforce shared standards. The structural explanation is: the architecture has no abstraction layer for shared context. Even if every team communicated perfectly, they would still build independent retrieval pipelines because there is no shared API to call. The fix requires a platform, not a process.
Cluster 1
Context Fragmentation
The customer exists differently in every system
  • Spark card, Brex platform, Capital One credit, and CRM contact data use different entity schemas with no common key
  • Brex operating independently post-acquisition — data arrives via batch feed, not real-time API
  • 41% of LLM enterprise failures trace to upstream data issues, not model problems (Stanford AI Index 2025)
  • CRM optimized for rep workflow — sparse free-text, milestone-updated, no behavioral enrichment
  • Unresolved entity duplicates + stale records in a raw multi-system prompt produce worse outputs than a well-preprocessed compact summary
Cluster 2
Trust Collapse and Adoption Failure
Reps who got burned once stop using the tools entirely
  • 95% of enterprise GenAI pilots delivered no measurable P&L impact — tools that 'do not retain feedback, adapt to context, or improve over time' (MIT NANDA 2025)
  • Only 20% of enterprise AI tools work cross-functionally (McKinsey, Dec 2025)
  • 'It doesn't retain knowledge of client preferences or learn from previous edits. It repeats the same mistakes.' (CIO, MIT NANDA)
  • Capital One's own evidence: Agent Assist 84% → 93% — good retrieval drives trust, bad retrieval destroys it
Cluster 3
Feedback Loop Absence
The AI tools do not learn. Same errors in perpetuity.
  • No consistent schema for capturing (retrieved context, generated recommendation, rep action, business outcome) — the four elements required for an eval dataset
  • Rep behavior signals (used/modified/ignored/flagged) available immediately; business outcome signals lagged 2–4 weeks — both required for complete eval loop
  • Eval invisibility: without shared logging, structurally impossible to determine whether a bad recommendation was caused by retrieval failure or generation failure
Cluster 4
Regulatory Pressure
The compliance clock started April 17, 2026
  • SR 11-7 replaced by SR 26-2 on April 17, 2026 — a 'forthcoming RFI on AI/GenAI/agentic-AI model risk' is signaled
  • GenAI systems technically 'out of scope' in SR 26-2 text but supervisors and internal audit 'already applying MRM expectations by analogy' (Databricks, April 2026)
  • SR 26-2 requires: inventory tiered by materiality, controls applied proportionately, lifecycle defensible end-to-end, evidence of governance generated automatically
  • PII exposure: raw CRM records (customer name, account number, EIN) flowing directly into LLM prompts create both regulatory and safety risk
Cluster 5
Post-Acquisition Complexity Spike
Brex turned a linear problem into an exponential one
  • Before Brex: FSO sells Spark card products from one data ecosystem. After Brex (April 7, 2026): Spark + Brex corporate cards + Brex spend management + SMB banking from three separate ecosystems
  • Brex entity model is company-centric (built for CFO teams) vs. Capital One's person-centric model — entity resolution is non-trivial
  • Ramp explicitly framing acquisition as creating 'uncertainty about product direction, pricing, underwriting, and integration' — competitive pressure from a unified-data competitor is immediate
Cluster 6
Measurement Void
Leadership cannot tell whether the AI investment is working
  • No north star metric for recommendation quality — each team defines 'good' differently: engagement, adoption, conversion, retention
  • Capital One can point to Chat Concierge's 55% lead lift (Fortune, Dec 2025) — but cannot point to an equivalent FSO AI metric
  • Without a shared logging layer, structurally impossible to determine whether a bad recommendation was caused by retrieval failure or generation failure — two distinct problems that get conflated and neither gets fixed
  • 'More than half of generative AI budgets devoted to sales and marketing tools, yet MIT found the biggest ROI in back-office automation' — without measurement, AI budgets are allocated on intuition, not evidence
Ring 1 · Internal Platform Users
StakeholderPrimary PainWhat They Need from CardEx Core
Field Sales Reps (FSRs)Contradictory recommendations damage credibility with customersRecommendations they can trust enough to act on without independent verification
Sales Managers / Regional DirectorsNo visibility into recommendation quality across team; AI impact on pipeline invisibleTeam-level dashboard: adoption rate, accuracy trend, outcome correlation
Field Sales AI Product TeamBuilds retrieval infrastructure from scratch for every new AI applicationStable, documented Context API; new apps integrate in days, not months
Data Science / ML EngineersEval datasets built ad hoc; no systematic capture of production input-output pairsStructured (context, output, score) triples automatically generated from every recommendation event
Data Governance / PrivacyRaw CRM records flowing into LLM prompts without sanitizationPII preprocessing layer upstream of all LLM calls; single point of compliance control
Ring 2 · Downstream Users (experience the output)
StakeholderConnection to CardEx CoreWhat Platform Failure Looks Like
Small Business Owners (Spark)Subject of recommendations; receive pitches shaped by AI outputRep calls with wrong product offer because AI reasoned from stale or incomplete context
Mid-Market Corporate Customers (Brex)Newly in-scope post-acquisition; different financial profile than Spark customersRep has no understanding of Brex spend patterns; pitch defaults to generic card offer
Startup Founders (Brex)High-velocity customers; context changes rapidly with funding rounds and headcount spikesAI recommendation lags 6–8 weeks behind actual company state; rep pitches as if company is still in seed stage
Ring 3 · Platform Stakeholders (constrain or enable)
StakeholderConstraintEnablement
Model Risk ManagementSR 26-2 principles: source attribution, documentation, independent validation, ongoing monitoringIf CardEx Core satisfies MRM requirements, it satisfies them for all downstream apps simultaneously
Brex EngineeringBrex data arrives via batch/API, not direct DB access; entity model differs from Capital One'sBrex AI-native architecture (agentic workflows, expense automation) can enrich context if properly integrated
Capital One Cloud / InfrastructureAll data flows must comply with Capital One's cloud security architectureCloud-native infrastructure means context platform can be built on existing tech stack without new procurement
Enterprise AI (Natarajan's org)Field sales AI must align with enterprise AI strategy (open-weight models, proprietary data)Provides model infrastructure and AI governance patterns already in use for Chat Concierge and Agent Assist
Internal · Primary User
Maya Chen
Senior Field Sales Representative
“The tool would be useful if I could trust it. But I can't trust it, so I use it as a starting point and then verify everything it says. That's not useful. That's just extra steps.”
Context: 6 years at Capital One; 40+ SMB accounts across Pacific Northwest; asked to sell Brex products since April 2026. Was a top performer before AI tools arrived — knew her customers through manual research. Now has 40% more accounts and is expected to use AI to compensate for lost research time.

What she does: Opens the AI tool, skims the recommendation, checks 2–3 things manually, then decides whether to use it. This takes longer than just doing the research manually. She is net negative on the AI tool's time savings.

Pain points: Recommendations reference products the customer already has (stale context) · Pitch suggestions don't reflect business changes · After Brex acquisition: no idea what context the AI has on inherited Brex customers
Needs from CardEx Core: Single customer summary reflecting most recent state across all data sources · Freshness indicator: “Context last updated 3 days ago” · Confidence signal: High confidence (multiple recent signals) vs. Low confidence (old data, sparse signals)
Internal · Adoption Lever
David Torres
Regional Sales Director
“The problem isn't the idea. The idea is right. The problem is the data. My reps know this business. They can tell when the AI is wrong. But they don't have time to verify everything, so they ignore it.”
Context: 12 reps across Texas and Oklahoma; responsible for $180M in annual portfolio value. Pushed team to adopt tools at launch. Two reps had embarrassing customer interactions based on wrong recommendations. Usage dropped sharply. Now tells team: “use the tools as a starting point, but always verify.”

What he does: Reviews AI adoption metrics monthly; doesn't look at recommendation accuracy because it isn't reported. Attributes good quarter performance to rep skill, bad quarter to market conditions — AI impact is invisible to him.
Needs from CardEx Core: Team-level view showing recommendation quality trend, adoption rate by rep, outcome correlation · Leadership-ready evidence the AI investment is working (he is regularly asked this question and cannot answer it)
Downstream · Primary Consumer
James Okafor
Small Business Owner · Atlanta
“They called me last month about a cash rewards card. I already have one. And I've been using Brex for my team expenses for two years. Do they not know that?”
Context: Owns a restaurant equipment supply company; 38 employees. Capital One Spark Cash Plus card since 2021. Grew from 12 to 38 employees 2022–2025; moved to partial Brex spend management in Q3 2024. Been pitched on product upgrades three times — all three referenced his 2022–2023 spend profile, not his current situation. Never offered a Brex product, which is the product he actually uses.

What he does: Takes the call. Politely declines. Continues evaluating Ramp. Mentions to a founder friend that “Capital One doesn't really know what your business needs.”
Platform success: Rep opens with “I see your team has grown significantly — are you finding the current card limits still fit your volume?” · Rep knows he uses Brex and asks whether the combined Capital One + Brex solution would simplify his operations · He doesn't have to explain his business situation from scratch
Edge Case · Adoption Floor
“Marcus”
The Burned Skeptic
“I tried it for three months. Two customer calls went wrong. I tell the new reps: don't rely on the AI.”
Why this persona matters: 8 years at Capital One; known as the “old school” rep in his office. Used the AI recommendation tool for 3 months at launch; had two customer interactions go wrong based on stale recommendations. Now actively discourages adoption — contagious behavior pattern that represents the adoption floor.

His trust cannot be rebuilt by improving the AI slightly. It requires a fundamentally different experience: a recommendation he did not expect that turned out to be correct, demonstrated via an outcome he cares about.
CardEx Core must: Produce ≥1 visibly correct recommendation in a high-stakes situation · Show freshness indicator distinguishing “data from last week” vs. “guessing based on 2023 data” · Not require him to change his workflow
Maya Chen · Field Sales Rep · Internal User
THINKS
Is this recommendation based on current data or old data? I have no way to know. · My manager will ask why my adoption score is low. I don't want to explain the tool is unreliable. · The Brex customers I inherited after the acquisition — I have no idea what the AI knows about them.
FEELS
Frustrated that the tool adds steps instead of removing them. Mildly anxious before customer calls where she didn't have time to verify the AI recommendation. Proud of her reputation for knowing customers — threatened by a tool that might undermine that reputation.
SAYS
'I trust my own research more than the tool.' · 'The tool would be useful if it was accurate.' · 'I use it as a starting point.' (She says this; she actually skips it most of the time.)
DOES
Opens the tool, skims quickly, checks 2–3 facts manually before acting. On busy days, skips the tool entirely. Never flags a wrong recommendation formally — doesn't know how and doesn't think it helps. Doesn't log pitch outcomes consistently.
PAIN POINTS
Stale recommendations · No confidence signal · No way to know what the AI knows about Brex customers · Formal feedback mechanism is unclear or absent
GAINS
If CardEx Core works: walks into every meeting knowing the rep has the most current customer picture without spending 20 minutes on manual research. Recommendation accuracy high enough that she can act on first read on routine accounts.
James Okafor · Small Business Owner · Downstream Consumer
THINKS
Do they actually know my business, or is this a generic call? · They have all my transaction history. Why are they pitching me something I already have? · Ramp's rep knew things about my business without me telling them.
FEELS
Undervalued as a customer. Mildly annoyed at the wasted call. Not hostile — he likes Capital One as a bank — but increasingly open to switching spend management to a vendor who seems to understand him.
SAYS
Politely declines the offer. 'Thanks, I'll think about it.' Does not give negative feedback directly.
DOES
Takes the call, declines, logs no complaint. Continues evaluating Ramp. Mentions to a founder friend that 'Capital One doesn't really know what your business needs, they just sell cards.'
PAIN POINTS
Irrelevant pitches · Being treated as a generic SMB rather than a specific business at a specific stage · Having to explain his business situation from scratch every time
GAINS
If CardEx Core works: rep opens the call with a question that demonstrates understanding. James doesn't have to explain his growth. The offer is relevant. The conversation is short and productive. He recommends Capital One to other founders.

Scenario: Maya has a pitch meeting in 3 hours with a Brex customer she inherited post-acquisition. She has not met this customer before.

StageRep ActionCurrent System StateDNF Risk
1. Lead surfacesReceives notification in CRMCRM shows meeting; AI tool not yet opened
2. Context gatheringOpens AI recommendation toolTool retrieves from Spark data (complete). Brex data: partial batch, 6 weeks old.DNF-1: If rep trusts the recommendation without knowing the Brex lag, she walks in with an outdated picture. Customer notices.
3. Recommendation reviewReads AI recommendationReferences Q1 spend volume; customer's Q2 volume is 40% higherDNF-2: No freshness indicator means rep cannot assess confidence. Trusts blindly or verifies everything — both suboptimal.
4. Manual verification20 min manually pulling transaction dataFinds Q2 data showing 40% volume increaseDNF-3: 20 min manual work × 5 meetings/week = 100 min/week of context work that should be automated.
5. Pitch executionCalls customer with updated pitch based on manual researchAI tool is not tracking the callDNF-4: AI tool has no record that its recommendation was wrong. Will make the same stale recommendation for the next rep who covers this customer.
6. Outcome loggingShould log pitch outcome in CRMNo structured field for “AI recommendation quality rating”DNF-5: Without structured outcome logging tied to recommendation IDs, the eval dataset never grows. AI tool never improves.
7. Recommendation improvementNo feedback flows back to recommendation engineDNF-6: As portfolios evolve, staleness gap widens over time. Problem gets worse, not better, without active feedback.
TouchpointWhat HappensJames's ExperiencePlatform Failure Signature
Incoming call from repRep calls to offer a pitch“Another card pitch” — low expectations based on prior callsStale context → wrong product offer
First 60 secondsRep opens with product offerIf offer references something he already has: friction. If offer references his actual situation: conversation.Platform quality is experienced here — James has no visibility into the AI, only the outcome
Product discussionRep and James discuss the offerIf rep seems to know his business: trust builds. If generic: James disengages politely.Context accuracy determines rep's ability to engage authentically
Decision pointJames decides to engage further or declineDeclines without explanation if pitch is irrelevantChurn signal not captured as a platform failure — attributed to “market conditions”
6 months laterJames evaluates whether to move spend management to RampRamp rep called with a pitch that reflected his actual Q2 volumeCompetitive loss partially attributable to context accuracy gap at prior Capital One touchpoint
The small business owner never knows CardEx Core exists. His experience of platform quality is entirely mediated through the rep. A platform failure registers as “Capital One doesn't understand my business” — not as a technology problem. Customer satisfaction surveys will not surface CardEx Core as the failure point. Churn will be attributed to pricing or product. The business case for CardEx Core must be built on leading indicators (rep adoption, recommendation accuracy), not lagging ones (customer satisfaction), because the lagging signal is too noisy and too slow.
1
Temporal Inconsistency
Different systems update at different cadences: Capital One CRM (rep-driven, updated at deal milestones — weekly at best), card transaction data (daily batch), credit profile (monthly review cycle), Brex platform data (real-time in Brex's system; weekly batch feed to Capital One post-acquisition). A recommendation generated from these four sources simultaneously reflects four different as-of dates for the same customer. The customer who 'exists' in the AI's context is a temporal composite — accurate on some dimensions and wrong on others, with no signal to the model about which dimensions are current.
CardEx Core design implication: Every context object must carry a freshness timestamp. The retrieval layer must surface the staleness distribution before passing context to the model. The model should be prompted with the as-of dates explicitly, not assume all context is current.
2
Schema Mismatch and Entity Resolution
Capital One's Spark card data is person-centric: the fundamental entity is a cardholder (individual or sole proprietor). Brex's data model is company-centric: the fundamental entity is a company, under which multiple employees and spend categories exist. When a field rep's AI tool tries to retrieve 'everything about Maria's flooring company,' it must match: Maria (person) in Capital One Spark → card number → CRM contact; Maria's company (entity) in Brex → company ID → spend categories; Maria (person) in Capital One credit → SSN/EIN. Without a shared resolution layer, each application team builds their own matching logic — and each builds it differently, producing different 'Maria' records for different AI tools.
CardEx Core design implication: A canonical customer entity — a golden record that resolves person-level and company-level identities across all source systems — must be the foundation. Entity resolution is a prerequisite to everything else. Without it, freshness improvements don't help because each application is still pulling data for a different entity.
3
Context Window Pollution
Without preprocessing, passing raw multi-system data into the context window means: redundant fields (customer's address appears 4 times, once per source system, consuming tokens without adding information), outdated records (a 2022 credit inquiry that is no longer relevant), unresolved entity duplicates ('Maria Lopez' and 'Maria L.' appear as separate records, confusing the model), and noise (free-text rep notes adding tokens without structured signal). A well-preprocessed context (structured customer summary: current products, recent spend trends, Brex category breakdown, credit utilization, rep interaction history from last 90 days) will produce more relevant recommendations. Sanjiv Yajnik, Capital One President of Financial Services: 'You can't just throw generic data into it, nor data that hasn't been properly cleaned.'
CardEx Core design implication: A preprocessing and summarization layer — not just a retrieval layer — is required. Raw records from source systems should never flow directly to the model. CardEx Core is responsible for: entity-resolved → normalized schema → PII-sanitized → freshness-tagged → relevance-scored → summarized to a structured context object.
4
Eval Invisibility
Without a shared logging layer, there is no consistent schema for capturing what happened after a recommendation was generated. It is structurally impossible to answer: Was a bad recommendation caused by retrieval failure (wrong context retrieved) or generation failure (right context, wrong synthesis)? Did a rep's modification of a recommendation improve the outcome, or would the original have been equally successful? Which customer segments produce systematically poor recommendations? These questions cannot be answered if input-output pairs are not logged in a consistent, queryable format. Each application team logging independently creates its own schema — no cross-application analysis is possible.
CardEx Core design implication: The shared logging schema is as important as the shared context schema. Every recommendation event must log: customer entity ID, context object IDs retrieved (with freshness), generated recommendation text, rep action (used/modified/ignored/flagged with reason), and business outcome when available.
IDAssumptionBasisUrgency
A-08Current-state architecture (inferred) not validated against Capital One's actual production environmentConstructed from public information about Capital One's AI deployments and acquisition context; not validated against internal systemsCritical
A-09Root cause is structural — missing platform abstraction layer — not organizational (siloed incentives or poor communication)Inferred from architecture pattern. Organizational explanation not ruled out — may be both. If wrong, Concept F (governance-first, no platform) is the right solution.Critical
A-10Customer churn in BC&P's SMB segment is partially attributable to irrelevant pitch experiences caused by stale or fragmented contextDirectional inference from James Okafor persona and standard churn analysis limitations; causal path plausible but not confirmedMedium
A-11No canonical customer entity currently resolves identity across Capital One Spark, Brex company, and Capital One credit schemasBrex operating independently post-acquisition; entity resolution at this scale (35,000+ Brex companies) is significant engineering unlikely completed in 3 months since April 7, 2026Critical
Phase 3
Definition
Wrong/right question · 4-layer architecture · RAFT pattern · Context API schema · Coherence map · MoSCoW P0/P1/P2/Won't Have · A-12–A-15
The question the AI program is currently trying to answer
How do we make each field sales AI tool more accurate?
The question CardEx Core is built to answer
How do we ensure that every field sales AI tool reasons from the same version of the customer?
The reframe: The instinct in any AI program facing quality problems is to improve the model. That instinct is correct when the problem is model quality. It is incorrect — and expensive — when the problem is data infrastructure. The six affinity clusters from Phase 2 make clear that all failure modes (temporal inconsistency, schema mismatch, context window pollution, eval invisibility, regulatory exposure) are data infrastructure failures. No model improvement addresses any of them.

The recommendation is not the product. The context is the product. The recommendations are outputs of that product. When the platform improves (more Brex data integrated, faster freshness cadence), every downstream tool inherits the improvement automatically — without individual model work.

The CardEx Core platform has four components that must be built in a specific order. Each layer is a structural prerequisite for the next. Building out of sequence produces a system that appears to work in demo conditions and fails in production.

Layer 1
Canonical Customer Entity — Entity Resolution
Resolves person-centric Capital One records with company-centric Brex records into a single canonical entity — a golden record — with a confidence score and a human-review queue for low-confidence matches.
Why first: Every other layer processes data about a customer. If identity is not resolved, freshness normalization runs on the wrong entity; the context API serves conflated records; feedback logging attributes outcomes to the wrong profile. A bug in Layer 1 degrades all layers above it. Requires: deterministic matching on SSN/EIN as primary key; probabilistic confidence score for fuzzy fallback; human-review queue for low-confidence matches on high-value accounts.
Layer 2
Freshness Normalization
Assigns a freshness timestamp and staleness classification to every data field in the canonical entity. Establishes which fields update real-time, daily, and monthly — and surfaces this metadata alongside the data itself.
Why second: Freshness normalization is meaningless if applied to unresolved entity records. A freshness timestamp on “Flooring Co Inc (Brex)” is only useful if the consuming layer knows that entity is the same as “Maria Lopez (Spark card 4111-xxxx).” Requires field-level metadata schema: source system, last-updated timestamp, update cadence classification (real-time / daily / weekly / monthly), and a staleness flag (current / aging / stale) computed against configurable thresholds per field type. Card transaction data stales in 48 hours. Credit profile stales in 30 days. CRM free-text notes stale in 7 days.
Layer 3
PII Preprocessing + Context API
Exposes the canonical entity + freshness metadata as a standardized, stateless API. Returns a structured context object — entity-resolved, freshness-tagged, PII-sanitized, relevance-summarized — optimized for LLM consumption. Includes retrieval_ids on every response for SR 26-2 audit trail.
Why third: A context API built before freshness normalization serves data without staleness signals — consuming applications have no way to know whether the context is current or aging. PII preprocessing at the API boundary solves SR 26-2 compliance once for all downstream applications simultaneously. API is stateless (not cached) to avoid reintroducing staleness. Critical design: Stateless retrieval on every call; PII tokenized before leaving the API layer; structured summary, not raw records; retrieval_ids field native in every response.
Layer 4
Feedback Logging
Captures every recommendation event as a structured log entry: which context objects were retrieved (from the API's retrieval_ids), what was generated, what the rep did (used/modified/ignored/flagged), and what the business outcome was (when available from CRM).
Why fourth: The logging layer captures retrieval_ids from the API response. Without the API's standardized response structure, there is no consistent schema for logging — each application logs in its own format, and cross-application analysis is impossible. Decoupled from the recommendation engine so model swaps don't disrupt the audit trail, and the data science team can consume eval datasets without touching the context platform.

CardEx Core is shared infrastructure. It provides a retrieval substrate — not domain-specific model behavior. A shared context platform cannot be fine-tuned for specific use cases without specializing the platform and losing its horizontal value. If CardEx Core's retrieval layer is fine-tuned for lead scoring, it becomes the lead scoring platform — and the pitch recommendation application has to build its own retrieval again.

RAG at the platform layer solves this: CardEx Core retrieves context that is domain-agnostic. The lead scoring application takes that context and applies its own domain-specific reasoning. The pitch recommendation application does the same. Each application can be fine-tuned for its specific use case — using the same CardEx Core context as input.

The RAFT pattern: RAG provides freshness and attribution at the retrieval layer; fine-tuning provides behavioral consistency at the application layer. The platform holds the RAG layer centrally. Applications own their fine-tuning.

CardEx Core (RAG · shared · entity-resolved · PII-sanitized · freshness-tagged)
    │
    ├── Lead Scoring App (fine-tuned for scoring logic)
    ├── Pitch Recommendation App (fine-tuned for sales synthesis)
    └── Credit Suggestion App (fine-tuned for underwriting reasoning)
RAFT pattern: retrieval shared horizontally · behavioral fine-tuning scoped to each application domain
Context API Response ObjectEvery field is SR 26-2 attributable · PII sanitized · freshness-tagged
{
  "request_id": "CTX-REQ-20260627-142301",
  "entity_token": "ENT-44821-COF",          // entity token, not raw PII
  "entity_confidence": 0.94,
  "as_of_summary": {
    "freshest_signal": "2026-06-27T14:30:00Z",
    "stalest_signal":  "2026-06-01T00:00:00Z",
    "staleness_distribution": {
      "current": ["transaction_data", "crm_contact"],
      "aging":   ["brex_spend"],              // 7 days old — Brex weekly batch
      "stale":   []
    }
  },
  "context_summary": {
    "current_products":    ["Spark_Cash_Plus", "Brex_Corporate"],
    "do_not_recommend":    ["Spark_Cash_Plus"],  // PM-owned constraint — prevents re-pitching
    "spend_trend_90d":     "increasing_40pct",
    "brex_monthly_volume_q2": "$240K",
    "credit_utilization":  "84%",
    "headcount_signal":    "growing_plus8_cards_q2",
    "last_pitch_outcome":  "declined_march_2026_upgrade_pitch",
    "upgrade_indicators":  ["volume_increase", "headcount_growth", "high_utilization"],
    "suggested_context_for_pitch": "Customer has outgrown current credit limit; Brex
      spend growing; headcount expansion signals business growth phase.
      Brex Premium + credit limit increase to $400K–$500K is the indicated direction."
  },
  "retrieval_ids": [                          // SR 26-2 audit trail
    "TXN-batch-20260627",
    "CRD-20260601",
    "BRX-batch-20260621",
    "CRM-20260614",
    "ENT-resolve-20260601"
  ],
  "prompt_version": "ctx-prompt-v2.3",
  "pii_sanitized": true,
  "compliance_flags": {
    "sr_26_2_attributable": true,
    "pii_in_output":        false,
    "data_minimization_applied": true
  }
}
The do_not_recommendfield is prompt engineering at the schema level. The PM defines that the context object must always surface products the customer already holds as a negative constraint — not just positive signals. This is the structural fix for the “pitching a product they already have” failure mode documented in Phase 2.
Business Line
Spark Business
SMB Cards — third-largest small business credit card franchise
Core Problem
Field reps pitch the wrong Spark product because AI tools reason from stale transaction data; customers with growing spend are offered starter products; customers who upgraded are re-pitched their existing product
Consumer Need
SMB owner needs a rep who understands their business stage and recommends a product that fits current, not historical, spend volume
Business Need
Capital One needs Spark card upgrade conversion and spend volume growth; a rep pitching the right product at the right time is the primary conversion driver
CardEx Core Component
Entity resolution (Spark card → canonical entity) + Context API (transaction data freshness ≤ 48 hours) + Freshness metadata (staleness flag for spend data)
Business Line
Brex Corporate
Mid-Market and Corporate — 35,000+ business customers post-acquisition
Core Problem
Brex customer data exists in Brex's company-centric schema; Capital One's FSO AI tools cannot currently ingest Brex context because entity resolution and schema normalization have not been completed since April 7, 2026
Consumer Need
Brex corporate customer needs a Capital One rep who knows their Brex spend behavior, headcount trajectory, and expense policy structure — not just their credit profile
Business Need
Capital One needs to cross-sell Brex customers into Capital One banking products and Spark cards; this cross-sell cannot happen without unified customer context
CardEx Core Component
Entity resolution (Brex company ID → canonical entity via EIN matching) + Freshness normalization (Brex batch cadence: weekly → staleness flag: 'aging' after 8 days) + Context API (Brex fields: monthly volume, headcount signal, expense categories)
Business Line
SMB Banking
Deposits + Integrated Financial Services — the combined Capital One + Brex proposition
Core Problem
SMB banking is the highest-value product in the portfolio (stickiness, deposit growth, cross-product engagement) — but the FSO AI has no integrated view showing when a Spark card customer is ready to be pitched a full banking relationship
Consumer Need
Business owner at a growth inflection point (headcount +10, spend volume 40% up) needs a proactive reach-out with a banking relationship offer, not a reactive card pitch
Business Need
Capital One needs deposits and full-relationship SMB customers; the Brex acquisition was explicitly described as 'expanding Capital One's small business bank nationally' — this is the strategic growth thesis
CardEx Core Component
Cross-signal synthesis: spend volume trend (Spark transactions) + headcount growth (Brex card issuance signals) + credit utilization → composite 'banking relationship readiness' score surfaced in context object
P0 — Must Have · Cannot ship without these
P0-1
Canonical Customer Entity
CardEx Core must resolve customer identity across Capital One Spark, Brex, and Capital One credit schemas into a single canonical entity with a confidence score and a human-review queue for low-confidence matches.
Traces to: Cluster 1 (Context Fragmentation) · A-11 (entity resolution gap) · Failure Mode 2 (Schema Mismatch) · JTBD-4 (new product coverage on acquisition)
P0-2
Field-Level Freshness Metadata
Every field in the canonical customer entity must carry: source system, last-updated timestamp, update cadence classification, and a staleness flag computed against configurable thresholds per field type. Staleness flags must be surfaced in the context API response and in the rep-facing UI.
Traces to: Cluster 2 (Trust Collapse) · JTBD-2 (confidence calibration) · DNF Risk 2 (no confidence signal) · Failure Mode 1 (Temporal Inconsistency)
P0-3
PII Preprocessing at API Boundary
All PII (customer full name, SSN/EIN, account number, individual transaction detail) must be redacted or tokenized before the context object is returned by the API. Downstream applications must receive entity tokens, not raw PII.
Traces to: Cluster 4 (Regulatory Pressure) · A-04 (MRM applying SR 26-2) · A-05 (current tools lack standardized preprocessing) · JTBD-6 (compliance-ready audit trail)
P0-4
Context API with Source Attribution
CardEx Core must expose a stateless API returning the structured context object (canonical entity + freshness metadata + PII-sanitized summary) with a retrieval_ids field that identifies every source record contributing to the response. API must support all current FSO AI applications as consuming clients.
Traces to: Cluster 4 (Regulatory Pressure) · JTBD-6 (audit trail) · Layer 3 Context API design · A-06 (RAG architecture decision)
P0-5
Recommendation Event Logging
Every recommendation generated by a downstream FSO AI application that consumed CardEx Core context must log a structured event: entity token, retrieval IDs, recommendation summary, and rep action (used/modified/ignored/flagged). Logging must occur automatically without rep manual action.
Traces to: Cluster 3 (Feedback Loop Absence) · Cluster 6 (Measurement Void) · JTBD-3 (feedback without friction) · DNF Risks 4 and 5
P0-6
Rep-Facing Freshness Indicator
The freshness metadata must surface in the rep-facing UI — within existing CRM or sales tool — as a human-readable signal: 'Context last updated: [X days ago]' with a visual staleness indicator (green / yellow / red) per data source. Rep must be able to see this before acting on a recommendation.
Traces to: Cluster 2 (Trust Collapse) · DNF Risk 2 · JTBD-2 (confidence calibration) · Maya persona ('is this based on current data?')
P1 — Should Have · High value, blocked by specific dependency
P1-1
One-Tap Rep Feedback Mechanism
Reps must be able to flag a recommendation as 'wrong product,' 'stale data,' 'customer already has this,' or 'used without issue' in a single tap within the existing sales workflow. Flags must append to the recommendation event log automatically.
Traces to: JTBD-3 · Cluster 3 · Marcus persona (burned skeptic requires explicit flagging)
Blocked by: P0-5 (Recommendation Event Logging) — flag must append to an existing log entry
P1-2
Automated Eval Pipeline
CardEx Core must automatically score recommendation quality using: passive signals (modified vs. used without change), explicit flags (wrong product / stale data), and lagged outcome correlation (pitch converted / declined, logged in CRM 2–4 weeks post-pitch). Scores must produce labeled (context, recommendation, score) triples for model team consumption.
Traces to: Cluster 3 · Cluster 6 · JD requirement ('building measurement for GenAI in production')
Blocked by: P1-1 (explicit flags) + sufficient log volume (minimum 500 labeled events for meaningful signal)
P1-3
Manager Performance Dashboard
A team-level view showing: recommendation adoption rate by rep, staleness flag distribution, accuracy trend over time (flagged as wrong / used without modification), and lagged outcome correlation (AI-assisted vs. non-AI-assisted pitch conversion rates).
Traces to: David Torres persona · JTBD-5 (team performance visibility) · Cluster 6 (Measurement Void)
Blocked by: Sufficient log volume and eval pipeline output — dashboard is meaningless without the data it visualizes
P1-4
Brex Real-Time API Integration
Replace the weekly Brex batch feed with a real-time API connection to Brex's platform, reducing Brex data staleness from 8 days to ≤ 4 hours.
Traces to: A-13 (Brex batch cadence) · Cluster 5 (Post-Acquisition Complexity) · JTBD-4
Blocked by: Brex integration engineering capacity and API design on Brex's side — dependency not fully under Capital One's control
P2 — Could Have · Phase 2 roadmap
P2-1
Composite Signal Scores
Pre-computed scores surfaced in context object: 'Banking Relationship Readiness' (composite of spend trend + headcount growth + credit utilization), 'Upgrade Propensity' (card upgrade likelihood based on spend trajectory), 'Churn Risk' (declining engagement signals). Why P2: requires 12+ months of historical data to compute reliably; premature before the data foundation is stable.
P2-2
Recommendation Confidence Score
A model-calibrated confidence score (High / Medium / Low) per recommendation, surfaced to the rep alongside the freshness indicator. Why P2: requires the eval pipeline (P1-2) to be live and producing reliable accuracy scores before confidence calibration is meaningful. A confidence score based on insufficient data is misleading — worse than no score.
P2-3
Self-Service Data Source Expansion
A configuration interface allowing the Field Sales AI Product Team to add new data sources to the canonical entity schema without requiring CardEx Core engineering involvement for each addition. Why P2: build the first three sources (Spark, Brex, Credit) correctly before making onboarding self-serve. Premature abstraction.
Won't Have — Explicit Scope Boundaries
Won't HaveWhy This Boundary Matters
CardEx Core will not generate recommendationsCardEx Core provides context. Downstream AI applications generate recommendations. If CardEx Core generates recommendations, it becomes domain-specific and loses its value as shared infrastructure.
CardEx Core will not write to the CRMWrite access to CRM creates cascading data integrity risk. A bug in CardEx Core could corrupt the rep's contact history for every account. Read-only access contains the failure mode.
CardEx Core will not train modelsCardEx Core produces labeled eval datasets. Data science team consumes those datasets and uses them to retrain or prompt-tune downstream models. Conflating the context platform with model training creates an org ownership problem.
CardEx Core will not be customer-facingCustomer-facing interface requires consumer-grade UX, compliance review, and a fundamentally different threat model. CardEx Core is B2B internal tool only.
CardEx Core will not replace existing AI applicationsCardEx Core is additive infrastructure. The lead scoring model, pitch recommendation model, and credit suggestion model continue to exist. CardEx Core improves the data they reason from — it does not replace them.
IDAssumptionBasisUrgency
A-12Deterministic entity resolution match rate of ~70–85% is achievable using EIN as the primary key for Brex company-to-Capital One business account matchingStandard financial services B2B entity resolution benchmarks; EIN is the most reliable anchor; sole proprietors using personal SSNs complicate the matchHigh
A-13Brex data will be available to CardEx Core via weekly batch feed for first 12 months post-acquisition; real-time API access estimated 12–18 months outBrex operating independently; real-time integration at this scale requires a purpose-built integration layer that does not exist at acquisition closeCritical
A-14Context API P50 latency of ≤ 200ms is achievable without caching, using Capital One's cloud-native infrastructureCapital One completed full cloud migration in 2021; 200ms is standard for enterprise internal APIsMedium
A-15Downstream FSO AI applications can adopt the Context API without requiring a full application rebuildStandard enterprise architecture assumption; tightly-coupled monoliths where retrieval is embedded in model code would require partial rebuildsHigh
Phase 4
Ideation
User segments S1–S5 · SCAMPER 7 lenses · Concepts A–G evaluations · Pugh matrix (10×7) · Effort × Impact · Prune the Tree · Selected direction + 3 trade-offs

Segments are defined before concepts because segments constrain which concepts make sense. Any concept that does not serve S1 and S2 simultaneously is not viable at MVP — those two represent the majority of FSO headcount and the most acute post-Brex pain.

SegmentDescriptionCurrent FrictionWhat “Platform Working” Looks Like
S1 · High-Volume SMB Reps40+ Spark accounts; 5–8 pitch meetings/week; relies on AI for research throughputToo many accounts to research manually; AI recommendations are stale or wrong; trust has collapsedRecommendations right 80%+ of the time on first read; manual verification is the exception, not the rule
S2 · Cross-Sell Reps (Brex-Inherited)Inherited Brex accounts post-acquisition; selling Capital One products into a customer base they've never metBrex context entirely absent from their tools; pitching Capital One products blind into Brex customersBrex spend patterns, headcount signals, and expense behavior surfaced alongside Spark card data in a single context view
S3 · Strategic Account Managers10–15 high-value accounts; knows customers deeply through personal relationshipsDon't need AI for basic research; need AI to surface signals they would miss at scale (spend spikes, utilization changes)Proactive alerts: “Account X hit 92% credit utilization — timely moment for a limit conversation”
S4 · Sales ManagersTeam oversight; 10–15 direct reps; responsible for regional portfolio targetsNo visibility into recommendation quality across team; AI impact on pipeline invisibleTeam-level dashboard showing adoption, accuracy trend, and outcome correlation
S5 · Field Sales AI Product TeamBuilds and maintains downstream AI applications that serve S1–S3Builds retrieval infrastructure from scratch per application; no shared API to callStable, documented Context API with a semantic versioning commitment; new applications integrate in days, not months
SSubstitute
What if we substituted the centralized context platform with a decentralized schema standard?
Instead of a shared context API, define a shared context schema — a data contract that specifies what fields every FSO AI application must retrieve, in what format, with what freshness metadata. Each application team implements their own retrieval but all conform to the schema. This separates the governance problem from the infrastructure problem. May be solvable faster at 60–70% of the benefit at 20% of the cost.
→ Concept B (Schema-First Federated)
CCombine
What if we combined the context retrieval layer with the recommendation engine itself?
Instead of a context platform feeding separate downstream apps, build one unified FSO AI assistant that handles retrieval, synthesis, and recommendation in a single model. Eliminates the integration problem entirely and creates a cleaner rep experience (one tool, not three). The risk: monolithic architecture is harder to improve in targeted ways — you can't fix retrieval without touching the recommendation.
→ Concept C (Unified FSO AI Agent)
AAdapt
What if we adapted the existing Agent Assist architecture rather than building new?
Capital One's Agent Assist is already in production (10,000+ uses, 84% → 93% search relevance improvement), has SR 26-2-compliant logging infrastructure, and organizational familiarity. Time-to-value is the most underrated variable in an enterprise platform build. A working system that ships in 90 days and is 80% as good as the ideal is better than an ideal system that ships in 12 months.
→ Concept D (Agent Assist Extension)
MModify / Magnify
What if we magnified the freshness problem and made real-time context the core design constraint?
An event streaming architecture (Apache Kafka) where every customer event — card transaction, Brex card swipe, CRM update — triggers an instant context refresh. Sub-minute staleness eliminates the freshness problem structurally. This concept came as a surprise — the instinct was to build an API platform, not a streaming platform — which is exactly the signal that it deserves serious evaluation.
→ Concept E (Real-Time Streaming Platform)
PPut to Other Uses
What if the context platform was surfaced to customers, not just to reps?
Small business owners could see a 'Capital One Business Profile' — what Capital One knows about their business, how current it is, what products they are eligible for. A two-sided product: reps use it to prepare pitches; customers use it to see how they are understood. Discarded for MVP scope — different compliance regime, different UX requirements, ~2-year build timeline.
→ Discarded for MVP. Noted for Phase 3+ roadmap.
EEliminate
What if we eliminated the central context platform entirely and focused on governance of existing retrieval?
No new platform. Audit each FSO AI application's existing retrieval layer, establish a data quality standard (freshness thresholds, PII requirements, entity matching rules), and enforce compliance through quarterly governance reviews. Attractive to anyone skeptical that a platform can be built fast enough. The test is A-09 — if the root cause is organizational, this concept is correct. If structural, this concept fails.
→ Concept F (Governance-First, No Platform)
RReverse
What if we reversed the build sequence — starting with the feedback loop rather than the context layer?
Build the feedback logging and eval pipeline first — establishing what a 'good recommendation' looks like and which context signals are actually predictive of pitch success — then build the context platform around the signals that matter, rather than the signals assumed to matter. Directly addresses assumption hygiene. Scores well on feedback loop quality but poorly on rep-facing trust, which is the immediate adoption problem.
→ Concept G (Eval-First, Context-Second)
A
CardEx Core — Horizontal Shared Platform
The full proposal. Central entity resolution + freshness normalization + context API + feedback logging.
Strengths
  • Maximum data consistency across all FSO AI applications — single source of truth
  • SR 26-2 compliance built once for all consuming applications simultaneously
  • Scales to new data sources (Discover network, future acquisitions) by adding a source to the platform, not rebuilding each application
  • Highest feedback loop quality — structured logging at the platform level produces consistent, queryable eval datasets
Weaknesses
  • Slowest to ship — 4-layer architecture, entity resolution for 35K+ Brex accounts — realistically 5–7 months to MVP
  • Highest organizational footprint — every FSO AI application team must migrate from their own retrieval to the shared API
  • Most complex build — a bug in Layer 1 (entity resolution) degrades all layers above it
B
Schema-First Federated
Define a shared context schema and data contract. PM owns the standard, not the infrastructure.
Strengths
  • Faster to first impact — schema definition and governance tooling can be live in 6–8 weeks
  • Lower adoption friction — teams keep control of their retrieval; conformance is incremental
  • Lower organizational risk — no new infrastructure dependency
Weaknesses
  • Consistency depends on implementation quality per team — distributed enforcement is unreliable under shipping pressure
  • Brex entity resolution is still solved independently by each team — the hardest problem is not addressed
  • No shared logging — feedback loop quality is fragmented; cross-application eval is impossible
  • Schema drift is the historical failure mode of federated standards in large organizations
C
Unified FSO AI Agent
One AI system handles retrieval, synthesis, and recommendation. No separate context platform.
Strengths
  • Highest adoption potential — reps have one tool, one interface
  • Eliminates the API adoption problem — no downstream applications to migrate
  • Cleaner feedback loop — one system captures all inputs and outputs in a consistent schema
Weaknesses
  • Monolithic architecture is hardest to improve in targeted ways — retrieval and generation tightly coupled
  • Abandons the 'shared infrastructure' value proposition — FSO AI product team loses ability to build distinct domain-specialized applications
  • Source attribution becomes harder — unified model that retrieves and generates in one pass makes it difficult to isolate which context drove which recommendation
D
Agent Assist Extension
Extend Capital One's existing Agent Assist architecture for FSO use.
Strengths
  • Fastest to first value — proven infrastructure; realistically 60–90 days to FSO pilot
  • SR 26-2 compliance framework established — Agent Assist was built under Capital One's MRM governance
  • Organizational credibility — Agent Assist is already trusted internally; 'FSO edition' inherits that trust
Weaknesses
  • Agent Assist designed for reactive service lookups, not proactive sales synthesis — fundamentally different retrieval patterns
  • Brex entity resolution is not in Agent Assist's design — adding Brex's company-centric schema requires significant architecture extension that approaches building new
  • Scalability ceiling — optimized for one-to-one retrieval; FSO needs one-to-many synthesis (one rep, multiple products, one customer summary)
  • Technical debt imported at launch — carrying design decisions that don't fit the new use case
E
Real-Time Streaming Platform
Apache Kafka event streaming. Every customer event triggers instant context refresh. Sub-minute staleness.
Strengths
  • Highest data freshness — eliminates the staleness problem structurally
  • Best long-term architecture — event sourcing provides complete audit trail and enables time-travel queries
  • Scales horizontally — Kafka's architecture scales with event volume without redesign
Weaknesses
  • Longest time to production — streaming infrastructure + entity resolution + API layer; realistically 9–12 months to MVP
  • Requires Brex's cooperation for streaming — Brex would need to expose a real-time event stream; weekly batch is currently available
  • Engineering complexity is the highest of all concepts — requires ML engineers, data engineers, platform engineers, and Kafka specialists working in parallel
F
Governance-First, No Platform
No new infrastructure. Audit existing retrieval layers. Establish data quality standards. Enforce through quarterly reviews.
Strengths
  • Zero infrastructure risk — no new system to build, operate, or debug
  • Fast to establish standards — a governance framework can be defined in 30 days
  • Lower capital expenditure — PM cost only; no infrastructure spend
Weaknesses
  • Does not address the structural root cause — governance of fragmented retrieval improves quality marginally but does not produce consistency
  • Brex entity resolution cannot be governed into existence — each team would need to solve it independently
  • Historical failure mode of standards-only approaches: team compliance is high in Q1, degrades under shipping pressure — standards without enforcement infrastructure are aspirational
G
Eval-First, Context-Second
Build the feedback loop and eval pipeline first. Let data reveal which context signals are actually predictive before building the platform.
Strengths
  • Highest feedback loop quality of any concept — the eval framework is the entire Phase 1 focus
  • Reduces assumption risk — instead of building entity resolution for all signals, let data confirm which signals are worth the effort
  • Faster to Phase 1 value — logging infrastructure is simpler than a 4-layer platform
Weaknesses
  • Does not address the rep trust problem in Phase 1 — reps are abandoning AI tools; an eval pipeline improves model quality over time but provides no immediate improvement to the rep's experience
  • The Burned Skeptic (Marcus) cannot be recovered with an eval framework — he needs to see a recommendation he trusts, not a dashboard
  • Brex context gap not addressed in Phase 1 — cross-sell reps (S2) get no value until the context platform is built in Phase 2

Ten criteria, scored 1–5 (5 = best). Concept A (CardEx Core) is the reference concept and does not win on every criterion. Five criteria are double-weighted (2×) reflecting the post-acquisition context: data consistency, SR 26-2 compliance, scalability, feedback loop quality, and Brex integration readiness.

CriterionWtABCDEFG
C1: Time to first value2435143
C2: Data consistency across FSO AI5343522
C3: Freshness SLA at MVP4333522
C4: SR 26-2 compliance coverage5334423
C5: Organizational adoption friction2454254
C6: Scalability to new data sources5322523
C7: Feedback loop quality5233425
C8: Rep-facing trust signal5343511
C9: Engineering complexity (inverse)2334154
C10: Brex integration readiness4332322
Raw total39313333352729
Weighted total62465151563842
Where Concept A loses: C1 (Time to first value) — Concept D wins decisively (5 vs. 2). Agent Assist Extension ships in 60–90 days; CardEx Core ships in 5–7 months. C5 (Organizational adoption friction) — Concepts C and F score higher. C9 (Engineering complexity) — Concept F wins (5 vs. 2).
Why Concept E (56) scores second: Most technically correct long-term architecture. Loses on C1 (time to value, 1×) and C9 (engineering complexity, 1×). The Brex complexity spike is happening now — a 9–12 month streaming build means FSO operates with fragmented context for the entirety of the Brex integration's most critical window.
HIGH IMPACT
    │
    │   [P0-6: Rep freshness indicator]    [P0-1: Canonical entity resolution]
    │   [P0-5: Recommendation logging]     [P0-4: Context API]
    │   [P0-3: PII preprocessing]
    │                                      [P1-2: Automated eval pipeline]
    │                                      [P1-3: Manager dashboard]
    │   [P1-1: One-tap feedback]
    │
    │───────────────────────────────────────────────────────────────
    │                                      [P1-4: Brex real-time API]
    │                                      [P2-1: Composite scores]
    │   [P2-3: Self-service expansion]
    │   [P2-2: Confidence score]
    │
LOW IMPACT
         LOW EFFORT                        HIGH EFFORT
Build sequence: PII preprocessing → Event logging → Freshness indicator (quick wins) → Entity resolution + Context API (core platform) → Feedback mechanism → Eval pipeline → Manager dashboard
🌳Trunk — Foundational
Remove any of these and the platform cannot function.
  • Canonical customer entity with entity resolution (Spark + Brex + Credit schemas → single golden record)
  • Context API with source attribution and stateless retrieval
  • PII preprocessing and tokenization at the API boundary
  • Freshness metadata embedded in every context response
🌿Primary Branches — Core Value Delivery
What reps and managers will actually experience. Without these, the platform is technically correct but invisible.
  • Rep-facing freshness indicator (green / yellow / red staleness by data source)
  • Recommendation event logging (structured, queryable, consistent schema)
  • Brex batch integration (weekly feed at MVP — minimum viable Brex coverage)
  • Downstream app API adoption support (documentation, migration guides, integration support)
🌱Secondary Branches — Improvement Engine
Require healthy primary branches to produce value. These are the platform's flywheel.
  • One-tap rep feedback mechanism (wrong product / stale data / used without issue)
  • Automated eval pipeline (passive signals + explicit flags + lagged outcomes → labeled triples)
  • Manager performance dashboard (adoption rate, accuracy trend, outcome correlation)
  • Confidence score in rep-facing UI (calibrated against eval pipeline output)
✂️Pruned — Cut from CardEx Core Scope
Explicit scope exclusions with rationale.
Pruned ItemReason
Real-time Kafka streaming architecture9–12 month build; Brex API dependency; weekly batch achieves sufficient freshness for MVP at lower risk
CardEx Core-generated recommendationsScope violation — CardEx Core is a context platform, not a recommendation engine
Customer-facing context transparencyDifferent compliance regime (consumer-facing AI); materially different product; Phase 3+ vision
Composite signal scores (Upgrade Propensity, Churn Risk)Requires 12+ months of historical data to calibrate reliably; premature before data foundation is stable
Self-service data source expansionBuild the first three sources correctly before making onboarding self-serve; premature abstraction
CRM write accessWrite access introduces data integrity risk; CardEx Core reads from CRM and logs to its own store
Selected concept: Concept A — CardEx Core Horizontal Shared Platform (Weighted score: 62 · Second place: Concept E at 56 · Third place tie: Concept C and D at 51)
Why Concept A over Concept D (the strongest alternative on speed)

Concept D (Agent Assist Extension) wins on time-to-value (5 vs. 2) and engineering complexity (4 vs. 2). The reason Concept A is selected despite this loss is that two of the highest-weighted criteria — C6: Scalability to new data sources (2×) and C10: Brex integration readiness (2×) — represent the post-acquisition context that makes this problem urgent in 2026 specifically.

Agent Assist was built to answer a question (“what's the answer to this service inquiry?”). CardEx Core is built to build a picture (“who is this customer across all their Capital One relationships?”). These are fundamentally different retrieval patterns. Adapting the former for the latter is not adaptation — it is replacement with legacy architectural debt attached. Within 12–18 months, the Brex extension would require a partial rebuild approaching the scope of building correctly the first time.

Why Concept A over Concept E (the most technically correct alternative)

Concept E (Real-Time Streaming) is the right long-term architecture. This is acknowledged directly. If Capital One's field sales AI program had 12 months to build before the Brex complexity problem needed solving, Concept E would be the correct choice.

The Brex acquisition closed April 7, 2026. Cross-sell reps are inheriting accounts they have no context for today. A 9–12 month streaming build means those reps operate blind for the entirety of the Brex integration's most critical window. The adoption damage from this period — reps who lose trust in AI tools during Brex onboarding will be hard to recover even after the platform ships — is a real cost the Pugh Matrix weights do not fully capture.

Design hedge: CardEx Core is architected for future streaming adoption. The Context API is stateless; the retrieval layer is abstracted; entity resolution does not assume batch inputs. When the Brex real-time API is available (Phase 2 roadmap), upgrading from weekly batch to real-time streaming is a source configuration change, not a platform rebuild. Concept E's architecture is embedded in Concept A's design as a Phase 2 path.

What Concept A gives up — stated explicitly

1. Speed to first rep-visible improvement. Concept D could produce a working FSO AI context improvement in 90 days. Concept A's MVP is 5–7 months. During that window, the trust collapse continues and the Burned Skeptic population grows. Mitigation: ship the rep-facing freshness indicator (P0-6) as the earliest possible visible change — which can be done before the full platform is live, using existing retrieval layers with freshness metadata added as a preprocessing step.

2. Organizational independence for application teams. Concept A requires every FSO AI application team to migrate from their own retrieval to a shared API — a real change management cost. Mitigation: API adoption program with migration guides, dedicated integration support, and a compatibility layer that allows teams to call CardEx Core alongside their existing retrieval during a parallel-run period.

3. Simplicity of the failure mode. Concept G (Eval-First) has a simple, isolated failure mode: the eval pipeline doesn't produce good data yet. Concept A's failure mode is more complex: a bug in entity resolution degrades all four layers simultaneously. Mitigation: aggressive testing and a staged rollout — entity resolution for Spark-only customers first (lower complexity, no Brex schema), then expanding to Brex customers once the resolution layer is stable.

Phase 5
Delivery
Two MVPs · System architecture · 3 schemas · Eval rubrics D1–D4 · System prompt · PM prompt decisions · HITL loop · 3 levers · 3 drift types · 6-tier metrics · RAID · A-17–A-20

The JD defines two co-equal deliverables: the context platform and the feedback loop strategy. These are specified as two sequential MVPs. They are not simultaneous — MVP-B depends on MVP-A being stable — but both are PM-owned with equal specificity.

MVP-A · The Context Platform
Every FSO AI application reasons from the same version of the customer.
Scope: Canonical customer entity (Spark + Brex + Credit schema resolution), freshness normalization, PII preprocessing, Context API with source attribution, rep-facing freshness indicator.
Wedge rationale
  • Entity resolution is the structural prerequisite. A recommendation logging system that logs against an unresolved entity produces an eval dataset that cannot be trusted.
  • The rep-facing freshness indicator is the earliest visible improvement — the minimum intervention required to begin rebuilding trust. It can ship before the full Context API is live.
  • PII preprocessing at the API boundary solves SR 26-2 compliance once for all downstream applications simultaneously. Every day it doesn't exist is accumulated regulatory exposure.
Target: Month 5–6 · Gate: Entity resolution ≥70% match rate for Spark-Brex pairs · API P50 ≤200ms · Zero raw PII in sample audit
MVP-B · The Feedback Loop System
The system learns. Quality improves measurably. Leadership has measurement.
Scope: Recommendation event logging, one-tap rep feedback, HITL review queue, automated eval pipeline, drift detection monitoring, prompt engineering governance.
Why MVP-B follows MVP-A
  • The eval pipeline scores recommendation quality by comparing retrieved context against the recommendation generated. If context is inconsistent (pre-MVP-A), eval scores are noisy.
  • The HITL feedback loop produces labeled (context, recommendation, score) triples. If context is fragmented, you cannot determine whether a bad recommendation was caused by retrieval failure or generation failure. MVP-A makes this distinction possible.
Target: Month 8–9 · Gate: Logging ≥95% of events · ≥500 labeled events accumulated · Eval pipeline running weekly without manual intervention · Drift baseline established
Current State — Fragmented Retrieval
┌─────────────────────────────────────────────────────────┐
│                   FIELD SALES REP                       │
└──────────────────┬──────────────┬──────────────┬────────┘
                   │              │              │
          ┌────────▼───┐  ┌───────▼────┐  ┌─────▼──────────┐
          │ Lead       │  │ Pitch      │  │ Credit         │
          │ Scoring AI │  │ Recomm. AI │  │ Suggestion AI  │
          └────────┬───┘  └───────┬────┘  └─────┬──────────┘
                   │              │              │
          ┌────────▼───┐  ┌───────▼────┐  ┌─────▼──────────┐
          │ Spark CRM  │  │ Transaction│  │ Credit Profile │
          │ (COF       │  │ History DW │  │ (COF           │
          │  schema)   │  │ (daily     │  │  monthly       │
          │            │  │  batch)    │  │  refresh)      │
          └────────────┘  └────────────┘  └────────────────┘

          ┌─────────────────────────────────────────────────┐
          │ Brex Platform (company-centric schema)          │
          │ → NOT CONNECTED to any of the above AI apps     │
          │ → Brex customers appear in CRM with no context  │
          └─────────────────────────────────────────────────┘

FAILURE MODES:
  ✗ Three retrieval layers → three versions of same customer
  ✗ Brex data entirely absent from all three AI applications
  ✗ No shared entity resolution → schema mismatch across sources
  ✗ No feedback capture → no logging of what reps did
  ✗ No PII preprocessing → raw customer data in LLM prompts
  ✗ No freshness visibility → rep cannot tell which data is current
Target State — CardEx Core + Feedback Loop
┌──────────────────────────────────────────────────────────────┐
│                        FIELD SALES REP                       │
│              [Sees: freshness indicator · one-tap feedback]  │
└────────────┬──────────────────────┬──────────────┬───────────┘
             │                      │              │
    ┌────────▼───┐         ┌────────▼────┐    ┌────▼──────────┐
    │ Lead       │         │ Pitch       │    │ Credit        │
    │ Scoring AI │         │ Recomm. AI  │    │ Suggestion AI │
    └────────┬───┘         └────────┬────┘    └────┬──────────┘
             └──────────────────────┼───────────────┘
                                    │  All apps call shared API
                    ┌───────────────▼──────────────────┐
                    │         CONTEXT API               │
                    │  (stateless · source attribution  │
                    │   · PII-sanitized · ≤200ms P50)   │
                    └───────────────┬──────────────────┘
          ┌─────────────────────────▼──────────────────────────┐
          │              CONTEXTUAL INTELLIGENCE PLATFORM       │
          │  Layer 1: Canonical Customer Entity                 │
          │  Layer 2: Freshness Normalization                   │
          │  Layer 3: PII Preprocessing (SR 26-2 compliant)    │
          └─────────────────────────┬──────────────────────────┘
          ┌─────────────────────────▼──────────────────────────┐
          │   SOURCE SYSTEMS                                    │
          │   Spark CRM (daily) · Transaction DW (daily)       │
          │   Brex Platform (weekly batch → real-time P2)      │
          │   Credit Profile (monthly) · CRM Notes (event)     │
          └────────────────────────────────────────────────────┘
          ┌────────────────────────────────────────────────────┐
          │              FEEDBACK LOOP SYSTEM (MVP-B)          │
          │  Recommendation Event Log                          │
          │  → HITL Review Queue (flagged events · 48hr SLA)  │
          │  → Eval Pipeline (weekly · 4 dimensions)          │
          │  → Three Levers: Retrieval · Prompt · Retraining  │
          │  → Drift Detection (Data · Concept · Output)       │
          └────────────────────────────────────────────────────┘
Schema 1 — Canonical Customer Entity
Canonical Customer EntityGolden record · entity-resolved · freshness-tagged per field
{
  "entity_token": "ENT-44821-COF",
  "entity_confidence": 0.94,
  "resolution_method": "deterministic_ein",
  "source_ids": {
    "spark_account_id": "SPK-7821-XXXX",
    "brex_company_id": "BRX-44821",
    "capital_one_credit_id": "CRD-992-XXXX",
    "crm_contact_id": "CRM-LOC-4821"
  },
  "business_profile": {
    "legal_name_token": "ENTITY_44821",
    "industry_code": "5087",
    "employee_count_signal": {
      "value": 38,
      "source": "brex_card_issuance",
      "as_of": "2026-06-21T00:00:00Z",
      "cadence": "weekly_batch",
      "staleness_flag": "current"
    }
  },
  "capital_one_relationship": {
    "current_products": ["Spark_Cash_Plus", "Brex_Corporate"],
    "card_utilization_pct": {
      "value": 84,
      "as_of": "2026-06-27T14:30:00Z",
      "cadence": "daily_batch",
      "staleness_flag": "current"
    },
    "spend_trend_90d": "increasing",
    "brex_monthly_volume_q2_2026": {
      "value": 240000,
      "currency": "USD",
      "as_of": "2026-06-21T00:00:00Z",
      "cadence": "weekly_batch",
      "staleness_flag": "current"
    }
  },
  "sales_history": {
    "last_pitch_outcome": "declined",
    "last_pitch_date": "2026-03-12",
    "last_pitch_product": "Spark_Cash_Plus_upgrade",
    "upgrade_indicators": [
      "volume_increase_40pct_q2",
      "headcount_growth_8_cards_q2",
      "credit_utilization_84pct"
    ]
  }
}
Schema 2 — Recommendation Event Log
Recommendation Event LogSR 26-2 attributable · links to CRM outcome via entity token
// At recommendation time:
{
  "event_id": "REC-20260627-FSR-4821",
  "entity_token": "ENT-44821-COF",
  "rep_id_token": "REP-TOKEN-9821",
  "retrieval_ids": ["TXN-batch-20260627", "CRD-20260601", "BRX-batch-20260621"],
  "prompt_version": "ctx-prompt-v2.3",
  "downstream_model": "pitch-recommender-v1.4",
  "recommendation_summary": "Brex Premium upgrade + credit limit increase to $450K",
  "rep_action": { "type": "modified", "modification": "removed_credit_limit_suggestion" },
  "eval_scores": { "human_reviewed": false, "auto_scored": false }
}

// After CRM logs outcome (30-day lag):
"outcome": {
  "outcome_type": "converted",
  "converted_product": "Brex_Premium",
  "outcome_source": "crm_opportunity_closed"
}

// After HITL eval runs:
"eval_scores": {
  "retrieval_precision": 0.89,
  "recommendation_accuracy": "accurate",
  "human_reviewed": true,
  "quality_dimension_scores": {
    "retrieval_relevance": 4, "factual_accuracy": 5,
    "business_appropriateness": 4, "specificity_to_customer": 4
  }
}
This section defines what “good” looks like before the product ships — as the JD explicitly requires. The rubric is the PM's contribution. The scoring infrastructure is Data Science's. The gate criteria for retraining are co-owned.
D1Retrieval Precision
Did CardEx Core surface the right context for this recommendation?
ScoreDefinition
5All retrieved context directly relevant; no irrelevant context included
4Majority relevant; one peripheral object included
3Mixed relevance; key signal retrieved but alongside significant noise
2Critical signal missing; recommendation generated without most relevant data
1Wrong context retrieved entirely; data for wrong entity or wrong time period
Method: HITL team reviews retrieval_ids against canonical entity. Automated checks for do_not_recommend exclusions. Baseline: ~60% precision ≥4 (A-16). Target: ≥85% by Month 6.
D2Recommendation Accuracy
Does the recommendation contain factual errors about the customer's situation?
Binary: Accurate / Inaccurate. Inaccuracy types logged separately:
wrong_product — recommends a product the customer already holds
wrong_limit — references a credit limit that doesn't match canonical entity
stale_signal_used — driven by a staleness-flagged field despite the warning
entity_mismatch — references signals from a different customer

Method: Automated for wrong_product and wrong_limit; human review for the others. Target: ≥95% accuracy rate within 3 months of MVP-A.
D3Business Appropriateness
Is the recommendation suitable for this customer's business stage and Capital One's product strategy?
ScoreDefinition
5Matches growth stage, addresses confirmed pain point, aligns with Capital One's cross-sell priority
4Directionally correct; one element could be better tailored
3Plausible but generic; not tailored to this customer's specific signals
2Technically available but misaligned with likely need
1Clearly wrong for this business (e.g., starter card for a $240K/month spender)
Human eval only. What scores a 5 reflects the PM's understanding of Capital One's FSO strategy for each customer segment. Must be reviewed quarterly as strategy evolves. Monthly sample of 50 recommendations. Target: mean ≥4.0 by Month 12.
D4Business Outcome Correlation (Lagged)
Did AI-assisted recommendations convert at higher rates than non-AI-assisted pitches?
Measurement: 30-day rolling conversion rate for pitches where the rep used the AI recommendation vs. pitches where the rep ignored it. Tracked at segment level (S1 High-Volume SMB, S2 Cross-Sell Brex) and product level.

Why secondary, not north star: Outcome correlation is the ground truth but takes 30 days per pitch and is confounded by rep skill, market conditions, and product mix. It is the most important metric for demonstrating business value to leadership. It is not the fastest feedback signal for improving the model. Target: AI-assisted pitches convert ≥15% higher by Month 12. (A-17)
Prompt engineering at the platform level is distinct from prompt engineering at the query level. The PM owns the schema and instruction layer that every downstream application inherits. This is the highest-leverage PM-owned technical tool in the CardEx Core stack.
1Field ordering within context summaryPM
LLMs are sensitive to position within a context window. The PM orders the context summary to minimize the most damaging failure modes:
(1) do_not_recommend — first, always; catches pitching an existing product before any other signal is processed
(2) current_products — second, same reason
(3) upgrade_indicators — third; the positive signal the model should build toward
(4) spend_trend_90d + brex_monthly_volume — fourth; quantitative grounding for upgrade indicators
(5) last_pitch_outcome — fifth; recent context, not the primary frame
(6) credit_utilization — sixth; supporting signal
(7) suggested_context_for_pitch — the PM-authored narrative summary that synthesizes signals into a direction
2Staleness framing in the promptPM
Raw staleness metadata is not directly useful to a language model. Three versions to test:

A (flag-only): '$240K [DATA AGED: 7 DAYS — VERIFY BEFORE CITING]'
B (weight reduction): '$240K (less certain — last updated 7 days ago)'
C (instruction injection): System prompt instructs: 'When a field is marked aging or stale, express lower confidence; do not cite specific figures from stale fields.'

The PM tests which framing produces appropriate hedging on stale signals without over-hedging on current signals. A/B testable within the eval framework.
3System prompt — the PM-owned instruction layerPM
You are a field sales assistant for Capital One's Business Cards & Payments division.
You will receive a structured customer context object.

Your task: Recommend ONE specific Capital One product action for this customer.

Rules:
- NEVER recommend a product listed in do_not_recommend
- If a field is marked 'stale', express uncertainty about that signal
- Base your recommendation on upgrade_indicators, not on historical behavior alone
- Your recommendation must be actionable in a single sales call
- Do not recommend more than one product action; specificity > coverage
- Output format: [Product action] | [Primary signal used] | [Confidence: High/Medium/Low]

This prompt is versioned (prompt_version in the API response). When the PM updates it, the version increments — and the eval pipeline can measure whether the new version produces better scores than the prior version.
4Negative example injectionPM
The PM periodically injects negative examples into the system prompt — explicit descriptions of failure modes drawn from the HITL review queue's highest-frequency failures:

BAD: 'Offer Spark Cash Plus' when customer already holds Spark Cash Plus
BAD: 'Suggest increasing credit limit to $150K' when current limit is already $150K
BAD: 'Discuss Brex features' without knowing the customer's specific Brex use case

Negative examples updated monthly based on the HITL queue's failure mode distribution. Zero-retraining-cost improvement the PM ships independently.
5Prompt version governancePM
Gate criteria for shipping a new prompt version:
• Tested on last 200 recommendation events from eval dataset
• Quality dimension scores ≥ previous version on at least 3 of 4 dimensions
• No regression on D2 (Recommendation Accuracy — factual errors cannot increase)
• Logged in version registry with change rationale and before/after eval scores

The PM who reaches for retraining first, before exhausting prompt adjustments, burns engineering cycles unnecessarily. Prompt changes are faster than model retraining by 4–6 weeks.
1
Recommendation Generated
Downstream AI app calls Context API. Receives context object (prompt_version tagged). Generates recommendation. Recommendation event logged automatically.
2
Rep Signal Captured
Rep acts:
[used without change] → passive positive signal (low confidence)
[modified] → passive mixed signal (negative on removed element)
[ignored] → passive negative signal (low confidence)
[flagged] → explicit signal (high confidence, routes to HITL queue)
3
HITL Review Queue
Flagged events reviewed by eval team within 48 hours. Reviewer scores all four quality dimensions. Failure mode categorized: retrieval / accuracy / appropriateness / other. Labeled event appended to eval dataset.
4
Auto-Labeling (Passive Signals)
Used without change → D2 accuracy assumed positive (high confidence).
Modified → D3 appropriateness scored based on what was removed.
Ignored → weak negative signal; not used for retraining without HITL confirmation.
Conversion outcome (30-day lag) → strongest ground truth label; overrides passive signals.
5
Eval Pipeline (Runs Weekly)
Aggregates all labeled events from prior week. Computes quality scores by: customer segment, data source, prompt version, downstream model. Outputs: quality score trends, failure mode distribution, staleness correlation. Flags: any score declining week-over-week for 3+ consecutive weeks.
6
Three Improvement Levers — PM Selects Based on Failure Mode Diagnosis
L1: Retrieval Tuning (PM + Data Engineering) — for retrieval precision failures. 2–3 week engineering cycle.
L2: Prompt Adjustment (PM owns fully) — for appropriateness and instruction clarity failures. Same-day to 1 week.
L3: Model Retraining (PM sets gate criteria; Data Science executes) — only after L2 exhausted; 4–8 weeks.
Lever 3 (Model Retraining) — PM-Owned Gate Criteria
Gate CriterionThreshold
Minimum labeled dataset size≥1,000 labeled events with outcome data
Quality decline sustained≥3 consecutive weekly declines in overall quality score
Prompt adjustment exhausted≥2 prompt versions tested without improvement
D2 (Accuracy) floor during retraining evalMust not drop below 90%
Champion-challenger evaluationNew model must beat current on all 4 dimensions on holdout eval set before promotion
Drift is not “the model is wrong.” Drift is “the model was right and is becoming less right, in a direction no one noticed because the change is gradual.”
Type 1Data Drift
What it is
Post-Brex, the canonical entity store receives a new category of customer — high-growth tech startups with $500K+/month Brex spend, no Spark card history. Entity resolution, freshness thresholds, and context summarization were calibrated on SMB customers with 2–5 year card histories. The new customer distribution breaks these calibrations.
Detection
Weekly monitoring of canonical entity field distributions. Kolmogorov-Smirnov test on key numeric fields (monthly spend, credit utilization, headcount). Alert threshold: distribution shift >2 standard deviations from 90-day rolling baseline.
Response
Lever 1 (Retrieval Tuning) — recalibrate entity resolution for new customer type; adjust freshness thresholds. Lever 2 — update prompt to handle new segment.
Type 2Concept Drift
What it is
In Q1 2026, a Brex Premium recommendation for a $200K/month customer scores 5/5 on D3 Business Appropriateness. In Q4 2026, Capital One launches a new SMB Banking product clearly superior for this profile. The Brex Premium recommendation is now a 3/5 — but the model still recommends it because it was rewarded in Q1's training data.
Detection
Quarterly eval rubric review. PM compares current Business Appropriateness rubric against Capital One's current product strategy. D4 (Business Outcome) monitoring: if conversion rate declines for a recommendation type that previously converted well, concept drift is a candidate explanation.
Response
Eval rubric update (PM-owned) → re-score historical eval data → if model needs to learn the new concept, Lever 3 (retraining with updated rubric as ground truth).
Type 3Output Drift (Recommendation Convergence)
What it is
In Month 3, the model produces varied, customer-specific outputs. In Month 12, after 9 months of HITL feedback training, 60% of all recommendations suggest one of three templates. The model learned these three are the 'safest bets' — flagged less frequently — and regresses to them regardless of customer context. This is the HITL paradox: the feedback mechanism that improves quality can also narrow output diversity over time.
Detection
Output diversity index — tracked weekly. Measures vocabulary diversity of recommendation text and distribution of product types recommended. Alert: if top-3 recommendation types account for >60% of all recommendations for 4 consecutive weeks.
Response
(1) Diversity injection in system prompt: 'Consider the full range of Capital One products; do not default to the most common recommendation.' (2) Mark 'generic recommendation despite specific signals' as D3 failure in HITL rubric. (3) Lever 3 (retraining with diversity penalty) if Lever 2 doesn't recover diversity within 4 weeks.
Month 1 is dedicated baseline measurement. No MVP-A components touch a field rep until baselines for Trust Rate, Retrieval Precision, Accuracy Rate, Adoption Rate, and Conversion Rate are established from current-state behavior. Every assumption-labeled baseline is replaced by a confirmed measurement by end of Month 1.
Tier 1 — North Star
MetricBaselineTargetMethod
Recommendation Trust Rate — % of recommendations reps act on without modification~20% (A-19)≥60% by Month 12Event log: (used without modification) ÷ (total acted on)
Trust Rate integrates all three quality dimensions simultaneously — freshness (reps modify when data is stale), accuracy (reps modify when facts are wrong), and appropriateness (reps modify when pitch direction is off). A rising Trust Rate is the compressed signal that the platform is working.
Tier 2 — Platform Health
MetricTargetWhen
Context API Availability≥99.5%Continuously from MVP-A
Context API P50 Latency≤200msContinuously from MVP-A
Freshness SLA Compliance — % of responses with all fields within staleness thresholds≥90%Weekly from MVP-A
Brex Ingestion Success Rate≥98% of weekly batch jobs complete without data lossWeekly from Month 3
Entity Resolution Match Rate (Spark-Brex pairs)≥70% deterministic match by Month 6Weekly during MVP-A build
Tier 3 — Feedback Loop Health + Eval Quality (D1–D4)
MetricTargetWhen
Feedback Capture Rate≥95% of events with ≥1 signal by Month 2Weekly from Phase 1
Eval Dataset Growth Rate≥200 labeled events/week by Month 9Weekly from MVP-B
HITL Review Clearance Rate≥90% reviewed within 48 hoursWeekly from MVP-B
Inter-rater Reliability (D3)Cohen's Kappa ≥0.75Monthly, ≥50 dual-reviewed events
Retrieval Precision@3 (D1)≥85% by Month 6Weekly HITL scoring on 25-event sample
Recommendation Accuracy Rate (D2)≥95% by Month 3 post-MVP-AAutomated vs. canonical entity
Business Appropriateness Score mean (D3)≥4.0 / 5 by Month 12Monthly 50-recommendation HITL sample
Outcome Conversion Correlation (D4)AI-assisted pitches ≥15% higher by Month 12Monthly from Month 9 (30-day lag)
Tier 4 — Business Impact
MetricTargetWhen
Rep Adoption Rate≥70% of eligible S1+S2 reps using recommendations ≥1×/week by Month 9Monthly
AI-Assisted Pitch Conversion Rate≥15% higher than non-AI baseline by Month 12Monthly (30-day lag)
Rep Time Savings per Pitch Prep≥15 min reduction by Month 9Bi-monthly survey
New App Integration Time≤5 business days from API access to first production eventPer integration
Tier 5 — Improvement Velocity
MetricTarget
Quality Score Trend (mean across 4 eval dimensions)Improving ≥2 points per quarter from Month 9 baseline
Time-to-Improvement (Lever 2: Prompt)≤10 business days from issue detected to deployed prompt update validated against eval holdout
Time-to-Improvement (Lever 3: Retraining)≤8 weeks from retraining trigger to new model version promoted to production
Prompt Version Win Rate≥75% of new prompt versions improve on ≥3 of 4 quality dimensions
Tier 6 — Guardrails (Non-Negotiable Floors)
GuardrailThresholdResponse
PII in recommendation outputZeroPause all downstream model deployments; audit PII preprocessing layer
HITL-flagged recommendation rate>15% of weekly recommendations flagged as wrongEmergency eval review; PM + Data Science + MRM convene within 48 hours
SR 26-2 audit trail completeness<100% with complete retrieval_idsBlock new recommendation events until gap resolved
Output Diversity Index<50% of Month 3 baseline for 3 consecutive weeksLever 2 (diversity injection); escalate to Lever 3 if not recovered in 4 weeks
Entity Resolution ConfidenceAverage confidence <0.75Pause Brex entity resolution expansion; review matching algorithm
Feedback Capture Rate<80% for 2 consecutive weeksEngineering review of logging pipeline; confirm no silent failures
PhaseGate MetricPass Threshold
Phase 1 → Phase 2Feedback Capture Rate≥95% of events captured
PII guardrail0 PII in 100-event audit
Freshness indicatorLive in ≥1 rep-facing tool
Phase 2 → Phase 3 (MVP-A complete)Context API handling Spark traffic100% of Spark app requests via CardEx Core
Brex Entity Resolution Coverage≥70% of Brex accounts resolved
Trust Rate (Spark customers)Trending above 30% vs. ~20% baseline
Context API P50 Latency≤200ms confirmed in production load test
Phase 3 → Phase 4 (MVP-B complete)Eval Pipeline Coverage≥80% of recommendation events scored weekly
HITL Clearance Rate≥90% reviewed within 48 hours
Eval Dataset Size≥500 labeled events accumulated
Drift DetectionAll three drift type monitors active with baselines established
Risks
IDRiskPIMitigation
R-01Brex entity resolution match rate <50% (A-12 fails)MHSample matching exercise Week 1; if <50%, extend human-review queue capacity and adjust MVP-A gate criteria
R-02Downstream FSO AI application teams resist Context API migrationMHParallel-run period; requires executive sponsor mandate from BC&P leadership — PM cannot force migration without org authority
R-03SR 26-2 RFI on GenAI drops before MVP-A ships, requiring architectural changesL-MHMRM team in design review from Week 2; CardEx Core's source attribution already satisfies likely RFI requirements
R-04HITL review queue becomes backloggedMMAutomated pre-scoring to triage severity; high-severity flags require 48-hour SLA; low-severity batch reviewed weekly
R-05Brex batch feed cadence is bi-weekly or monthly (longer than A-13 assumes)MHNegotiate freshness SLA with Brex engineering Week 1; written commitment, not verbal estimate
R-06Prompt version update degrades quality for a segment not in eval holdoutLMStaged rollout: new prompt version served to 10% of traffic before full cutover; monitor 48 hours
R-07Output diversity drift occurs faster than A-18 assumes (within 3 months of MVP-B)LMMonthly diversity index reporting from MVP-B launch; early detection protocol if >10% drop within first 90 days
Issues (Known at Time of Writing)
IDIssueOwner
I-01Brex real-time API availability is a dependency not under Capital One's full controlPM + Brex Engineering Lead
I-02HITL review team staffing not yet defined — eval pipeline requires human reviewersPM + BC&P People Lead
I-03Prompt version governance requires alignment with Data Science on champion-challenger evaluation criteria before MVP-BPM + Data Science Lead
Dependencies
IDDependencyOwned byRequired by
D-01BC&P executive sponsor mandate for downstream app API migrationBC&P Head of ProductPhase 2 start (Month 2)
D-02Brex engineering commitment to weekly batch feed cadence and real-time API roadmapBrex EngineeringPhase 1 end (Month 2)
D-03MRM sign-off on CardEx Core SR 26-2 compliance designCapital One MRMPhase 2 launch (Month 6)
D-04HITL reviewer team staffing (2 FTE minimum for Phase 3 launch)BC&P Ops / PeoplePhase 3 start (Month 6)
D-05CRM outcome linkage capability (pitch outcome → recommendation event log)CRM EngineeringPhase 3 (for D4 scoring)
IDAssumptionBasisUrgency
A-17AI-assisted pitch conversion rate will improve ≥15% over non-AI-assisted baseline by Month 12Capital One's Chat Concierge demonstrated 55% lead conversion improvement; field sales context is rep-mediated so a more conservative target is appropriateHigh
A-18Output diversity drift will become detectable within 6–9 months of HITL feedback training beginningDocumented pattern in enterprise models trained with HITL feedback mechanisms — models converge on low-variance outputs over timeMedium
A-19Recommendation Trust Rate baseline is approximately 20%Inferred from trust collapse described in Phase 2; Agent Assist improvement from 84% to 93% suggests current field sales context is meaningfully below a reachable good stateHigh
A-20Recommendation Accuracy Rate baseline is unknown but likely low; wrong-product errors are probable given absence of a do_not_recommend constraint in current toolsRep complaint patterns documented in Phase 2; absence of structured product exclusion field in current retrieval layersHigh
Phase 6
Learning
Full assumption register (A-01–A-20) · Top 5 validation priorities · Over/underestimate analysis · First 8 actions · Vision · Note on this project
Critical — Solution Direction Changes If Wrong
IDPhAssumptionBasisValidation
A-010FSO lacks a unified customer view across Spark, Brex, and Discover data sourcesBrex operating independently post-acquisition; 12–24 month typical integration timeline at this scaleArchitecture review with FSO engineering leads, Week 1
A-020No shared context platform exists for field sales AI; each application has its own retrieval layerJD language 'design and build a horizontal foundation for shared, trusted context' implies the platform does not existSame architecture review, Week 1
A-082The inferred current-state architecture (multiple independent retrieval layers, no entity resolution, no shared logging) reflects Capital One's actual production environmentConstructed from public information about Capital One's AI deployments and acquisition contextArchitecture review with FSO engineering and Data Science leads, Week 1
A-092The root cause is structural — a missing platform abstraction layer — not organizational (siloed teams and poor communication)Inferred from how Capital One built AI vertically. The organizational explanation is not ruled out — it may be bothStakeholder interviews with FSO AI team leads from at least two application teams, Week 2. Key probe: 'If you wanted to share customer context with another team today, what would it take?'
A-112No canonical customer entity currently resolves identity across Capital One Spark card, Brex company, and Capital One credit schemasBrex operating independently; entity resolution at this scale (35,000+ Brex companies) unlikely completed in 3 months since April 7, 2026Data architecture review with Brex integration team, Week 1
A-133Brex data will be available via weekly batch feed for the first 12 months; real-time API access estimated 12–18 months outBrex operating independently; real-time integration requires purpose-built integration layerBrex engineering meeting, Week 1; secure written SLA commitment for initial batch cadence
High Urgency — Scope or Timeline Changes If Wrong
IDPhAssumptionValidation
A-030Data siloing is causing measurable adoption friction — reps are aware of inconsistent recommendations and avoiding AI toolsContextual inquiry with 6–8 FSRs across two regional offices, Month 1
A-040Capital One's MRM is already applying SR 26-2 principles to GenAI systems in field sales by analogyMRM team introductory meeting, Week 2; ask which governance framework applies to FSO AI today
A-060RAG is the correct retrieval architecture for CardEx Core MVP — fine-tuning not viable due to weekly data changes and SR 26-2 source attribution requirementsData refresh cadence audit for each source system, Week 2
A-070Rep behavior signals (used/modified/ignored/flagged) are the primary available HITL feedback signal at MVP-B launchCRM instrumentation review, Week 2
A-123Deterministic entity resolution match rate of ~70–85% achievable using EIN as primary keySample matching exercise on 500 Brex accounts, Week 2
A-153Downstream FSO AI applications can adopt the Context API without requiring a full application rebuildArchitecture review of each existing FSO AI application, Week 1
A-175AI-assisted pitch conversion rate will improve ≥15% over non-AI baseline by Month 12Baseline conversion rate audit in Month 1 before MVP-A ships
A-195Recommendation Trust Rate baseline is approximately 20%Pre-launch rep behavior audit in Month 1
A-205Recommendation Accuracy Rate baseline is unknown but likely low; wrong-product errors probableRetrospective accuracy audit on 100 recent recommendations, Month 1
Medium Urgency — Refinable In-Flight
IDPhAssumptionValidation
A-050Current FSO AI tools pass CRM data to LLM prompts without standardized PII preprocessingArchitecture review of existing app prompt construction, Week 1
A-102Customer churn in BC&P's SMB segment is partially attributable to irrelevant pitch experiences caused by stale or fragmented contextChurn analysis segmented by pitch relevance score and AI tool usage rate, post-MVP-B
A-143Context API P50 latency of ≤200ms achievable without caching using Capital One's cloud-native infrastructureLoad testing in development environment before production launch
A-165Retrieval Precision@3 baseline for current FSO AI tools is approximately 60%Retrospective eval team scoring on 100 pre-CardEx Core recommendations, Month 1
A-185Output diversity drift will become detectable within 6–9 months of HITL feedback training beginningFirst diversity index report at Month 3 post-MVP-B to establish baseline before drift begins
Priority 1A-09Root Cause — Structural vs. Organizational
Why this one, this order: This is the only assumption whose failure invalidates the platform build entirely. If the root cause is organizational siloes and insufficient governance rather than a missing abstraction layer, Concept F (governance-first, no platform) is the right solution. A wrong answer here means building expensive infrastructure to solve a people problem. The question cannot be answered by architecture reviews alone — it requires conversations with the people who built the existing applications.
How to validate: Interview leads from at least two FSO AI application teams — separately, not in a group. Key question: 'If you wanted your application to use the same customer data as the lead scoring tool, what would it take technically, and what would it take organizationally?' If the answer is 'we'd just agree to share the data schema' → organizational. If 'we'd need to build a shared API layer' → structural.
Priority 2A-11No Canonical Entity Exists
Why this one, this order: Entity resolution is the structural prerequisite for the entire platform. If a partial canonical entity already exists (Capital One has begun a Brex data integration that resolves some customer identities), CardEx Core builds on it rather than from scratch. The answer changes engineering scope significantly.
How to validate: Request entity model documentation from both Capital One's data platform team and the Brex integration engineering team in Week 1. Specifically ask: 'Do we have a customer record that links a Brex company ID to a Capital One Spark card account ID?'
Priority 3A-13Brex Batch Cadence
Why this one, this order: The freshness SLA CardEx Core can promise for Brex data is entirely determined by the batch cadence. Weekly → Brex data labeled 'aging' after 8 days, acceptable for MVP. Monthly → Brex data is structurally stale for most of the month, significantly weakening CardEx Core's value for S2 (Cross-Sell Brex) reps — the segment with the most acute pain.
How to validate: Week 1 meeting with Brex engineering and Capital One cloud team. Secure a written SLA commitment — not a verbal estimate — for the initial batch job cadence before designing the freshness normalization layer.
Priority 4A-15Downstream App Rebuild Requirement
Why this one, this order: The MVP-A parallel-run migration strategy depends on existing FSO AI applications being able to call the CardEx Core API alongside their current retrieval without a full rebuild. If any application has retrieval embedded directly in model code (a tightly-coupled monolith), migrating it requires a partial rebuild — adding one engineering cycle per affected application to the MVP-A timeline.
How to validate: Week 1 architecture review. Ask each application team lead to show the code path from 'rep triggers recommendation' to 'data is retrieved.' Decouplability is visible in the architecture; it does not require running the code.
Priority 5A-12Entity Resolution Match Rate
Why this one, this order: The 70–85% deterministic match rate estimate determines the human-review queue volume, MVP-A launch quality, and staffing requirement for entity resolution review. A 50% match rate doubles the human review queue and changes the staffing plan before the first line of entity resolution code is written.
How to validate: Sample matching exercise in Week 2. Run EIN-based deterministic matching on 500 randomly selected Brex company accounts against Capital One Spark and credit records. Measure actual match rate. This exercise can be done with raw data access — no platform needed.
This section identifies the structural biases in the proposal — not to undermine it, but because a hiring manager reading critically will find them, and stating them first is more credible than having them surface in an interview.
What This Proposal Overestimates
1. Entity resolution feasibility at MVP quality. The proposal assumes 70–85% deterministic match rate. In practice, B2B entity resolution for financial services is complicated by: business names that don't match legal entity names (DBA vs. registered name); sole proprietors using personal SSNs for both business and personal accounts (the Spark card may be under the owner's SSN; Brex's account is under a company EIN); and businesses that changed legal structure between their Spark card opening and their Brex onboarding. At 50% match, the canonical entity store at MVP-A launch has 50% of Brex accounts unresolved — which materially limits CardEx Core's value for S2 (Cross-Sell Brex) reps, the most acute pain point.
2. Rep trust recovery timeline. The Adoption Rate target of ≥70% of eligible reps by Month 9 assumes improving recommendation quality is sufficient to recover trust from reps who have already been burned. In enterprise AI rollouts, trust recovery from negative experiences requires more than quality improvement — it requires visible organizational signaling, peer success stories that reach the Burned Skeptic cohort, and in some cases direct manager intervention. The Month 9 adoption target may need to be disaggregated: early adopters (40%) at Month 6, broader adoption (70%) at Month 15.
3. Feedback loop signal quality in the first 9 months. Passive signals (used/modified/ignored) are noisy: a rep who modifies a recommendation may be improving it or may be wrong about their modification. Without a high enough proportion of explicit flags and lagged outcome labels in the dataset, the quality trend will be statistically noisy for 6–9 months. The ≥2 points/quarter improvement target may need to be deferred to Month 15 with a smaller signal of directional improvement (≥1 point) being the Month 9 gate.
What This Proposal Underestimates
1. Organizational change management as the primary risk. The RAID log lists downstream app team resistance as R-02 (Medium probability, High impact). This is likely underrated. Every FSO AI application team has its own roadmap, its own architecture philosophy, and — critically — its own answer to “why would I trust a shared platform I didn't build?” Migrating to a shared API affects their deployment independence, their debugging process, and their on-call responsibility. Without a strong, sustained executive mandate and dedicated migration engineering support, the parallel-run period could extend from 3 months to 12+ months. The mitigation that is probably missing: a formal adoption milestone tied to FSO engineering team performance reviews, championed by the BC&P Head of Product. Without it, migration is optional and will be deprioritized whenever teams face shipping pressure.
2. HITL review team staffing. D-04 assumes 2 FTE minimum for Phase 3 launch. At ≥200 labeled events per week with 10% dual-review for inter-rater reliability, a 15% flag rate on 1,000 weekly recommendations produces 150 flagged events requiring 48-hour turnaround. Two reviewers working at high quality can process approximately 80–100 flagged events per week before quality degrades. Phase 3 launch staffing should be 3–4 FTE with a clear plan to reduce as automated scoring matures. Understaffing the eval team at launch is the fastest way to degrade eval dataset quality and lose the feedback loop before it has demonstrated value.
3. Brex integration timeline. A-13 assumes Brex batch data integration is live by Month 3. Three months is aggressive. The more realistic timeline based on standard acquisition integration patterns at this scale: Month 5–6 for initial batch, Month 12–18 for real-time API. The roadmap should show a Spark-only Context API at MVP-A launch (Month 6), with Brex data arriving in MVP-A Phase 2 (Month 9). Leadership expectations must be set accordingly.
The organizational assumption this proposal cannot validate from the outside: Whether Capital One's BC&P organization has a PM who can credibly own both the context platform (a data infrastructure product) and the feedback loop system (an ML operations product) simultaneously. This is a broad scope for a Manager-level PM. In practice, the context platform may require a data platform PM skill set and the feedback loop may require an ML platform PM skill set. The risk is that this PM role is designed for a unicorn — and what actually ships is whichever half the PM is stronger in, while the other half is deprioritized under shipping pressure.
Day 1–2Schedule five architecture reviews — do not cancel them for anything.
Reviews needed: FSO engineering leads (existing AI applications), Brex integration engineering team, Capital One data platform team, MRM team, and the Field Sales AI Product Team (S5).

Rationale: A-01, A-02, A-08, A-11, and A-15 are all Critical assumptions. Every design decision made before these reviews is built on inference, not fact. The first instinct of any PM inheriting a problem is to start designing. The correct instinct is to first confirm the problem is what you think it is. These five reviews take one week and replace the five most dangerous assumptions in the register.
Day 3–5Conduct stakeholder interviews with FSO AI application team leads — separately, not in a group.
Target: leads from at least two different FSO AI application teams, interviewed independently.

Rationale: Validates A-09 (structural vs. organizational root cause) and A-15 (rebuild requirement). The interviews must be separate because team leads in a group setting will align to the most politically safe answer. Separate conversations surface whether fragmentation is a technical architecture problem or a coordination problem.

Key question to every interviewee: 'If you needed to use the same customer data as another team's AI application, what would it take — technically and organizationally?'
Week 2Run the entity resolution sample exercise — 500 Brex accounts.
Request access to 500 randomly selected Brex company records and match them against Capital One Spark and credit records using EIN as the primary key.

Rationale: Validates A-12 (match rate). This is a data exercise, not a design exercise. It requires no architecture decisions and no new infrastructure. The result either confirms MVP-A's entity resolution plan (≥70% match) or changes the scope and timeline before engineering begins.
Week 2Meet with Brex integration engineering — get a written batch cadence SLA.
Not a verbal estimate. A written commitment to initial batch frequency, data schema documentation, and the roadmap for real-time API availability.

Rationale: Validates A-13 (Brex batch cadence). The Freshness SLA Compliance target, staleness thresholds, S2 segment value story, and Phase 4 roadmap all depend on this number. A verbal 'probably weekly' will shift under engineering pressure.
Month 1Establish all pre-launch baselines — before a single rep sees CardEx Core.
Specific tasks:
• Retrospective eval scoring on 100 recent FSO AI recommendations (confirms A-16, A-20: Retrieval Precision and Accuracy Rate baselines)
• Rep behavior audit in existing sales tools (confirms A-19: Trust Rate baseline)
• CRM conversion rate pull segmented by AI-assisted vs. non-AI-assisted pitches (confirms A-17: conversion rate baseline)
• Rep time-in-tool measurement for pitch preparation

Rationale: Without baselines, every metric at Month 12 is a claim without a denominator. With baselines, every metric at Month 12 is evidence.
Month 1Author the eval rubric — circulate for MRM and Data Science sign-off before any model work begins.
Deliver: the four-dimension eval framework (Retrieval Precision, Recommendation Accuracy, Business Appropriateness, Business Outcome Correlation) with scoring definitions, measurement methods, and HITL reviewer training protocol.

Rationale: The champion-challenger evaluation criteria, model retraining gate criteria, and prompt version governance all depend on an agreed rubric. A rubric established after the model is evaluated is a rationalization. Data Science cannot build the eval pipeline without knowing what it is scoring. This document is the PM's first real deliverable — and it is entirely PM-owned.
Month 1, Week 3Activate recommendation event logging on all existing FSO AI applications.
Deploy structured logging schema to existing applications — passive signals only (used/modified/ignored). No rep-facing change required.

Rationale: Every day without logging is a day of eval data lost. By the time MVP-B launches (Month 9), 8 months of passive signal data will exist if logging starts in Month 1. That is 8× more eval history than if logging starts at MVP-B. This action has no rep-facing risk and no architectural dependency — it should be the first thing that ships.
Month 2Pilot the rep-facing freshness indicator with a cohort of 10–15 early-adopter reps before full rollout.
Identify 10–15 reps from S1 or S2 who are known to be tool-positive (not the Burned Skeptic cohort). Ship the freshness indicator to them as a pilot. Run structured debrief after 2 weeks.

Rationale: The freshness indicator is the first rep-visible change and the first trust restoration signal. Piloting before full rollout answers two questions that cannot be answered in design: (1) Do reps understand what 'aging: 7 days' means, or is it confusing? (2) Does seeing the freshness signal change how reps use the recommendation? The pilot either confirms the design or surfaces a UX problem before it reaches all reps.
Every field sales AI tool at Capital One runs on the same customer context.
Not because it is required. Because the recommendations are better.
Why this sentence:It names the platform's specific promise (same customer context), not the technology. It is specific to field sales, not generic enterprise AI. “Not because it is required. Because it is better.” is the adoption thesis — reps and application teams should choose CardEx Core because it works, not because a mandate forced them. A platform adopted under mandate is abandoned when leadership attention moves on. A platform adopted because it produces better recommendations is defended by the people who use it. The sentence stands without the document behind it.

This is a portfolio case study constructed from publicly available information. It was built to demonstrate how I approach a GenAI PM role requiring platform thinking, ML measurement fluency, and domain depth in financial services AI — not to claim insider knowledge of Capital One's internal architecture.

What is executed: The research, analytical frameworks, domain architecture reasoning, eval framework design, assumption register, and strategic recommendations in this document.

What is directional and not validated: All assumptions labeled A-01 through A-20. The current-state architecture is inferred, not confirmed. The baseline metrics are estimates grounded in public analogs, not internal measurements. The entity resolution match rate (A-12) and Brex batch cadence (A-13) are the two assumptions most likely to change the build on contact with reality.

What I would do differently with internal access: The First 8 Actions in Section 46 are precisely what I would do. The document is designed so that the five Critical assumptions can be validated or refuted in the first two weeks — before any engineering commitment is made. The platform concept is correct if A-09 is confirmed (structural root cause). It is the wrong concept if A-09 is wrong. That is a testable claim.