2026

CardEx Core

How Capital One BC&P Turns Four Post-Acquisition Data Silos Into One Context Layer for Field Sales AI

My Role

Product Strategy, Platform Architecture & GenAI Systems Design

Project Timeline

June 2026

Pilot Market

Capital One BC&P Field Sales Organization

Project Stack

This is a PM portfolio case study targeting the Capital One BC&P Manager, Product Management — GenAI Transformation role (Req ID R240507). It covers a full double-diamond process — market research through a 20-assumption register — built entirely from public information. The interactive simulator below is a functional prototype of the platform mechanics described in the case study.

The thesis:Capital One's field sales AI tools produce inconsistent recommendations not because the models are wrong — but because each application retrieves customer context independently. CardEx Core is horizontal context infrastructure that solves the data layer, not the model layer.

Live Simulator

CardEx Core Platform Simulator

Four customer scenarios · Platform layer animation · D1–D4 Eval framework · Persistent feedback loop

View full screen ↗

contextos.capitalone.internal / field-sales-ai / context-platform

4 Customer ProfilesPlatform Layer AnimD1–D4 Eval FrameworkPersistent Feedback

Open simulator ↗

Phase 0

Research Brief

Company snapshot · AI in production · Business urgency · RAG vs fine-tuning · HITL chain · SR 26-2 · A-01–A-07

01Company Snapshot

What Capital One Is Today

Capital One is a cloud-native financial institution with $669B in total assets (Dec 31, 2025) and 100M+ customers — the only major U.S. bank to have migrated entirely to public cloud, closing its last data center in 2021. Competitors (JPMorgan, BofA) are still paying to migrate; Capital One is building higher-order capabilities on top of a stack competitors haven't reached yet.

Total Assets

$669B

Dec 31, 2025

Brex Acquisition

$5.15B

Closed April 7, 2026

SMB Card Franchise

#3 in U.S.

CEO Fairbank, Q4 2025 earnings

Event	Date	Scale	Why it matters to this role
Discover Financial acquisition	2025	~$35B	Created proprietary payment network; Capital One now competes directly with Visa/Mastercard
Brex acquisition closed	April 7, 2026	$5.15B · 35K+ customers	Added Brex corporate cards + spend management + agentic AI workflows. Brex operating independently post-acquisition.
Top-3 corporate card issuer	April 2026	$100B+ combined card spend	Same field reps now sell across 4 product lines from 4 separate data ecosystems with no unified customer view

The FSO complexity spike: Before Brex, field reps sold Spark Business cards from one ecosystem. Post-Brex, those same reps navigate Spark cards + Brex corporate cards + Brex spend management + SMB banking — four data sources in four separate systems. A rep pitching a mid-size construction firm today needs to synthesize card history (Capital One CRM), expense management behavior (Brex platform), payment flow patterns (Discover network), and credit profile (Capital One underwriting) — currently living in four systems with no unified view.

02What Capital One Has Actually Shipped

Agent Assist Tool · Internal · Customer Service

84% → 93% search relevance

Used 10,000+ times by customer service agents. Built on proprietary Capital One data. Agents use it to search for relevant information in real-time during customer calls. Proves good retrieval drives trust — and bad retrieval destroys it.

Chat Concierge · External · Auto Dealers

55% improvement in leads

Multi-agent system: one agent communicates with customer, one creates action plan based on business rules, one assesses other agents' outputs, one validates and explains the plan. Latency reduced fivefold since launch. Prem Natarajan (EVP, Enterprise AI): “We want to start off at the low end of the risk spectrum, but also find use cases with impact and enough complexity that we can learn from it.”

The gap this role fills: Both Agent Assist and Chat Concierge were built as vertical, domain-specific systems with their own data retrieval layers. The field sales org is the next frontier — requiring synthesis across multiple product lines, not just one vertical. Milind Naphade (SVP, AI Foundations): “We'd like to bring this capability to more of our customer-facing engagements. But we want to do it in a well-managed way.”

The JD is explicitly scoped to building the horizontal foundation that enables multiple downstream AI applications to run on shared, trusted customer context.

03Enterprise AI Failure Rates — The Industry Evidence

MIT NANDA 2025

95%

of enterprise GenAI pilots delivered no measurable P&L impact. Core cause: tools that “do not retain feedback, adapt to context, or improve over time.”

Stanford AI Index 2025

41%

of LLM failures in enterprises caused by upstream data issues — not model problems. The model is often fine. The data infrastructure is broken.

Deloitte Dec 2025

11%

of enterprises with agentic AI use it in production. Only 20% of enterprise AI tools work cross-functionally (McKinsey). The pilot-to-production drop exists because trust fails before scale.

IBM (June 2026): “Many companies operate with fragmented and siloed data environments… Critical business information is often spread across disconnected systems and inconsistent data formats. AI systems struggle in these environments because poor-quality data weakens the performance and reliability of AI models. If enterprise data sources contain gaps or errors, AI agents can make flawed recommendations or run incorrect actions at scale.”

04Three Forces Making CardEx Core Urgent in 2026 Specifically

Force 1 · April 7, 2026

Brex tripled product complexity overnight

Without a shared context platform, every new AI application built for the expanded portfolio replicates the same fragmented data retrieval problem. This is not a future risk — it is happening now as Brex integration begins. Each new AI tool adds to the fragmentation rather than solving it.

Force 2 · April 17, 2026

SR 26-2 replaced SR 11-7

New model risk management guidance is live. GenAI/agentic systems are technically “out of scope” but regulators and internal audit are already applying MRM expectations by analogy. A “forthcoming RFI on AI/GenAI/agentic-AI model risk” is signaled. A shared context platform with built-in provenance satisfies governance requirements once for all downstream apps.

Force 3 · Ongoing

AI ROI measurement imperative

JD explicitly states: “deep experience working with Generative AI systems, with a particular emphasis on building measurement for GenAI in production.” Chat Concierge's 55% lead lift exists because they instrumented the measurement. FSO AI needs the same infrastructure. Without it, leadership is flying blind.

05Why RAG, Not Fine-Tuning — Core Architecture Decision

Customer data (spend behavior, headcount, payment timing, card utilization) changes weekly or monthly. Fine-tuning would require continuous retraining — operationally unsustainable. More critically, fine-tuned models cannot satisfy SR 26-2 source attribution requirements: if an AI recommendation led a rep to offer a higher credit limit to a business that then defaulted, the bank cannot trace which data drove the recommendation. RAG provides that trace natively.

Dimension	RAG ✓	Fine-Tuning ✗
Customer data freshness	Retrieves at inference time — always current	Baked into weights at training time — stale within weeks for dynamic customer data
Data update cadence	Update the source; model adjusts automatically	Requires retraining on every material data change
Source attribution (SR 26-2)	Every response traceable to retrieved documents	Black-box — weights don't reveal which training data drove a decision
Cost model	Upfront indexing + modest retrieval costs; scales with query volume	High upfront training compute; recurring retraining as customer data evolves
Operational fit	Index update = configuration operation	Model retraining = ML engineering sprint

51% of enterprise AI deployments in production use RAG (Menlo Ventures, 2024 State of Generative AI in the Enterprise). RAG is “the default starting point for most enterprise AI deployments particularly where data freshness, regulatory compliance, and speed to production are priorities” (SculptSoft, June 2026).

06HITL Feedback Signal Chain — Mechanical Walkthrough

Rep receives AI recommendation

Starting point — generated from whatever context was retrieved by the downstream AI application

Rep acts

[Uses it] [Modifies it] [Ignores it] [Flags it as wrong] — four distinct signal types with different confidence levels

System captures

recommendation_id + retrieved_context_ids + generated_output + rep_action — all four fields required for a complete eval event

Human evaluators (or automated eval) score

Retrieval quality: was the right context retrieved? Generation quality: coherent, accurate, safe? Business outcome (30-day lag): did the pitch succeed? Did the rep follow up?

Eval dataset grows

(context, output, score) triples accumulate — the labeled dataset that drives all downstream model improvement

Two improvement levers selected by failure mode

L1: Retrieval tuning (embedding weights, chunk size, metadata filters). L2: Generation improvement (prompt engineering, guardrail adjustment, model swap). PM selects which lever based on which failure mode the eval data reveals.

Why the PM owns this, not the data scientist: The PM defines what counts as a “good” recommendation. Is it good if the rep read it (engagement)? Used it (behavior)? Led to an upgrade (30-day outcome)? Customer stayed 12 months (retention)? Each definition produces a different training signal. The data scientist builds the pipeline; the PM decides what the pipeline measures.

07Assumption Register · Phase 0 (A-01–A-07)

ID	Assumption	Basis	Urgency	Validation · Week
A-01	FSO lacks unified customer view across Spark, Brex, Discover data sources	Brex operating independently post-acquisition; 12–24 month typical integration timeline	Critical	Architecture review with FSO engineering leads, Week 1
A-02	No shared context platform exists; each FSO AI app has its own retrieval layer	JD language — 'design and build a horizontal foundation' — implies platform doesn't exist	Critical	Product landscape review, Week 1
A-03	Data siloing is causing measurable adoption friction among field reps	Industry-level evidence (MIT NANDA, Stanford AI Index) applied directionally to Capital One FSO	High	Contextual inquiry with 6–8 FSRs across two regional offices, Month 1
A-04	MRM is applying SR 26-2 principles to field sales GenAI by analogy	Documented pattern at large regulated institutions (Databricks, April 2026)	High	MRM team introductory meeting, Week 2
A-05	Current tools pass CRM data to LLMs without standardized PII preprocessing	JD 'standardize' language implies current state is non-standard	Medium	Architecture review of existing app prompt construction, Week 1
A-06	RAG is correct architecture for CardEx Core MVP — not fine-tuning	Data freshness, SR 26-2 attribution, 51% enterprise RAG production adoption	High	Data refresh cadence audit per source system, Week 2
A-07	Rep behavior signals are primary HITL signal at MVP; outcome signals lagged 2–4 weeks via CRM	Standard enterprise AI feedback loop pattern; CRM instrumentation unknown	High	CRM instrumentation review, Week 2

Phase 2

Discovery

Problem statement · Current state · 5 Whys · 6 Affinity clusters · Stakeholder map · Personas · Empathy maps · Journey map · 4 Structural failure modes · A-08–A-11

08Problem Statement + Structural Reframe

How Capital One's field sales AI program is currently trying to fix the problem

How do we make each field sales AI tool more accurate?

The question CardEx Core is built to answer

How do we ensure that every field sales AI tool reasons from the same version of the customer?

BC&P's field sales AI tools produce inconsistent recommendations because each application retrieves customer context independently, with no shared platform ensuring that every tool reasons from the same version of the same customer at the same point in time. The problem is not that the models are bad. The problem is that there is no shared version of the customer beneath them.

Current State — Fragmented Retrieval (A-08 · Inferred · Not yet validated)

Field Sales Rep
    │
    ├── Lead Scoring AI ──── Spark CRM (Capital One schema)
    ├── Pitch Recommendation AI ──── Transaction History DW (daily batch)
    ├── Credit Suggestion AI ──── Credit Profile (monthly refresh)
    └── [Post-Brex] Spend Insights AI ──── Brex Platform (company-centric schema · batch feed)

Each application: owns its own retrieval pipeline · applies its own preprocessing (or none)
                  has its own freshness cadence · has no shared entity resolution
                  produces no structured output that feeds back to improve recommendations
                  logs nothing in a consistent schema

Brex Platform is NOT currently connected to any of the above AI applications · Brex customers appear in CRM with no Brex context

095 Whys — Root Cause Diagnosis

Starting observation: Field sales AI recommendations are inconsistent and distrusted by reps.

Why 1

Why are the recommendations inconsistent?

Because each AI application retrieves customer context from a different data source with a different freshness cadence and a different customer entity schema.

Why 2

Why does each application retrieve context independently?

Because Capital One's field sales AI was built vertically — one use case at a time, each team owning its full stack — with no horizontal abstraction layer for shared customer context.

Why 3

Why was it built vertically without a shared layer?

Because Capital One's GenAI deployment was in a proof-of-concept phase for its first production deployments (Chat Concierge for auto, Agent Assist for service). Moving fast required each team to own their full stack. No PM was assigned to own the horizontal layer — because the horizontal problem wasn't visible when each vertical was the only one.

Why 4

Why wasn't the horizontal problem visible earlier?

Because each vertical deployment was in a single domain (auto dealerships, customer service) with a single, well-understood data source. The cross-domain consistency problem only emerges when multiple products and multiple data sources must be reconciled for the same customer — which became the FSO's reality when Brex was acquired.

Why 5

Why did the Brex acquisition make the horizontal problem visible and urgent?

Because Brex operates on a different technology stack, a different customer entity model (company-centric, not person-centric), and a different data freshness pattern than Capital One's existing systems. Integrating Brex context into FSO AI requires entity resolution, schema normalization, and freshness reconciliation — exactly the functions a shared context platform provides. Without that platform, the Brex context either never reaches the FSO AI tools or arrives inconsistently, making the fragmentation problem acute rather than latent.

Root cause — structural, not organizational: An organizational explanation would be: teams didn't communicate well enough, or there was no governance process to enforce shared standards. The structural explanation is: the architecture has no abstraction layer for shared context. Even if every team communicated perfectly, they would still build independent retrieval pipelines because there is no shared API to call. The fix requires a platform, not a process.

10Affinity Diagram — 6 Clusters

Cluster 1

Context Fragmentation

The customer exists differently in every system

Spark card, Brex platform, Capital One credit, and CRM contact data use different entity schemas with no common key
Brex operating independently post-acquisition — data arrives via batch feed, not real-time API
41% of LLM enterprise failures trace to upstream data issues, not model problems (Stanford AI Index 2025)
CRM optimized for rep workflow — sparse free-text, milestone-updated, no behavioral enrichment
Unresolved entity duplicates + stale records in a raw multi-system prompt produce worse outputs than a well-preprocessed compact summary

Cluster 2

Trust Collapse and Adoption Failure

Reps who got burned once stop using the tools entirely

95% of enterprise GenAI pilots delivered no measurable P&L impact — tools that 'do not retain feedback, adapt to context, or improve over time' (MIT NANDA 2025)
Only 20% of enterprise AI tools work cross-functionally (McKinsey, Dec 2025)
'It doesn't retain knowledge of client preferences or learn from previous edits. It repeats the same mistakes.' (CIO, MIT NANDA)
Capital One's own evidence: Agent Assist 84% → 93% — good retrieval drives trust, bad retrieval destroys it

Cluster 3

Feedback Loop Absence

The AI tools do not learn. Same errors in perpetuity.

No consistent schema for capturing (retrieved context, generated recommendation, rep action, business outcome) — the four elements required for an eval dataset
Rep behavior signals (used/modified/ignored/flagged) available immediately; business outcome signals lagged 2–4 weeks — both required for complete eval loop
Eval invisibility: without shared logging, structurally impossible to determine whether a bad recommendation was caused by retrieval failure or generation failure

Cluster 4

Regulatory Pressure

The compliance clock started April 17, 2026

SR 11-7 replaced by SR 26-2 on April 17, 2026 — a 'forthcoming RFI on AI/GenAI/agentic-AI model risk' is signaled
GenAI systems technically 'out of scope' in SR 26-2 text but supervisors and internal audit 'already applying MRM expectations by analogy' (Databricks, April 2026)
SR 26-2 requires: inventory tiered by materiality, controls applied proportionately, lifecycle defensible end-to-end, evidence of governance generated automatically
PII exposure: raw CRM records (customer name, account number, EIN) flowing directly into LLM prompts create both regulatory and safety risk

Cluster 5

Post-Acquisition Complexity Spike

Brex turned a linear problem into an exponential one

Before Brex: FSO sells Spark card products from one data ecosystem. After Brex (April 7, 2026): Spark + Brex corporate cards + Brex spend management + SMB banking from three separate ecosystems
Brex entity model is company-centric (built for CFO teams) vs. Capital One's person-centric model — entity resolution is non-trivial
Ramp explicitly framing acquisition as creating 'uncertainty about product direction, pricing, underwriting, and integration' — competitive pressure from a unified-data competitor is immediate

Cluster 6

Measurement Void

Leadership cannot tell whether the AI investment is working

No north star metric for recommendation quality — each team defines 'good' differently: engagement, adoption, conversion, retention
Capital One can point to Chat Concierge's 55% lead lift (Fortune, Dec 2025) — but cannot point to an equivalent FSO AI metric
Without a shared logging layer, structurally impossible to determine whether a bad recommendation was caused by retrieval failure or generation failure — two distinct problems that get conflated and neither gets fixed
'More than half of generative AI budgets devoted to sales and marketing tools, yet MIT found the biggest ROI in back-office automation' — without measurement, AI budgets are allocated on intuition, not evidence

11Stakeholder Map — 3 Rings

Ring 1 · Internal Platform Users

Stakeholder	Primary Pain	What They Need from CardEx Core
Field Sales Reps (FSRs)	Contradictory recommendations damage credibility with customers	Recommendations they can trust enough to act on without independent verification
Sales Managers / Regional Directors	No visibility into recommendation quality across team; AI impact on pipeline invisible	Team-level dashboard: adoption rate, accuracy trend, outcome correlation
Field Sales AI Product Team	Builds retrieval infrastructure from scratch for every new AI application	Stable, documented Context API; new apps integrate in days, not months
Data Science / ML Engineers	Eval datasets built ad hoc; no systematic capture of production input-output pairs	Structured (context, output, score) triples automatically generated from every recommendation event
Data Governance / Privacy	Raw CRM records flowing into LLM prompts without sanitization	PII preprocessing layer upstream of all LLM calls; single point of compliance control

Ring 2 · Downstream Users (experience the output)

Stakeholder	Connection to CardEx Core	What Platform Failure Looks Like
Small Business Owners (Spark)	Subject of recommendations; receive pitches shaped by AI output	Rep calls with wrong product offer because AI reasoned from stale or incomplete context
Mid-Market Corporate Customers (Brex)	Newly in-scope post-acquisition; different financial profile than Spark customers	Rep has no understanding of Brex spend patterns; pitch defaults to generic card offer
Startup Founders (Brex)	High-velocity customers; context changes rapidly with funding rounds and headcount spikes	AI recommendation lags 6–8 weeks behind actual company state; rep pitches as if company is still in seed stage

Ring 3 · Platform Stakeholders (constrain or enable)

Stakeholder	Constraint	Enablement
Model Risk Management	SR 26-2 principles: source attribution, documentation, independent validation, ongoing monitoring	If CardEx Core satisfies MRM requirements, it satisfies them for all downstream apps simultaneously
Brex Engineering	Brex data arrives via batch/API, not direct DB access; entity model differs from Capital One's	Brex AI-native architecture (agentic workflows, expense automation) can enrich context if properly integrated
Capital One Cloud / Infrastructure	All data flows must comply with Capital One's cloud security architecture	Cloud-native infrastructure means context platform can be built on existing tech stack without new procurement
Enterprise AI (Natarajan's org)	Field sales AI must align with enterprise AI strategy (open-weight models, proprietary data)	Provides model infrastructure and AI governance patterns already in use for Chat Concierge and Agent Assist

12Personas — 4 Profiles

Internal · Primary User

Maya Chen

Senior Field Sales Representative

“The tool would be useful if I could trust it. But I can't trust it, so I use it as a starting point and then verify everything it says. That's not useful. That's just extra steps.”

Context: 6 years at Capital One; 40+ SMB accounts across Pacific Northwest; asked to sell Brex products since April 2026. Was a top performer before AI tools arrived — knew her customers through manual research. Now has 40% more accounts and is expected to use AI to compensate for lost research time.

What she does: Opens the AI tool, skims the recommendation, checks 2–3 things manually, then decides whether to use it. This takes longer than just doing the research manually. She is net negative on the AI tool's time savings.

Pain points: Recommendations reference products the customer already has (stale context) · Pitch suggestions don't reflect business changes · After Brex acquisition: no idea what context the AI has on inherited Brex customers

Needs from CardEx Core: Single customer summary reflecting most recent state across all data sources · Freshness indicator: “Context last updated 3 days ago” · Confidence signal: High confidence (multiple recent signals) vs. Low confidence (old data, sparse signals)

Internal · Adoption Lever

David Torres

Regional Sales Director

“The problem isn't the idea. The idea is right. The problem is the data. My reps know this business. They can tell when the AI is wrong. But they don't have time to verify everything, so they ignore it.”

Context: 12 reps across Texas and Oklahoma; responsible for $180M in annual portfolio value. Pushed team to adopt tools at launch. Two reps had embarrassing customer interactions based on wrong recommendations. Usage dropped sharply. Now tells team: “use the tools as a starting point, but always verify.”

What he does: Reviews AI adoption metrics monthly; doesn't look at recommendation accuracy because it isn't reported. Attributes good quarter performance to rep skill, bad quarter to market conditions — AI impact is invisible to him.

Needs from CardEx Core: Team-level view showing recommendation quality trend, adoption rate by rep, outcome correlation · Leadership-ready evidence the AI investment is working (he is regularly asked this question and cannot answer it)

Downstream · Primary Consumer

James Okafor

Small Business Owner · Atlanta

“They called me last month about a cash rewards card. I already have one. And I've been using Brex for my team expenses for two years. Do they not know that?”

Context: Owns a restaurant equipment supply company; 38 employees. Capital One Spark Cash Plus card since 2021. Grew from 12 to 38 employees 2022–2025; moved to partial Brex spend management in Q3 2024. Been pitched on product upgrades three times — all three referenced his 2022–2023 spend profile, not his current situation. Never offered a Brex product, which is the product he actually uses.

What he does: Takes the call. Politely declines. Continues evaluating Ramp. Mentions to a founder friend that “Capital One doesn't really know what your business needs.”

Platform success: Rep opens with “I see your team has grown significantly — are you finding the current card limits still fit your volume?” · Rep knows he uses Brex and asks whether the combined Capital One + Brex solution would simplify his operations · He doesn't have to explain his business situation from scratch

Edge Case · Adoption Floor

“Marcus”

The Burned Skeptic

“I tried it for three months. Two customer calls went wrong. I tell the new reps: don't rely on the AI.”

Why this persona matters: 8 years at Capital One; known as the “old school” rep in his office. Used the AI recommendation tool for 3 months at launch; had two customer interactions go wrong based on stale recommendations. Now actively discourages adoption — contagious behavior pattern that represents the adoption floor.

His trust cannot be rebuilt by improving the AI slightly. It requires a fundamentally different experience: a recommendation he did not expect that turned out to be correct, demonstrated via an outcome he cares about.

CardEx Core must: Produce ≥1 visibly correct recommendation in a high-stakes situation · Show freshness indicator distinguishing “data from last week” vs. “guessing based on 2023 data” · Not require him to change his workflow

13Empathy Maps

Maya Chen · Field Sales Rep · Internal User

THINKS

Is this recommendation based on current data or old data? I have no way to know. · My manager will ask why my adoption score is low. I don't want to explain the tool is unreliable. · The Brex customers I inherited after the acquisition — I have no idea what the AI knows about them.

FEELS

Frustrated that the tool adds steps instead of removing them. Mildly anxious before customer calls where she didn't have time to verify the AI recommendation. Proud of her reputation for knowing customers — threatened by a tool that might undermine that reputation.

SAYS

'I trust my own research more than the tool.' · 'The tool would be useful if it was accurate.' · 'I use it as a starting point.' (She says this; she actually skips it most of the time.)

DOES

Opens the tool, skims quickly, checks 2–3 facts manually before acting. On busy days, skips the tool entirely. Never flags a wrong recommendation formally — doesn't know how and doesn't think it helps. Doesn't log pitch outcomes consistently.

PAIN POINTS

Stale recommendations · No confidence signal · No way to know what the AI knows about Brex customers · Formal feedback mechanism is unclear or absent

GAINS

If CardEx Core works: walks into every meeting knowing the rep has the most current customer picture without spending 20 minutes on manual research. Recommendation accuracy high enough that she can act on first read on routine accounts.

James Okafor · Small Business Owner · Downstream Consumer

THINKS

Do they actually know my business, or is this a generic call? · They have all my transaction history. Why are they pitching me something I already have? · Ramp's rep knew things about my business without me telling them.

FEELS

Undervalued as a customer. Mildly annoyed at the wasted call. Not hostile — he likes Capital One as a bank — but increasingly open to switching spend management to a vendor who seems to understand him.

SAYS

Politely declines the offer. 'Thanks, I'll think about it.' Does not give negative feedback directly.

DOES

Takes the call, declines, logs no complaint. Continues evaluating Ramp. Mentions to a founder friend that 'Capital One doesn't really know what your business needs, they just sell cards.'

PAIN POINTS

Irrelevant pitches · Being treated as a generic SMB rather than a specific business at a specific stage · Having to explain his business situation from scratch every time

GAINS

If CardEx Core works: rep opens the call with a question that demonstrates understanding. James doesn't have to explain his growth. The offer is relevant. The conversation is short and productive. He recommends Capital One to other founders.

14Customer Journey Map · Maya Preparing for a Pitch

Scenario: Maya has a pitch meeting in 3 hours with a Brex customer she inherited post-acquisition. She has not met this customer before.

Stage	Rep Action	Current System State	DNF Risk
1. Lead surfaces	Receives notification in CRM	CRM shows meeting; AI tool not yet opened	—
2. Context gathering	Opens AI recommendation tool	Tool retrieves from Spark data (complete). Brex data: partial batch, 6 weeks old.	DNF-1: If rep trusts the recommendation without knowing the Brex lag, she walks in with an outdated picture. Customer notices.
3. Recommendation review	Reads AI recommendation	References Q1 spend volume; customer's Q2 volume is 40% higher	DNF-2: No freshness indicator means rep cannot assess confidence. Trusts blindly or verifies everything — both suboptimal.
4. Manual verification	20 min manually pulling transaction data	Finds Q2 data showing 40% volume increase	DNF-3: 20 min manual work × 5 meetings/week = 100 min/week of context work that should be automated.
5. Pitch execution	Calls customer with updated pitch based on manual research	AI tool is not tracking the call	DNF-4: AI tool has no record that its recommendation was wrong. Will make the same stale recommendation for the next rep who covers this customer.
6. Outcome logging	Should log pitch outcome in CRM	No structured field for “AI recommendation quality rating”	DNF-5: Without structured outcome logging tied to recommendation IDs, the eval dataset never grows. AI tool never improves.
7. Recommendation improvement	—	No feedback flows back to recommendation engine	DNF-6: As portfolios evolve, staleness gap widens over time. Problem gets worse, not better, without active feedback.

15Touchpoint Map · James Okafor (Downstream)

Touchpoint	What Happens	James's Experience	Platform Failure Signature
Incoming call from rep	Rep calls to offer a pitch	“Another card pitch” — low expectations based on prior calls	Stale context → wrong product offer
First 60 seconds	Rep opens with product offer	If offer references something he already has: friction. If offer references his actual situation: conversation.	Platform quality is experienced here — James has no visibility into the AI, only the outcome
Product discussion	Rep and James discuss the offer	If rep seems to know his business: trust builds. If generic: James disengages politely.	Context accuracy determines rep's ability to engage authentically
Decision point	James decides to engage further or decline	Declines without explanation if pitch is irrelevant	Churn signal not captured as a platform failure — attributed to “market conditions”
6 months later	James evaluates whether to move spend management to Ramp	Ramp rep called with a pitch that reflected his actual Q2 volume	Competitive loss partially attributable to context accuracy gap at prior Capital One touchpoint

The small business owner never knows CardEx Core exists. His experience of platform quality is entirely mediated through the rep. A platform failure registers as “Capital One doesn't understand my business” — not as a technology problem. Customer satisfaction surveys will not surface CardEx Core as the failure point. Churn will be attributed to pricing or product. The business case for CardEx Core must be built on leading indicators (rep adoption, recommendation accuracy), not lagging ones (customer satisfaction), because the lagging signal is too noisy and too slow.

164 Structural Failure Modes of Siloed Retrieval

Temporal Inconsistency

Different systems update at different cadences: Capital One CRM (rep-driven, updated at deal milestones — weekly at best), card transaction data (daily batch), credit profile (monthly review cycle), Brex platform data (real-time in Brex's system; weekly batch feed to Capital One post-acquisition). A recommendation generated from these four sources simultaneously reflects four different as-of dates for the same customer. The customer who 'exists' in the AI's context is a temporal composite — accurate on some dimensions and wrong on others, with no signal to the model about which dimensions are current.

CardEx Core design implication: Every context object must carry a freshness timestamp. The retrieval layer must surface the staleness distribution before passing context to the model. The model should be prompted with the as-of dates explicitly, not assume all context is current.

Schema Mismatch and Entity Resolution

Capital One's Spark card data is person-centric: the fundamental entity is a cardholder (individual or sole proprietor). Brex's data model is company-centric: the fundamental entity is a company, under which multiple employees and spend categories exist. When a field rep's AI tool tries to retrieve 'everything about Maria's flooring company,' it must match: Maria (person) in Capital One Spark → card number → CRM contact; Maria's company (entity) in Brex → company ID → spend categories; Maria (person) in Capital One credit → SSN/EIN. Without a shared resolution layer, each application team builds their own matching logic — and each builds it differently, producing different 'Maria' records for different AI tools.

CardEx Core design implication: A canonical customer entity — a golden record that resolves person-level and company-level identities across all source systems — must be the foundation. Entity resolution is a prerequisite to everything else. Without it, freshness improvements don't help because each application is still pulling data for a different entity.

Context Window Pollution

Without preprocessing, passing raw multi-system data into the context window means: redundant fields (customer's address appears 4 times, once per source system, consuming tokens without adding information), outdated records (a 2022 credit inquiry that is no longer relevant), unresolved entity duplicates ('Maria Lopez' and 'Maria L.' appear as separate records, confusing the model), and noise (free-text rep notes adding tokens without structured signal). A well-preprocessed context (structured customer summary: current products, recent spend trends, Brex category breakdown, credit utilization, rep interaction history from last 90 days) will produce more relevant recommendations. Sanjiv Yajnik, Capital One President of Financial Services: 'You can't just throw generic data into it, nor data that hasn't been properly cleaned.'

CardEx Core design implication: A preprocessing and summarization layer — not just a retrieval layer — is required. Raw records from source systems should never flow directly to the model. CardEx Core is responsible for: entity-resolved → normalized schema → PII-sanitized → freshness-tagged → relevance-scored → summarized to a structured context object.

Eval Invisibility

Without a shared logging layer, there is no consistent schema for capturing what happened after a recommendation was generated. It is structurally impossible to answer: Was a bad recommendation caused by retrieval failure (wrong context retrieved) or generation failure (right context, wrong synthesis)? Did a rep's modification of a recommendation improve the outcome, or would the original have been equally successful? Which customer segments produce systematically poor recommendations? These questions cannot be answered if input-output pairs are not logged in a consistent, queryable format. Each application team logging independently creates its own schema — no cross-application analysis is possible.

CardEx Core design implication: The shared logging schema is as important as the shared context schema. Every recommendation event must log: customer entity ID, context object IDs retrieved (with freshness), generated recommendation text, rep action (used/modified/ignored/flagged with reason), and business outcome when available.

17Assumption Register · Phase 2 (A-08–A-11)

ID	Assumption	Basis	Urgency
A-08	Current-state architecture (inferred) not validated against Capital One's actual production environment	Constructed from public information about Capital One's AI deployments and acquisition context; not validated against internal systems	Critical
A-09	Root cause is structural — missing platform abstraction layer — not organizational (siloed incentives or poor communication)	Inferred from architecture pattern. Organizational explanation not ruled out — may be both. If wrong, Concept F (governance-first, no platform) is the right solution.	Critical
A-10	Customer churn in BC&P's SMB segment is partially attributable to irrelevant pitch experiences caused by stale or fragmented context	Directional inference from James Okafor persona and standard churn analysis limitations; causal path plausible but not confirmed	Medium
A-11	No canonical customer entity currently resolves identity across Capital One Spark, Brex company, and Capital One credit schemas	Brex operating independently post-acquisition; entity resolution at this scale (35,000+ Brex companies) is significant engineering unlikely completed in 3 months since April 7, 2026	Critical

Phase 3

Definition

Wrong/right question · 4-layer architecture · RAFT pattern · Context API schema · Coherence map · MoSCoW P0/P1/P2/Won't Have · A-12–A-15

18Core Insight — The Recommendation Is Not the Product

The question the AI program is currently trying to answer

How do we make each field sales AI tool more accurate?

The question CardEx Core is built to answer

How do we ensure that every field sales AI tool reasons from the same version of the customer?

The reframe: The instinct in any AI program facing quality problems is to improve the model. That instinct is correct when the problem is model quality. It is incorrect — and expensive — when the problem is data infrastructure. The six affinity clusters from Phase 2 make clear that all failure modes (temporal inconsistency, schema mismatch, context window pollution, eval invisibility, regulatory exposure) are data infrastructure failures. No model improvement addresses any of them.

The recommendation is not the product. The context is the product. The recommendations are outputs of that product. When the platform improves (more Brex data integrated, faster freshness cadence), every downstream tool inherits the improvement automatically — without individual model work.

194-Layer Platform Architecture — Build Order Is Non-Arbitrary

The CardEx Core platform has four components that must be built in a specific order. Each layer is a structural prerequisite for the next. Building out of sequence produces a system that appears to work in demo conditions and fails in production.

Layer 1

Canonical Customer Entity — Entity Resolution

Resolves person-centric Capital One records with company-centric Brex records into a single canonical entity — a golden record — with a confidence score and a human-review queue for low-confidence matches.

Why first: Every other layer processes data about a customer. If identity is not resolved, freshness normalization runs on the wrong entity; the context API serves conflated records; feedback logging attributes outcomes to the wrong profile. A bug in Layer 1 degrades all layers above it. Requires: deterministic matching on SSN/EIN as primary key; probabilistic confidence score for fuzzy fallback; human-review queue for low-confidence matches on high-value accounts.

Layer 2

Freshness Normalization

Assigns a freshness timestamp and staleness classification to every data field in the canonical entity. Establishes which fields update real-time, daily, and monthly — and surfaces this metadata alongside the data itself.

Why second: Freshness normalization is meaningless if applied to unresolved entity records. A freshness timestamp on “Flooring Co Inc (Brex)” is only useful if the consuming layer knows that entity is the same as “Maria Lopez (Spark card 4111-xxxx).” Requires field-level metadata schema: source system, last-updated timestamp, update cadence classification (real-time / daily / weekly / monthly), and a staleness flag (current / aging / stale) computed against configurable thresholds per field type. Card transaction data stales in 48 hours. Credit profile stales in 30 days. CRM free-text notes stale in 7 days.

Layer 3

PII Preprocessing + Context API

Exposes the canonical entity + freshness metadata as a standardized, stateless API. Returns a structured context object — entity-resolved, freshness-tagged, PII-sanitized, relevance-summarized — optimized for LLM consumption. Includes retrieval_ids on every response for SR 26-2 audit trail.

Why third: A context API built before freshness normalization serves data without staleness signals — consuming applications have no way to know whether the context is current or aging. PII preprocessing at the API boundary solves SR 26-2 compliance once for all downstream applications simultaneously. API is stateless (not cached) to avoid reintroducing staleness. Critical design: Stateless retrieval on every call; PII tokenized before leaving the API layer; structured summary, not raw records; retrieval_ids field native in every response.

Layer 4

Feedback Logging

Captures every recommendation event as a structured log entry: which context objects were retrieved (from the API's retrieval_ids), what was generated, what the rep did (used/modified/ignored/flagged), and what the business outcome was (when available from CRM).

Why fourth: The logging layer captures retrieval_ids from the API response. Without the API's standardized response structure, there is no consistent schema for logging — each application logs in its own format, and cross-application analysis is impossible. Decoupled from the recommendation engine so model swaps don't disrupt the audit trail, and the data science team can consume eval datasets without touching the context platform.

20RAFT Architecture — RAG at the Platform Layer, Fine-Tuning at the App Layer

CardEx Core is shared infrastructure. It provides a retrieval substrate — not domain-specific model behavior. A shared context platform cannot be fine-tuned for specific use cases without specializing the platform and losing its horizontal value. If CardEx Core's retrieval layer is fine-tuned for lead scoring, it becomes the lead scoring platform — and the pitch recommendation application has to build its own retrieval again.

RAG at the platform layer solves this: CardEx Core retrieves context that is domain-agnostic. The lead scoring application takes that context and applies its own domain-specific reasoning. The pitch recommendation application does the same. Each application can be fine-tuned for its specific use case — using the same CardEx Core context as input.

The RAFT pattern: RAG provides freshness and attribution at the retrieval layer; fine-tuning provides behavioral consistency at the application layer. The platform holds the RAG layer centrally. Applications own their fine-tuning.

CardEx Core (RAG · shared · entity-resolved · PII-sanitized · freshness-tagged)
    │
    ├── Lead Scoring App (fine-tuned for scoring logic)
    ├── Pitch Recommendation App (fine-tuned for sales synthesis)
    └── Credit Suggestion App (fine-tuned for underwriting reasoning)

RAFT pattern: retrieval shared horizontally · behavioral fine-tuning scoped to each application domain

21Context API Response Schema

Context API Response ObjectEvery field is SR 26-2 attributable · PII sanitized · freshness-tagged

{
  "request_id": "CTX-REQ-20260627-142301",
  "entity_token": "ENT-44821-COF",          // entity token, not raw PII
  "entity_confidence": 0.94,
  "as_of_summary": {
    "freshest_signal": "2026-06-27T14:30:00Z",
    "stalest_signal":  "2026-06-01T00:00:00Z",
    "staleness_distribution": {
      "current": ["transaction_data", "crm_contact"],
      "aging":   ["brex_spend"],              // 7 days old — Brex weekly batch
      "stale":   []
    }
  },
  "context_summary": {
    "current_products":    ["Spark_Cash_Plus", "Brex_Corporate"],
    "do_not_recommend":    ["Spark_Cash_Plus"],  // PM-owned constraint — prevents re-pitching
    "spend_trend_90d":     "increasing_40pct",
    "brex_monthly_volume_q2": "$240K",
    "credit_utilization":  "84%",
    "headcount_signal":    "growing_plus8_cards_q2",
    "last_pitch_outcome":  "declined_march_2026_upgrade_pitch",
    "upgrade_indicators":  ["volume_increase", "headcount_growth", "high_utilization"],
    "suggested_context_for_pitch": "Customer has outgrown current credit limit; Brex
      spend growing; headcount expansion signals business growth phase.
      Brex Premium + credit limit increase to $400K–$500K is the indicated direction."
  },
  "retrieval_ids": [                          // SR 26-2 audit trail
    "TXN-batch-20260627",
    "CRD-20260601",
    "BRX-batch-20260621",
    "CRM-20260614",
    "ENT-resolve-20260601"
  ],
  "prompt_version": "ctx-prompt-v2.3",
  "pii_sanitized": true,
  "compliance_flags": {
    "sr_26_2_attributable": true,
    "pii_in_output":        false,
    "data_minimization_applied": true
  }
}

The do_not_recommendfield is prompt engineering at the schema level. The PM defines that the context object must always surface products the customer already holds as a negative constraint — not just positive signals. This is the structural fix for the “pitching a product they already have” failure mode documented in Phase 2.

22Coherence Map — 3 Business Lines

Business Line

Spark Business

SMB Cards — third-largest small business credit card franchise

Core Problem

Field reps pitch the wrong Spark product because AI tools reason from stale transaction data; customers with growing spend are offered starter products; customers who upgraded are re-pitched their existing product

Consumer Need

SMB owner needs a rep who understands their business stage and recommends a product that fits current, not historical, spend volume

Business Need

Capital One needs Spark card upgrade conversion and spend volume growth; a rep pitching the right product at the right time is the primary conversion driver

CardEx Core Component

Entity resolution (Spark card → canonical entity) + Context API (transaction data freshness ≤ 48 hours) + Freshness metadata (staleness flag for spend data)

Business Line

Brex Corporate

Mid-Market and Corporate — 35,000+ business customers post-acquisition

Core Problem

Brex customer data exists in Brex's company-centric schema; Capital One's FSO AI tools cannot currently ingest Brex context because entity resolution and schema normalization have not been completed since April 7, 2026

Consumer Need

Brex corporate customer needs a Capital One rep who knows their Brex spend behavior, headcount trajectory, and expense policy structure — not just their credit profile

Business Need

Capital One needs to cross-sell Brex customers into Capital One banking products and Spark cards; this cross-sell cannot happen without unified customer context

CardEx Core Component

Entity resolution (Brex company ID → canonical entity via EIN matching) + Freshness normalization (Brex batch cadence: weekly → staleness flag: 'aging' after 8 days) + Context API (Brex fields: monthly volume, headcount signal, expense categories)

Business Line

SMB Banking

Deposits + Integrated Financial Services — the combined Capital One + Brex proposition

Core Problem

SMB banking is the highest-value product in the portfolio (stickiness, deposit growth, cross-product engagement) — but the FSO AI has no integrated view showing when a Spark card customer is ready to be pitched a full banking relationship

Consumer Need

Business owner at a growth inflection point (headcount +10, spend volume 40% up) needs a proactive reach-out with a banking relationship offer, not a reactive card pitch

Business Need

Capital One needs deposits and full-relationship SMB customers; the Brex acquisition was explicitly described as 'expanding Capital One's small business bank nationally' — this is the strategic growth thesis

CardEx Core Component

Cross-signal synthesis: spend volume trend (Spark transactions) + headcount growth (Brex card issuance signals) + credit utilization → composite 'banking relationship readiness' score surfaced in context object

23MoSCoW Requirements

P0 — Must Have · Cannot ship without these

P0-1

Canonical Customer Entity

CardEx Core must resolve customer identity across Capital One Spark, Brex, and Capital One credit schemas into a single canonical entity with a confidence score and a human-review queue for low-confidence matches.

Traces to: Cluster 1 (Context Fragmentation) · A-11 (entity resolution gap) · Failure Mode 2 (Schema Mismatch) · JTBD-4 (new product coverage on acquisition)

P0-2

Field-Level Freshness Metadata

Every field in the canonical customer entity must carry: source system, last-updated timestamp, update cadence classification, and a staleness flag computed against configurable thresholds per field type. Staleness flags must be surfaced in the context API response and in the rep-facing UI.

Traces to: Cluster 2 (Trust Collapse) · JTBD-2 (confidence calibration) · DNF Risk 2 (no confidence signal) · Failure Mode 1 (Temporal Inconsistency)

P0-3

PII Preprocessing at API Boundary

All PII (customer full name, SSN/EIN, account number, individual transaction detail) must be redacted or tokenized before the context object is returned by the API. Downstream applications must receive entity tokens, not raw PII.

Traces to: Cluster 4 (Regulatory Pressure) · A-04 (MRM applying SR 26-2) · A-05 (current tools lack standardized preprocessing) · JTBD-6 (compliance-ready audit trail)

P0-4

Context API with Source Attribution

CardEx Core must expose a stateless API returning the structured context object (canonical entity + freshness metadata + PII-sanitized summary) with a retrieval_ids field that identifies every source record contributing to the response. API must support all current FSO AI applications as consuming clients.

Traces to: Cluster 4 (Regulatory Pressure) · JTBD-6 (audit trail) · Layer 3 Context API design · A-06 (RAG architecture decision)

P0-5

Recommendation Event Logging

Every recommendation generated by a downstream FSO AI application that consumed CardEx Core context must log a structured event: entity token, retrieval IDs, recommendation summary, and rep action (used/modified/ignored/flagged). Logging must occur automatically without rep manual action.

Traces to: Cluster 3 (Feedback Loop Absence) · Cluster 6 (Measurement Void) · JTBD-3 (feedback without friction) · DNF Risks 4 and 5

P0-6

Rep-Facing Freshness Indicator

The freshness metadata must surface in the rep-facing UI — within existing CRM or sales tool — as a human-readable signal: 'Context last updated: [X days ago]' with a visual staleness indicator (green / yellow / red) per data source. Rep must be able to see this before acting on a recommendation.

Traces to: Cluster 2 (Trust Collapse) · DNF Risk 2 · JTBD-2 (confidence calibration) · Maya persona ('is this based on current data?')

P1 — Should Have · High value, blocked by specific dependency

P1-1

One-Tap Rep Feedback Mechanism

Reps must be able to flag a recommendation as 'wrong product,' 'stale data,' 'customer already has this,' or 'used without issue' in a single tap within the existing sales workflow. Flags must append to the recommendation event log automatically.

Traces to: JTBD-3 · Cluster 3 · Marcus persona (burned skeptic requires explicit flagging)

Blocked by: P0-5 (Recommendation Event Logging) — flag must append to an existing log entry

P1-2

Automated Eval Pipeline

CardEx Core must automatically score recommendation quality using: passive signals (modified vs. used without change), explicit flags (wrong product / stale data), and lagged outcome correlation (pitch converted / declined, logged in CRM 2–4 weeks post-pitch). Scores must produce labeled (context, recommendation, score) triples for model team consumption.

Traces to: Cluster 3 · Cluster 6 · JD requirement ('building measurement for GenAI in production')

Blocked by: P1-1 (explicit flags) + sufficient log volume (minimum 500 labeled events for meaningful signal)

P1-3

Manager Performance Dashboard

A team-level view showing: recommendation adoption rate by rep, staleness flag distribution, accuracy trend over time (flagged as wrong / used without modification), and lagged outcome correlation (AI-assisted vs. non-AI-assisted pitch conversion rates).

Traces to: David Torres persona · JTBD-5 (team performance visibility) · Cluster 6 (Measurement Void)

Blocked by: Sufficient log volume and eval pipeline output — dashboard is meaningless without the data it visualizes

P1-4

Brex Real-Time API Integration

Replace the weekly Brex batch feed with a real-time API connection to Brex's platform, reducing Brex data staleness from 8 days to ≤ 4 hours.

Traces to: A-13 (Brex batch cadence) · Cluster 5 (Post-Acquisition Complexity) · JTBD-4

Blocked by: Brex integration engineering capacity and API design on Brex's side — dependency not fully under Capital One's control

P2 — Could Have · Phase 2 roadmap

P2-1

Composite Signal Scores

Pre-computed scores surfaced in context object: 'Banking Relationship Readiness' (composite of spend trend + headcount growth + credit utilization), 'Upgrade Propensity' (card upgrade likelihood based on spend trajectory), 'Churn Risk' (declining engagement signals). Why P2: requires 12+ months of historical data to compute reliably; premature before the data foundation is stable.

P2-2

Recommendation Confidence Score

A model-calibrated confidence score (High / Medium / Low) per recommendation, surfaced to the rep alongside the freshness indicator. Why P2: requires the eval pipeline (P1-2) to be live and producing reliable accuracy scores before confidence calibration is meaningful. A confidence score based on insufficient data is misleading — worse than no score.

P2-3

Self-Service Data Source Expansion

A configuration interface allowing the Field Sales AI Product Team to add new data sources to the canonical entity schema without requiring CardEx Core engineering involvement for each addition. Why P2: build the first three sources (Spark, Brex, Credit) correctly before making onboarding self-serve. Premature abstraction.

Won't Have — Explicit Scope Boundaries

Won't Have	Why This Boundary Matters
CardEx Core will not generate recommendations	CardEx Core provides context. Downstream AI applications generate recommendations. If CardEx Core generates recommendations, it becomes domain-specific and loses its value as shared infrastructure.
CardEx Core will not write to the CRM	Write access to CRM creates cascading data integrity risk. A bug in CardEx Core could corrupt the rep's contact history for every account. Read-only access contains the failure mode.
CardEx Core will not train models	CardEx Core produces labeled eval datasets. Data science team consumes those datasets and uses them to retrain or prompt-tune downstream models. Conflating the context platform with model training creates an org ownership problem.
CardEx Core will not be customer-facing	Customer-facing interface requires consumer-grade UX, compliance review, and a fundamentally different threat model. CardEx Core is B2B internal tool only.
CardEx Core will not replace existing AI applications	CardEx Core is additive infrastructure. The lead scoring model, pitch recommendation model, and credit suggestion model continue to exist. CardEx Core improves the data they reason from — it does not replace them.

24Assumption Register · Phase 3 (A-12–A-15)

ID	Assumption	Basis	Urgency
A-12	Deterministic entity resolution match rate of ~70–85% is achievable using EIN as the primary key for Brex company-to-Capital One business account matching	Standard financial services B2B entity resolution benchmarks; EIN is the most reliable anchor; sole proprietors using personal SSNs complicate the match	High
A-13	Brex data will be available to CardEx Core via weekly batch feed for first 12 months post-acquisition; real-time API access estimated 12–18 months out	Brex operating independently; real-time integration at this scale requires a purpose-built integration layer that does not exist at acquisition close	Critical
A-14	Context API P50 latency of ≤ 200ms is achievable without caching, using Capital One's cloud-native infrastructure	Capital One completed full cloud migration in 2021; 200ms is standard for enterprise internal APIs	Medium
A-15	Downstream FSO AI applications can adopt the Context API without requiring a full application rebuild	Standard enterprise architecture assumption; tightly-coupled monoliths where retrieval is embedded in model code would require partial rebuilds	High

Phase 4

Ideation

User segments S1–S5 · SCAMPER 7 lenses · Concepts A–G evaluations · Pugh matrix (10×7) · Effort × Impact · Prune the Tree · Selected direction + 3 trade-offs

25User Segments — S1 through S5

Segments are defined before concepts because segments constrain which concepts make sense. Any concept that does not serve S1 and S2 simultaneously is not viable at MVP — those two represent the majority of FSO headcount and the most acute post-Brex pain.

Segment	Description	Current Friction	What “Platform Working” Looks Like
S1 · High-Volume SMB Reps	40+ Spark accounts; 5–8 pitch meetings/week; relies on AI for research throughput	Too many accounts to research manually; AI recommendations are stale or wrong; trust has collapsed	Recommendations right 80%+ of the time on first read; manual verification is the exception, not the rule
S2 · Cross-Sell Reps (Brex-Inherited)	Inherited Brex accounts post-acquisition; selling Capital One products into a customer base they've never met	Brex context entirely absent from their tools; pitching Capital One products blind into Brex customers	Brex spend patterns, headcount signals, and expense behavior surfaced alongside Spark card data in a single context view
S3 · Strategic Account Managers	10–15 high-value accounts; knows customers deeply through personal relationships	Don't need AI for basic research; need AI to surface signals they would miss at scale (spend spikes, utilization changes)	Proactive alerts: “Account X hit 92% credit utilization — timely moment for a limit conversation”
S4 · Sales Managers	Team oversight; 10–15 direct reps; responsible for regional portfolio targets	No visibility into recommendation quality across team; AI impact on pipeline invisible	Team-level dashboard showing adoption, accuracy trend, and outcome correlation
S5 · Field Sales AI Product Team	Builds and maintains downstream AI applications that serve S1–S3	Builds retrieval infrastructure from scratch per application; no shared API to call	Stable, documented Context API with a semantic versioning commitment; new applications integrate in days, not months

26SCAMPER — 7 Lenses, 7 Concepts

SSubstitute

What if we substituted the centralized context platform with a decentralized schema standard?

Instead of a shared context API, define a shared context schema — a data contract that specifies what fields every FSO AI application must retrieve, in what format, with what freshness metadata. Each application team implements their own retrieval but all conform to the schema. This separates the governance problem from the infrastructure problem. May be solvable faster at 60–70% of the benefit at 20% of the cost.

→ Concept B (Schema-First Federated)

CCombine

What if we combined the context retrieval layer with the recommendation engine itself?

Instead of a context platform feeding separate downstream apps, build one unified FSO AI assistant that handles retrieval, synthesis, and recommendation in a single model. Eliminates the integration problem entirely and creates a cleaner rep experience (one tool, not three). The risk: monolithic architecture is harder to improve in targeted ways — you can't fix retrieval without touching the recommendation.

→ Concept C (Unified FSO AI Agent)

AAdapt

What if we adapted the existing Agent Assist architecture rather than building new?

Capital One's Agent Assist is already in production (10,000+ uses, 84% → 93% search relevance improvement), has SR 26-2-compliant logging infrastructure, and organizational familiarity. Time-to-value is the most underrated variable in an enterprise platform build. A working system that ships in 90 days and is 80% as good as the ideal is better than an ideal system that ships in 12 months.

→ Concept D (Agent Assist Extension)

MModify / Magnify

What if we magnified the freshness problem and made real-time context the core design constraint?

An event streaming architecture (Apache Kafka) where every customer event — card transaction, Brex card swipe, CRM update — triggers an instant context refresh. Sub-minute staleness eliminates the freshness problem structurally. This concept came as a surprise — the instinct was to build an API platform, not a streaming platform — which is exactly the signal that it deserves serious evaluation.

→ Concept E (Real-Time Streaming Platform)

PPut to Other Uses

What if the context platform was surfaced to customers, not just to reps?

Small business owners could see a 'Capital One Business Profile' — what Capital One knows about their business, how current it is, what products they are eligible for. A two-sided product: reps use it to prepare pitches; customers use it to see how they are understood. Discarded for MVP scope — different compliance regime, different UX requirements, ~2-year build timeline.

→ Discarded for MVP. Noted for Phase 3+ roadmap.

EEliminate

What if we eliminated the central context platform entirely and focused on governance of existing retrieval?

No new platform. Audit each FSO AI application's existing retrieval layer, establish a data quality standard (freshness thresholds, PII requirements, entity matching rules), and enforce compliance through quarterly governance reviews. Attractive to anyone skeptical that a platform can be built fast enough. The test is A-09 — if the root cause is organizational, this concept is correct. If structural, this concept fails.

→ Concept F (Governance-First, No Platform)

RReverse

What if we reversed the build sequence — starting with the feedback loop rather than the context layer?

Build the feedback logging and eval pipeline first — establishing what a 'good recommendation' looks like and which context signals are actually predictive of pitch success — then build the context platform around the signals that matter, rather than the signals assumed to matter. Directly addresses assumption hygiene. Scores well on feedback loop quality but poorly on rep-facing trust, which is the immediate adoption problem.

→ Concept G (Eval-First, Context-Second)

27Concept Evaluation — A through G

CardEx Core — Horizontal Shared Platform

The full proposal. Central entity resolution + freshness normalization + context API + feedback logging.

Strengths

Maximum data consistency across all FSO AI applications — single source of truth
SR 26-2 compliance built once for all consuming applications simultaneously
Scales to new data sources (Discover network, future acquisitions) by adding a source to the platform, not rebuilding each application
Highest feedback loop quality — structured logging at the platform level produces consistent, queryable eval datasets

Weaknesses

Slowest to ship — 4-layer architecture, entity resolution for 35K+ Brex accounts — realistically 5–7 months to MVP
Highest organizational footprint — every FSO AI application team must migrate from their own retrieval to the shared API
Most complex build — a bug in Layer 1 (entity resolution) degrades all layers above it

Schema-First Federated

Define a shared context schema and data contract. PM owns the standard, not the infrastructure.

Strengths

Faster to first impact — schema definition and governance tooling can be live in 6–8 weeks
Lower adoption friction — teams keep control of their retrieval; conformance is incremental
Lower organizational risk — no new infrastructure dependency

Weaknesses

Consistency depends on implementation quality per team — distributed enforcement is unreliable under shipping pressure
Brex entity resolution is still solved independently by each team — the hardest problem is not addressed
No shared logging — feedback loop quality is fragmented; cross-application eval is impossible
Schema drift is the historical failure mode of federated standards in large organizations

Unified FSO AI Agent

One AI system handles retrieval, synthesis, and recommendation. No separate context platform.

Strengths

Highest adoption potential — reps have one tool, one interface
Eliminates the API adoption problem — no downstream applications to migrate
Cleaner feedback loop — one system captures all inputs and outputs in a consistent schema

Weaknesses

Monolithic architecture is hardest to improve in targeted ways — retrieval and generation tightly coupled
Abandons the 'shared infrastructure' value proposition — FSO AI product team loses ability to build distinct domain-specialized applications
Source attribution becomes harder — unified model that retrieves and generates in one pass makes it difficult to isolate which context drove which recommendation

Agent Assist Extension

Extend Capital One's existing Agent Assist architecture for FSO use.

Strengths

Fastest to first value — proven infrastructure; realistically 60–90 days to FSO pilot
SR 26-2 compliance framework established — Agent Assist was built under Capital One's MRM governance
Organizational credibility — Agent Assist is already trusted internally; 'FSO edition' inherits that trust

Weaknesses

Agent Assist designed for reactive service lookups, not proactive sales synthesis — fundamentally different retrieval patterns
Brex entity resolution is not in Agent Assist's design — adding Brex's company-centric schema requires significant architecture extension that approaches building new
Scalability ceiling — optimized for one-to-one retrieval; FSO needs one-to-many synthesis (one rep, multiple products, one customer summary)
Technical debt imported at launch — carrying design decisions that don't fit the new use case

Real-Time Streaming Platform

Apache Kafka event streaming. Every customer event triggers instant context refresh. Sub-minute staleness.

Strengths

Highest data freshness — eliminates the staleness problem structurally
Best long-term architecture — event sourcing provides complete audit trail and enables time-travel queries
Scales horizontally — Kafka's architecture scales with event volume without redesign

Weaknesses

Longest time to production — streaming infrastructure + entity resolution + API layer; realistically 9–12 months to MVP
Requires Brex's cooperation for streaming — Brex would need to expose a real-time event stream; weekly batch is currently available
Engineering complexity is the highest of all concepts — requires ML engineers, data engineers, platform engineers, and Kafka specialists working in parallel

Governance-First, No Platform

No new infrastructure. Audit existing retrieval layers. Establish data quality standards. Enforce through quarterly reviews.

Strengths

Zero infrastructure risk — no new system to build, operate, or debug
Fast to establish standards — a governance framework can be defined in 30 days
Lower capital expenditure — PM cost only; no infrastructure spend

Weaknesses

Does not address the structural root cause — governance of fragmented retrieval improves quality marginally but does not produce consistency
Brex entity resolution cannot be governed into existence — each team would need to solve it independently
Historical failure mode of standards-only approaches: team compliance is high in Q1, degrades under shipping pressure — standards without enforcement infrastructure are aspirational

Eval-First, Context-Second

Build the feedback loop and eval pipeline first. Let data reveal which context signals are actually predictive before building the platform.

Strengths

Highest feedback loop quality of any concept — the eval framework is the entire Phase 1 focus
Reduces assumption risk — instead of building entity resolution for all signals, let data confirm which signals are worth the effort
Faster to Phase 1 value — logging infrastructure is simpler than a 4-layer platform

Weaknesses

Does not address the rep trust problem in Phase 1 — reps are abandoning AI tools; an eval pipeline improves model quality over time but provides no immediate improvement to the rep's experience
The Burned Skeptic (Marcus) cannot be recovered with an eval framework — he needs to see a recommendation he trusts, not a dashboard
Brex context gap not addressed in Phase 1 — cross-sell reps (S2) get no value until the context platform is built in Phase 2

28Pugh Matrix — 10 Criteria × 7 Concepts

Ten criteria, scored 1–5 (5 = best). Concept A (CardEx Core) is the reference concept and does not win on every criterion. Five criteria are double-weighted (2×) reflecting the post-acquisition context: data consistency, SR 26-2 compliance, scalability, feedback loop quality, and Brex integration readiness.

Criterion	Wt	A	B	C	D	E	F	G
C1: Time to first value	1×	2	4	3	5	1	4	3
C2: Data consistency across FSO AI	2×	5	3	4	3	5	2	2
C3: Freshness SLA at MVP	1×	4	3	3	3	5	2	2
C4: SR 26-2 compliance coverage	2×	5	3	3	4	4	2	3
C5: Organizational adoption friction	1×	2	4	5	4	2	5	4
C6: Scalability to new data sources	2×	5	3	2	2	5	2	3
C7: Feedback loop quality	2×	5	2	3	3	4	2	5
C8: Rep-facing trust signal	1×	5	3	4	3	5	1	1
C9: Engineering complexity (inverse)	1×	2	3	3	4	1	5	4
C10: Brex integration readiness	2×	4	3	3	2	3	2	2
Raw total	—	39	31	33	33	35	27	29
Weighted total	—	62	46	51	51	56	38	42

Where Concept A loses: C1 (Time to first value) — Concept D wins decisively (5 vs. 2). Agent Assist Extension ships in 60–90 days; CardEx Core ships in 5–7 months. C5 (Organizational adoption friction) — Concepts C and F score higher. C9 (Engineering complexity) — Concept F wins (5 vs. 2).

Why Concept E (56) scores second: Most technically correct long-term architecture. Loses on C1 (time to value, 1×) and C9 (engineering complexity, 1×). The Brex complexity spike is happening now — a 9–12 month streaming build means FSO operates with fragmented context for the entirety of the Brex integration's most critical window.

29Effort × Impact · Build Sequence

HIGH IMPACT
    │
    │   [P0-6: Rep freshness indicator]    [P0-1: Canonical entity resolution]
    │   [P0-5: Recommendation logging]     [P0-4: Context API]
    │   [P0-3: PII preprocessing]
    │                                      [P1-2: Automated eval pipeline]
    │                                      [P1-3: Manager dashboard]
    │   [P1-1: One-tap feedback]
    │
    │───────────────────────────────────────────────────────────────
    │                                      [P1-4: Brex real-time API]
    │                                      [P2-1: Composite scores]
    │   [P2-3: Self-service expansion]
    │   [P2-2: Confidence score]
    │
LOW IMPACT
         LOW EFFORT                        HIGH EFFORT

Build sequence: PII preprocessing → Event logging → Freshness indicator (quick wins) → Entity resolution + Context API (core platform) → Feedback mechanism → Eval pipeline → Manager dashboard

30Prune the Product Tree

🌳Trunk — Foundational

Remove any of these and the platform cannot function.

Canonical customer entity with entity resolution (Spark + Brex + Credit schemas → single golden record)
Context API with source attribution and stateless retrieval
PII preprocessing and tokenization at the API boundary
Freshness metadata embedded in every context response

🌿Primary Branches — Core Value Delivery

What reps and managers will actually experience. Without these, the platform is technically correct but invisible.

Rep-facing freshness indicator (green / yellow / red staleness by data source)
Recommendation event logging (structured, queryable, consistent schema)
Brex batch integration (weekly feed at MVP — minimum viable Brex coverage)
Downstream app API adoption support (documentation, migration guides, integration support)

🌱Secondary Branches — Improvement Engine

Require healthy primary branches to produce value. These are the platform's flywheel.

One-tap rep feedback mechanism (wrong product / stale data / used without issue)
Automated eval pipeline (passive signals + explicit flags + lagged outcomes → labeled triples)
Manager performance dashboard (adoption rate, accuracy trend, outcome correlation)
Confidence score in rep-facing UI (calibrated against eval pipeline output)

✂️Pruned — Cut from CardEx Core Scope

Explicit scope exclusions with rationale.

Pruned Item	Reason
Real-time Kafka streaming architecture	9–12 month build; Brex API dependency; weekly batch achieves sufficient freshness for MVP at lower risk
CardEx Core-generated recommendations	Scope violation — CardEx Core is a context platform, not a recommendation engine
Customer-facing context transparency	Different compliance regime (consumer-facing AI); materially different product; Phase 3+ vision
Composite signal scores (Upgrade Propensity, Churn Risk)	Requires 12+ months of historical data to calibrate reliably; premature before data foundation is stable
Self-service data source expansion	Build the first three sources correctly before making onboarding self-serve; premature abstraction
CRM write access	Write access introduces data integrity risk; CardEx Core reads from CRM and logs to its own store

31Selected Direction + Trade-Off Statement

Selected concept: Concept A — CardEx Core Horizontal Shared Platform (Weighted score: 62 · Second place: Concept E at 56 · Third place tie: Concept C and D at 51)

Why Concept A over Concept D (the strongest alternative on speed)

Concept D (Agent Assist Extension) wins on time-to-value (5 vs. 2) and engineering complexity (4 vs. 2). The reason Concept A is selected despite this loss is that two of the highest-weighted criteria — C6: Scalability to new data sources (2×) and C10: Brex integration readiness (2×) — represent the post-acquisition context that makes this problem urgent in 2026 specifically.

Agent Assist was built to answer a question (“what's the answer to this service inquiry?”). CardEx Core is built to build a picture (“who is this customer across all their Capital One relationships?”). These are fundamentally different retrieval patterns. Adapting the former for the latter is not adaptation — it is replacement with legacy architectural debt attached. Within 12–18 months, the Brex extension would require a partial rebuild approaching the scope of building correctly the first time.

Why Concept A over Concept E (the most technically correct alternative)

Concept E (Real-Time Streaming) is the right long-term architecture. This is acknowledged directly. If Capital One's field sales AI program had 12 months to build before the Brex complexity problem needed solving, Concept E would be the correct choice.

The Brex acquisition closed April 7, 2026. Cross-sell reps are inheriting accounts they have no context for today. A 9–12 month streaming build means those reps operate blind for the entirety of the Brex integration's most critical window. The adoption damage from this period — reps who lose trust in AI tools during Brex onboarding will be hard to recover even after the platform ships — is a real cost the Pugh Matrix weights do not fully capture.

Design hedge: CardEx Core is architected for future streaming adoption. The Context API is stateless; the retrieval layer is abstracted; entity resolution does not assume batch inputs. When the Brex real-time API is available (Phase 2 roadmap), upgrading from weekly batch to real-time streaming is a source configuration change, not a platform rebuild. Concept E's architecture is embedded in Concept A's design as a Phase 2 path.

What Concept A gives up — stated explicitly

1. Speed to first rep-visible improvement. Concept D could produce a working FSO AI context improvement in 90 days. Concept A's MVP is 5–7 months. During that window, the trust collapse continues and the Burned Skeptic population grows. Mitigation: ship the rep-facing freshness indicator (P0-6) as the earliest possible visible change — which can be done before the full platform is live, using existing retrieval layers with freshness metadata added as a preprocessing step.

2. Organizational independence for application teams. Concept A requires every FSO AI application team to migrate from their own retrieval to a shared API — a real change management cost. Mitigation: API adoption program with migration guides, dedicated integration support, and a compatibility layer that allows teams to call CardEx Core alongside their existing retrieval during a parallel-run period.

3. Simplicity of the failure mode. Concept G (Eval-First) has a simple, isolated failure mode: the eval pipeline doesn't produce good data yet. Concept A's failure mode is more complex: a bug in entity resolution degrades all four layers simultaneously. Mitigation: aggressive testing and a staged rollout — entity resolution for Spark-only customers first (lower complexity, no Brex schema), then expanding to Brex customers once the resolution layer is stable.

Phase 5

Delivery

Two MVPs · System architecture · 3 schemas · Eval rubrics D1–D4 · System prompt · PM prompt decisions · HITL loop · 3 levers · 3 drift types · 6-tier metrics · RAID · A-17–A-20

32Two MVP Proposals

The JD defines two co-equal deliverables: the context platform and the feedback loop strategy. These are specified as two sequential MVPs. They are not simultaneous — MVP-B depends on MVP-A being stable — but both are PM-owned with equal specificity.

MVP-A · The Context Platform

Every FSO AI application reasons from the same version of the customer.

Scope: Canonical customer entity (Spark + Brex + Credit schema resolution), freshness normalization, PII preprocessing, Context API with source attribution, rep-facing freshness indicator.

Wedge rationale

Entity resolution is the structural prerequisite. A recommendation logging system that logs against an unresolved entity produces an eval dataset that cannot be trusted.
The rep-facing freshness indicator is the earliest visible improvement — the minimum intervention required to begin rebuilding trust. It can ship before the full Context API is live.
PII preprocessing at the API boundary solves SR 26-2 compliance once for all downstream applications simultaneously. Every day it doesn't exist is accumulated regulatory exposure.

Target: Month 5–6 · Gate: Entity resolution ≥70% match rate for Spark-Brex pairs · API P50 ≤200ms · Zero raw PII in sample audit

MVP-B · The Feedback Loop System

The system learns. Quality improves measurably. Leadership has measurement.

Scope: Recommendation event logging, one-tap rep feedback, HITL review queue, automated eval pipeline, drift detection monitoring, prompt engineering governance.

Why MVP-B follows MVP-A

The eval pipeline scores recommendation quality by comparing retrieved context against the recommendation generated. If context is inconsistent (pre-MVP-A), eval scores are noisy.
The HITL feedback loop produces labeled (context, recommendation, score) triples. If context is fragmented, you cannot determine whether a bad recommendation was caused by retrieval failure or generation failure. MVP-A makes this distinction possible.

Target: Month 8–9 · Gate: Logging ≥95% of events · ≥500 labeled events accumulated · Eval pipeline running weekly without manual intervention · Drift baseline established

33System Architecture — Current State vs. Target State

Current State — Fragmented Retrieval

┌─────────────────────────────────────────────────────────┐
│                   FIELD SALES REP                       │
└──────────────────┬──────────────┬──────────────┬────────┘
                   │              │              │
          ┌────────▼───┐  ┌───────▼────┐  ┌─────▼──────────┐
          │ Lead       │  │ Pitch      │  │ Credit         │
          │ Scoring AI │  │ Recomm. AI │  │ Suggestion AI  │
          └────────┬───┘  └───────┬────┘  └─────┬──────────┘
                   │              │              │
          ┌────────▼───┐  ┌───────▼────┐  ┌─────▼──────────┐
          │ Spark CRM  │  │ Transaction│  │ Credit Profile │
          │ (COF       │  │ History DW │  │ (COF           │
          │  schema)   │  │ (daily     │  │  monthly       │
          │            │  │  batch)    │  │  refresh)      │
          └────────────┘  └────────────┘  └────────────────┘

          ┌─────────────────────────────────────────────────┐
          │ Brex Platform (company-centric schema)          │
          │ → NOT CONNECTED to any of the above AI apps     │
          │ → Brex customers appear in CRM with no context  │
          └─────────────────────────────────────────────────┘

FAILURE MODES:
  ✗ Three retrieval layers → three versions of same customer
  ✗ Brex data entirely absent from all three AI applications
  ✗ No shared entity resolution → schema mismatch across sources
  ✗ No feedback capture → no logging of what reps did
  ✗ No PII preprocessing → raw customer data in LLM prompts
  ✗ No freshness visibility → rep cannot tell which data is current

Target State — CardEx Core + Feedback Loop

┌──────────────────────────────────────────────────────────────┐
│                        FIELD SALES REP                       │
│              [Sees: freshness indicator · one-tap feedback]  │
└────────────┬──────────────────────┬──────────────┬───────────┘
             │                      │              │
    ┌────────▼───┐         ┌────────▼────┐    ┌────▼──────────┐
    │ Lead       │         │ Pitch       │    │ Credit        │
    │ Scoring AI │         │ Recomm. AI  │    │ Suggestion AI │
    └────────┬───┘         └────────┬────┘    └────┬──────────┘
             └──────────────────────┼───────────────┘
                                    │  All apps call shared API
                    ┌───────────────▼──────────────────┐
                    │         CONTEXT API               │
                    │  (stateless · source attribution  │
                    │   · PII-sanitized · ≤200ms P50)   │
                    └───────────────┬──────────────────┘
          ┌─────────────────────────▼──────────────────────────┐
          │              CONTEXTUAL INTELLIGENCE PLATFORM       │
          │  Layer 1: Canonical Customer Entity                 │
          │  Layer 2: Freshness Normalization                   │
          │  Layer 3: PII Preprocessing (SR 26-2 compliant)    │
          └─────────────────────────┬──────────────────────────┘
          ┌─────────────────────────▼──────────────────────────┐
          │   SOURCE SYSTEMS                                    │
          │   Spark CRM (daily) · Transaction DW (daily)       │
          │   Brex Platform (weekly batch → real-time P2)      │
          │   Credit Profile (monthly) · CRM Notes (event)     │
          └────────────────────────────────────────────────────┘
          ┌────────────────────────────────────────────────────┐
          │              FEEDBACK LOOP SYSTEM (MVP-B)          │
          │  Recommendation Event Log                          │
          │  → HITL Review Queue (flagged events · 48hr SLA)  │
          │  → Eval Pipeline (weekly · 4 dimensions)          │
          │  → Three Levers: Retrieval · Prompt · Retraining  │
          │  → Drift Detection (Data · Concept · Output)       │
          └────────────────────────────────────────────────────┘

34Data Model Schemas

Schema 1 — Canonical Customer Entity

Canonical Customer EntityGolden record · entity-resolved · freshness-tagged per field

{
  "entity_token": "ENT-44821-COF",
  "entity_confidence": 0.94,
  "resolution_method": "deterministic_ein",
  "source_ids": {
    "spark_account_id": "SPK-7821-XXXX",
    "brex_company_id": "BRX-44821",
    "capital_one_credit_id": "CRD-992-XXXX",
    "crm_contact_id": "CRM-LOC-4821"
  },
  "business_profile": {
    "legal_name_token": "ENTITY_44821",
    "industry_code": "5087",
    "employee_count_signal": {
      "value": 38,
      "source": "brex_card_issuance",
      "as_of": "2026-06-21T00:00:00Z",
      "cadence": "weekly_batch",
      "staleness_flag": "current"
    }
  },
  "capital_one_relationship": {
    "current_products": ["Spark_Cash_Plus", "Brex_Corporate"],
    "card_utilization_pct": {
      "value": 84,
      "as_of": "2026-06-27T14:30:00Z",
      "cadence": "daily_batch",
      "staleness_flag": "current"
    },
    "spend_trend_90d": "increasing",
    "brex_monthly_volume_q2_2026": {
      "value": 240000,
      "currency": "USD",
      "as_of": "2026-06-21T00:00:00Z",
      "cadence": "weekly_batch",
      "staleness_flag": "current"
    }
  },
  "sales_history": {
    "last_pitch_outcome": "declined",
    "last_pitch_date": "2026-03-12",
    "last_pitch_product": "Spark_Cash_Plus_upgrade",
    "upgrade_indicators": [
      "volume_increase_40pct_q2",
      "headcount_growth_8_cards_q2",
      "credit_utilization_84pct"
    ]
  }
}

Schema 2 — Recommendation Event Log

Recommendation Event LogSR 26-2 attributable · links to CRM outcome via entity token

// At recommendation time:
{
  "event_id": "REC-20260627-FSR-4821",
  "entity_token": "ENT-44821-COF",
  "rep_id_token": "REP-TOKEN-9821",
  "retrieval_ids": ["TXN-batch-20260627", "CRD-20260601", "BRX-batch-20260621"],
  "prompt_version": "ctx-prompt-v2.3",
  "downstream_model": "pitch-recommender-v1.4",
  "recommendation_summary": "Brex Premium upgrade + credit limit increase to $450K",
  "rep_action": { "type": "modified", "modification": "removed_credit_limit_suggestion" },
  "eval_scores": { "human_reviewed": false, "auto_scored": false }
}

// After CRM logs outcome (30-day lag):
"outcome": {
  "outcome_type": "converted",
  "converted_product": "Brex_Premium",
  "outcome_source": "crm_opportunity_closed"
}

// After HITL eval runs:
"eval_scores": {
  "retrieval_precision": 0.89,
  "recommendation_accuracy": "accurate",
  "human_reviewed": true,
  "quality_dimension_scores": {
    "retrieval_relevance": 4, "factual_accuracy": 5,
    "business_appropriateness": 4, "specificity_to_customer": 4
  }
}

35Eval Framework — Four Quality Dimensions

This section defines what “good” looks like before the product ships — as the JD explicitly requires. The rubric is the PM's contribution. The scoring infrastructure is Data Science's. The gate criteria for retraining are co-owned.

D1Retrieval Precision

Did CardEx Core surface the right context for this recommendation?

Score	Definition
5	All retrieved context directly relevant; no irrelevant context included
4	Majority relevant; one peripheral object included
3	Mixed relevance; key signal retrieved but alongside significant noise
2	Critical signal missing; recommendation generated without most relevant data
1	Wrong context retrieved entirely; data for wrong entity or wrong time period

Method: HITL team reviews retrieval_ids against canonical entity. Automated checks for do_not_recommend exclusions. Baseline: ~60% precision ≥4 (A-16). Target: ≥85% by Month 6.

D2Recommendation Accuracy

Does the recommendation contain factual errors about the customer's situation?

Binary: Accurate / Inaccurate. Inaccuracy types logged separately:
wrong_product — recommends a product the customer already holds
wrong_limit — references a credit limit that doesn't match canonical entity
stale_signal_used — driven by a staleness-flagged field despite the warning
entity_mismatch — references signals from a different customer

Method: Automated for wrong_product and wrong_limit; human review for the others. Target: ≥95% accuracy rate within 3 months of MVP-A.

D3Business Appropriateness

Is the recommendation suitable for this customer's business stage and Capital One's product strategy?

Score	Definition
5	Matches growth stage, addresses confirmed pain point, aligns with Capital One's cross-sell priority
4	Directionally correct; one element could be better tailored
3	Plausible but generic; not tailored to this customer's specific signals
2	Technically available but misaligned with likely need
1	Clearly wrong for this business (e.g., starter card for a $240K/month spender)

Human eval only. What scores a 5 reflects the PM's understanding of Capital One's FSO strategy for each customer segment. Must be reviewed quarterly as strategy evolves. Monthly sample of 50 recommendations. Target: mean ≥4.0 by Month 12.

D4Business Outcome Correlation (Lagged)

Did AI-assisted recommendations convert at higher rates than non-AI-assisted pitches?

Measurement: 30-day rolling conversion rate for pitches where the rep used the AI recommendation vs. pitches where the rep ignored it. Tracked at segment level (S1 High-Volume SMB, S2 Cross-Sell Brex) and product level.

Why secondary, not north star: Outcome correlation is the ground truth but takes 30 days per pitch and is confounded by rep skill, market conditions, and product mix. It is the most important metric for demonstrating business value to leadership. It is not the fastest feedback signal for improving the model. Target: AI-assisted pitches convert ≥15% higher by Month 12. (A-17)

36Prompt Engineering — PM-Owned Layer

Prompt engineering at the platform level is distinct from prompt engineering at the query level. The PM owns the schema and instruction layer that every downstream application inherits. This is the highest-leverage PM-owned technical tool in the CardEx Core stack.

1Field ordering within context summaryPM

LLMs are sensitive to position within a context window. The PM orders the context summary to minimize the most damaging failure modes:
(1) do_not_recommend — first, always; catches pitching an existing product before any other signal is processed
(2) current_products — second, same reason
(3) upgrade_indicators — third; the positive signal the model should build toward
(4) spend_trend_90d + brex_monthly_volume — fourth; quantitative grounding for upgrade indicators
(5) last_pitch_outcome — fifth; recent context, not the primary frame
(6) credit_utilization — sixth; supporting signal
(7) suggested_context_for_pitch — the PM-authored narrative summary that synthesizes signals into a direction

2Staleness framing in the promptPM

Raw staleness metadata is not directly useful to a language model. Three versions to test:

A (flag-only): '$240K [DATA AGED: 7 DAYS — VERIFY BEFORE CITING]'
B (weight reduction): '$240K (less certain — last updated 7 days ago)'
C (instruction injection): System prompt instructs: 'When a field is marked aging or stale, express lower confidence; do not cite specific figures from stale fields.'

The PM tests which framing produces appropriate hedging on stale signals without over-hedging on current signals. A/B testable within the eval framework.

3System prompt — the PM-owned instruction layerPM

You are a field sales assistant for Capital One's Business Cards & Payments division.
You will receive a structured customer context object.

Your task: Recommend ONE specific Capital One product action for this customer.

Rules:
- NEVER recommend a product listed in do_not_recommend
- If a field is marked 'stale', express uncertainty about that signal
- Base your recommendation on upgrade_indicators, not on historical behavior alone
- Your recommendation must be actionable in a single sales call
- Do not recommend more than one product action; specificity > coverage
- Output format: [Product action] | [Primary signal used] | [Confidence: High/Medium/Low]

This prompt is versioned (prompt_version in the API response). When the PM updates it, the version increments — and the eval pipeline can measure whether the new version produces better scores than the prior version.

4Negative example injectionPM

The PM periodically injects negative examples into the system prompt — explicit descriptions of failure modes drawn from the HITL review queue's highest-frequency failures:

BAD: 'Offer Spark Cash Plus' when customer already holds Spark Cash Plus
BAD: 'Suggest increasing credit limit to $150K' when current limit is already $150K
BAD: 'Discuss Brex features' without knowing the customer's specific Brex use case

Negative examples updated monthly based on the HITL queue's failure mode distribution. Zero-retraining-cost improvement the PM ships independently.

5Prompt version governancePM

Gate criteria for shipping a new prompt version:
• Tested on last 200 recommendation events from eval dataset
• Quality dimension scores ≥ previous version on at least 3 of 4 dimensions
• No regression on D2 (Recommendation Accuracy — factual errors cannot increase)
• Logged in version registry with change rationale and before/after eval scores

The PM who reaches for retraining first, before exhausting prompt adjustments, burns engineering cycles unnecessarily. Prompt changes are faster than model retraining by 4–6 weeks.

37HITL Loop — 6 Steps from Rep Signal to Model Improvement

Recommendation Generated

Downstream AI app calls Context API. Receives context object (prompt_version tagged). Generates recommendation. Recommendation event logged automatically.

Rep Signal Captured

Rep acts:
[used without change] → passive positive signal (low confidence)
[modified] → passive mixed signal (negative on removed element)
[ignored] → passive negative signal (low confidence)
[flagged] → explicit signal (high confidence, routes to HITL queue)

HITL Review Queue

Flagged events reviewed by eval team within 48 hours. Reviewer scores all four quality dimensions. Failure mode categorized: retrieval / accuracy / appropriateness / other. Labeled event appended to eval dataset.

Auto-Labeling (Passive Signals)

Used without change → D2 accuracy assumed positive (high confidence).
Modified → D3 appropriateness scored based on what was removed.
Ignored → weak negative signal; not used for retraining without HITL confirmation.
Conversion outcome (30-day lag) → strongest ground truth label; overrides passive signals.

Eval Pipeline (Runs Weekly)

Aggregates all labeled events from prior week. Computes quality scores by: customer segment, data source, prompt version, downstream model. Outputs: quality score trends, failure mode distribution, staleness correlation. Flags: any score declining week-over-week for 3+ consecutive weeks.

Three Improvement Levers — PM Selects Based on Failure Mode Diagnosis

L1: Retrieval Tuning (PM + Data Engineering) — for retrieval precision failures. 2–3 week engineering cycle.
L2: Prompt Adjustment (PM owns fully) — for appropriateness and instruction clarity failures. Same-day to 1 week.
L3: Model Retraining (PM sets gate criteria; Data Science executes) — only after L2 exhausted; 4–8 weeks.

Lever 3 (Model Retraining) — PM-Owned Gate Criteria

Gate Criterion	Threshold
Minimum labeled dataset size	≥1,000 labeled events with outcome data
Quality decline sustained	≥3 consecutive weekly declines in overall quality score
Prompt adjustment exhausted	≥2 prompt versions tested without improvement
D2 (Accuracy) floor during retraining eval	Must not drop below 90%
Champion-challenger evaluation	New model must beat current on all 4 dimensions on holdout eval set before promotion

38Drift Detection — 3 Types

Drift is not “the model is wrong.” Drift is “the model was right and is becoming less right, in a direction no one noticed because the change is gradual.”

Type 1Data Drift

What it is

Post-Brex, the canonical entity store receives a new category of customer — high-growth tech startups with $500K+/month Brex spend, no Spark card history. Entity resolution, freshness thresholds, and context summarization were calibrated on SMB customers with 2–5 year card histories. The new customer distribution breaks these calibrations.

Detection

Weekly monitoring of canonical entity field distributions. Kolmogorov-Smirnov test on key numeric fields (monthly spend, credit utilization, headcount). Alert threshold: distribution shift >2 standard deviations from 90-day rolling baseline.

Response

Lever 1 (Retrieval Tuning) — recalibrate entity resolution for new customer type; adjust freshness thresholds. Lever 2 — update prompt to handle new segment.

Type 2Concept Drift

What it is

In Q1 2026, a Brex Premium recommendation for a $200K/month customer scores 5/5 on D3 Business Appropriateness. In Q4 2026, Capital One launches a new SMB Banking product clearly superior for this profile. The Brex Premium recommendation is now a 3/5 — but the model still recommends it because it was rewarded in Q1's training data.

Detection

Quarterly eval rubric review. PM compares current Business Appropriateness rubric against Capital One's current product strategy. D4 (Business Outcome) monitoring: if conversion rate declines for a recommendation type that previously converted well, concept drift is a candidate explanation.

Response

Eval rubric update (PM-owned) → re-score historical eval data → if model needs to learn the new concept, Lever 3 (retraining with updated rubric as ground truth).

Type 3Output Drift (Recommendation Convergence)

What it is

In Month 3, the model produces varied, customer-specific outputs. In Month 12, after 9 months of HITL feedback training, 60% of all recommendations suggest one of three templates. The model learned these three are the 'safest bets' — flagged less frequently — and regresses to them regardless of customer context. This is the HITL paradox: the feedback mechanism that improves quality can also narrow output diversity over time.

Detection

Output diversity index — tracked weekly. Measures vocabulary diversity of recommendation text and distribution of product types recommended. Alert: if top-3 recommendation types account for >60% of all recommendations for 4 consecutive weeks.

Response

(1) Diversity injection in system prompt: 'Consider the full range of Capital One products; do not default to the most common recommendation.' (2) Mark 'generic recommendation despite specific signals' as D3 failure in HITL rubric. (3) Lever 3 (retraining with diversity penalty) if Lever 2 doesn't recover diversity within 4 weeks.

39Success Metrics — 6-Tier Framework

Month 1 is dedicated baseline measurement. No MVP-A components touch a field rep until baselines for Trust Rate, Retrieval Precision, Accuracy Rate, Adoption Rate, and Conversion Rate are established from current-state behavior. Every assumption-labeled baseline is replaced by a confirmed measurement by end of Month 1.

Tier 1 — North Star

Metric	Baseline	Target	Method
Recommendation Trust Rate — % of recommendations reps act on without modification	~20% (A-19)	≥60% by Month 12	Event log: (used without modification) ÷ (total acted on)

Trust Rate integrates all three quality dimensions simultaneously — freshness (reps modify when data is stale), accuracy (reps modify when facts are wrong), and appropriateness (reps modify when pitch direction is off). A rising Trust Rate is the compressed signal that the platform is working.

Tier 2 — Platform Health

Metric	Target	When
Context API Availability	≥99.5%	Continuously from MVP-A
Context API P50 Latency	≤200ms	Continuously from MVP-A
Freshness SLA Compliance — % of responses with all fields within staleness thresholds	≥90%	Weekly from MVP-A
Brex Ingestion Success Rate	≥98% of weekly batch jobs complete without data loss	Weekly from Month 3
Entity Resolution Match Rate (Spark-Brex pairs)	≥70% deterministic match by Month 6	Weekly during MVP-A build

Tier 3 — Feedback Loop Health + Eval Quality (D1–D4)

Metric	Target	When
Feedback Capture Rate	≥95% of events with ≥1 signal by Month 2	Weekly from Phase 1
Eval Dataset Growth Rate	≥200 labeled events/week by Month 9	Weekly from MVP-B
HITL Review Clearance Rate	≥90% reviewed within 48 hours	Weekly from MVP-B
Inter-rater Reliability (D3)	Cohen's Kappa ≥0.75	Monthly, ≥50 dual-reviewed events
Retrieval Precision@3 (D1)	≥85% by Month 6	Weekly HITL scoring on 25-event sample
Recommendation Accuracy Rate (D2)	≥95% by Month 3 post-MVP-A	Automated vs. canonical entity
Business Appropriateness Score mean (D3)	≥4.0 / 5 by Month 12	Monthly 50-recommendation HITL sample
Outcome Conversion Correlation (D4)	AI-assisted pitches ≥15% higher by Month 12	Monthly from Month 9 (30-day lag)

Tier 4 — Business Impact

Metric	Target	When
Rep Adoption Rate	≥70% of eligible S1+S2 reps using recommendations ≥1×/week by Month 9	Monthly
AI-Assisted Pitch Conversion Rate	≥15% higher than non-AI baseline by Month 12	Monthly (30-day lag)
Rep Time Savings per Pitch Prep	≥15 min reduction by Month 9	Bi-monthly survey
New App Integration Time	≤5 business days from API access to first production event	Per integration

Tier 5 — Improvement Velocity

Metric	Target
Quality Score Trend (mean across 4 eval dimensions)	Improving ≥2 points per quarter from Month 9 baseline
Time-to-Improvement (Lever 2: Prompt)	≤10 business days from issue detected to deployed prompt update validated against eval holdout
Time-to-Improvement (Lever 3: Retraining)	≤8 weeks from retraining trigger to new model version promoted to production
Prompt Version Win Rate	≥75% of new prompt versions improve on ≥3 of 4 quality dimensions

Tier 6 — Guardrails (Non-Negotiable Floors)

Guardrail	Threshold	Response
PII in recommendation output	Zero	Pause all downstream model deployments; audit PII preprocessing layer
HITL-flagged recommendation rate	>15% of weekly recommendations flagged as wrong	Emergency eval review; PM + Data Science + MRM convene within 48 hours
SR 26-2 audit trail completeness	<100% with complete retrieval_ids	Block new recommendation events until gap resolved
Output Diversity Index	<50% of Month 3 baseline for 3 consecutive weeks	Lever 2 (diversity injection); escalate to Lever 3 if not recovered in 4 weeks
Entity Resolution Confidence	Average confidence <0.75	Pause Brex entity resolution expansion; review matching algorithm
Feedback Capture Rate	<80% for 2 consecutive weeks	Engineering review of logging pipeline; confirm no silent failures

40Phase Gates — Explicit Progression Criteria

Phase	Gate Metric	Pass Threshold
Phase 1 → Phase 2	Feedback Capture Rate	≥95% of events captured
	PII guardrail	0 PII in 100-event audit
	Freshness indicator	Live in ≥1 rep-facing tool
Phase 2 → Phase 3 (MVP-A complete)	Context API handling Spark traffic	100% of Spark app requests via CardEx Core
	Brex Entity Resolution Coverage	≥70% of Brex accounts resolved
	Trust Rate (Spark customers)	Trending above 30% vs. ~20% baseline
	Context API P50 Latency	≤200ms confirmed in production load test
Phase 3 → Phase 4 (MVP-B complete)	Eval Pipeline Coverage	≥80% of recommendation events scored weekly
	HITL Clearance Rate	≥90% reviewed within 48 hours
	Eval Dataset Size	≥500 labeled events accumulated
	Drift Detection	All three drift type monitors active with baselines established

41RAID Log

Risks

ID	Risk	P	I	Mitigation
R-01	Brex entity resolution match rate <50% (A-12 fails)	M	H	Sample matching exercise Week 1; if <50%, extend human-review queue capacity and adjust MVP-A gate criteria
R-02	Downstream FSO AI application teams resist Context API migration	M	H	Parallel-run period; requires executive sponsor mandate from BC&P leadership — PM cannot force migration without org authority
R-03	SR 26-2 RFI on GenAI drops before MVP-A ships, requiring architectural changes	L-M	H	MRM team in design review from Week 2; CardEx Core's source attribution already satisfies likely RFI requirements
R-04	HITL review queue becomes backlogged	M	M	Automated pre-scoring to triage severity; high-severity flags require 48-hour SLA; low-severity batch reviewed weekly
R-05	Brex batch feed cadence is bi-weekly or monthly (longer than A-13 assumes)	M	H	Negotiate freshness SLA with Brex engineering Week 1; written commitment, not verbal estimate
R-06	Prompt version update degrades quality for a segment not in eval holdout	L	M	Staged rollout: new prompt version served to 10% of traffic before full cutover; monitor 48 hours
R-07	Output diversity drift occurs faster than A-18 assumes (within 3 months of MVP-B)	L	M	Monthly diversity index reporting from MVP-B launch; early detection protocol if >10% drop within first 90 days

Issues (Known at Time of Writing)

ID	Issue	Owner
I-01	Brex real-time API availability is a dependency not under Capital One's full control	PM + Brex Engineering Lead
I-02	HITL review team staffing not yet defined — eval pipeline requires human reviewers	PM + BC&P People Lead
I-03	Prompt version governance requires alignment with Data Science on champion-challenger evaluation criteria before MVP-B	PM + Data Science Lead

Dependencies

ID	Dependency	Owned by	Required by
D-01	BC&P executive sponsor mandate for downstream app API migration	BC&P Head of Product	Phase 2 start (Month 2)
D-02	Brex engineering commitment to weekly batch feed cadence and real-time API roadmap	Brex Engineering	Phase 1 end (Month 2)
D-03	MRM sign-off on CardEx Core SR 26-2 compliance design	Capital One MRM	Phase 2 launch (Month 6)
D-04	HITL reviewer team staffing (2 FTE minimum for Phase 3 launch)	BC&P Ops / People	Phase 3 start (Month 6)
D-05	CRM outcome linkage capability (pitch outcome → recommendation event log)	CRM Engineering	Phase 3 (for D4 scoring)

42Assumption Register · Phase 5 (A-17–A-20)

ID	Assumption	Basis	Urgency
A-17	AI-assisted pitch conversion rate will improve ≥15% over non-AI-assisted baseline by Month 12	Capital One's Chat Concierge demonstrated 55% lead conversion improvement; field sales context is rep-mediated so a more conservative target is appropriate	High
A-18	Output diversity drift will become detectable within 6–9 months of HITL feedback training beginning	Documented pattern in enterprise models trained with HITL feedback mechanisms — models converge on low-variance outputs over time	Medium
A-19	Recommendation Trust Rate baseline is approximately 20%	Inferred from trust collapse described in Phase 2; Agent Assist improvement from 84% to 93% suggests current field sales context is meaningfully below a reachable good state	High
A-20	Recommendation Accuracy Rate baseline is unknown but likely low; wrong-product errors are probable given absence of a do_not_recommend constraint in current tools	Rep complaint patterns documented in Phase 2; absence of structured product exclusion field in current retrieval layers	High

Phase 6

Learning

Full assumption register (A-01–A-20) · Top 5 validation priorities · Over/underestimate analysis · First 8 actions · Vision · Note on this project

43Consolidated Assumption Register — All 20 Assumptions

Critical — Solution Direction Changes If Wrong

ID	Ph	Assumption	Basis	Validation
A-01	0	FSO lacks a unified customer view across Spark, Brex, and Discover data sources	Brex operating independently post-acquisition; 12–24 month typical integration timeline at this scale	Architecture review with FSO engineering leads, Week 1
A-02	0	No shared context platform exists for field sales AI; each application has its own retrieval layer	JD language 'design and build a horizontal foundation for shared, trusted context' implies the platform does not exist	Same architecture review, Week 1
A-08	2	The inferred current-state architecture (multiple independent retrieval layers, no entity resolution, no shared logging) reflects Capital One's actual production environment	Constructed from public information about Capital One's AI deployments and acquisition context	Architecture review with FSO engineering and Data Science leads, Week 1
A-09	2	The root cause is structural — a missing platform abstraction layer — not organizational (siloed teams and poor communication)	Inferred from how Capital One built AI vertically. The organizational explanation is not ruled out — it may be both	Stakeholder interviews with FSO AI team leads from at least two application teams, Week 2. Key probe: 'If you wanted to share customer context with another team today, what would it take?'
A-11	2	No canonical customer entity currently resolves identity across Capital One Spark card, Brex company, and Capital One credit schemas	Brex operating independently; entity resolution at this scale (35,000+ Brex companies) unlikely completed in 3 months since April 7, 2026	Data architecture review with Brex integration team, Week 1
A-13	3	Brex data will be available via weekly batch feed for the first 12 months; real-time API access estimated 12–18 months out	Brex operating independently; real-time integration requires purpose-built integration layer	Brex engineering meeting, Week 1; secure written SLA commitment for initial batch cadence

High Urgency — Scope or Timeline Changes If Wrong

ID	Ph	Assumption	Validation
A-03	0	Data siloing is causing measurable adoption friction — reps are aware of inconsistent recommendations and avoiding AI tools	Contextual inquiry with 6–8 FSRs across two regional offices, Month 1
A-04	0	Capital One's MRM is already applying SR 26-2 principles to GenAI systems in field sales by analogy	MRM team introductory meeting, Week 2; ask which governance framework applies to FSO AI today
A-06	0	RAG is the correct retrieval architecture for CardEx Core MVP — fine-tuning not viable due to weekly data changes and SR 26-2 source attribution requirements	Data refresh cadence audit for each source system, Week 2
A-07	0	Rep behavior signals (used/modified/ignored/flagged) are the primary available HITL feedback signal at MVP-B launch	CRM instrumentation review, Week 2
A-12	3	Deterministic entity resolution match rate of ~70–85% achievable using EIN as primary key	Sample matching exercise on 500 Brex accounts, Week 2
A-15	3	Downstream FSO AI applications can adopt the Context API without requiring a full application rebuild	Architecture review of each existing FSO AI application, Week 1
A-17	5	AI-assisted pitch conversion rate will improve ≥15% over non-AI baseline by Month 12	Baseline conversion rate audit in Month 1 before MVP-A ships
A-19	5	Recommendation Trust Rate baseline is approximately 20%	Pre-launch rep behavior audit in Month 1
A-20	5	Recommendation Accuracy Rate baseline is unknown but likely low; wrong-product errors probable	Retrospective accuracy audit on 100 recent recommendations, Month 1

Medium Urgency — Refinable In-Flight

ID	Ph	Assumption	Validation
A-05	0	Current FSO AI tools pass CRM data to LLM prompts without standardized PII preprocessing	Architecture review of existing app prompt construction, Week 1
A-10	2	Customer churn in BC&P's SMB segment is partially attributable to irrelevant pitch experiences caused by stale or fragmented context	Churn analysis segmented by pitch relevance score and AI tool usage rate, post-MVP-B
A-14	3	Context API P50 latency of ≤200ms achievable without caching using Capital One's cloud-native infrastructure	Load testing in development environment before production launch
A-16	5	Retrieval Precision@3 baseline for current FSO AI tools is approximately 60%	Retrospective eval team scoring on 100 pre-CardEx Core recommendations, Month 1
A-18	5	Output diversity drift will become detectable within 6–9 months of HITL feedback training beginning	First diversity index report at Month 3 post-MVP-B to establish baseline before drift begins

44Top 5 Assumptions to Validate First — In This Order

Priority 1A-09 — Root Cause — Structural vs. Organizational

Why this one, this order: This is the only assumption whose failure invalidates the platform build entirely. If the root cause is organizational siloes and insufficient governance rather than a missing abstraction layer, Concept F (governance-first, no platform) is the right solution. A wrong answer here means building expensive infrastructure to solve a people problem. The question cannot be answered by architecture reviews alone — it requires conversations with the people who built the existing applications.

How to validate: Interview leads from at least two FSO AI application teams — separately, not in a group. Key question: 'If you wanted your application to use the same customer data as the lead scoring tool, what would it take technically, and what would it take organizationally?' If the answer is 'we'd just agree to share the data schema' → organizational. If 'we'd need to build a shared API layer' → structural.

Priority 2A-11 — No Canonical Entity Exists

Why this one, this order: Entity resolution is the structural prerequisite for the entire platform. If a partial canonical entity already exists (Capital One has begun a Brex data integration that resolves some customer identities), CardEx Core builds on it rather than from scratch. The answer changes engineering scope significantly.

How to validate: Request entity model documentation from both Capital One's data platform team and the Brex integration engineering team in Week 1. Specifically ask: 'Do we have a customer record that links a Brex company ID to a Capital One Spark card account ID?'

Priority 3A-13 — Brex Batch Cadence

Why this one, this order: The freshness SLA CardEx Core can promise for Brex data is entirely determined by the batch cadence. Weekly → Brex data labeled 'aging' after 8 days, acceptable for MVP. Monthly → Brex data is structurally stale for most of the month, significantly weakening CardEx Core's value for S2 (Cross-Sell Brex) reps — the segment with the most acute pain.

How to validate: Week 1 meeting with Brex engineering and Capital One cloud team. Secure a written SLA commitment — not a verbal estimate — for the initial batch job cadence before designing the freshness normalization layer.

Priority 4A-15 — Downstream App Rebuild Requirement

Why this one, this order: The MVP-A parallel-run migration strategy depends on existing FSO AI applications being able to call the CardEx Core API alongside their current retrieval without a full rebuild. If any application has retrieval embedded directly in model code (a tightly-coupled monolith), migrating it requires a partial rebuild — adding one engineering cycle per affected application to the MVP-A timeline.

How to validate: Week 1 architecture review. Ask each application team lead to show the code path from 'rep triggers recommendation' to 'data is retrieved.' Decouplability is visible in the architecture; it does not require running the code.

Priority 5A-12 — Entity Resolution Match Rate

Why this one, this order: The 70–85% deterministic match rate estimate determines the human-review queue volume, MVP-A launch quality, and staffing requirement for entity resolution review. A 50% match rate doubles the human review queue and changes the staffing plan before the first line of entity resolution code is written.

How to validate: Sample matching exercise in Week 2. Run EIN-based deterministic matching on 500 randomly selected Brex company accounts against Capital One Spark and credit records. Measure actual match rate. This exercise can be done with raw data access — no platform needed.

45Over/Underestimate Analysis — Structural Biases in This Proposal

This section identifies the structural biases in the proposal — not to undermine it, but because a hiring manager reading critically will find them, and stating them first is more credible than having them surface in an interview.

What This Proposal Overestimates

1. Entity resolution feasibility at MVP quality. The proposal assumes 70–85% deterministic match rate. In practice, B2B entity resolution for financial services is complicated by: business names that don't match legal entity names (DBA vs. registered name); sole proprietors using personal SSNs for both business and personal accounts (the Spark card may be under the owner's SSN; Brex's account is under a company EIN); and businesses that changed legal structure between their Spark card opening and their Brex onboarding. At 50% match, the canonical entity store at MVP-A launch has 50% of Brex accounts unresolved — which materially limits CardEx Core's value for S2 (Cross-Sell Brex) reps, the most acute pain point.

2. Rep trust recovery timeline. The Adoption Rate target of ≥70% of eligible reps by Month 9 assumes improving recommendation quality is sufficient to recover trust from reps who have already been burned. In enterprise AI rollouts, trust recovery from negative experiences requires more than quality improvement — it requires visible organizational signaling, peer success stories that reach the Burned Skeptic cohort, and in some cases direct manager intervention. The Month 9 adoption target may need to be disaggregated: early adopters (40%) at Month 6, broader adoption (70%) at Month 15.

3. Feedback loop signal quality in the first 9 months. Passive signals (used/modified/ignored) are noisy: a rep who modifies a recommendation may be improving it or may be wrong about their modification. Without a high enough proportion of explicit flags and lagged outcome labels in the dataset, the quality trend will be statistically noisy for 6–9 months. The ≥2 points/quarter improvement target may need to be deferred to Month 15 with a smaller signal of directional improvement (≥1 point) being the Month 9 gate.

What This Proposal Underestimates

1. Organizational change management as the primary risk. The RAID log lists downstream app team resistance as R-02 (Medium probability, High impact). This is likely underrated. Every FSO AI application team has its own roadmap, its own architecture philosophy, and — critically — its own answer to “why would I trust a shared platform I didn't build?” Migrating to a shared API affects their deployment independence, their debugging process, and their on-call responsibility. Without a strong, sustained executive mandate and dedicated migration engineering support, the parallel-run period could extend from 3 months to 12+ months. The mitigation that is probably missing: a formal adoption milestone tied to FSO engineering team performance reviews, championed by the BC&P Head of Product. Without it, migration is optional and will be deprioritized whenever teams face shipping pressure.

2. HITL review team staffing. D-04 assumes 2 FTE minimum for Phase 3 launch. At ≥200 labeled events per week with 10% dual-review for inter-rater reliability, a 15% flag rate on 1,000 weekly recommendations produces 150 flagged events requiring 48-hour turnaround. Two reviewers working at high quality can process approximately 80–100 flagged events per week before quality degrades. Phase 3 launch staffing should be 3–4 FTE with a clear plan to reduce as automated scoring matures. Understaffing the eval team at launch is the fastest way to degrade eval dataset quality and lose the feedback loop before it has demonstrated value.

3. Brex integration timeline. A-13 assumes Brex batch data integration is live by Month 3. Three months is aggressive. The more realistic timeline based on standard acquisition integration patterns at this scale: Month 5–6 for initial batch, Month 12–18 for real-time API. The roadmap should show a Spark-only Context API at MVP-A launch (Month 6), with Brex data arriving in MVP-A Phase 2 (Month 9). Leadership expectations must be set accordingly.

The organizational assumption this proposal cannot validate from the outside: Whether Capital One's BC&P organization has a PM who can credibly own both the context platform (a data infrastructure product) and the feedback loop system (an ML operations product) simultaneously. This is a broad scope for a Manager-level PM. In practice, the context platform may require a data platform PM skill set and the feedback loop may require an ML platform PM skill set. The risk is that this PM role is designed for a unicorn — and what actually ships is whichever half the PM is stronger in, while the other half is deprioritized under shipping pressure.

46First 8 Actions — If Hired Into This Role Tomorrow

Day 1–2Schedule five architecture reviews — do not cancel them for anything.

Reviews needed: FSO engineering leads (existing AI applications), Brex integration engineering team, Capital One data platform team, MRM team, and the Field Sales AI Product Team (S5).

Rationale: A-01, A-02, A-08, A-11, and A-15 are all Critical assumptions. Every design decision made before these reviews is built on inference, not fact. The first instinct of any PM inheriting a problem is to start designing. The correct instinct is to first confirm the problem is what you think it is. These five reviews take one week and replace the five most dangerous assumptions in the register.

Day 3–5Conduct stakeholder interviews with FSO AI application team leads — separately, not in a group.

Target: leads from at least two different FSO AI application teams, interviewed independently.

Rationale: Validates A-09 (structural vs. organizational root cause) and A-15 (rebuild requirement). The interviews must be separate because team leads in a group setting will align to the most politically safe answer. Separate conversations surface whether fragmentation is a technical architecture problem or a coordination problem.

Key question to every interviewee: 'If you needed to use the same customer data as another team's AI application, what would it take — technically and organizationally?'

Week 2Run the entity resolution sample exercise — 500 Brex accounts.

Request access to 500 randomly selected Brex company records and match them against Capital One Spark and credit records using EIN as the primary key.

Rationale: Validates A-12 (match rate). This is a data exercise, not a design exercise. It requires no architecture decisions and no new infrastructure. The result either confirms MVP-A's entity resolution plan (≥70% match) or changes the scope and timeline before engineering begins.

Week 2Meet with Brex integration engineering — get a written batch cadence SLA.

Not a verbal estimate. A written commitment to initial batch frequency, data schema documentation, and the roadmap for real-time API availability.

Rationale: Validates A-13 (Brex batch cadence). The Freshness SLA Compliance target, staleness thresholds, S2 segment value story, and Phase 4 roadmap all depend on this number. A verbal 'probably weekly' will shift under engineering pressure.

Month 1Establish all pre-launch baselines — before a single rep sees CardEx Core.

Specific tasks:
• Retrospective eval scoring on 100 recent FSO AI recommendations (confirms A-16, A-20: Retrieval Precision and Accuracy Rate baselines)
• Rep behavior audit in existing sales tools (confirms A-19: Trust Rate baseline)
• CRM conversion rate pull segmented by AI-assisted vs. non-AI-assisted pitches (confirms A-17: conversion rate baseline)
• Rep time-in-tool measurement for pitch preparation

Rationale: Without baselines, every metric at Month 12 is a claim without a denominator. With baselines, every metric at Month 12 is evidence.

Month 1Author the eval rubric — circulate for MRM and Data Science sign-off before any model work begins.

Deliver: the four-dimension eval framework (Retrieval Precision, Recommendation Accuracy, Business Appropriateness, Business Outcome Correlation) with scoring definitions, measurement methods, and HITL reviewer training protocol.

Rationale: The champion-challenger evaluation criteria, model retraining gate criteria, and prompt version governance all depend on an agreed rubric. A rubric established after the model is evaluated is a rationalization. Data Science cannot build the eval pipeline without knowing what it is scoring. This document is the PM's first real deliverable — and it is entirely PM-owned.

Month 1, Week 3Activate recommendation event logging on all existing FSO AI applications.

Deploy structured logging schema to existing applications — passive signals only (used/modified/ignored). No rep-facing change required.

Rationale: Every day without logging is a day of eval data lost. By the time MVP-B launches (Month 9), 8 months of passive signal data will exist if logging starts in Month 1. That is 8× more eval history than if logging starts at MVP-B. This action has no rep-facing risk and no architectural dependency — it should be the first thing that ships.

Month 2Pilot the rep-facing freshness indicator with a cohort of 10–15 early-adopter reps before full rollout.

Identify 10–15 reps from S1 or S2 who are known to be tool-positive (not the Burned Skeptic cohort). Ship the freshness indicator to them as a pilot. Run structured debrief after 2 weeks.

Rationale: The freshness indicator is the first rep-visible change and the first trust restoration signal. Piloting before full rollout answers two questions that cannot be answered in design: (1) Do reps understand what 'aging: 7 days' means, or is it confusing? (2) Does seeing the freshness signal change how reps use the recommendation? The pilot either confirms the design or surfaces a UX problem before it reaches all reps.

47Vision

Every field sales AI tool at Capital One runs on the same customer context.
Not because it is required. Because the recommendations are better.

Why this sentence:It names the platform's specific promise (same customer context), not the technology. It is specific to field sales, not generic enterprise AI. “Not because it is required. Because it is better.” is the adoption thesis — reps and application teams should choose CardEx Core because it works, not because a mandate forced them. A platform adopted under mandate is abandoned when leadership attention moves on. A platform adopted because it produces better recommendations is defended by the people who use it. The sentence stands without the document behind it.

48A Note on This Project

This is a portfolio case study constructed from publicly available information. It was built to demonstrate how I approach a GenAI PM role requiring platform thinking, ML measurement fluency, and domain depth in financial services AI — not to claim insider knowledge of Capital One's internal architecture.

What is executed: The research, analytical frameworks, domain architecture reasoning, eval framework design, assumption register, and strategic recommendations in this document.

What is directional and not validated: All assumptions labeled A-01 through A-20. The current-state architecture is inferred, not confirmed. The baseline metrics are estimates grounded in public analogs, not internal measurements. The entity resolution match rate (A-12) and Brex batch cadence (A-13) are the two assumptions most likely to change the build on contact with reality.

What I would do differently with internal access: The First 8 Actions in Section 46 are precisely what I would do. The document is designed so that the five Critical assumptions can be validated or refuted in the first two weeks — before any engineering commitment is made. The platform concept is correct if A-09 is confirmed (structural root cause). It is the wrong concept if A-09 is wrong. That is a testable claim.