Contents
- A person fills out a quote form on your auto insurance landing page.
- According to a Forrester Consulting study commissioned by Neustar, companies that implement identity resolution across their customer data see a 15-35% improvement in marketing efficiency and a 10-20% increase in customer lifetime value.
- The three-tier matching process operates on every inbound record before it touches the lead database.
- Duplicate leads are not just an operational annoyance.
The Setup
A person fills out a quote form on your auto insurance landing page. Two weeks later, the same person fills out a life insurance form from a different device. A month after that, they click an affiliate link and land on your annuities page. Your system now has three separate records for one person -- with three different source attributions, three different timestamps, and potentially three different email addresses or phone numbers.
This is the duplicate problem, and it costs lead generation operators real money. When you sell the same person to a buyer three times, the buyer gets three leads that all convert (or fail to convert) into the same customer. The buyer pays three times for one opportunity. When they discover the duplication -- and they always do -- they request refunds, reduce their volume commitment, or leave entirely. Industry data from Experian's Global Data Management report found that organizations estimate 26% of their customer data is inaccurate, and Gartner research indicates that poor data quality costs organizations an average of $12.9 million per year.
The conventional solution is deduplication: match records by email address and merge duplicates. This works for the simplest cases but fails when the same person uses different email addresses across submissions, or when they provide a phone number on one form and an email on another. Single-field matching catches some duplicates but misses the ones that actually cause the most damage -- because those are the records that look different on the surface but represent the same person.
What the Data Shows
According to a Forrester Consulting study commissioned by Neustar, companies that implement identity resolution across their customer data see a 15-35% improvement in marketing efficiency and a 10-20% increase in customer lifetime value. The reason is straightforward: when you know that three records are one person, you can make one high-quality placement instead of three low-confidence deliveries.
One production system resolved 958,937 contact points across 616,543 leads (as of January 2026) using a three-tier matching process. That is an average of 1.56 contact points per lead -- meaning the system regularly consolidates multiple touchpoints into a single identity profile. The same system maintains 306,676 blacklist entries and processed 76,836 outbound deliveries, with deduplication running at every stage of the pipeline (portal_stealth_locked_values).
The three tiers operate in strict priority order -- each tier is tried only if the previous one fails to find a match. This is not fuzzy matching or probabilistic scoring. It is deterministic: either a match is found at a given tier, or the system moves to the next tier.
Before this system existed, the same operation ran on six separate SaaS vendors, each with its own customer database. Identity resolution across those vendors was manual -- if it happened at all. The consolidation into a single platform with unified matching eliminated the fragmented data problem entirely. Every lead, regardless of which of the 12 inbound sources it arrived through, passes through the same identity resolution pipeline before entering the database (portal_stealth_locked_values).
How It Works
The three-tier matching process operates on every inbound record before it touches the lead database. Here is what each tier does and why the order matters.
Tier 1: Unique identifier match. Every record that enters the system is assigned a unique identifier (UUID) at the point of creation. If a subsequent record arrives with the same UUID -- which happens when the same browser session submits multiple forms, or when a known tracking identifier is passed via URL parameter -- the system matches immediately. This is the highest-confidence match because UUIDs are system-generated, not user-provided. No human types a UUID incorrectly. When a Tier 1 match is found, the new data merges into the existing record: new fields are added, existing fields are updated if the new data is more recent, and the contact point count increments.
Tier 2: Email match. If no UUID match is found, the system normalizes the email address (lowercase, whitespace trimmed) and checks for an existing record with the same email. Email is the second-highest confidence identifier because most people have a primary email address they use consistently. However, email is not perfect -- people have multiple email addresses, they make typos, and they sometimes use disposable addresses. When a Tier 2 match is found, the same merge process runs: data consolidation, field updates, and contact point tracking.
Tier 3: Phone match. If neither UUID nor email produces a match, the system normalizes the phone number (strip formatting, standardize country code) and checks for an existing record. Phone matching is the third tier because phone numbers are less stable than email addresses -- people change numbers, share numbers, and enter numbers with formatting variations. But phone matching catches records that email matching misses: the person who used a work email on form A and a personal email on form B, but the same mobile number on both. When a Tier 3 match is found, the merge process runs identically.
No match across any tier. If no match is found at any tier, the system creates a new lead record. This is a genuinely new person entering the system for the first time.
The merge process at each tier is not a simple overwrite. It is a recursive merge that preserves data from both the existing record and the new submission. If the existing record has a mailing address but no phone number, and the new submission has a phone number but no mailing address, the merged record has both. This payload merge approach means every interaction with a lead adds information to their profile rather than replacing it.
The system also links orphaned events: if a lead activity or event was captured before the identity was resolved (for example, a page view tracked by cookie before the person submitted a form), the system retroactively links that event to the resolved identity. This backfill process means the lead profile includes behavioral data from before the person identified themselves.
After identity resolution completes, the system recalculates the lead score (0-100) based on the consolidated profile, including enrichment data from demographic and behavioral sources and the full contact history. A lead with three consolidated touchpoints, a verified email, and demographic enrichment scores higher than a lead with a single form submission and no verification -- and that score difference determines routing priority and buyer pricing.
What This Means for Business Operators
Duplicate leads are not just an operational annoyance. They are a direct cost: refund requests from buyers who received the same person twice, reduced trust leading to lower volume commitments, and wasted delivery capacity on records that should have been consolidated. When industry data shows that 26% of customer data is inaccurate, and your lead generation operation processes tens of thousands of records per month, the financial impact of unresolved identities is significant.
Three-tier matching -- UUID, then email, then phone -- catches duplicates that single-field deduplication misses. The 958,937 contact points resolved across 616,543 leads demonstrate that the approach works at production scale, consolidating an average of 1.56 touchpoints per identity. For lead generation operators selling to quality-conscious buyers, identity resolution is not a technical feature. It is the mechanism that determines whether your leads are worth the price you charge.
Related: Spoke #84: 616,543 Leads Through One Platform | Spoke #89: Lead Enrichment and Persona Segmentation | CS19: The PRJ-01 Product Story
References
- Experian (2025). "Global Data Management Report." Customer data accuracy and duplication rates.
- Gartner (2025). "Data Quality Cost Impact." Financial cost of poor data quality.
- Forrester/Neustar (2025). "Identity Resolution Marketing Efficiency Study." Marketing efficiency and LTV improvements from identity resolution.
- Keating, M.G. (2026). "The Compounding Execution Method: Complete Technical Documentation." Stealth Labz. Browse papers