The Unique Customer Problem

Why unique customers are so challenging to track + 7 ways you can do it

Bryan Curley
March 20, 2024

Did You Know? An estimated 10-30% of an organization’s customer and prospect records can be affected by duplication. (Source)

Yesterday, we looked at Customer Lifetime Value (CLV, or sometimes LTV or CLTV), a crucial metric for virtually every business that only 42% of businesses can measure accurately.

(And don’t worry, today’s email will be much shorter than yesterday’s.)

For the uninitiated, CLV is a simple-on-the-surface metric represented by the following formula:

CLV = (Average Order Value x Purchase Frequency) x Customer Lifespan

The part in parentheses—(Average Order Value x Purchase Frequency)—is called Customer Value and represents how much revenue a customer generates over a period of time. That seems kind of important, no?

Average Order Value (AOV) is easy to calculate: Total Revenue / Total Orders over your chosen time period, such as 1 year, 3 years, 5 years, etc.—whatever makes sense for your business given how frequently customers purchase your product.

Purchase Frequency (PF) seems easy to calculate, but it’s actually deceptively difficult if you want to do it accurately, which you probably do. Here’s PF’s formula:

PF = Total Orders / Number of Unique Customers

No sweat, right?

Giphy

What makes PF so hard to calculate?

The difficulty lies in the word “unique.”

Have you ever gone to make a purchase online and realized you couldn’t remember your password, so you just made a new account to speed things up?
Have you ever purchased multiple times from the same site but clicked the “Check out as guest” option to bypass the account creation process entirely?
Have you ever purchased from a company’s website but also purchased from their brick-and-mortar store?

In each of those situations, you’re a repeat customer presenting as a new customer from the customer account or data collection perspective of the store where you’re shopping.

And for companies with a lot of brick-and-mortar locations and volume, the “unique customer” nut can be a lot harder to crack.

Today’s edition of Data-Driven Marketing offers several ways to demystify your data and get a better understanding of how many customers you really have.

7 ways to identify unique customers

Here are some techniques you can use.

1. Collect customer data

One of the simplest ways is to ask your customers for more data. Unfortunately, there’s a trade-off:

⬆️ More data = ⬆️ More checkout complexity = ⬇️ Fewer conversions

Baynard Institute, a company that performs usability research, quantified the impact of adding more form fields to the checkout process, showing a steep decline in “UX performance” as more fields are added.

checkout complexity decreases conversions

Source: This complex analysis from Baynard Institute

But even if you collect a ton of customer data, you probably have those customers who create new accounts whenever they lose their login info, so this doesn’t solve the problem entirely.

2. Manual data review

Smaller companies can export and analyze their customer data by hand, looking for obvious ways to spot duplicate customers, such as the following:

Different email address but same name and phone number
Shipping to the same address (could be different family members, so determine whether you care more about unique customers or unique households)

While simpler than other techniques described here, there’s no doubt this method doesn’t scale well.

3. Use matching rules to identify duplicates

Many companies use CRMs (software that handles customer relationship management), which include their own deduplication features to ensure accurate customer data.

Salesforce is the largest CRM with a 27% market share, and they include the ability to set basic matching rules to identify and remove duplicate accounts. Other CRMs offer similar functionality.

Services like Syncari and Dedupely sync with many of the most popular CRMs and offer extended data quality management functionality, but many only use the typical “matching fields” approach, which is a great first step and sufficient for many use cases, though obviously not all.

4. Use machine learning to tackle messier data

Sometimes simple matching rules won’t cut it. Consider this simplified example for a company that has tens of thousands of customers with the following three records in their database:

Field	Customer 1	Customer 2	Customer 3
First Name	John
Last Name	Smith		Smith
Email		[email protected]	[email protected]
DOB	12/20/1975	12/20/1975
City	Bismarck		Bismarck
State	ND		ND

Customers 2 and 3 have the same email address, so they’re the same person. Easy peasy. But what about Customer 1? Matching rules aren’t sufficient to combine Customer 1 with either Customer 2 or Customer 3.

Customer 1 and Customer 2 only share the same DOB.
Customer 1 and Customer 3 share the same Last Name, City, and State, but “Smith” is popular last name, and Bismarck, ND has a population of 75,000. WhitePages.com has 5 pages of listings for Smiths in Bismarck, ND alone!

Machine learning models can analyze more complex systems like this one where basic matching rules might miss duplicate accounts. DataRobot is an example of a service that offers this more advanced functionality.

5. Use cookies to track customer behavior

Cookies log user behavior in the browser, which allows for user identification and tracking. A customer could purchase from your website today with the “Check out as guest” option, then come back tomorrow and make another purchase using the “Check out as guest” option again. If your website put a cookie in the user’s browser, you can identify them as the same person, even if they didn’t provide any personal identifying information.

However, cookies alone aren’t enough to identify duplicate customers. They expire after a period of time, users can browse incognito to begin a new, cookie-less session (Cookie Monster in shambles), and users also can decline to use cookies when you provide them with the required cookie notice.

Me whenever one of those “opt-in to being tracked!” notices pops up:

I Dont Think So No Way GIF by FTX_Official

Gif by FTX_Official on Giphy

6. IP tracking

Most people don’t realize this, but even if you browse the web incognito or you clear your cookies before a new session, websites still can identify who you are.

IP tracking allows websites to see the IP address where your Internet connection originates, logging a specific address and that address’s assigned location. VPNs are a popular way of circumventing IP tracking, but most people don’t use VPNs, leaving IP tracking a pretty effective way of identifying users.

7. Browser profiles (user agents)

While some people are aware of IP tracking and use VPNs to prevent being tracked, even fewer are aware of browser profiles, also called user agents.

Every time you access the Internet, your browser records a bunch of information about the technology you’re using for that session. Here’s my browser profile for the session I’m currently using to write this email:

Download or open this image in a new tab to see a larger version.

This is the user agent for my current browsing session:

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:123.0) Gecko/20100101 Firefox/123.0

Any site I visit will have the following information about the browser I’m using (Firefox) and the operating system of the device I’m using (my Windows laptop):

Mozilla/5.0: Indicates my browser’s compatibility with Mozilla, even though the browser is not Mozilla. Firefox, the browser I use, is developed by Mozilla.
Windows NT 10.0: Indicates that the operating system is Windows 10. "NT" was the original product line name for Windows operating systems that were designed for workstations and server computers.
Win64; x64: Specifies that the Windows version is running on a 64-bit architecture.
rv:123.0: Refers to the release version of my browser. In this case, it indicates version 123.0.
Gecko/20100101: Gecko is the layout engine developed by Mozilla that Firefox uses. The date (20100101) does not necessarily represent a specific release date of the Gecko engine; it's often used as a generic timestamp in user-agent strings.
Firefox/123.0: Specifies that the browser making the request is Firefox, and the version of the browser is 123.0.

We’re getting into technical stuff now, so I’ll stop there. Check out this guide about user agents if you want to learn more.

Everyone say, “Hi!” to Jessica C 👋

Question: What’s the most random fact you know?

Jessica C’s Answer: “There are more fake flamingos in the world than real ones.“

Editor’s Note (me, Bryan): OK, this is an insane fact that actually isn’t so insane once you look into it. (Though it’s absolutely weird and random, so thank you, Jessica.)

Apparently there are 6 species of flamingos across the Caribbean, South America, Africa, the Middle East, and Europe with estimates of their total population ranging from several hundred thousand to a few million living in the wild.

The fake flamingo was invented in 1957 by a dude named Don Featherstone (lol at that last name, c’mon) right here in my home state, Massachusetts. While I couldn’t find an estimate for the exact number made over the last 70ish years, it’s easily more than that number of real ones existing today. I’m imagining entire landfills full of just broken pink lawn ornaments.

ChatGPT-Generated Joke of the Day 🤣

Why don't scientists trust stairs?

Because they're always up to something.

Suggest a topic for a future edition 🤔

Got an idea for a topic I can cover? Or maybe you’re struggling with a specific marketing-related problem that you’d like me to address?

Just reply to this email and describe the topic.

There's no guarantee I'll use your suggestion, but I read and reply to everyone, so have at it!