How to Choose a CRO Agency: 7 Questions to Ask

WRITTEN BY

Irakli B.

Why Hiring a CRO Agency Is a Procurement Decision, Not a Vibe Check

Figuring out how to choose a CRO agency is less about who has the slickest deck and more about who has a method that holds up under scrutiny. Every agency shows you the winning case studies. Very few will tell you how many tests lost, how they sized the sample, or what they did when the "redesign" tanked revenue. This guide is for founders and growth leads evaluating three to five agencies who need to make the hire defensible internally — not just exciting.

In this guide

Why Most CRO Agency Pitches Look Identical
Question 1: What's Your Win Rate Over the Last 12 Months?
Question 2: How Do You Size Samples and Call Significance?
Question 3: What Happens When a Test Loses?
Question 4: How Does Research Feed the Testing Roadmap?
Question 5: Who Owns the Testing Tool Contract?
Question 6: How Do You Score and Prioritise Hypotheses?
Question 7: How Do You Define Success - Uplift or Revenue?
CRO Agency Red Flags and How to Spot Them Early

Why Most CRO Agency Pitches Look Identical

If you've taken three discovery calls this week, you've already noticed the pattern. Case studies with big percentages. A testing roadmap slide. A retainer quote somewhere between $6,000 and $15,000 a month. The pitches blur together because they're selling the same story.
‍
The differences show up about 90 days in. One agency is running statistically sound tests tied to revenue on your P&L. The other is redesigning your hero section every two weeks and calling it optimisation. By the time you notice, you've spent $30,000 and your conversion rate is exactly where it started.
‍
Think of it like hiring a contractor to renovate your kitchen. Every contractor shows you photos of finished kitchens. The question that matters is how they handle the plumbing when the wall comes down and there's a leak nobody planned for. CRO is the same. The work happens when tests lose, research surprises you, or a "winning" variant doesn't replicate on the P&L.
‍
The seven questions below are designed to surface the method, not the marketing. Ask them in the order given. You'll learn more in 20 minutes than most founders learn in three months of working with the wrong agency.

Pro Tip:

Run the same questions across every agency you interview. Consistency is how you compare answers. If you ask Agency A about win rate and Agency B about pricing, you can't actually compare them. Use one scorecard, seven questions, and rate each response 1-5.

Question 1 - What's Your Win Rate Over the Last 12 Months?

The first question to ask a CRO agency is their overall win rate across every test they ran in the last 12 months - not just the ones that made it into the deck. A healthy win rate sits somewhere between 20% and 35%. Anything above 50% is either a fresh agency with five tests under its belt or someone cherry-picking.
‍
Here's why the range matters. CRO is a science, and real science includes losing hypotheses. An agency with a 70% win rate is probably testing safe, low-impact changes (button colour, copy tweaks) or calling flat tests "wins" because the variant didn't lose. Neither moves revenue.
‍
A good answer sounds like this: "Over the last 12 months we ran 47 tests across 11 clients. 14 were winners, 9 were losers, and 24 were flat or inconclusive. Our winners averaged a 6.8% uplift in revenue per visitor."
‍
That answer includes volume, breakdown, and the metric. A weak answer sounds like: "We have a 92% success rate." Ask what they mean by success and watch the room go quiet.

Quick Note:

"Flat" is not a dirty word. Flat tests are still learning. They tell you the lever you pulled doesn't move the needle, which saves you from building a whole strategy around a dead end. Agencies that hide flats are hiding the map of what didn't work.

Question 2 - How Do You Size Samples and Call Significance?

This is the question that separates testing programmes from theatre. Ask how they calculate sample size before a test starts and when they decide a test is "done."
‍
A proper answer includes three numbers: baseline conversion rate, minimum detectable effect (MDE), and statistical power (usually 80%). The agency should plug these into a sample size calculator and commit to a test duration before the test goes live - typically two to four full business cycles, or at least two weeks minimum.
‍
If the answer is "we let it run until we see a winner," run. That's called peeking, and it inflates false positives. It's the CRO equivalent of flipping a coin 100 times and stopping the count the moment heads is ahead. You'll always find a "winner" that doesn't hold up.
‍
Imagine testing whether a new menu at your café sells more espressos. You don't declare victory on Tuesday lunch because three extra people ordered one. You run the menu for a full month across weekday mornings, weekend rushes, and rainy Sundays. Sample size is just the ecommerce version of giving the menu enough shifts to prove itself.

Question 3 - What Happens When a Test Loses?

Ask the agency to walk you through the last test they ran that lost. What was the hypothesis, what happened, and what did they do next. The answer tells you everything about their process.
‍
A mature agency treats losers as data. They'll tell you exactly which assumption in the hypothesis was wrong, what the post-test analysis revealed, and how it changed the next test. A less mature agency will say "we'll run another variant" or change the subject to a winner.
‍
Losing tests are more valuable than winning ones over a 12-month programme. They eliminate bad ideas, refine customer understanding, and prevent you from scaling a change that would have quietly cost you money. Any agency that treats losers as failures rather than inputs is running an optimisation theatre, not a programme.
‍
Think of it like an emergency room. A good doctor doesn't just celebrate the patients who walked out fine - they do a post-mortem on the complications, because that's where the lessons live. Same with tests. The wins feel good. The losses teach you how your customers actually behave.

Important Update:

Losing tests should have a written post-mortem. Ask to see the template. If every test ends with "we'll iterate," there's no learning loop. A proper loser analysis documents the original hypothesis, what the data showed, what the agency thinks went wrong, and what it means for future tests on similar pages.

Question 4 - How Does Research Feed the Testing Roadmap?

A testing roadmap without research is a guessing roadmap. Ask what research methods the agency uses and how those inputs become hypotheses on the testing calendar.
‍
The answer should cover a mix: quantitative data (GA4 funnel analysis, Shopify reports, heatmaps, session recordings) and qualitative data (customer surveys, review mining, user testing, support ticket analysis). The best agencies combine at least three methods before writing a single hypothesis.
‍
Here's what a weak answer looks like: "We follow best practices and look at your analytics." Best practices are the average of what worked for other stores. Your store is not average, and your customers aren't average, so best practices are a starting bias, not a strategy.
‍
A strong answer sounds like: "We start with a two-week research sprint - review mining on your top 200 reviews, five user tests, a funnel audit in GA4, and a heatmap on your top three landing pages. That produces roughly 25 to 40 friction points, which we turn into prioritised hypotheses." Concrete, time-bound, and evidence-led.

Question 5 - Who Owns the Testing Tool Contract?

Small question, big implications. CRO testing tools like VWO, Convert, or AB Tasty aren't cheap - paid plans run $500 to $3,000+ per month depending on traffic. Ask whether the contract sits in your name or the agency's.
‍
If the agency owns the contract, two things happen when you leave. You lose access to every test result, every audience segment, every heatmap and recording you've built up. And the agency keeps the leverage - they can quietly bundle the tool cost into the retainer and mark it up 30-50%.
‍
The right setup is simple. The tool contract is in your name, on your billing. The agency has admin access, they install it, they run it, they build everything inside it. If the relationship ends, you keep 18 months of test history, segments, and recordings. The only thing that leaves is the agency.
‍
This applies to analytics tools too. GA4 property, Hotjar, Microsoft Clarity, survey tools - all in your name. A good agency will tell you this before you ask. It's a signal they're used to working with sophisticated clients who know how the industry works.

Question 6 - How Do You Score and Prioritise Hypotheses?

Every CRO agency should be using a scoring framework - ICE, PIE, or RICE are the common ones. Ask which they use and how it works in practice.
‍
ICE scores each hypothesis on Impact, Confidence, and Ease (1-10 each). PIE scores on Potential, Importance, and Ease. RICE adds Reach - how many users the test will affect. The framework itself matters less than whether they use one consistently and whether you'll see the scores.
‍
Ask for a sample prioritised backlog. A good agency will show you 15-30 scored hypotheses with notes on the research input behind each one. A weaker agency will send you a three-item roadmap based on "what we usually test first."
‍
The scoring isn't gospel - it's a conversation starter. A hypothesis scoring 24 on ICE should be tested before one scoring 16, but the founder's gut check matters too. If the top-scoring test conflicts with brand guidelines or a product launch, it gets deprioritised.
What you want is a system that forces the conversation to be explicit instead of "trust us."
‍
If you want to pressure-test your own hypothesis backlog or compare how agencies score ideas, the Weblics CRO framework breakdown walks through ICE, PIE, and RICE side by side.

Reminder:

Prioritisation scores should be visible to you. If the agency holds the scoring spreadsheet and only sends you the top three tests each quarter, you can't challenge the logic. Ask for read access to the full backlog. Real agencies share it. Performance theatres don't.

Question 6 - How Do You Score and Prioritise Hypotheses?

This is the question that catches most agencies off guard. Ask how they define a successful test: uplift on a metric, or revenue on the profit and loss statement.
‍
The honest answer is that uplift and revenue can disagree. A variant might increase conversion rate by 8% but reduce average order value by 12%, netting out to flat revenue. Another variant might lift add-to-cart by 20% but tank checkout completion. If the agency only measures the metric closest to their intervention, they'll "win" while your P&L stays flat or goes backwards.
‍
A strong answer ties test results to revenue per visitor (RPV) - the metric that captures both conversion rate and AOV in one number. Even better if they look at it alongside new customer revenue versus returning, because a lift driven by existing buyers doesn't grow the business the same way.
‍
Here's the GPS analogy. A turn-by-turn app that tells you you're making great time while you're headed to the wrong city is useless. Uplift on a single metric without revenue context is the same thing - technically accurate, strategically meaningless.

CRO Agency Red Flags and How to Spot Them Early

A few answers should end the conversation before the second call. These are the CRO agency red flags worth memorising.
‍
Guaranteed results. Anyone promising a specific uplift percentage before seeing your data is either lying or doesn't understand how testing works. CRO outcomes depend on traffic volume, baseline, seasonality, and a dozen variables the agency can't see on a sales call. The only legitimate guarantee is a process guarantee - "we'll run X tests in Y days" or "we'll refund if we don't hit a process milestone."
‍
"We'll redesign your site." Redesigns are not CRO. They're creative projects dressed up as optimisation. A full redesign removes the baseline you'd need to measure whether anything actually worked. Good agencies test inside the existing site and only recommend a redesign after the data says the current structure is fundamentally broken.
No research phase. If the agency wants to start testing in week one, they're guessing. A proper CRO programme has a two to four week discovery phase before the first test goes live. Anyone skipping it is selling activity, not outcomes.
‍
Flat retainers with no deliverable cadence. A $10,000 monthly retainer should come with a specific cadence: X tests per quarter, Y research outputs, Z reporting rhythms. If the contract just says "ongoing CRO services," you'll spend six months wondering what you're paying for.
‍
Case studies without context. Percentages are useless without traffic, duration, and test design. A "23% uplift" on a site doing 500 visitors a month is noise. Ask for the denominators. Any agency unwilling to share them is hiding something.

Pro Tip:

The best agencies will disqualify themselves. If an agency tells you your traffic is too low for rigorous testing, or that you need a research phase before a test plan, or that they don't do redesigns - that's a trust signal. Agencies saying yes to everything are telling you what you want to hear, which is rarely what you need.