High returns are easy to find. Here's how to tell whether they mean anything.
You've run a backtest. The result looks good. CAGR of 34%. Win rate of 68%. Total return of 280%.
Should you trust it?
Not yet. Not without asking four questions first.
Most backtesting results — including the ones people post online, sell as services, and build entire newsletters around — fail at least one of these. Understanding why separates useful research from noise.
1. Question 1: What's the CAGR — Not the Total Return?
Total return is the most commonly cited number in trading content. It's also the least meaningful one without context.
"This strategy returned 450% total." Over how long?
If that's 20 years, it's roughly 8.5% per year — slightly below the S&P 500's long-term average. Unimpressive. If it's 4 years, it's about 52% per year — exceptional.
Same number. Completely different meaning.
CAGR (Compound Annual Growth Rate) solves this. It tells you the equivalent annual return, accounting for compounding, regardless of the time period tested. It makes every backtest comparable on equal terms.
The formula: (End Value / Start Value)^(1 / Years) − 1
Reference points worth knowing:
- S&P 500 long-term: ~10% CAGR
- Bitcoin long-term (from 2015): 60%+ CAGR (with extreme volatility)
- A "good" active strategy: consistently above the relevant Buy & Hold benchmark
When someone shows you a backtest result, always ask for the CAGR. If they only show total return, they're either hiding something or don't know what they're doing.
2. Question 2: What's the Maximum Drawdown?
A strategy with 34% CAGR sounds excellent. But what if it dropped 72% at its worst point before recovering?
Would you have held through that?
Most people wouldn't. They'd sell at the bottom — locking in the loss, missing the recovery, and ending up worse than if they'd done nothing.
Maximum drawdown is the largest peak-to-trough decline a strategy experienced. It answers the question: "What's the worst stretch I'd have had to survive — and could I actually do it?"
The math gets brutal at high drawdown levels:
- 50% drawdown requires 100% gain to recover
- 60% drawdown requires 150% gain
- 70% drawdown requires 233% gain
High CAGR with high drawdown isn't a great strategy. It's a strategy that works on paper but destroys discipline in practice.
The question to ask before trusting any backtest: "Could I realistically hold through the worst drawdown this strategy has ever produced?"
If the answer is no — the CAGR doesn't matter. You'd never capture it.
3. Question 3: Does It Beat the Right Benchmark?
This is where most "winning" strategies quietly fall apart.
The standard benchmark is Buy & Hold — buy on day one, hold forever. Simple, no decisions, no fees.
But standard Buy & Hold has a hidden problem: it assumes you invested everything at the exact start date of the backtest. Nobody does that in real life.
If your backtest starts in January 2019, Buy & Hold bought Bitcoin at $3,500. Of course almost anything looks good compared to that entry.
If your backtest starts in November 2021, Buy & Hold bought at $65,000. Suddenly every strategy looks like a genius by comparison.
Same strategy. Same parameters. Completely different conclusion — depending on the start date.
A more honest benchmark is what we call Average B&H — the average CAGR across all possible entry points that had enough time remaining. Not the lucky entry. Not the worst entry. The realistic entry that most investors would actually experience.
Your strategy should beat Average B&H, not just the artificially lucky version of Buy & Hold.
And then there's DCA — Dollar-Cost Averaging, investing a fixed amount at regular intervals. This is what most investors actually do. If your active strategy doesn't beat steady, boring DCA over a meaningful period, the complexity isn't worth it.
Three benchmarks. Three questions:
- Does it beat Buy & Hold (day one)?
- Does it beat Average B&H (realistic entry)?
- Does it beat DCA (what most people actually do)?
A strategy that passes all three is genuinely interesting.
4. Question 4: How Many Trades Is This Based On?
This is the one most people skip — and it's often the most important.
Statistical significance requires enough data points to distinguish edge from luck.
A rough guide:
- Under 10 trades: essentially meaningless
- 10–30 trades: weak signal, treat with serious caution
- 30–50 trades: starting to be meaningful
- 50+ trades: solid foundation for evaluation
Here's why it matters: with 5 trades, a coin-flip strategy could show an 80% win rate by pure chance. With 50 trades, random luck evens out. With 100+ trades, you're seeing actual edge — or the absence of it.
The trap with complex strategies: more conditions = fewer signals = fewer trades = less validity.
A strategy that requires RSI below 30 AND price above the 200 SMA AND Fear & Greed below 25 AND weekly candle close green might only generate 4 signals in 5 years. That's not a backtest. That's four data points with a very compelling story attached.
Before trusting any backtest result: how many trades is this based on?
Under 30? It's an anecdote, not evidence.
5. Putting It Together
The four questions, in order:
- CAGR — not total return. Makes timeframes comparable.
- Max drawdown — could you realistically survive the worst stretch?
- Benchmark comparison — does it beat Buy & Hold, Average B&H, and DCA?
- Trade count — is there enough data to trust the result?
A backtest that passes all four isn't guaranteed to work in the future. No backtest is. But it's a result worth taking seriously — and it's a dramatically higher bar than most trading content ever clears.
Every backtest on Backtesting Arena shows all four metrics automatically — including Average B&H and trade count. No credit card required. → tradingstrategies.work
Study the past, improve your future.
In the next post: we apply these metrics to real strategies on real data — starting with the Golden Cross, one of the most cited and least tested signals in crypto trading.
FAQ:
Question: What's the difference between CAGR and total return?
Answer: Total return is the simple cumulative gain over the entire period (e.g. "the strategy returned 450%"). It tells you nothing about how long that took. CAGR (Compound Annual Growth Rate) distributes that return mathematically across individual years, accounting for compounding. It makes backtests over different time ranges directly comparable. Anyone showing only total return is either hiding something or doesn't know what they're doing.
Question: Why isn't a simple Buy & Hold comparison enough?
Answer: Because the benchmark depends massively on the start date. A backtest starting January 2019 compares against buying Bitcoin at $3,500 — almost any strategy looks good against that. A backtest starting November 2021 compares against buying at $65,000 — almost any strategy looks like a genius. Average B&H averages across many realistic entry points and is therefore a much more meaningful benchmark.
Question: When is the trade count statistically sufficient?
Answer: Under 10 trades, a backtest is essentially meaningless — even a coin-flip strategy can show a high win rate by chance. Between 30 and 50 trades the result starts becoming meaningful. At 50+ trades you have a solid foundation. Watch out for complex strategies with many conditions: they often produce only a handful of signals over multiple years, which statistically doesn't allow evaluation — no matter how compelling the story around it sounds.
Question: What is Average B&H and where do I find it?
Answer: Average B&H is the average CAGR across all possible entry points within the backtest period that had enough remaining time for a meaningful evaluation. The metric removes the distortion caused by a randomly favorable or unfavorable start date. Backtesting Arena shows this value by default for every backtest, alongside regular Buy & Hold and DCA. A strategy should ideally beat all three benchmarks.