We audited 40,000 of our own backtests — and benched several of our own strategies in the process. Here is what we killed, what we reactivated, and the uncomfortable core lesson: whether a strategy is "good" depends less on the strategy than on the asset class it runs in.

Most trading platforms show you which strategies work. We also show you which of our own do not — and we shut them off. After roughly 40,000 backtests on the platform, that is exactly what we did.

This is not a marketing post. It is an autopsy of our own strategy lineup.

The data

We analyzed roughly 26,000 comparable backtest runs (at least 5 trades, so it is evidence and not anecdote), across all asset classes, measured against buy-and-hold. Every claim below comes from those runs, not from gut feeling.

And the first finding was the most uncomfortable.

The core lesson: quality depends on the asset class, not the strategy

Break our strategies down by asset class and the picture splits cleanly into two worlds:

Asset class	Beats B&H (share of runs)	Interpretation
Crypto	~65% beat single B&H	Almost everything beats holding
Stocks / ETFs / Forex / Commodities	~16% beat single B&H	Almost nothing beats holding

This is not about the strategies. It is about the window: 2015–2026 was an almost uninterrupted bull market for stocks and ETFs. Against a market like that, any strategy that sits in cash at times loses — no matter how clever its signal. In crypto, with its brutal cycles, stepping aside pays off.

That has a direct consequence for curation: we cannot call a strategy "bad" just because it fails to beat buy-and-hold on stocks. We have to look closer.

What we shut off

For us, shutting off means: out of the backtester and the default selection, but the data stays — the entry remains visible in the Strategy Library, with the verdict and the numbers that led to the shutdown.

bb_squeeze — fully shut off. The weakest strategy on the platform. It beats buy-and-hold on only 11 of 32 pairs, and even on crypto (home turf) only 11 of 28. We did not just condemn it — we prototyped a modern TTM squeeze rebuild. It reached +8.6% vs B&H, so the concept works. But it stayed clearly below our existing trend strategies (ema_cross +24%, supertrend +23%, both with fewer trades). A rebuild would have been a mid-pack duplicate — so deliberately no v2.

fibonacci — shut off as curation. Tricky one: per pair it nominally beats buy-and-hold by a hair (+1.9 percentage points). But that edge is propped up by dead alt-coins — there, B&H averages −9.7%, and the strategy only "wins" by sitting in cash, not through a real edge. On liquid majors it loses. Here too we tried to fix it: the fixed 1.272 take-profit was a bottleneck; a trailing exit lifted the result by about 20 percentage points (from −29.5 to −9.6). But even repaired it does not clear buy-and-hold on majors. Pre-cost breakeven at many trades means a loss after costs. Ballast.

obv_macd (v1) — shut off, replaced by v2. Not a performance verdict, but versioning. v2 corrects the flip timing (fills at the flip-bar close instead of one bar later, matching the live signal and the traffic light). If you want OBV-MACD, you take v2.

stoch_rsi_sma — shut off after costs. This was the most uncomfortable one, because by usage it is our second-most-popular strategy. Before fees it looks viable: an average +9.2% CAGR. But it trades an average of 558 times per run — and every trade costs. With a realistic per-side fee (0.10% crypto / 0.05% stocks) it collapses to an average +1.1% (median −2.1%), 16 of 28 tested cells turn net-negative, and in 0 of 28 does it beat buy-and-hold after costs. The code is correct — it is pure turnover erosion. We built a dedicated net-of-cost layer for this, one that re-prices every fill with a fee.

What we did NOT shut off — and why that matters just as much

The obvious conclusion from "non-crypto almost never beats B&H" would be: kill half the non-crypto lineup. We checked — and decided against it. That is the second half of the honesty.

When we added drawdown to the analysis, the picture flipped: almost all non-crypto strategies are drawdown protectors. They deliver 15 to 45 percentage points shallower drawdown than buy-and-hold — despite lower returns. They are not return machines, they are risk reducers. A blanket mass-shutdown would have been a mistake.

The same logic rescued one reactivated strategy and one filter:

capitulation_v1 reactivated on crypto. We had judged it too harshly before (on an average rather than a median basis). The 40k data show, on crypto: 60% of pairs beat B&H, a +8.7% median edge, and about 40 percentage points shallower drawdown. It stays off only on non-crypto.
The ATR volatility filter stays. In aggregate it beats B&H only 42% of the time — that looked like a shutdown candidate. But it is not a return booster, it is a drawdown tool: it raises drawdown protection on crypto from +34 to +52 percentage points. Judge it by hit rate and you measure the wrong thing.

The tools this produced

A curation is only as honest as its yardstick. We built two things for that:

A net-of-cost layer that evaluates every strategy after realistic trading fees. It was the thing that tipped stoch_rsi_sma over — and it generally exposes that "beats buy-and-hold" is half a truth before fees (buy-and-hold pays in only once).
A risk-adjusted scorecard, now the default lens for non-crypto assets: instead of just return vs buy-and-hold, it shows how much of buy-and-hold's upside a strategy captures and how much drawdown it saves. On a bull-market asset, that is the more honest question.

What this means for you

Three practical takeaways:

Always ask about the asset class. A strategy that shines on crypto can lose structurally on stocks — not because it is bad, but because the market breathes differently.
Always ask about the fees. A high hit rate at high trade frequency can be a losing business after costs.
Return is not the only axis. In strongly trending markets, drawdown protection is often the real value of an active strategy — not the extra return.

What Backtesting Arena contributes here

We treat shutting off our own strategies as part of the product, not an embarrassing slip. Every killed strategy stays visible with its verdict and its numbers. The lesson of why something does not work is often worth more to a user than one more strategy that looks good gross.

This is not an "only honest platform" claim. It is simply the way of working we believe is right: measure, judge honestly, and show the result — even when it makes our own lineup smaller.

FAQ

Aren't you breaking your own product by doing this? The opposite. A roster of 24 strategies, several of which are ballast after costs, is weaker than a curated set you can trust. We crystallize out the robust ones. The data of the shut-off ones stays visible.

Does "non-crypto almost never beats B&H" mean active trading on stocks is pointless? No. It means the yardstick "does it beat buy-and-hold's return" is unfair in a persistent bull market. The right yardstick is risk-adjusted: how much upside do you capture, and how much drawdown do you save? That is exactly why our scorecard is now the default view on non-crypto.

Why was stoch_rsi_sma so popular if it loses after costs? Because the pre-cost backtest rewarded it — like most backtest engines, ours long computed without fees. That is exactly the gap the net-of-cost layer closes. Popularity here was an artifact of the measurement, not proof of performance.

Do shut-off strategies ever come back? If someone presents a repaired, validated version that sits above buy-and-hold after costs — yes. capitulation_v1 is proof that we correct in the other direction too: it was reactivated when better data showed the first verdict was too harsh. Data first, then decide.

Aren't your fee assumptions arbitrary? They are conservative-realistic: 0.10% per side matches the Binance spot taker fee, 0.05% is a realistic stock spread proxy. They are assumptions, not constants of nature — but they hit the right order of magnitude, and slippage (which would only enlarge the effect) is deliberately not yet included.

Backtesting Arena

40,000 Backtests Later: Which of Our Own Strategies We're Killing