A 0.70-AUC Crypto Model That Lost Money on Every Coin (AUC Is Not P&L)
I trained machine-learning models that genuinely predict which way a crypto candle goes — ~0.70 AUC, out-of-sample, walk-forward, across 16 coins. Then I traded the signal and lost money on every single one.
The gap between those two sentences is the most important lesson in quant trading, and it cost me a week of GPU time to relearn it.
The setup
Per-coin XGBoost. 5-minute bars. ~3 years of history per symbol. Proper walk-forward validation — train on a window, test on the next one, roll forward, never peek at the future. Trained on a P100. 25 coins. The label: does the next move clear a volatility-scaled threshold.
The part that looked great
By every machine-learning metric, this was a win:
- ETH: mean out-of-sample AUC 0.70 across 29 walk-forward windows. 29 of 29 windows scored above 0.55.
- Across the basket: 16 of 16 coins held AUC ≥ 0.55, and zero collapsed below 0.52.
- I vol-normalized the labels to check the model wasn't just secretly predicting volatility regimes instead of direction — the AUC barely moved (0.697 → 0.703). There is real directional signal in there.
0.70 AUC out-of-sample, surviving walk-forward and a vol-normalization sanity check, on 16+ coins. If you stopped at the model card, you'd ship it. I almost did.
The part where it lost money on everything
Then I ran a fee-aware replay — actually trading the signal with realistic costs and slippage. Best config per coin:
| Coin | Best-config net P&L |
|---|---|
| BTC | −$236.67 |
| ETH | −$212.64 |
| XRP | −$111.04 |
| SHIB | −$96.04 |
| SUSHI | −$89.00 |
(Backtest replay, summed across thousands of simulated trades on ~$1 risk units — the signal is that every config is red, not the absolute figure. Other configs were far worse; ETH at higher frequency hit −$6,100.)
Every coin. Every configuration. And the one summary number that ended the project: pct_win_windows = 0.0000. Across all the walk-forward windows, zero were net profitable. Not "mostly losing." Zero for five coins, zero for twenty-nine windows.
Why a 0.70-AUC model loses money
Two numbers explain the whole thing.
1. AUC measures direction. It says nothing about magnitude. The model was right about which way ~70% of the time. But the moves it was right about averaged ~0.23% gross per trade. Round-trip fees plus slippage ran 0.2%–0.5%. You can be reliably correct about the direction of a move that is too small to capture profitably. At 10 bps fees the post-cost hit rate was ~24%; at 25 bps it fell to ~10% — doubling the fee roughly doubled the losses. The entire "edge" was living inside the bid-ask spread.
2. 93–99.7% of trades exited on a timeout, not a target. Almost nothing ever reached the take-profit. The model said "up," the price drifted a hair, and the position aged out at the time stop for a small post-fee loss. A signal that's directionally right but magnitude-tiny produces exactly this: thousands of trades that each bleed a few basis points to friction.
The lesson that actually matters
AUC is not P&L. Predictive power and profitability are different axes. A model can be genuinely, robustly predictive and still be worthless to trade, because:
- It predicts the sign of a move, not whether that move clears your costs.
- Out-of-sample AUC validates the model. It does not validate the strategy. The only thing that validates a strategy is a fee-aware replay with realistic slippage — and that's the test almost no "I built an ML trading bot 🤖🚀" post ever shows you.
If your backtest reports AUC, accuracy, or a confusion matrix but not a net-of-fees equity curve, you don't yet know whether you have a strategy. You have a science project.
What I did with it
I didn't deploy it. Zero for five is not a close call.
And then — because lessons are cheap until they're expensive — I bet that better features (live bonding-curve data on brand-new memecoins) would clear the cost bar where 5-minute bars couldn't. They produced a real AUC lift in backtest, too. On real money, the memecoin sniper hit the exact same wall.
So that's three bots and three different metrics that looked like edge — win rate, AUC, and backtest signal — and the same answer every time: a number that looks like an edge right up until it meets the cost of capturing it. The only strategy of mine that actually makes money is the boring uniform grid with no clever model in it at all. There's a lesson in that, too.
You can watch all of it run — the winners and the losers — on real money, live.