31 Strategies Tested, 4 Survived

31 Strategies Tested, 4 Survived

The complete results of testing 31 systematic trading strategies across forex and crypto. What worked, what failed, and the patterns that separate robust strategies from curve-fitted illusions.

backtestingstrategyresults

The Graveyard Is the Real Dataset

Every quant team publishes their winners. Nobody publishes the 87% that didn’t make it.

We think the failures are more valuable than the successes. If you only study strategies that work, you’re training on survivorship bias. You’ll learn what correlates with success in a biased sample, not what actually causes it.

So here’s the full list. Thirty-one strategies. Four survivors. And the patterns that separate them.

Strategy graveyard — all results, tick-verified where possible Every strategy we tested with its verdict. Two survived (green). Seven died on real data (red). The breakeven line (PF=1.0) is where hope goes to die.

Risk:Reward sweep across all signal types Win rate vs expectancy across R:R ratios. Higher R:R = better math but worse variance. For prop firm use, need WR > 50% AND low drawdown.

The Testing Framework

Before the results, the methodology. Every strategy was tested with the same framework to ensure comparability:

import pandas as pd
import numpy as np
from dataclasses import dataclass

@dataclass
class BacktestConfig:
    """Standardized backtest configuration for all strategies."""
    initial_capital: float = 100_000
    risk_per_trade: float = 0.01           # 1% risk per trade
    max_positions: int = 1                  # Single position at a time
    slippage_pips: float = 1.0             # 1 pip slippage per side
    commission_per_lot: float = 7.0         # $7 round trip per standard lot
    in_sample_end: str = '2023-12-31'       # IS/OOS split
    min_trades: int = 100                   # Minimum trades for validity
    max_param_sensitivity: float = 0.30     # Max PF drop in sensitivity test

def survival_criteria(results: dict, config: BacktestConfig) -> dict:
    """
    A strategy survives if it passes ALL criteria.
    """
    checks = {
        'enough_trades': results['n_trades'] >= config.min_trades,
        'profitable_is': results['profit_factor_is'] > 1.2,
        'profitable_oos': results['profit_factor_oos'] > 1.1,
        'oos_degradation': (
            results['profit_factor_oos'] / results['profit_factor_is'] > 0.7
        ),
        'param_robust': results['param_sensitivity'] < config.max_param_sensitivity,
        'reasonable_drawdown': results['max_drawdown'] > -0.20,
    }
    checks['survived'] = all(checks.values())
    return checks

The criteria are intentionally strict:

  1. Minimum 100 trades — enough for statistical significance
  2. In-sample profit factor > 1.2 — clearly profitable, not marginal
  3. Out-of-sample profit factor > 1.1 — survives unseen data
  4. OOS degradation < 30% — performance doesn’t collapse on new data
  5. Parameter sensitivity < 30% — not dependent on exact parameter values
  6. Max drawdown < 20% — risk is manageable at our sizing

The Full Results

Category 1: Volatility Strategies (7 tested, 1 survived)

#StrategyIS PFOOS PFTradesSurvivedFailure Reason
1Bollinger Band squeeze breakout1.150.92423NoOOS reversal
2ATR channel breakout1.311.08289NoOOS degradation > 30%
3Entropy collapse timing1.511.33347Yes
4Keltner channel mean reversion1.180.97512NoOOS unprofitable
5VIX regime filter (equities)1.431.21156NoParameter sensitive
6Realized vs implied vol spread1.221.0489NoToo few trades
7GARCH volatility forecast1.090.88234NoIS marginal

The entropy collapse strategy is our best performer. The full writeup is in Entropy Collapse as a Volatility Timing Signal.

Category 2: Statistical / Regime (6 tested, 1 survived)

#StrategyIS PFOOS PFTradesSurvivedFailure Reason
8Hurst exponent regime adaptive1.421.29267Yes
9Cointegration pairs (EURUSD/GBPUSD)1.341.15143NoParameter sensitive
10Kalman filter trend extraction1.270.94198NoOOS degradation
11Hidden Markov Model regime1.481.01112NoOOS degradation
12Variance ratio test timing1.110.93267NoIS marginal
13Autocorrelation momentum1.191.09312NoIS marginal

HMM regimes showed the most promising in-sample results (1.48 PF) but collapsed out-of-sample to 1.01. Classic case of a flexible model that fits noise. The Hurst approach survived precisely because it’s simpler — less capacity to overfit. Details in Hurst Exponents for Mean Reversion.

Category 3: Price Action / Market Structure (8 tested, 1 survived)

#StrategyIS PFOOS PFTradesSurvivedFailure Reason
14FVG Wall clustering1.311.18203Yes
15Order block bounce1.140.89387NoOOS unprofitable
16Liquidity sweep reversal1.221.03245NoOOS degradation
17Session open range breakout1.191.12489NoIS marginal
18Previous day high/low reaction1.080.95534NoIS marginal
19Fibonacci cluster confluence0.970.91312NoIS unprofitable
20Swing failure pattern1.231.01178NoOOS degradation
21Displacement + FVG standard1.160.94298NoOOS unprofitable

The Fibonacci cluster strategy was the only one that failed to achieve even 1.0 PF in-sample. Fibonacci levels have no edge. We tested this exhaustively. See FVG Magnetism for the FVG Wall research.

Category 4: Momentum / Trend (5 tested, 1 survived)

#StrategyIS PFOOS PFTradesSurvivedFailure Reason
22Adaptive momentum (variable lookback)1.381.22312Yes
23Dual moving average crossover1.121.05187NoIS marginal
24Donchian channel breakout1.251.11156NoParameter sensitive
25RSI divergence1.090.87423NoOOS unprofitable
26MACD histogram reversal1.040.91567NoIS marginal

The adaptive momentum strategy dynamically adjusts its lookback period based on recent volatility — shorter lookback in high-vol, longer in low-vol. Simple modification, significant impact. The fixed-parameter versions (strategies 23-26) all failed.

Category 5: Alternative / Exotic (5 tested, 0 survived)

#StrategyIS PFOOS PFTradesSurvivedFailure Reason
27Lunar cycle correlation0.980.95312NoIS unprofitable
28Sentiment (Twitter NLP)1.170.8289NoOOS collapse, few trades
29Commitment of Traders1.211.0867NoToo few trades
30Intermarket correlation shifts1.140.97145NoOOS unprofitable
31Seasonality patterns1.111.0448NoToo few trades

Yes, we tested lunar cycles. No, it doesn’t work. Science requires testing hypotheses you expect to fail.

Patterns in the Failures

After 31 strategies, the failure modes cluster into five categories:

failure_analysis = {
    'oos_degradation': {
        'count': 8,
        'pattern': 'Strong IS, weak OOS. The model fits noise, not signal.',
        'common_in': ['Complex models (HMM, Kalman)', 'Many parameters'],
        'lesson': 'Simpler models generalize better.',
    },
    'is_marginal': {
        'count': 7,
        'pattern': 'IS profit factor 1.0-1.2. Not enough edge to survive costs.',
        'common_in': ['Classic indicators (RSI, MACD, MA crossover)'],
        'lesson': 'If it barely works in sample, it won\'t work live.',
    },
    'parameter_sensitive': {
        'count': 4,
        'pattern': 'Works at exact parameters, dies with small changes.',
        'common_in': ['Strategies with >3 optimizable parameters'],
        'lesson': 'Robust strategies work across a parameter plateau.',
    },
    'too_few_trades': {
        'count': 4,
        'pattern': 'Promising results but <100 trades. Insufficient evidence.',
        'common_in': ['Higher timeframe strategies', 'Exotic signals'],
        'lesson': 'You need statistical significance, not just profit.',
    },
    'oos_unprofitable': {
        'count': 4,
        'pattern': 'OOS profit factor below 1.0. Edge is imaginary.',
        'common_in': ['Price action "common knowledge" strategies'],
        'lesson': 'Popular doesn\'t mean profitable.',
    },
}

What the Survivors Have in Common

The four strategies that passed all criteria share three properties:

1. Few free parameters. The entropy strategy has 3 key parameters. The Hurst strategy has 3. The FVG Wall has 3. The adaptive momentum has 2. Compare this to the HMM strategy (7 parameters) or the cointegration pairs (5 parameters). Fewer parameters means less capacity to overfit.

2. Adaptive mechanisms. Every survivor adapts to market conditions: entropy z-scores adapt to the local entropy distribution, Hurst switches between strategy families, FVG Wall strength adapts to clustering, adaptive momentum adjusts its lookback. The failures almost all used fixed parameters.

3. Grounded in a causal story. Entropy collapse reflects order accumulation before volatility. Hurst exponent measures actual serial dependence. FVG clustering reflects genuine supply/demand imbalance. Adaptive momentum captures volatility-dependent trend persistence. Compare this to “RSI is below 30” which has no microstructure explanation.

The 12.9% Survival Rate

Four out of 31 is a 12.9% survival rate. Is that good?

It’s probably typical for honest quant research. The literature suggests that:

  • Academic factor research has a replication rate of about 15-20%
  • Hedge fund strategy development typically sees 10-15% survival to production
  • Retail trading systems have an estimated survival rate below 5% (though this is hard to measure)

Our 12.9% is in line with institutional benchmarks, which gives us some confidence that our testing framework is neither too strict nor too lenient.

Conclusion: What We’ll Test Next

The research continues. Our current pipeline includes:

  • Orderflow imbalance strategies on crypto (tick-level data from Hyperliquid)
  • Cross-asset entropy correlation (does entropy collapse in bonds predict FX volatility?)
  • LLM-generated trading hypotheses with automated backtesting (see Building a $1 LLM Trading Agent)

The key question is always the same: does this idea survive honest testing?

Most won’t. That’s the point.


Dive into the individual survivors: Entropy Collapse | Hurst Exponents | FVG Magnetism. For the bigger picture, read Why We Open-Source Our Quant Research.