0
1
2
3
4
5
6
7
8
9
10
11
12
13
The Complete Automated Trading Playbook

From zero knowledge
to a validated, automated
trading system.

Not signals. Not copy-trading. Not “trust me bro.”
The methodology behind a system that actually runs.

This playbook walks you through every step of building an automated crypto trading system — from understanding what an edge is, through strategy development, rigorous backtesting, falsification, position sizing, infrastructure, and live deployment. It was written by someone who built and operates the system described. Every lesson comes from real experience — including the expensive mistakes.
READ THE PLAYBOOK
14
Modules
71+
Sections
7 mo
Live
2.7M
Candles Tested

What You’ll Build

14 modules. One complete trading system.

Module 0

Foundations

  • 0.1 What Is an Edge (and Why Most People Don’t Have One)
  • 0.2 How Markets Actually Work
  • 0.3 The Systems Thinking Framework
  • 0.4 Why 95% of Retail Traders Lose
~45 min read
Module 1

Mindset & Risk Philosophy

  • 1.1 Discretionary vs Systematic
  • 1.2 The Canonical Specification
  • 1.3 Empirical Over Theoretical
  • 1.4 The Adversarial Mindset
~30 min read
Module 2

Exchange Setup & Account Architecture

  • 2.1 Choosing an Exchange
  • 2.2 Spot vs Isolated vs Cross-Margin
  • 2.3 The Leverage Misconception
  • 2.4 API Keys & Security
  • 2.5 Fiat On/Off Ramps
  • 2.6 Stablecoin & Counterparty Risk
  • 2.7 Wrong-Rail Deposit Policy
~2.5 hours
Module 3

Data Infrastructure

  • 3.1 What Data You Need
  • 3.2 Where to Get It
  • 3.3 Storage & Databases
  • 3.4 Data Quality & Cleaning
  • 3.5 Historical Backfill
  • 3.6 Real-Time Data Layer
~3.5 hours
Module 4

Strategy Development

  • 4.1 Hypothesis to Testable Signal
  • 4.2 Types of Strategies
  • 4.3 Conviction Gates & Filters
  • 4.4 The Investigation Template
  • 4.5 Using LLMs as Research Assistants
~4 hours
Module 5

Backtesting Done Right

  • 5.1 Building a Backtesting Engine
  • 5.2 The Cardinal Rules
  • 5.3 Monte Carlo Simulation
  • 5.4 Walk-Forward & Out-of-Sample
  • 5.5 Parameter Sensitivity
  • 5.6 Multiple-Testing & FDR Control
~4 hours
Module 6

Trying to Kill Your Strategy

  • 6.1 Why Most Backtests Lie
  • 6.2 The Six Falsification Tests
  • 6.3 When to Kill vs When to Tune
  • 6.4 The Adversarial Review Process
~3 hours
Module 7

Position Sizing & Risk Management

  • 7.1 Leverage as Capital Efficiency
  • 7.2 Stop-Loss Philosophy
  • 7.3 Circuit Breakers & Drawdown Limits
  • 7.4 Portfolio-Level Risk
~2 hours
Module 8

Building the System

  • 8.1 Architecture Decisions
  • 8.2 Essential Components
  • 8.3 State Management & Reconciliation
  • 8.4 Configuration-Driven Strategies
  • 8.5 Using AI to Build
  • 8.6 Order Rounding & Contract Math
  • 8.7 Research vs Production Separation
~7 hours
Module 9

Deployment & Operations

  • 9.1 Server Setup
  • 9.2 Docker & Containerisation
  • 9.3 Shadow Mode (Paper Trading)
  • 9.4 Going Live
  • 9.5 Monitoring & Alerts
  • 9.6 Disaster Recovery
~4.5 hours
Module 10

Regime Detection & Macro Overlay

  • 10.1 Why the Same Strategy Fails in Different Markets
  • 10.2 Building a Regime Detector
  • 10.3 Macro Overlays
  • 10.4 The Volatility Filter
~3 hours
Module 11

Continuous Improvement

  • 11.1 Live Performance vs Backtest
  • 11.2 Strategy Degradation Detection
  • 11.3 Automated Research Loop
  • 11.4 Cross-Pollination
~2 hours
Module 12

Tax & Accounting

  • 12.1 Why Tax Matters from Day One
  • 12.2 Cost Basis Methods
  • 12.3 Spot vs Derivatives Tax Treatment
  • 12.4 Records You Need to Keep
  • 12.5 Practical Setup
  • 12.6 Structures
  • 12.7 Module Competency Checklist
~1.5 hours
Module 13

Operator Psychology & Discipline

  • 13.1 The Drawdown Test
  • 13.2 The Three Override Modes
  • 13.3 The Drawdown Protocol
  • 13.4 Information Hygiene During Live Operation
  • 13.5 Incident Response When Emotional
  • 13.6 Burnout
  • 13.7 Operator Competency Markers
~1.5 hours
~37 hours
Estimated reading & comprehension time

Module 0

Foundations
4 sections · ~45 min read

What this is NOT

This is not a signal service. We do not publish trades you can copy.

This is not financial advice. Nothing in this playbook is a recommendation to buy, sell, or hold any asset, or to adopt any specific strategy.

This is not a guarantee. Past performance does not predict future results. Most systematic strategies fail. Most edges decay. Building a system that makes money is hard, and building one that survives multiple regimes is much harder.

This is not safe. Crypto trading carries substantial risk of total loss of capital. Leverage amplifies that risk. Decide what you can afford to lose before you start, then never deploy more than that.

This is the methodology. The process for building, testing, and operating your own automated trading system. The strategies described are examples of the process; the process is what matters.

0.1 What Is an Edge

Before you write a single line of code or open a single exchange account, you need to understand the only thing that matters in trading: whether you have an edge. Everything else — the servers, the algorithms, the dashboards — is infrastructure for exploiting an edge. Without one, you are building a very expensive random number generator.

The House Always Wins — Unless You’re the House

A casino doesn’t win every hand. It wins 51% of them. Over thousands of hands, that 1% compounds into a fortune. That’s an edge: a small, repeatable, statistically verified advantage that manifests over many occurrences.

In trading, an edge is the same thing. It is not a hot tip. It is not a pattern you saw once on a chart. It is a measurable, repeatable tendency in price behaviour that persists across hundreds of trades, survives transaction costs, and holds up when you try to destroy it with statistical testing.

Most retail traders do not have an edge. They have opinions. Opinions do not compound.

What an Edge Looks Like in Practice

Here is the shape of a real edge, drawn from a category of system we operate in production. The numbers are deliberately qualitative — what matters is the profile, not any single point estimate:

MetricValueWhat It Means
StrategyA long-only weekly trend-following systemGoes long when a slow moving-average derivative turns positive, gated by a close-position filter and an efficiency-ratio gate
Win RateBelow 50%Loses more trades than it wins
Profit FactorHigh single digitsWinners are several times larger than losers
CAGRStrongly positive over a multi-year windowAnnualised return over the lookback
Max DrawdownContained well under 25%Worst peak-to-trough decline
Trades per YearSingle-digit annual frequencyExtremely low frequency

Notice: it loses more than half its trades. A beginner would look at that and say the strategy is broken. But the winners are so much larger than the losers that the overall expectancy is strongly positive. This is typical of trend-following systems.

Key Insight

An edge is not about being right most of the time. It’s about the ratio of what you make when you’re right versus what you lose when you’re wrong, multiplied across hundreds of occurrences. A 40% win rate with 3:1 reward-to-risk is more profitable than an 80% win rate with 1:4 reward-to-risk.

How Do You Know If You Have One?

You don’t guess. You test. Rigorously. This playbook will teach you how to:

  • Formulate a hypothesis — “I think BTC rallies when the weekly slope turns positive”
  • Backtest it against years of historical data with realistic costs
  • Stress-test it with Monte Carlo simulation (10,000 randomised permutations)
  • Try to destroy it with six different falsification tests
  • Verify it still works on data the strategy has never seen (out-of-sample)

If it survives all of that, you might have an edge. If it fails any test, you don’t — and that just saved you real money.

You Understand This When…

  • You can explain why a 40% win rate can be highly profitable
  • You understand that an edge is statistical, not predictive
  • You know that without rigorous testing, you have an opinion, not an edge

0.2 How Markets Actually Work

Most people think of a market as a chart going up and down. That’s like thinking of the ocean as a line on a depth gauge. The chart is the output. Understanding the machinery underneath it is what separates people who build profitable systems from people who draw lines on screens.

The Order Book

Every trade happens because two people disagree. One thinks the price is going up and buys. The other thinks it’s going down and sells. The mechanism that matches them is called the order book.

Think of it as two queues facing each other:

Sell Orders (Asks)

  • $95,1200.5 BTC
  • $95,1150.3 BTC
  • $95,1102.1 BTC
  • $95,1050.7 BTC
  • $95,100  ← ask1.4 BTC

Buy Orders (Bids)

  • $95,060  ← bid2.0 BTC
  • $95,0650.4 BTC
  • $95,0703.5 BTC
  • $95,0750.8 BTC
  • $95,0801.2 BTC
SPREAD: $95,100 − $95,060 = $40 (0.04%)

When you hit Market Buy, your order eats through the ask side from the best price upward. The more size you push, the higher the average fill price climbs — this is slippage.

Simplified order book. The spread is the gap between the best bid and ask. Slippage is the price impact of your order eating through multiple levels.

Why This Matters for Your System

When you backtest a strategy, you see clean numbers: “buy at $95,080, sell at $95,500.” In reality:

  • Spread — You pay (at least) the top-of-book spread on every taker entry and exit. On tier-1 BTC perpetual venues during liquid hours the quoted top-of-book spread is typically sub-$2 (often well under $1). It can spike to $50+ during stress events — sharp moves, exchange outages, illiquid windows like Sunday opens. Distinguish quoted spread (top-of-book) from effective spread, which includes the price impact your order has when it eats through depth beyond the top level.
  • Slippage — If you’re trading size, your market order pushes through multiple levels. Your average fill is worse than the price you saw.
  • Fees — Exchanges charge 0.01%–0.1% per trade. Maker orders (resting liquidity) typically pay zero or earn a small rebate; taker orders (crossing the book) pay the full taker rate. Cost modelling must distinguish the two — a strategy that always crosses the book has very different economics than one that posts and waits. At high frequency, taker fees destroy most edges.
  • Funding rates — On perpetual futures, longs and shorts exchange a periodic payment derived from the premium/discount of the perpetual price relative to the spot index. When the perp trades above spot (positive premium) longs pay shorts; when it trades below, shorts pay longs. The premium correlates with sentiment but is not directly “caused” by trend — it’s just where the perp is marked vs spot. Funding intervals vary by venue: most are 8h, some are 1h (e.g. Deribit), some variable. Magnitudes are typically 0.01–0.05% per 8h, with extremes of 0.3%+ in stressed conditions. Funding is path-dependent: backtests must accrue it per-interval over the actual hold window using each venue’s mark/index convention, not as a flat round-trip cost.

A strategy that shows +2% per trade in backtesting might show +0.5% after costs — or negative. Always model costs. In this playbook, every backtest uses 25 basis points (0.25%) round-trip as a baseline.

War Story

One of our strategies looked brilliant in backtesting: strong positive returns, great Sharpe ratio. When we modelled funding rates properly, the edge vanished. The strategy was holding long positions during periods of elevated perpetual premium — exactly when longs are paying. We were paying roughly 0.05% every 8 hours on the wrong side of funding, which compounds: over a multi-day hold that’s 0.15–0.3%+ in funding alone, on top of fees and slippage. The backtest without funding showed +15% per trade. With funding accrued per-interval: roughly −2%. The strategy was killed before it ever touched real money. The lesson: funding is path-dependent, not a constant — you must accrue it interval-by-interval over each actual hold.

Liquidity, Volatility, and Regime

Markets are not static. They behave differently under different conditions:

RegimeBehaviourWhat WorksWhat Fails
Bull / Trending UpStrong directional moves, shallow pullbacksTrend-following, momentumMean reversion, shorting
Bear / Trending DownSharp drops, relief rallies, low confidenceShort-selling, defensive positionsDip-buying, averaging down
Chop / RangeSideways, no clear direction, fake breakoutsMean reversion, range strategiesTrend-following (gets whipsawed)
High VolatilityLarge candles, wide spreads, fast movesWider stops, smaller positionsTight stops (get stopped out by noise)
Low VolatilitySmall candles, tight ranges, compressionBreakout anticipation, patienceMost active strategies (not enough movement)

The same strategy can have a Sharpe ratio of 3.0 in one regime and -1.0 in another. Module 10 covers how to detect regimes and adjust — or sit out entirely.

Key Insight

The single most common mistake in strategy development is building a system that works in one regime and deploying it into all regimes. A trend-following strategy built on 2020–2021 bull market data will be destroyed in a choppy sideways market. You must either build regime-aware strategies or accept that some periods will be flat or negative. Module 10 covers this in depth.

You Understand This When…

  • You can explain what an order book is and why spreads/slippage matter
  • You know that backtests must include realistic costs (fees, slippage, funding)
  • You understand that markets operate in different regimes and the same strategy doesn’t work in all of them

0.3 The Systems Thinking Framework

A trading system is not a strategy. A strategy is one component. The system is everything — data collection, signal generation, risk management, execution, monitoring, and continuous improvement. All of it connected, all of it feeding back into itself.

The Trading System Flywheel

Just like a business has a flywheel (traffic → leads → customers → reviews → more traffic), a trading system has one too. Every component feeds the next, and the system gets better over time.

Market Data

candles, funding, open interest

Data Pipeline

clean, store, validate

Strategy Engine

signal generation, gates, filters

Risk Management

position sizing, stops, circuit breakers

Execution

exchange API, order handling, reconciliation

Live Results

trades, PnL, fills

The trading system flywheel. Each component feeds the next. Live results feed back into research, creating a system that improves over time.

Why Most People Build It Wrong

Most aspiring system builders jump straight to the strategy: “I want to build a MACD crossover bot.” They write the signal logic, backtest it, see positive numbers, connect it to an exchange, and deploy. Then it fails.

It fails because they built the engine without the chassis, the brakes, or the dashboard:

  • No data quality checks — Garbage data in, garbage signals out. A corrupted candle can trigger a false trade.
  • No risk management layer — The strategy says “buy,” so it buys. No position sizing. No maximum exposure. No “the market just dropped 20% in an hour, maybe stop.”
  • No reconciliation — The bot thinks it placed an order. Did the exchange actually fill it? At what price? Is the position what the bot thinks it is?
  • No monitoring — The bot crashes at 3am. Nobody knows until the next morning. By then, there’s an unmanaged position sitting on the exchange.

This playbook builds the entire system, in the right order. Strategy comes after data infrastructure, risk management, and architecture — not before.

War Story

Our bot once thought it was holding a short position for 6 days. It wasn’t. The entry order had failed and returned a permanent error, but the state management code didn’t roll back properly. The bot was trailing a stop on a phantom position — managing nothing. It would have continued indefinitely if we hadn’t built a reconciliation layer that checks the exchange’s actual state against the bot’s internal state every hour. Module 8 covers exactly how to build this.

You Understand This When…

  • You see a trading system as a connected machine, not just a strategy
  • You can name the six major components (data, strategy, risk, execution, monitoring, research)
  • You understand why building in the right order matters

0.4 Why 95% of Retail Traders Lose

This is not a motivational scare tactic. It’s a diagnostic. If you understand precisely why most traders lose, you can build a system that avoids every failure mode. This section maps the failure modes so the rest of the playbook can address each one.

The Failure Modes

1

No Edge (Trading on Vibes)

“BTC looks like it’s going up.” That is not a strategy. That is a feeling. Without a statistically validated edge, every trade is a coin flip minus fees. Over hundreds of trades, fees guarantee you lose. Most retail traders have never backtested a single idea. They trade on pattern recognition, gut feeling, or someone else’s signal. Module 4 solves this.

2

Overfitting (The Backtest Looked Great)

You test 200 parameter combinations and pick the one with the best returns. Congratulations — you’ve curve-fitted to historical noise. The strategy worked perfectly on data it was designed to fit and will fail on everything else. This is the single most common technical mistake. Modules 5 and 6 solve this.

3

Position Sizing (50x YOLO)

A $500 account with 50x leverage and no risk management means a 2% adverse move wipes you out. The strategy could be excellent, but if one bad trade takes 100% of your equity, the strategy never gets to prove itself over hundreds of trades. Module 7 solves this.

4

No Regime Awareness

The strategy was developed during a bull market. It worked because everything went up. Now the market is ranging sideways and the strategy is getting chopped to pieces on false signals. The trader thinks the strategy broke. It didn’t — the market changed. Module 10 solves this.

5

Emotional Override

The bot says sell. You think “but it’ll come back.” You override the system. It doesn’t come back. This is why systematic beats discretionary for most people: the system doesn’t have emotions, FOMO, or ego. Module 1 covers the philosophical foundation for trusting the system.

6

Ignoring Costs

Fees, funding rates, slippage, and spread. A strategy that trades 50 times a day and makes 0.05% per trade sounds profitable until you realise fees are 0.06% per trade. You are literally paying the exchange to lose money. Module 5 covers cost modelling.

7

No Infrastructure

The bot runs on a laptop. The laptop goes to sleep. The bot crashes. There’s no monitoring, no health checks, no alerts. An unmanaged position sits on the exchange for 12 hours. Module 9 covers deployment and operations.

The Good News

Every single failure mode listed above is solvable. That’s what this playbook is — a systematic solution to each one, in order. If you follow the modules sequentially and do the work, you will avoid the mistakes that destroy 95% of retail traders.

You Understand This When…

  • You can list at least 5 reasons most retail traders lose
  • You know which module in this playbook addresses each failure mode
  • You understand that the purpose of this playbook is to systematically eliminate every failure mode before you risk real money

Module 1

Mindset & Risk Philosophy
4 sections · ~30 min read

1.1 Discretionary vs Systematic

There are two ways to trade: make decisions yourself, or build a system that makes decisions for you. This section explains why this playbook is entirely about the second approach — and why the first approach fails for most people.

Discretionary Trading

A discretionary trader looks at charts, reads news, considers context, and makes a decision: buy, sell, or do nothing. Every trade is a judgment call. The best discretionary traders in the world — the ones running hedge fund desks — are genuinely talented. They have spent 10,000+ hours reading order flow, developing intuition, and making real-time decisions under pressure.

You are probably not one of them. Neither am I. And that’s fine, because:

Systematic Trading

A systematic trader builds a set of rules. The rules are explicit, deterministic, and testable. “When X happens, do Y. When Z happens, do W.” The system executes those rules without deviation. The human’s job is to design and validate the rules, not to execute them.

DiscretionarySystematic
Decision makerYou, in real timeCode, based on tested rules
TestableNo (every decision is unique)Yes (backtest across years)
Emotional influenceHigh (fear, greed, FOMO)Zero (code has no feelings)
ScalableNo (limited by your attention)Yes (runs 24/7 on a server)
RepeatableNo (you’ll make different decisions on different days)Yes (same input = same output, every time)
ImprovableSlowly (requires experience)Measurably (change a rule, re-test)
Skill requiredDeep market intuition (rare)Systems design + data analysis (learnable)

Key Insight

Systematic trading converts the problem from “be a great trader” (which requires rare talent) into “be a great engineer” (which requires discipline and methodology). If you are someone who thinks in systems, processes, and rules — this is your domain.

The Hybrid Trap

Many people attempt a hybrid: build a system, but override it when they “feel” like the market is going to do something different. This is the worst of both worlds. You get the complexity of a system with the unreliability of discretion. The system was validated on data. Your override was validated on nothing. If you build a system, trust the system. If you don’t trust it, fix it — don’t override it.

You Understand This When…

  • You can articulate why systematic trading is more appropriate for most people
  • You understand that overriding a validated system is not “adding judgment” — it’s introducing untested randomness
  • You’re committed to building rules-based systems, not vibes-based trades

1.2 The Canonical Specification

Every system you build should have a single document that defines every constraint, every behaviour, and every authority. If it’s not in the spec, it doesn’t exist. This is the foundation of building systems that are deterministic, auditable, and reproducible.

What It Is

A canonical specification is a single, authoritative document that describes your entire trading system. It contains:

  • Strategy rules — exact entry conditions, exit conditions, and what to do in every possible state
  • Indicator math — the precise formulas, not “use a 20-period EMA” but the actual computation with edge cases defined
  • Risk parameters — maximum position size, stop-loss rules, circuit breaker conditions
  • Data definitions — what a “weekly candle” means (which day does the week start? what timezone?)
  • Operational procedures — what happens when the bot restarts, what happens during exchange maintenance

The system we built has a canonical specification of approximately 4,000 lines. Every line of code traces back to a line in the spec. Nothing is assumed. Nothing is inferred.

Why It Matters

Without a spec, you have “tribal knowledge” — the system works because the person who built it remembers what it’s supposed to do. That’s fine until:

  • You come back to the code 3 months later and can’t remember why a particular threshold is 0.75
  • You change one parameter and something unrelated breaks, because the dependencies were never documented
  • You try to explain the system to someone else and realise you can’t, because half the rules are in your head
  • You run a backtest and get different results from last time, because you used a different week definition without realising it

War Story

Our backtest results shifted materially — double-digit-percent swing in headline metrics — when we accidentally switched from Monday-start weeks to Sunday-start weeks. Same strategy, same data, same indicators — but the moving-average values were different because the weekly close fell on a different day. This is the kind of thing that a canonical specification prevents: it locks down “weeks start on Monday, UTC, and here is the exact pandas resampling code.” No ambiguity. No drift.

The Closed-World Assumption

The spec operates on a closed-world assumption: only things explicitly declared in the spec exist. If a behaviour is not specified, it is forbidden. If a parameter is not defined, it does not have a default — it is an error.

This sounds rigid. It is. That’s the point. Trading systems that “kind of work most of the time” will “kind of fail” at the worst possible moment. Rigidity in specification produces reliability in execution.

Practical Advice

You don’t need to write 4,000 lines on day one. Start with a one-page document for your first strategy: entry rule, exit rule, position size, stop-loss, data source, and timeframe. Then expand it as you discover edge cases. The spec grows with the system. The important thing is that it exists and is the single source of truth.

You Understand This When…

  • You understand that a canonical specification is the single source of truth for the system
  • You know what the closed-world assumption means and why it prevents bugs
  • You’re prepared to write down every rule, not just the obvious ones

1.3 Empirical Over Theoretical

In this framework, data wins every argument. No matter how elegant the theory, if the backtest shows negative returns, the theory is wrong. This section establishes the epistemological foundation: we believe what the data shows, not what we think should be true.

The Graveyard of Beautiful Theories

Someone on Twitter says: “BTC tends to rally on Tuesdays.” Sounds suspect. But instead of dismissing it or believing it, you test it:

  • Pull 5 years of BTC daily candles
  • Tag each day with its day of week
  • Measure the average forward return on Tuesdays vs all other weekdays
  • Compare to baseline (random day-of-week assignment)
  • Apply filters (regime, volatility) to see if a sub-condition matters

The typical outcome of an investigation like this is: any apparent effect is small, often disappears once you condition on volatility regime, and is not tradeable standalone. The theory isn’t crazy — it just usually isn’t strong enough. You only know by testing.

Key Insight

The correct response to any trading claim is not “that’s stupid” or “that makes sense.” The correct response is: “show me the backtest.” Every claim is a hypothesis. Every hypothesis can be tested. This playbook teaches you how to test any claim against real data in hours, not weeks.

The Investigation Template

Every time you hear a trading idea — from Twitter, a podcast, a friend, or your own intuition — run it through this loop:

  1. Hear the claim — “BTC rallies after extreme fear”
  2. Formulate the hypothesis — “When the Fear & Greed Index drops below 15, BTC returns over the next 7 days are positive on average”
  3. Pull the data — 5 years of BTC daily candles + Fear & Greed Index history
  4. Test it — measure average 7-day returns after extreme fear events
  5. Find filters that matter — does it work better in bull markets? Low volatility?
  6. Deliver a verdict — “weak signal, not tradeable standalone” or “strong signal, worth developing”

Module 4 walks through this in full detail with real examples.

You Understand This When…

  • Your instinct when hearing a trading idea is “test it” not “believe it” or “dismiss it”
  • You understand the investigation template: claim → hypothesis → data → test → filter → verdict
  • You accept that data trumps theory, every time

1.4 The Adversarial Mindset

The most dangerous moment in system development is when the backtest shows positive results. That is when most people stop testing and start deploying. That is also when the real work begins: trying to destroy your own strategy.

Why You Must Attack Your Own Work

Confirmation bias is the tendency to find evidence that supports what you already believe. In trading system development, it manifests like this:

  • You build a strategy. The backtest shows +40% CAGR. You feel excited.
  • You notice the max drawdown is -35%. “That’s manageable,” you tell yourself.
  • You don’t check whether the performance came from 3 lucky trades or was distributed evenly.
  • You don’t test on data the strategy hasn’t seen.
  • You deploy. It immediately loses money. The edge was an artefact of overfitting.

The adversarial mindset flips this: your default assumption is that the strategy does NOT work. Your job is to try to prove it does, and if you can’t break it after six different types of attack, then — tentatively — it might be real.

The Six Attacks (Preview)

Module 6 covers these in full detail, but here is what you are going to throw at every strategy:

#AttackWhat It Tests
1Parameter RobustnessMove every parameter by ±10-20%. Does the edge survive?
2Out-of-Sample HoldoutTest on data the strategy has never seen
3Regime StabilityDoes it work in bull, bear, AND chop markets?
4Cross-Venue TransferDoes it work on a different exchange’s data?
5Placebo / Random BaselineIs it better than random entry at the same frequency?
6Time StabilityDoes it work in the first half AND the second half of the data?

If a strategy fails any of these, it goes back to the lab or gets killed. There is no “well, it mostly works.”

War Story

We spent two weeks investigating whether positioning footprints could produce a tradeable edge. After multiple phases of analysis, hundreds of trades across roughly a dozen candidate signals, and six falsification tests on each: one signal survived — a derivatives-driven contrarian setup, hit rate above 60% and profit factor above 1.5. But its p-value sat above the 0.05 line — not statistically significant. The honest verdict: “direction of evidence is positive but not strong enough to trade standalone.” We could have ignored the stats and deployed it. Instead, we shelved it as an overlay filter and moved on. That discipline is what keeps you alive.

You Understand This When…

  • You see a positive backtest as the beginning of validation, not the end
  • You can name six types of attacks to throw at a strategy
  • You’re prepared to kill a strategy you spent weeks developing if it doesn’t survive testing

Module 2

Exchange Setup & Account Architecture
7 sections · ~2.5 hours

2.1 Choosing an Exchange

Your exchange is the foundation of your entire operation. It holds your money, executes your trades, and provides your data. Choosing the wrong one can cost you everything — not from bad trades, but from the exchange itself. This section covers how to evaluate exchanges and what to watch out for.

Evaluation Criteria

CriterionWhy It MattersHow to Check
Security track recordHas the exchange been hacked? How did they respond?Search “[exchange name] hack” — look for hot wallet breaches, user fund freezes
API qualityYour bot communicates via API. A bad API means bad execution.Read the API docs. Check rate limits, WebSocket support, error handling
LiquidityLow liquidity = high slippage = worse fillsCheck 24h volume on your trading pair. Compare bid/ask spread.
Fee structureMaker vs taker fees directly impact your edgeMaker 0.01–0.02%, Taker 0.04–0.06% is competitive for crypto perps
Deposit/withdrawal handlingCan you get your money out? What happens with wrong-network deposits?Test with a small amount first. Always.
Regulatory statusSome exchanges aren’t available in your jurisdictionCheck ToS for your country. Verify KYC requirements.

War Story

We ran two trading containers on BingX. The September 2024 security incident — widely-reported losses in the $43–52M range [Yahoo Finance] [Bloomberg Law] [CoinDesk] [DL News] — was already on the public record when we onboarded the venue, and we accepted the residual custody risk knowing the history.

We later abandoned the venue entirely. One driver was BingX’s own published deposit-recovery policy: BingX states that it “generally does not provide token or coin recovery service” for wrong deposits to its addresses, with assistance offered only “at its sole discretion” for significant losses [BingX policy]. Combined with the post-hack handling already on the public record, that policy stance — custody-controlled funds, exchange-controlled discretion to return them — falls below the bar we set for venues holding our capital.

Compare that to venues like Binance, Bybit, and Bitget, which publish self-service or structured recovery workflows for incorrect deposits [Binance] [Bybit] [Bitget]. The fact that some venues build recovery as a default and others build non-recovery as a default is a real, observable, due-diligence dimension. We migrated to venues whose published commitments matched our standard.

The September 2024 incident itself, and BingX’s framing of a $43M+ loss as a “minor asset loss,” is illustrative. Read the public record before you onboard a venue; trust your operational experience when you’re already there.

Your exchange due diligence is part of your trading risk, not separate from it.

Recommended Setup for Beginners

  • Start with one exchange. Pick a tier-1 perpetual-futures venue with a clean security track record (multi-year operating history, transparent reserves, no recent custody incidents). For spot, a major established exchange with deep liquidity.
  • Use a fiat on-ramp exchange separately. Kraken for AUD (if Australian), Coinbase for USD. Convert to USDT via limit orders (not “Convert” features which have hidden markup), then send USDT to your trading exchange.
  • Once profitable and stable, add a second exchange for redundancy and to reduce single-exchange risk.

Critical: Test Deposits First

Before sending any meaningful amount to a new exchange, send a small test deposit ($10–50) and immediately withdraw it. Verify the round trip works. Check that deposits credit correctly and withdrawals arrive. We lost $183 USDT by sending to the wrong token deposit address on an exchange that charged $200 to recover it. Yes — we paid more to recover the funds than the funds were worth. Always verify you are on the correct TOKEN deposit page, not just the correct network.

You’re Done When…

  • You have selected an exchange based on the evaluation criteria
  • You have completed KYC verification
  • You have made a small test deposit AND withdrawal successfully
  • You have bookmarked the API documentation

2.2 Spot vs Isolated vs Cross-Margin

This section explains the three account modes available on crypto exchanges, what each one means for your risk, and when to use which. We teach them in order of complexity: spot is simplest, isolated is the next step (capped loss = posted margin only), and cross-margin is the most advanced and most dangerous (uses your entire account as collateral). Getting this wrong is how people wake up to a liquidated account.

The Three Modes

1

Spot Trading

You buy BTC with USDT. You now own BTC. If BTC goes to zero, you lose what you paid. You cannot lose more than you invested. There is no leverage, no liquidation, no margin calls. This is the simplest and safest mode. Use this for: long-term positions, conservative strategies, SMSF/retirement accounts, set-and-forget systems.

2

Isolated Margin

Each position has its own separate collateral. You decide how much margin to assign to each trade. If that position gets liquidated, only the assigned margin is lost — the rest of your account is untouched.

When to use: When running multiple strategies simultaneously, or when you want to contain the blast radius of any single trade. You might allocate $50 of margin to one position and $100 to another. If the first gets liquidated, you lose $50. The second position and the remaining account balance are unaffected. This is the natural next step beyond spot — you opt into leverage, but the maximum loss per trade is capped at the margin you posted.

3

Cross-Margin

Your entire account balance is collateral for every position. If you have $10,000 in the account and open a leveraged position, the exchange can use all $10,000 to keep your position alive. If the position moves against you far enough, your entire account gets liquidated — not just the margin allocated to that trade.

When to use: When you are running a single strategy on a dedicated account with robust risk management (stops, circuit breakers). The advantage is that your position can survive larger adverse moves without liquidation, because it has more collateral. The disadvantage is that a catastrophic failure liquidates everything — this is the most advanced mode and the most dangerous.

Comparison

Spot

What you own
The actual asset (e.g. 0.1 BTC bought at $95,000)
Leverage
None (1×)
Capital efficiency
Low — full notional locked up
Liquidation
None — you cannot be liquidated
Max loss on $10,000 account
Up to $9,500 if BTC drops 95% (only the amount actually invested in the asset)
Risk profile
Asset can go to zero (rare for majors); no funding cost; no margin call
Best for
Long-term holds, conservative strategies, SMSF / retirement, learning

No leverage. No liquidation. Simplest. Safest.

Isolated Margin

What you own
A leveraged position; collateral is the margin you posted to that position only
Leverage
Selectable per position (e.g. 5×, 10×, 50×)
Capital efficiency
High — only posted margin is locked; rest of account is free
Liquidation
Per-position. Liquidating one position does not touch the rest of the account.
Max loss per position
The margin you assigned to that position (e.g. assign $50 to Pos A and $100 to Pos B → max loss $50 and $100 respectively)
Risk profile
Blast radius is contained per trade; funding cost applies; mark-price liquidation
Best for
Running multiple strategies side-by-side, capping per-trade loss explicitly, the natural next step beyond spot

Each position carries its own collateral. Blast radius is contained.

Cross-Margin

What you own
A leveraged position backed by the entire account balance as collateral
Leverage
Effective leverage = total notional / account equity (no per-position cap)
Capital efficiency
Highest — positions can survive larger adverse moves before liquidation
Liquidation
Account-wide. A catastrophic move on a single position can liquidate the entire account.
Max loss on $10,000 account
$10,000 (the entire account); a single bad trade can wipe everything
Risk profile
Maximum collateral, maximum risk; requires exchange-side stops, circuit breakers, and reconciliation
Best for
A single strategy on a dedicated account with battle-tested risk management. Most advanced. Most dangerous.

One bad trade can wipe the entire account.

The three account modes, in order of complexity. Spot is safest but offers no leverage. Isolated margin lets you control the blast radius of each position independently — max loss per trade is the margin you assigned. Cross-margin gives maximum collateral but maximum risk: a single liquidation can wipe the entire account.

Critical Decision

If you are starting out: use isolated margin. It forces you to think about how much you are willing to lose on each trade, and it prevents a single catastrophic trade from destroying your account. Move to cross-margin only when you have a battle-tested risk management layer with exchange-side stop-losses, circuit breakers, and a reconciliation system that verifies your actual exposure every hour.

You’re Done When…

  • You can explain the difference between spot, isolated margin, and cross-margin (in that order of complexity)
  • You know which mode is appropriate for your situation
  • You understand that cross-margin means your ENTIRE account is at risk on every position

2.3 The Leverage Misconception

This is possibly the most misunderstood concept in crypto trading. Most people hear “50x leverage” and think “50x risk.” That is one way to use it — the way that blows up accounts. There is another way, and it’s the foundation of how professional systems use leverage.

How Most People Use Leverage (The Gambler)

Account balance: $500. Leverage: 50x. The gambler thinks: “I can now control $25,000 worth of BTC.” They open a $25,000 position. BTC moves 2% against them. That’s $500. Their entire account is gone.

This is leverage used as amplification. It amplifies gains and losses equally. A 2% market move becomes a 100% account move. The gambler is one bad candle from zero.

How Systems Use Leverage (The Engineer)

Same account: $500. Same leverage setting: 50x. But the engineer uses it differently:

  • The strategy says: “enter long with a position size of $350 notional.”
  • At 50x leverage, that requires $7 of margin.
  • The strategy has a stop-loss at 2% from entry. Maximum loss per trade: $7.
  • The remaining $493 sits untouched in the account.

The leverage setting enables the system to take precisely-sized positions with minimal margin consumption. The strategy’s built-in risk management (stop-losses, circuit breakers) ensures you are never exposed to the full notional.

Critical: Liquidation vs Stop-Loss

The naive view — “at 50x with isolated margin, a 5% adverse move only loses $17.50” — is dangerously wrong. At 50x leverage with isolated margin and a typical maintenance margin of ~0.5%, your liquidation price is roughly 1–2% away from entry. A 5% adverse move would liquidate the position long before any 5% calculation matters.

The rule: choose leverage low enough that the liquidation price sits well beyond your intended stop-loss, with a buffer for worst-case slippage and funding accrual. A 2% stop demands a liquidation price meaningfully further away — for that, you want low leverage on a small notional, not high leverage on the same notional.

Liquidation price (long, isolated, simplified):

liq_price ≈ entry × (1 - 1/L + MMR)

where L is leverage and MMR is the maintenance margin rate for your tier. At L=50, MMR=0.5% → liquidation roughly 1.5% below entry. At L=5, MMR=0.5% → roughly 19.5% below entry — which gives a 2% stop ample room.

Reframe: leverage is capital efficiency only if you have explicitly engineered liquidation safety. Otherwise it is amplification with extra steps.

Mark Price vs Last Price

Liquidations on most major venues are computed against the mark price, not the last traded price. Mark price is derived from a basket of spot prices (and/or a fair-value formula) specifically to prevent single-exchange wicks from triggering cascade liquidations. This is a feature, not a bug — it protects you from getting liquidated by a 0.5-second spike on one venue.

  • Stop-loss orders can typically be configured to trigger on either mark or last price. Mark-triggered stops are safer (harder to wick out); last-triggered stops are gameable by a thin order book or a coordinated wick.
  • Always check your venue’s specific liquidation mechanics before deploying: partial liquidation tiers, ADL (auto-deleverage) rules, insurance fund behaviour, and the exact maintenance margin schedule.
  • Liquidation must sit safely beyond your stop, including expected slippage. In cross-margin you can be liquidated before your stop fires if collateral from other positions is depleted — isolated margin is generally safer for engineering predictable liquidation distance.
  • Maintenance margin tiers vary by position size. Almost every venue uses a tiered schedule — larger notional positions face higher MMR (e.g. 0.5% at small size, 1.0% at $1M notional, 2.5%+ at $10M+). Liquidation calculations must use the venue’s published tier schedule for your actual notional, not a single global number.
The GamblerThe Engineer
Account$500$500
Leverage50x50x
Position size$25,000 (max)$350 (strategy-determined)
Margin used$500 (100% of account)$7 (1.4% of account)
Stop-loss“I’ll watch it”2% from entry, exchange-side
Max loss per trade (stop honoured)$500 (entire account)$7
2% adverse moveLiquidatedStopped out at -$7 (liquidation engineered to sit well beyond the stop)
Drawdown after 10 consecutive losses (1% risk)N/A — already wiped~9.6% drawdown ((1-0.01)^10)
Strategy validation“Worked last time”10,000 Monte Carlo simulations

Key Insight

Leverage is a tool for capital efficiency, not risk amplification. A 50x leverage setting does not mean you take 50x more risk. It means you can take the same position with 50x less capital locked up as margin. The risk is determined by your position size and your stop-loss, not by the leverage multiple. The leverage just determines how much collateral the exchange requires.

Why This Works

When a strategy has been validated with:

  • 5 years of backtesting with realistic costs
  • 10,000-run Monte Carlo simulation
  • Walk-forward testing on unseen data
  • Six falsification tests (parameter robustness, cross-venue, regime stability, etc.)
  • Exchange-side stop-losses (not just bot-side)
  • Circuit breakers that halt trading during extreme drawdowns

…the probability of a single trade wiping out the account is effectively zero. The strategy has been designed to survive adverse moves. The leverage just means you don’t need to lock up $350 of your $500 account as margin for a $350 position. You lock up $7 instead, keeping $493 available for other strategies or as a safety buffer.

Warning

This approach ONLY works when the strategy has been rigorously validated AND has exchange-side stop-losses. “My bot has a stop-loss” is not enough. Bots crash. Servers go offline. Network connections drop. The stop-loss must be placed as an exchange-side order so that even if your bot is completely dead, the exchange will close the position at your predetermined loss level. Module 9 covers how to implement this.

You’re Done When…

  • You can explain why 50x leverage does NOT mean 50x risk
  • You understand the difference between using leverage for amplification (gambling) vs capital efficiency (engineering)
  • You know that exchange-side stop-losses are non-negotiable for leveraged positions
  • You can calculate the margin required for a given position size at a given leverage

2.4 API Keys & Security

Your API key is the connection between your trading bot and your money. Set it up wrong and someone else controls your account. This section covers how to create, secure, and manage API keys properly.

API Key Basics

An API key is a pair of strings — a key (public identifier) and a secret (private password) — that allows your bot to interact with the exchange on your behalf. Some exchanges also require a passphrase as a third component.

Step-by-Step: Creating a Secure API Key

1

Enable 2FA on your exchange account

Before creating any API key, ensure your exchange account has two-factor authentication enabled. Use an authenticator app (Google Authenticator, Authy), not SMS — SIM swapping attacks can bypass SMS 2FA.

2

Create a new API key with minimal permissions

Only enable the permissions your bot needs. For a trading bot: Read (to check positions and balances) and Trade (to place and cancel orders). Never enable Withdrawal permission on a trading API key. If the key is compromised, the attacker can trade but cannot steal your funds.

3

IP whitelist your server

Most exchanges allow you to restrict an API key to specific IP addresses. Always do this. Set it to the IP address of the server your bot runs on. If the key leaks, it can only be used from your server’s IP.

4

Store the secret securely

Store your API secret in an environment file (.env) on your server, not in your code. Never commit API keys to git. Add .env to your .gitignore. If you accidentally commit a key, rotate it immediately — git history retains deleted files forever.

War Story

We IP-whitelisted our server’s IPv4 address on the exchange. Orders kept failing with a cryptic “IP not whitelisted” error. After 3 days of debugging, we discovered the exchange’s API endpoint had inconsistent IPv6 routing — our server was sometimes resolving to and connecting over IPv6, a completely different address the exchange had never seen. The correct fix is to pin outbound traffic to IPv4 for that specific destination only — either via a route in the routing table, an /etc/hosts entry forcing the v4 record, an outbound firewall rule, or an HTTP client option. Do not globally disable IPv6 system-wide as a reflex — that degrades every other service on the box (DNS, package mirrors, monitoring) for an exchange-specific quirk. The lesson: verify which address family your server actually uses for outbound connections to each destination, and scope the fix to that destination.

Security Checklist

  • 2FA enabled on exchange account (authenticator app, not SMS)
  • API key has Read + Trade permissions only (NO withdrawal)
  • IP whitelist set to your server’s outbound IP address
  • API secret stored in .env file, not in code
  • .env is in .gitignore
  • API key has never been committed to a git repository
  • Separate API keys for each exchange (never reuse)
  • Separate API keys for live vs testnet (never mix)

Operational Security Depth

Permissions, IP whitelist, and a hidden .env are the floor, not the ceiling. The following are the recurring failure modes we’ve seen across venues — the ones that fail at 03:00 with cryptic error codes and lose you a fill.

Time and clock discipline

Every signed request you send is timestamped. The venue rejects requests whose timestamp is too far from its own clock — typically a window of a few hundred milliseconds to a few seconds. If your server clock drifts, you will see “invalid timestamp,” “recv window” or “request expired” errors. The fix is non-negotiable: install chrony or ntpd, point it at multiple stratum-2 sources, and verify drift stays under 500ms against your venue’s server time. Many venues expose a server-time endpoint — sample it on startup and alert if local-vs-venue offset exceeds your tolerance. Clock drift is silent: nothing tells you the box is drifting until orders start failing.

Nonce handling

If your venue requires a nonce, it must be monotonically increasing across the lifetime of the API key. Most implementations derive the nonce from time.time_ns() or millisecond-epoch. This works until the clock steps backwards (NTP correction, VM migration, daylight-savings on a misconfigured box) and your next nonce is smaller than the last one the venue saw. The venue rejects every subsequent request from that key until you rotate it. Mitigation: persist the last-used nonce to disk and always emit max(persisted_nonce + 1, current_time_ns).

HMAC signing pitfalls

The signed string is a recipe and the recipe is fussy. Common ways to silently produce an invalid signature:

  • Body bytes: sign the exact bytes you put on the wire. JSON re-serialisation can re-order keys or change whitespace; sign the post-serialisation string, not a freshly-serialised one.
  • Path normalisation: some venues require the path with the trailing slash, some without; some include the query string in the signed string, some don’t. Read the venue’s signing recipe and follow it byte-for-byte.
  • Header casing and ordering: some venues require specific headers in the signed string in a specific order.
  • Body inclusion on GET vs POST: on GET requests the “body” is usually empty string, but for some venues the query string substitutes; on POST you sign the body, not the query.

When debugging signature failures, log the exact pre-signature string (with secrets redacted) and compare against the venue’s reference example character-by-character.

Idempotency on retries

Every order-submission request must carry a clientOrderId that is deterministic from the underlying intent (see Module 8.3). On a network timeout you do not know whether the venue received and processed your order, so you retry. The deterministic ID lets the venue dedupe: if it already saw that ID, it returns the existing order rather than creating a second one. Without this, a retry on a 504 can give you a doubled position.

Rate-limit backoff

HTTP 429 (or the venue-specific equivalent) is the venue telling you to slow down. Treat it with care:

  • Exponential backoff with jitter: sleep min(cap, base * 2^attempt) + uniform(0, jitter). Jitter prevents a fleet of containers all retrying at the same instant after a transient outage (the “thundering herd”).
  • Respect Retry-After: if the response includes a Retry-After header, use that as the floor; never retry sooner.
  • Per-endpoint budgets: most venues meter rate limits per endpoint family (orders / market-data / account). Don’t share a single token-bucket across all calls — a burst of position queries will starve your order-placement budget.
  • Surface 429 to your alerting: a sustained 429 rate is a sign you’re mis-architected, not a normal operating condition.

Key rotation and signed-request hygiene

  • Sign request bodies, not just URLs. URL-only signing is forgeable on POST endpoints.
  • Rotate keys on a schedule (quarterly at minimum) and immediately on any whiff of compromise. Keep the old key alive for a brief overlap window so in-flight requests don’t fail mid-rotation.
  • Never log signed requests with secrets in the clear. Redact the secret, signature, and any auth header before any log line touches disk or a log shipper.
  • Keep separate keys per environment (live / testnet) and per machine where practical — on key compromise you can revoke a single blast radius rather than the whole estate.

You’re Done When…

  • You have created an API key with Read + Trade permissions only
  • The key is IP-whitelisted to your server
  • The secret is stored in a .env file that is gitignored
  • Withdrawal permission is NOT enabled
  • NTP/chrony is running and clock drift vs venue time is under 500ms
  • Your retry path uses exponential backoff with jitter and respects Retry-After
  • Order submissions carry a deterministic clientOrderId for idempotent retries

2.5 Fiat On/Off Ramps

Getting money into and out of crypto exchanges is surprisingly frustrating. Banks block transfers, exchanges have hidden fees, and sending to the wrong address can lose your funds permanently. This section covers how to set up reliable fiat rails.

The General Pattern

For most countries, the best approach is:

  1. Deposit fiat to a regulated exchange with good fiat support (Kraken, Coinbase, etc.)
  2. Buy USDT via limit order on the trading pair (e.g., USDT/AUD). Never use “Convert” or “Quick Buy” features — they have hidden spreads of 1–3%.
  3. Withdraw USDT to your trading exchange via a cheap network. The cheapest practical networks for USDT in real exchange UIs are TRC20 (TRON) and Polygon — withdraw fees are typically $1 or less and confirmations are fast. Other supported networks include BEP20 (BSC), Arbitrum, Optimism, Base, and Solana. Avoid ERC20 (Ethereum mainnet) unless the destination explicitly requires it — gas fees can run $5–30+. Always send a small test transfer first and confirm the destination supports the network you choose.
  4. Trade on your primary exchange (your chosen tier-1 perpetual or spot venue)

Critical: Deposit Address Traps

Exchanges have separate deposit addresses per token. If you go to “Deposit BNB” and get a BSC address, then send USDT to that address on the same BSC network — the exchange may consider those funds “lost” even though they arrived at an address the exchange controls. Policies vary: some venues auto-credit, some charge a flat recovery fee (often well into the hundreds of dollars), and some declare such transfers unrecoverable entirely. Read your specific venue’s policy before sending anything material. Always verify: correct TOKEN page, correct NETWORK, correct ADDRESS.

Off-Ramp (Crypto to Fiat)

Reverse the process: consolidate USDT to your fiat exchange, sell via limit order on the trading pair, withdraw fiat to your bank. Key points:

  • Use limit orders when selling to AUD/USD/EUR — market orders and “Convert” features have worse pricing
  • Check withdrawal limits and processing times for your fiat exchange
  • Some banks flag large crypto-related deposits — keep records of all transactions for tax purposes

You’re Done When…

  • You have a working fiat on-ramp: bank → fiat exchange → USDT → trading exchange
  • You have tested the full round trip with a small amount
  • You know which network to use for USDT transfers between your exchanges
  • You have verified the correct deposit address and token page

2.6 Stablecoin & Counterparty Risk

Your trading P&L is denominated in stablecoins, your collateral is parked at venues, and both can fail. Most operators ignore this until the day it matters — at which point ignoring it costs them a quarter of their account or more. Treat the stablecoin you hold and the venue that holds it as risk exposures, not as cash.

Stablecoins Are Not Cash

A stablecoin is a promise. Different stablecoins make that promise via different mechanisms, and each mechanism has its own failure mode. The major fiat-pegged stables fall roughly into three buckets:

  • Fiat-collateralised, custodied: the issuer claims to hold $1 of bank deposits, treasuries, or commercial paper for every unit issued. Failure modes: the reserves aren’t what the issuer claims, the custodian fails, regulatory action freezes redemptions.
  • Crypto-collateralised, over-collateralised: the unit is backed by a basket of crypto held in smart contracts at >100% collateral ratio. Failure modes: the underlying collateral crashes faster than liquidations can keep up; the smart contract has a bug; governance changes the collateral mix overnight.
  • Algorithmic / partially-collateralised: the peg is maintained by mint-and-burn arbitrage incentives rather than full backing. Failure modes: the incentives stop working in stress, and a death spiral becomes self-reinforcing. Several have gone to zero. Treat as a high-risk asset, not a stablecoin, regardless of how it markets itself.

Every major stablecoin you can name has depegged at some point in its history. A 2% depeg is annoying. A 5% depeg on a portfolio that’s 70% in that stable, with positions sized off the assumption that collateral is dollar-stable, is a quarter of your year’s P&L gone in a weekend.

Diversify, but Diversify Across Mechanisms

Holding three stablecoins that all share the same failure mode (fiat-custodied, all backed by the same banks) is not diversification. If the failure is regulatory or banking-system, all three move together. Real diversification means picking stables with different reserve mechanisms: e.g. one fiat-custodied at a top issuer plus one crypto-over-collateralised. The point is that the same shock cannot take both out simultaneously.

Counterparty Risk: The Venue Itself

Even if your stablecoin is sound, the venue holding it is a separate exposure. Exchange failures of the past decade share a common pattern: the failure is preceded by withdrawal slowdowns, then withdrawal pauses, then declarations that funds are “safe” while internally the venue scrambles. By the time the failure is public, withdrawing has been impossible for days.

The defensive posture, in priority order:

  1. Cap exposure per venue. No single venue should hold more than a defined fraction of your trading capital — the exact fraction depends on your trust in the venue, but a working ceiling for any one exchange is “the amount you can afford to lose entirely without it changing your life.” Success at one venue tempts you to concentrate there; resist it. Concentration is what turns a venue failure from a setback into a wipeout.
  2. Run a withdrawal drill on a fixed cadence. Every six months, withdraw your full balance from each venue back to self-custody, wait for the funds to arrive, then redeposit what you need to trade. The drill verifies that withdrawals work today — you do not get to find out they don’t when you actually need to leave. The drill is also the cheapest possible smoke-test for “has anything quietly changed about this venue?” Quietly-tightening withdrawal limits or quietly-extended approval times are early warnings.
  3. Hot vs cold storage. Only the capital you are actively trading belongs at a venue. Working-capital reserves — the money you intend to trade with next month, the buffer between trade size and account size — should sit in self-custody (hardware wallet, cold multisig) and be deployed to a venue just-in-time. The principle is simple: a venue can fail; a hardware wallet you hold the keys to cannot.
  4. Read the venue’s insurance-fund policy. Most major perpetual venues run an insurance fund that covers socialised losses when liquidations cascade past margin. Read what your venue’s policy actually says: how is the fund sized? What gets paid out? What is “clawback” or “auto-deleveraging” behaviour in your specific account class? You are exposed to the answer in a black-swan event regardless; you may as well know what the answer is in advance.

A Simple Per-Venue Risk Score

Don’t over-engineer this. A back-of-envelope risk-score per venue, refreshed quarterly:

Formula
venue_score
w1 × regulation_quality licensed in a credible jurisdiction?
+
w2 × insurance_fund_size absolute, and as % of open interest
+
w3 × withdrawal_track_record any pauses or freezes in last 24 mo?
+
w4 × transparency do they publish proof-of-reserves?
+
w5 × business_health any signs of layoffs, exec churn?
+
w6 × my_drill_result did my last withdrawal drill pass cleanly?

Per-venue risk score, refreshed quarterly. Each term is a discipline you can score honestly; the weighted sum is the artefact you compare against past versions of itself when concentration starts to creep up.

You will not get this exact for any venue — that’s fine. The exercise of writing the score down forces you to be specific about why you trust a venue, instead of trusting it because you’ve been there a while. When the score drops, reduce exposure before you have to.

Key Insight

The stablecoin and the venue are two separate counterparty risks layered on top of each other. A 5% stablecoin depeg while you’re fully collateralised in that stable, on a venue that’s simultaneously slowing withdrawals, is the worst-case scenario — and it has happened. Treat each layer independently: pick stables that survive different shocks, pick venues that hold a bounded fraction of your capital, and do not let success at one venue tempt you to concentrate there.

You’re Done When…

  • You hold at least two stablecoins backed by genuinely different reserve mechanisms
  • You have a defined per-venue exposure cap and your current allocation is within it
  • You have run a withdrawal drill in the last six months and it succeeded
  • Working-capital reserves sit in self-custody, not at a venue
  • You have read your primary venue’s insurance-fund and ADL/clawback policy and understand what would happen to you in a tail event
  • You can name the failure mode for each stablecoin and venue you hold material balances at

2.7 Wrong-Rail Deposit Policy: A Due-Diligence Dimension Most Traders Miss

Wrong-rail deposits — sending USDT on TRC20 to an ERC20-only address, depositing an unlisted token, picking the wrong network in a withdrawal flow, forgetting a memo on an XRP send — happen constantly to active traders. The funds usually arrive at exchange-issued infrastructure and remain visible on-chain. The question is what the exchange does next. The answer is observable in their published policy before you ever onboard.

The Setup

Active traders move money between venues, between networks, and between account types continuously. The mistakes are predictable: an address copied from an ERC20 deposit screen pasted into a TRC20 withdrawal flow; a token sent to a venue that doesn’t list it; a deposit fired into the wrong sub-account at the same exchange; a memo-required chain (XRP, XLM, ATOM) sent without the memo. In nearly all of these cases, the funds do not vanish. They arrive at deposit infrastructure that the exchange or its custody provider controls. They are visible on-chain. They are credited to some address inside the venue’s wallet topology, even if not to your account. What happens next is a policy decision, not a blockchain inevitability.

Why This Is a Real Difference Between Venues

A venue that publishes a self-service recovery flow is treating your asset as your asset that landed in the wrong slot. A venue that publishes “generally not recoverable” is treating the same situation as your loss to absorb. Same asset, same on-chain reality, same custody footprint — two completely different ethical commitments. This is not a theoretical distinction. It is written down, in each venue’s own help-centre articles, in language that you can read in five minutes before you ever fund an account.

Most retail traders only discover their venue’s policy on this after making a wrong deposit. By then it is too late. The remedy is to read the policy first and let the answer inform where you concentrate capital.

What to Check Before Onboarding

For each venue you are considering, search their support centre for explicit policy on each of the following:

  • Wrong-network deposits — you sent USDT on TRC20 to an address that only credits ERC20. Does the venue have a documented retrieval workflow, or do they treat this as user error and decline?
  • Unsupported-token deposits — you sent a token the venue does not list, to one of their generic addresses. Self-service refund? Manual ticket with stated SLA? Or non-recoverable?
  • Wrong-address-within-the-venue deposits — you sent funds to a deposit address belonging to a different account at the same exchange (your sub-account, your friend’s account, an old account of yours). Does the venue have an internal-transfer-correction process?
  • Memo/tag-missing deposits — you sent XRP, XLM, or ATOM without the required memo. The funds land in the venue’s shared deposit address. Does the venue have a self-service memo-rebinding flow, or do they require a manual ticket?

The exact answers will differ. The point is that the answers exist, are written down, and are searchable. A venue’s answers to these four questions form a tier-of-platform criterion you can apply before depositing a single dollar.

What Strong Policy Looks Like

A strong-posture venue publishes some combination of:

  • A self-service retrieval flow that takes a TxID and routes it through automated recovery for known-good cases
  • A documented manual recovery procedure with a stated process, fee structure, and rough turnaround
  • Explicit language committing to attempt recovery, even if outcome is not guaranteed

Examples (verified from the venues’ own published policy):

  • Binance — publishes a wrong-deposit FAQ [Binance] and a self-service “Retrieve Now” flow for supported cases [Binance self-service]
  • Bybit — publishes a deposit FAQ [Bybit] and a structured Unsupported-Deposit Recovery Procedure [Bybit recovery]
  • Bitget — publishes a self-service refund route for unlisted-coin and wrong-blockchain deposits [Bitget]

OKX sits a step behind these three but still publishes a recovery route — a support-ticket process plus an “untradable assets” withdrawal path for some cases [OKX].

What Weak Policy Looks Like

A weak-posture venue publishes some combination of:

  • Disclaimers that wrong-network or unsupported deposits are “likely non-recoverable” or “may result in permanent loss”
  • Statements that recovery is “not generally provided” or offered only “at the venue’s sole discretion”
  • No documented self-service flow and no published procedure with a stated turnaround

Examples (verified from the venues’ own published policy):

  • Kraken — funding FAQ describes unsupported-network deposits as “likely non-recoverable” [Kraken]
  • BingX — states it “generally does not provide token or coin recovery service” for wrong deposits, with assistance offered only “at its sole discretion” [BingX]
  • Swyftx — missing-crypto guidance warns wrong-network deposits can result in “permanent loss” [Swyftx]

Comparative Snapshot

VenuePosturePublished Stance (paraphrased from each venue’s own help-centre)
BinanceStrongSelf-service retrieval flow plus wrong-deposit FAQ
BybitStrongDocumented Unsupported-Deposit Recovery Procedure
BitgetStrongSelf-service refund for unlisted-coin / wrong-blockchain
OKXModerateSupport-ticket recovery plus untradable-assets withdrawal route
KrakenWeak-to-moderate“Likely non-recoverable” for unsupported networks, plus a recovery guide
BingXWeak“Generally does not provide” recovery; assistance “at sole discretion”
SwyftxWeakWrong-network deposits may result in “permanent loss”

Read each venue’s own page yourself before you trust this table — policies change. The discipline is the read, not the snapshot.

The Ethical Frame

When funds land at exchange-issued infrastructure and are visible on-chain under custody the exchange controls, refusing to return them is not a blockchain inevitability — it is a policy decision. Some venues choose to design for the customer (build a recovery flow as the default); some venues choose to design for retention (build non-recovery as the default and offer discretionary exceptions). Both choices are observable in the published policy. Both choices tell you something durable about how the venue treats your funds when an edge case fires.

The fact that Binance, Bybit, and Bitget publish recovery workflows proves that wrong-rail deposits are operationally recoverable in many cases. A venue that declines to operate that workflow is making a deliberate trade-off — lower operational cost to them, higher loss-absorption by you. That is information you are entitled to before you fund an account.

Practical Checklist

  • Before opening a venue account, read their wrong-deposit and unsupported-network policies in their own help centre
  • Look for the words self-service, retrieval, refund process, recovery procedure — strong signal
  • Look for the words likely non-recoverable, generally not provided, permanent loss, sole discretion — weak signal
  • Note which exchanges publish workflows and which publish disclaimers; treat the two groups as different tiers
  • Do not concentrate capital at a venue whose published policy stance you would not accept on a wrong deposit you actually made

You’re Done When…

  • You can name the published wrong-deposit policy stance for each venue you hold material balances at
  • You have read the actual help-centre page on at least your primary venue (not a third-party summary)
  • Your concentration of capital across venues reflects the policy posture you observed, not just liquidity or fee considerations
  • You understand the difference between “blockchain loss” (unrecoverable by anyone) and “policy loss” (recoverable in principle, declined by the venue)

FREE ACCESS

You’ve seen Modules 0–2.
Enter your email to unlock all 14 modules.

No payment. No spam. Just the complete playbook — data infrastructure, strategy development, backtesting, falsification, position sizing, deployment, and continuous improvement.

We respect your privacy. Unsubscribe anytime.

Module 3

Data Infrastructure
6 sections · ~3 hours

3.1 What Data You Need

Your trading system is only as good as the data it runs on. This section covers the types of market data available, what each is used for, and what you need at minimum to build a working system.

The Data Hierarchy

Not all data is equal. Here’s the hierarchy from essential to advanced:

LevelData TypeWhat It IsWhat It Enables
EssentialOHLCV CandlesOpen, High, Low, Close, Volume for each time periodIndicators, backtesting, basic strategies
ImportantFunding RatesPeriodic payments between longs and shorts on perpetual futuresCost modelling, sentiment indicators, crowding signals
ImportantOpen InterestTotal number of outstanding futures contractsPositioning analysis, crowding detection
AdvancedBid/Ask SpreadDifference between best buy and sell price at any momentAccurate slippage modelling, market quality assessment
AdvancedLiquidation DataForced closures of leveraged positionsCascade detection, extreme event signals
ExpertOrder Book DepthFull list of resting orders at each price levelLiquidity analysis, support/resistance identification
ExpertTick/Trade DataEvery individual trade that occursTape reading simulation, market microstructure analysis

Start with OHLCV candles. You can build your first strategy, backtest it, and deploy it with nothing else. Add funding rates and open interest when you want to model costs properly and explore positioning-based signals. The rest comes later.

Timeframes

Candles come in different timeframes. Each serves a different purpose:

  • Weekly — for slow, trend-following strategies (3–5 trades per year)
  • Daily — the standard for most swing strategies and indicator calculations
  • 4-Hour / 1-Hour — for intraday swing strategies
  • 15-Minute / 5-Minute — for active intraday strategies
  • 1-Minute — for precise entry timing and tape-reading simulation

More timeframes = more data = more storage = more maintenance. Start with daily and weekly. Add smaller timeframes only when your strategy requires them.

Key Insight

The data sophistication ladder is: mid-price candles → bid/ask candles → tick data → full order book. Each level adds cost and complexity. Most profitable systematic strategies we’ve tested work on simple OHLCV candles. Don’t over-engineer your data infrastructure before you’ve proven a strategy works on basic data.

You’re Done When…

  • You know what OHLCV candles are and why they are the foundation
  • You understand funding rates and why they matter for cost modelling on perpetuals
  • You’ve decided which timeframes your first strategy will use

3.2 Where to Get It

Every crypto exchange provides free historical data through their API. The challenge is not finding data — it’s fetching it reliably, handling pagination, and managing rate limits.

Free Sources (Exchange APIs)

Source TypeData AvailableTypical History DepthRate Limits
Tier-1 perpetual venues (CEX)OHLCV, funding, OI~2017–present, varies by venueHundreds to ~1000+ req/min
Tier-1 spot venuesOHLCV, trades, order book snapshotsOften the deepest crypto historyGenerally generous
Newer DEX-style perpetual venuesOHLCV, fundingInception (recent) onwardGenerous
FX / commodities / index broker APIsOHLCV (FX, commodities, indices)~2005–presentModerate

For crypto: One tier-1 perpetual venue with a clean API and good depth, paired with a tier-1 spot venue with the deepest history, is a solid default. Use the deeper-history source for historical backfill and your primary trading venue for live data.

For non-crypto: A retail FX/CFD broker with a free practice account is the easiest free source for FX, commodity, and index data going back many years. Useful for cross-asset correlation testing.

Building a Data Fetcher

You need a script that:

  1. Connects to the exchange API
  2. Fetches candle data in pages (most APIs return 200–1000 candles per request)
  3. Handles pagination correctly (use the last candle’s timestamp as the start for the next request)
  4. Respects rate limits (add a small delay between requests)
  5. Stores results in a database or CSV file
  6. Runs on a schedule (cron job) to keep data up to date

This is a straightforward Python script. An LLM like Claude or ChatGPT can help you write it in under an hour if you describe exactly what you need.

Pro Tip

When using an LLM to help write your data fetcher, give it the exchange’s API documentation URL and say: “Write a Python script that fetches all BTCUSDT 1-day candles from [exchange] starting from 2020-01-01, handles pagination, respects rate limits, and saves to a lightweight embedded SQL database.” Review the output, test it, and iterate. This is exactly how production data pipelines start.

You’re Done When…

  • You have identified your primary data source (exchange API)
  • You have a working script that fetches historical candle data
  • The script handles pagination and rate limits correctly
  • Data is stored in a queryable format (a lightweight embedded SQL database or CSV)

3.3 Storage & Databases

Your candle data needs to live somewhere reliable, queryable, and fast. This section covers the options from simple to production-grade.

The Options

1

CSV Files (Simplest)

One CSV per symbol per timeframe. Easy to inspect, easy to load into pandas, no database knowledge required. Fine for a single strategy on a single symbol. Falls apart when you have 25 symbols across 5 timeframes and need to join data efficiently. Start here.

2

Lightweight Embedded SQL Database (Simple + Queryable)

A single-file database that requires no server. Supports SQL queries. Fast for read-heavy workloads. Our production trading bots use a lightweight embedded SQL database for their local market data. Perfect for one strategy, one exchange, up to a few million candles. Graduate to this when CSV gets messy.

3

Server-Class Time-Series Database (Production-Grade)

A full relational database with time-series optimisation. Supports concurrent access, complex queries, and can handle hundreds of millions of rows. Our research environment uses a time-series database with vector-similarity capabilities for embedding storage. Use this when you have multiple strategies, multiple exchanges, and want an analytics layer.

Practical Advice

Do not start with a server-class database. Start with CSVs. Then move to a lightweight embedded SQL database when you need queries. Then move to a server-class time-series database when you need scale. Premature infrastructure optimisation is how people spend 3 weeks setting up a database and 0 weeks testing a strategy.

You’re Done When…

  • Your historical data is stored in a format you can query (CSV, an embedded SQL database, or a server-class time-series database)
  • You can load a symbol’s candle history into a pandas DataFrame in under 5 seconds
  • You have a data update process (manual or cron) that keeps data current

3.4 Data Quality & Cleaning

Bad data is worse than no data. A corrupted candle can trigger a false signal, open a real trade, and lose real money. This section covers the data quality checks that must exist before any data touches your strategy.

Common Data Quality Issues

IssueWhat HappensHow to Detect
Missing candles (gaps)Indicators calculate wrong values, signals fire at wrong timesCheck for expected number of rows per day/week
Incomplete candlesA candle fetched before the period closed has partial dataCompare volume vs typical; check fetch timestamp vs period end
Duplicate candlesSame timestamp appears twice, inflates averagesCheck for duplicate timestamps after ingestion
Extreme outliersA candle shows volume of 5 when the average is 5,000Flag candles where volume or range is >5 standard deviations from mean
Wrong timezoneWeekly candles calculated from the wrong start dayVerify first candle timestamp matches expected timezone

War Story

Our candle update script had a checkpoint bug: it fetched weekly candles at 00:30 UTC on Monday, which meant the Sunday close candle was still incomplete. The script saved it, advanced the checkpoint, and never went back to correct it. The result: one week’s candle had volume of 5 instead of 5,000 and a close that was off by $2,000. The strategy calculated the wrong SMA slope and would have made a trade based on garbage data. We only caught it because we built a data quality validator that flags candles with volume below the 1st percentile.

Minimum Data Quality Checks

  1. No gaps: Every expected time period has a candle
  2. No duplicates: No timestamp appears more than once
  3. Reasonable volume: No candle has volume that is zero or more than 5x the rolling median
  4. OHLC integrity: High ≥ Open, Close, Low. Low ≤ Open, Close, High.
  5. Monotonic timestamps: Each candle’s timestamp is exactly one period after the previous
  6. Complete candles: The most recent candle is for a period that has actually closed

Run these checks every time new data is ingested. Fail loudly if any check fails. Never let unvalidated data reach your strategy engine.

You’re Done When…

  • Your data pipeline includes quality checks that run on every ingestion
  • You can detect missing candles, duplicates, and outliers automatically
  • Bad data causes an error, not a trade

3.5 Historical Backfill

Before you can backtest anything, you need years of clean historical data. This section covers how to do a proper historical backfill and verify it’s correct.

How Much History Do You Need?

  • Minimum: 3 years. Covers at least one bull market, one bear market, and some chop.
  • Recommended: 5+ years. Enough to split into training and test sets while still having meaningful data in each.
  • For weekly strategies: 5 years gives you ~260 candles. 3 years gives you ~156. You need enough data for statistical significance.

The Backfill Process

1

Determine start date

For BTC: January 2020 gives you 5+ years including the COVID crash, 2021 bull run, 2022 bear market, and 2023–2026 recovery. This is a rich dataset.

2

Fetch in pages

Most APIs return 200–1000 candles per request. Paginate from your start date to present, using the last candle’s timestamp as the start of the next request. Add a 1-2 second delay between requests to stay within rate limits.

3

Validate the result

Run all data quality checks from section 3.4. Verify the total number of candles matches what you expect (365 daily candles per year, 52 weekly candles per year, etc.).

4

Set up ongoing updates

Create a cron job or scheduled task that fetches new candles daily. Use an overlap window: always re-fetch the last few candles in case the previous fetch caught an incomplete candle.

Pro Tip: Overlap Window

When resuming a data fetch from a checkpoint, always re-fetch the last 3–5 candles from the previous run. This corrects any candles that were incomplete when first fetched (e.g., fetched at 00:30 UTC before the daily candle closed at 00:00 UTC the next day). This one pattern eliminates the most common source of data corruption in automated pipelines.

You’re Done When…

  • You have 3–5 years of clean, validated candle history for your target instrument
  • Data quality checks pass on the complete dataset
  • You have a scheduled process to keep data current
  • The update process uses an overlap window to correct incomplete candles

3.6 Real-Time Data Layer

Backfill gives you history. The live system needs now. The cheap option — polling REST in a loop — works at backtest pace and falls apart at production pace. WebSocket discipline is what separates a system that catches fills in real time from one that’s always ten seconds behind reality.

Why WebSocket Beats Polling

For anything that changes faster than your cron interval — ticks, your own fills, your own order updates, position state — polling is the wrong tool. Three reasons:

  • Latency. A WebSocket push arrives in tens of milliseconds. A 1-second poll loop has, on average, 500ms of dead air between “something happened” and “your bot heard about it”. For an order that filled and is now drifting against you with no stop in place yet, that’s a long time.
  • Rate limits. REST endpoints are rate-limited; WebSockets generally are not (per-message, anyway). A polling loop tight enough to be useful is a polling loop that will eventually 429 and back off — usually exactly when the market is moving and the venue is busy.
  • Completeness. Between two REST polls you can miss intermediate state entirely — a fill that arrived and was reduced by another fill, an order that was placed and cancelled inside one polling window. Streams emit every transition.

Division of Labour: WebSocket vs REST

The two are complements, not substitutes. A reasonable default split:

  • WebSocket (streaming): ticks, trades, order-book updates, your account’s order updates, your account’s fills, position state changes, funding-rate updates.
  • REST (snapshots and control): initial account state on startup, periodic full reconciliation snapshots, historical backfill, placing and cancelling orders. Many operators place orders over REST even when the venue offers WS order entry, because REST is simpler to reason about and the latency penalty on placement is small relative to a position’s holding period. Opinions vary.

The WebSocket is the source of truth for change. The REST snapshot is the source of truth for state. Both, together, defend against the failure modes of either alone.

Reconnection Discipline

Connections drop. Networks blip. Venues restart. Your reconnect logic decides whether you trade through it or sit blind for an hour.

  • Exponential backoff with jitter. First reconnect attempt: 1 second. Then 2, 4, 8, 16, capped at 60. Add ±25% jitter to each delay so your bot and a thousand others don’t all hammer the venue at the same instant when it comes back up. Without jitter, the herd takes the venue down again on its first breath.
  • Circuit-breaker on flap. If you’ve reconnected more than N times in the last minute, stop reconnecting and alert. A flapping connection is almost always a sign of a deeper problem (your network, the venue’s, or a key error) that more reconnects will not fix. Better to halt and page than to log-spam a runbook into oblivion.
  • Resubscribe explicitly on reconnect. A new connection has no subscriptions. Re-send every subscribe message; do not assume the venue remembered.
  • Reconcile on every reconnect. The window of disconnection is exactly when fills you missed live. The first thing to do after “CONNECTED” is a REST snapshot of orders, positions, and recent fills, diff against local state, and emit any reconciliation actions before resuming normal trading.

Sequence-Gap Handling

Most venues stamp every WebSocket message with a monotonically-increasing sequence number per channel. The discipline is small and absolute: if you receive seq N+2 after seq N, you missed N+1. There is no “probably nothing important happened.”

  • On gap detection, fetch a REST snapshot for that channel (full order-book snapshot, full positions snapshot, etc.) and treat it as authoritative.
  • Drop any buffered out-of-order messages older than the snapshot timestamp.
  • Log the gap with both seq numbers and the time delta so you can diagnose whether it’s your network or the venue’s feed.

An order book that’s silently missing one update is worse than no order book at all — it makes confident decisions on stale state. Force-resnap is cheap insurance.

Checksum Verification on Order-Book Streams

Most venues that stream incremental order-book updates also publish a periodic checksum — typically a hash of the top-N levels of bids and asks. The contract is: if your locally-reconstructed book’s checksum doesn’t match the venue’s on the same tick, your local book is wrong. Causes range from a dropped delta to a mis-handled price-level deletion. The remedy is the same regardless: force-resnap. Discard the local book, fetch a full snapshot, replay any deltas that have arrived since.

If you trade off the order book at all — even just for sizing slippage estimates — verify checksums. A book you don’t verify is a book you can’t trust.

Idempotency and Deduplication

The same fill might land in your handler twice: once from the WebSocket fill stream, once from the REST snapshot you took on reconnect. Or three times, if the websocket re-delivers a buffered message after reconnect. The defence is a single line: dedupe by the venue’s fill_id (or order_id, or whatever immutable identifier the venue assigns). Before you process any fill, check whether you’ve already recorded it; if yes, drop. The cost is one indexed lookup per event; the benefit is that double-counting fills cannot corrupt your state.

The same applies to order updates: dedupe by (order_id, status, update_ts). Many venues will re-emit the “FILLED” status under various edge cases; your state machine should be a no-op the second time.

Key Insight

The WebSocket is not “the same thing as REST, but faster.” It is a different reliability model. REST gives you snapshot consistency at the cost of latency; streaming gives you change notifications at the cost of having to manage gaps, reconnects, and dedupe. Build for the streaming reliability model from day one — gap detection, checksum verification, dedupe, reconcile-on-reconnect — or you will eat the cost in silent state corruption when it matters most.

You’re Done When…

  • Live ticks, fills, and order updates arrive over WebSocket; you only poll REST for snapshots and control
  • Reconnect uses exponential backoff with jitter and a flap circuit-breaker
  • Every reconnect triggers a REST snapshot and a reconcile pass before trading resumes
  • You detect sequence gaps and force-resnap on detection, not at the next periodic check
  • If you consume order-book streams, you verify the venue’s checksum on every tick
  • Every event handler dedupes by the venue’s immutable id

Module 4

Strategy Development
5 sections · ~4 hours

4.1 Hypothesis to Testable Signal

Every strategy starts as an idea. The skill is converting that idea into a precise, testable, falsifiable hypothesis with exact entry and exit conditions. Vague ideas cannot be backtested. Precise hypotheses can.

The Conversion Process

Most trading ideas sound like this: “BTC tends to rally after big dips.” That is not a strategy. It is an observation. To make it testable, you need to answer five questions:

  1. What defines the condition? — What counts as a “big dip”? A 10% drop in 7 days? A daily candle closing below the 200-day moving average?
  2. What is the entry signal? — Do you buy the moment the condition is met? The next candle open? After a confirmation candle?
  3. What is the exit signal? — Time-based (hold for 14 days)? Target-based (sell at +5%)? Indicator-based (sell when RSI > 70)?
  4. What is the position? — Long only? Short only? Both directions?
  5. What timeframe? — Daily candles? Weekly? 4-hour?

Example: From Vague to Precise

Vague IdeaPrecise Hypothesis
“BTC rallies after big dips”When BTC daily close drops >10% from its 30-day high, go long at the next daily open. Exit after 14 days or at +8%, whichever comes first. Stop-loss at -5%.
“Trend following works”When a slow weekly moving-average derivative turns positive AND a close-position filter confirms strong-conviction candles AND an efficiency-ratio gate confirms a trending (not choppy) tape, go long. Exit when the moving-average derivative turns negative.
“Funding rate extremes revert”When 8h funding rate exceeds the 95th percentile of its 90-day distribution, go short. Exit when funding rate returns to median. Stop-loss at -3%.

The vague versions are untestable. The precise versions can be coded and backtested in an afternoon.

Key Insight

If you cannot write the entry and exit conditions as an if statement in code, the hypothesis is not precise enough. “Buy when it looks like a reversal” cannot be coded. “Buy when RSI(14) crosses below 30 and then crosses back above 30 on the next candle” can be coded in 3 lines of Python.

You Understand This When…

  • You can convert a vague trading idea into a precise hypothesis with exact entry, exit, and stop-loss rules
  • Every condition in your hypothesis is measurable and codeable
  • You’ve written at least one hypothesis in the precise format

4.2 Types of Strategies

Not all strategies are the same. They trade on different timeframes, exploit different market behaviours, and suit different personality types. This section maps the landscape so you can choose where to start.

The Strategy Spectrum

TypeHolds ForTrades/YearEdge SourceComplexity
Weekly Trend FollowingWeeks to months3–8Riding large trends, cutting losers earlyLow
Daily SwingDays to weeks15–40Multi-day momentum or mean reversionMedium
IntradayHours100–500Session patterns, liquidity sweeps, time-of-day effectsHigh
Derivatives / PositioningHours to days5–20Funding rate extremes, OI crowding, liquidation cascadesMedium
Statistical AnomalyVariesVariesDay-of-week effects, cross-asset correlation, regime-conditional patternsMedium-High

Where to Start

Start with weekly trend following. Here is why:

  • Fewest trades per year means lowest fees and least execution complexity
  • Simplest logic: one indicator, one or two filters, binary position (long or flat)
  • Long observation window: 5 years of weekly candles is ~260 data points, allowing the strategy to be exposed to multiple regimes
  • Forces parsimony: a weekly system with only 3–8 trades per year cannot support many parameters — you are constrained to 1–2 indicators and a simple rule, which is the real defence against overfitting
  • Set-and-forget: the system checks once per week, not every second

Important nuance: low trade count does not automatically mean “hard to overfit.” The opposite is true — low N means high variance, wider confidence intervals on every metric, and an easier path to fitting noise. The defence against overfitting at low N is parsimony (very few parameters), long observation windows (multiple regimes), simple rules (mechanism you can articulate), and cross-instrument validation (does the same rule work on ETH, SOL, etc. without re-tuning?). A weekly system is “safer” only because the format forces these constraints — not because few trades is inherently more honest.

Our simplest live strategy — a weekly trend-following system — trades a handful of times per year and shows a strong profit factor over a multi-year window. Simple does not mean weak — but a multi-year, low-double-digit-trade sample is still a wide confidence interval, which is why we lean on cross-instrument and walk-forward checks rather than the single point estimate.

Real Examples Across the Spectrum

1

Weekly Trend Following (Low Frequency)

A long-only weekly system that goes long when a slow moving-average derivative turns positive, gated by a close-position filter (the candle closed strongly into its range) and an efficiency-ratio gate (the tape has been trending, not chopping). Goes flat when the derivative turns negative. No shorting — the underlying has strong positive long-term drift, so shorting is structurally expensive. Single-digit annual trade count. Profit factor in the high single digits over a multi-year window. Max drawdown contained well under 25%.

2

Regime-Conditional Short (Mid-Frequency Swing + Regime Gate)

A short-side system that fires only when a confirmed bear regime is in force AND the daily efficiency-ratio is low (choppy / ranging). Exits when efficiency-ratio recovers or the bear regime closes. The edge comes from the underlying bleeding slowly during low-efficiency bear windows — slow drift, not sharp drops. Low double-digit annual trade count. Compound returns roughly proportional to the regime it operates in, with drawdown profile in line with that regime.

3

Derivatives-Driven Contrarian (Positioning Overlay)

A system that fires when derivatives positioning data signals one side of the book is crowded and price action confirms the squeeze. Mechanical logic: crowded positioning creates forced unwind cascades when price moves against it. The edge is extracted from positioning, not from technicals. Hit rate above 60%, profit factor above 1.5, single-digit annual trade count. Used as an overlay rather than a standalone — statistical confidence is moderate, not strong.

4

Session-Based Intraday (Multi-Timeframe)

During specific intraday session windows, identify a liquidity sweep on a higher intraday timeframe, wait for a market-structure shift on a lower one, enter on the retracement into a price-imbalance zone. Stop below the sweep wick. Target the next opposing liquidity pool, typically with an asymmetric reward-to-risk profile. This is the most complex strategy type — requires fine-grained data and multi-timeframe analysis. We treat it as a research direction, not a primary capital allocation.

Practical Advice

Build your first strategy at the simple end of the spectrum. Get it live, profitable, and boring. Then explore complexity. The temptation is to start with intraday multi-timeframe systems because they feel sophisticated. Resist. Complexity is earned, not chosen.

You Understand This When…

  • You can describe the five strategy types and their trade-offs
  • You understand why weekly trend following is the best starting point
  • You’ve chosen a strategy type for your first build

4.3 Conviction Gates & Filters

A raw signal tells you when to trade. Gates and filters tell you when not to trade. The difference between a mediocre strategy and a great one is often not the entry signal — it’s the trades the system refuses to take.

What Is a Gate?

A gate is a condition that must be true in addition to the entry signal. If the signal fires but the gate is closed, the system does nothing. Gates filter out low-conviction setups.

GateWhat It MeasuresWhy It Helps
Close Position (CP)Where the candle closed within its range (0 = low, 1 = high)A high-CP threshold means price closed strongly into the top of its range — strong conviction. Filters out indecisive candles.
Efficiency Ratio (ER)Direction vs noise ratio over N periods (0 = pure chop, 1 = pure trend)An ER threshold above pure-noise levels means the market has been trending rather than chopping. Filters out choppy periods where trend-following gets whipsawed.
Regime GateCurrent market regime (bull, bear, chop)Only trade in regimes where the strategy has proven edge. A long-only strategy gated by “bull regime” avoids bear markets entirely.
Volatility FilterCurrent volatility relative to historical (e.g., ATR percentile)Some strategies only work in low-vol (mean-reversion, day-of-week effects) or high-vol (breakout, momentum). The filter is the differentiator.

Real Impact: Before and After Gates

Here is the directional impact gates had on one of our live strategies. The numbers are deliberately qualitative — the point is the shape of the improvement, not specific values:

MetricWithout GatesWith Close-Position + Efficiency-Ratio Gates
CAGRPositive but modestSubstantially higher
Profit FactorLow single digitsHigh single digits
Max DrawdownAbove 30%Well under 20%
TradesHigher countLower count (gates filter many out)

Fewer trades, higher returns, lower drawdown. The gates eliminated trades that would have been losers — low-conviction entries during choppy or indecisive markets. The signal was the same. The gates made it profitable.

Key Insight

The best strategies are defined as much by what they refuse to trade as by what they trade. Most profitable traders sit out 60–80% of sessions. Your system should do the same. Gates are how you codify patience.

You Understand This When…

  • You can explain what a conviction gate is and why it improves strategy performance
  • You know the difference between an entry signal and a gate
  • You’ve identified at least one gate to add to your strategy hypothesis

4.4 The Investigation Template

This is the repeatable process for testing any trading idea. Hear a claim, formulate the hypothesis, pull data, test it, find the filters that matter, deliver a verdict. You will use this template dozens of times.

The Six Steps

1

Hear the claim

From Twitter, a podcast, a friend, a paper, or your own observation. “Funding rate extremes revert.” “Tuesday and Wednesday are the best trading days.” “Returns are higher right after a sharp VIX spike.” Don’t judge it — just write it down.

2

Formulate the hypothesis

Convert the claim into a precise, testable statement with exact conditions: “When BTC 8h funding rate exceeds the 95th percentile of its 90-day distribution, the 24-hour forward return is negative on average.” If you cannot make it precise, it is not testable.

3

Pull the data

Fetch the specific data you need. For the funding rate hypothesis: 5 years of 8-hour funding rate history + daily OHLCV candles. You already have this from Module 3.

4

Test it

Write a simple backtest. Identify every occurrence of the condition. Measure the forward return at your target horizon (4h, 24h, 7d). Calculate hit rate, average return, profit factor. Chart the results.

5

Find the filters that matter

Split the results by regime (bull/bear/chop), volatility (high/low), day of week, and any other relevant dimension. Often a signal that is flat overall becomes strong in one specific condition. In one investigation we ran, a flat headline result turned strongly positive once we filtered to low-volatility windows only — the filter was the entire signal.

6

Deliver a verdict

Be honest. The verdicts are: “Strong signal, worth developing further” (rare), “Weak signal, useful as overlay/filter only” (common), or “No signal, kill it” (most common). Do not force a positive result.

Worked Example (Illustrative): A Day-of-Week Effect

This is a textbook illustrative example — not a proprietary investigation — chosen because it’s familiar and shows the template clearly. Imagine you’ve heard the claim that a major equity index has a “Tuesday effect” — that Tuesdays are systematically stronger than other weekdays. Here is the shape of how that investigation would play out under this template:

  • Claim: The underlying has a positive Tuesday bias
  • Hypothesis: Average daily return on Tuesdays exceeds the average daily return across all other weekdays by a statistically meaningful margin
  • Data: Multi-year daily history of the index
  • Raw result: Marginally positive Tuesday average return — not enough to act on standalone.
  • Filters applied: Volatility (low/medium/high), regime (bull/bear/chop), month of year, prior-day direction, cross-asset
  • Key finding: In low-volatility windows the Tuesday edge flipped from flat to clearly positive; in medium-volatility windows it went negative. The volatility filter was the entire differentiator.
  • Cross-asset: The effect appeared on related indices, with a similar conditional pattern — suggesting a real (if narrow) market dynamic rather than single-asset noise.
  • Verdict: Weak signal, not tradeable standalone. Interesting as a conditional relationship, insufficient for production capital.

Total time on a real investigation of this shape: about half a day. The investigation produces real knowledge: a conditional relationship exists, it’s not strong enough to trade, and the volatility filter is the key variable. That last insight is useful for other research.

Pro Tip

Keep a research log. Every investigation — even the ones that produce nothing — generates knowledge about what doesn’t work and which filters matter. Over time, patterns emerge across investigations: “volatility regime matters for almost everything” is a finding that improves all future research.

You Understand This When…

  • You can run the 6-step investigation template on any trading claim
  • You understand that most investigations produce negative results — and that’s fine
  • You know how to apply dimensional splits (regime, volatility, day-of-week) to find conditional signals

4.5 Using LLMs as Research Assistants

Large language models (Claude, ChatGPT, Gemini, Grok) are extraordinary tools for accelerating every step of the investigation process. They cannot tell you whether a strategy works — only data can do that — but they can write the code to test it, generate hypotheses you hadn’t considered, and help you interpret results.

What LLMs Are Good At

  • Writing backtesting code: Describe your hypothesis precisely and an LLM can produce a working backtest script in minutes. Review it carefully — check for look-ahead bias, correct fee modelling, and proper data handling.
  • Generating hypotheses: “Given this dataset of BTC daily candles with funding rate and open interest, what are 10 testable hypotheses about predictable price behaviour?” LLMs produce creative starting points.
  • Interpreting results: Paste a backtest output and ask: “Is this statistically significant? What are the red flags? What would you test next?”
  • Adversarial review: Use one LLM to build the strategy, then give the backtest results to a different LLM and ask it to attack. “Find every reason this strategy might be overfitted or misleading.”

What LLMs Are NOT Good At

  • Predicting markets: An LLM has no access to current market data and no ability to forecast price movements. Never ask “will BTC go up tomorrow?”
  • Replacing testing: An LLM might say “that strategy should work because…” That’s a theory, not evidence. Always test.
  • Writing production-grade trading code without review: LLM-generated code can have subtle bugs — especially around look-ahead bias, timezone handling, and edge cases. Always review and test.

Critical: Look-Ahead Bias in LLM-Generated Code

The most common bug in LLM-generated backtesting code is look-ahead bias — using information that would not have been available at the time of the decision. Example: calculating an indicator using today’s close to make a decision that should have been made at today’s open. Always review generated code line by line and ask: “at the moment this decision is made, has this data point been observed yet?”

The Multi-LLM Workflow

For maximum rigour, use multiple LLMs in an adversarial workflow:

  1. LLM #1 (Builder): Write the strategy code and run the backtest
  2. LLM #2 (Attacker): Review the code for bugs, review the results for overfitting signals, suggest falsification tests
  3. LLM #3 (Synthesiser): Given both perspectives, produce a final assessment

This mirrors how professional quant teams work: the researcher proposes, the risk team attacks, and the portfolio manager decides. You can simulate all three roles with different LLMs or different conversations.

You Understand This When…

  • You know how to use LLMs to accelerate strategy research without replacing testing
  • You can spot look-ahead bias in LLM-generated code
  • You understand the multi-LLM adversarial workflow

Module 5

Backtesting Done Right
6 sections · ~4.5 hours

5.1 Building a Backtesting Engine

A backtesting engine simulates your strategy against historical data as if you were trading in real time. The key word is “as if.” Every decision the engine makes must only use information that would have been available at that moment. Violate this and your results are fantasy.

The Core Loop

Every backtester, no matter how sophisticated, runs the same basic loop:

Flow
FOR each candle in historical data (chronological order):
  1. 1
    UPDATE indicators with this candle’s dataRecompute on the new closed candle. Never on the next one.
  2. 2
    CHECK strategy rules against current indicator valuesIf ENTRY signal fires AND gates are open → OPEN position. If EXIT signal fires → CLOSE position. If STOP-LOSS is hit → CLOSE position at stop price.
  3. 3
    CALCULATE PnL if position was closed this candleSubtract fees (maker/taker) and slippage estimate.
  4. 4
    LOG the trade (entry price, exit price, PnL, reason)Append to the trade log on close, with the exit reason for later attribution.
AFTER all candles processedCALCULATE summary statistics (CAGR, PF, MaxDD, Sharpe, WR).

The backtesting core loop. Every decision uses only data available up to and including the current candle. Never the next candle.

Evaluation Metrics: Beyond Sharpe and PF

Sharpe ratio, profit factor, win rate, and max drawdown are the starting set, but no single metric captures a strategy. Pros look at a panel. Add at least these three:

  • Sortino ratio — like Sharpe but uses downside deviation (volatility of negative returns) in the denominator instead of total volatility. Preferred for asymmetric strategies (trend-following, long-tail-capture systems) where upside volatility is desirable and Sharpe unfairly penalises a strategy for having big winners. Sortino = (mean return − risk-free) / downside deviation.
  • Calmar ratio — annualised return divided by max drawdown. The metric of choice for capital-preservation-conscious investors and for sizing decisions: it directly answers “how much return per unit of worst-case pain?” A Calmar of 1.0 means you earned, on a yearly basis, what your worst drawdown cost you. Above 2 is strong; above 3 is exceptional.
  • Information Coefficient (IC) — for predictive signals (anything that produces a continuous score, not just binary entries), IC is the Spearman or Pearson correlation between the signal value and the forward return. It is the cross-section quant standard for “does my signal carry information?” Useful even before you have a fully formed entry/exit rule. Even a small but stable IC (0.03–0.05) can be tradeable; instability matters more than absolute level.

When each is most informative: Sharpe for symmetric / mean-reverting systems and broad portfolio comparisons. Sortino for trend-following or any strategy where upside vol is the point. Calmar when you’re sizing capital or comparing strategies for retirement-style accounts. IC for signal research before strategy construction. Profit factor for trade-by-trade asymmetry. Win rate only in combination with average-win/average-loss ratio — high win rate with poor PF is a red flag for “picking up pennies in front of a steamroller.”

Look at all of them. A strategy that looks good on Sharpe but ugly on Calmar has hidden tail risk. A strategy with a high IC but low Sharpe likely has an execution problem, not a signal problem. The panel tells you which.

Building Your Own vs Using a Framework

You have two options:

  • Build your own: A simple backtester in Python/pandas is 100–300 lines. You control everything. You understand everything. Recommended for your first strategy.
  • Use a framework: Libraries like backtrader, vectorbt, or zipline provide pre-built infrastructure. Faster to start but harder to customise and easier to misuse.

We recommend building your own for your first strategy, using an LLM to help with the code. This forces you to understand every line. Once you understand the mechanics, frameworks become useful for speed.

Prompt for Your LLM

“Write a Python backtester using pandas that: (1) loads daily OHLCV from a CSV, (2) calculates SMA(20) and SMA(50), (3) goes long when SMA(20) crosses above SMA(50), exits when it crosses below, (4) models 0.1% fees per trade, (5) tracks each trade with entry/exit prices and PnL, (6) outputs CAGR, profit factor, max drawdown, and win rate. The decision to trade must be made on the candle’s close, with the entry/exit at the next candle’s open.”

You Understand This When…

  • You have a working backtester that processes candles chronologically
  • All trading decisions use only past and current data (no future data)
  • Entries and exits happen at the next candle’s open, not the signal candle’s close
  • Fees are subtracted from every trade

5.2 The Cardinal Rules

These rules are non-negotiable. Violate any one of them and your backtest results are meaningless. Memorise them.

1

No Look-Ahead Bias

Every decision must use only data available at the time of the decision. If your strategy decides to buy based on today’s close, the earliest you can execute is tomorrow’s open. Using today’s close to enter at today’s open is impossible in real life. This is the #1 bug in backtesting code.

2

Model Realistic Costs

Every trade incurs: exchange fees (0.02–0.06% per side), slippage (0.01–0.05% per side), and for perpetuals, funding rates (variable, every 8 hours). Use 25 basis points (0.25%) round-trip as a baseline for crypto perpetuals. If your edge doesn’t survive 25bps of costs, it’s not an edge — it’s noise.

3

No Survivorship Bias

If you only test on assets that exist today, you miss the ones that went to zero. In crypto this matters less (BTC and ETH have survived), but for altcoins it is critical. A strategy that “worked on the top 20 coins” might have been tested on the 20 that survived — not the 200 that didn’t.

4

Use Enough Data

A strategy with 8 trades over 2 years is not statistically meaningful. You need enough trades and enough time to cover different market regimes. Minimum: 3 years of data covering at least one full bull-bear cycle. No fixed trade count is sufficient on its own. What matters is: (a) the confidence interval on the chosen metric (Sharpe, profit factor) is tight enough to act on, (b) the rule is robust across multiple instruments without re-tuning, (c) walk-forward out-of-sample performance holds up. Rule of thumb: under ~30 trades, confidence intervals on Sharpe and PF are too wide to trust as a standalone claim — you must lean on cross-instrument and walk-forward evidence. 100+ trades is preferred when basing a decision on a single instrument’s point estimate. Low-N strategies (e.g. ~3 weekly trades/year) are not invalid — they just have to earn trust through parsimony, multi-regime exposure, and cross-instrument robustness rather than through a tight confidence band on the trade sample alone.

5

Do Not Optimise Then Test on the Same Data

If you sweep 200 parameter combinations and pick the best one, your backtest is measuring how well you fit to historical noise, not how well the strategy works. Split your data: optimise on the first 70%, test on the remaining 30%. If it works on both, it might be real.

War Story

We ran a comprehensive investigation into retail and institutional trading footprints. Across 10 engineered signals and 1,260 total trades: 7 of 10 signals were at or below coin-flip after costs. The overall portfolio produced +0.039% mean return — effectively zero. Only one signal survived all falsification checks, and even that had a p-value of 0.22. The investigation produced more “no” answers than “yes” answers. That is normal. That is the process working correctly.

You Understand This When…

  • You can list the 5 cardinal rules from memory
  • You know why testing on the same data you optimised on is invalid
  • Your backtester includes realistic fee modelling

5.3 Monte Carlo Simulation

A single backtest gives you one realised path. Monte Carlo gives you a distribution of plausible alternative paths so you can size risk against the worst-case end of that distribution rather than against the lucky single sample you actually lived through. The catch: how you generate those alternative paths matters enormously, because trading P&L is not i.i.d.

Why One Backtest Is Not Enough

Your backtest shows a max drawdown of -15%. Great. But what if the three worst trades had happened consecutively instead of spread out? The drawdown might have been -35%. The specific order of trades in history is one random sample from a much larger distribution of plausible orderings. Monte Carlo’s job is to estimate that distribution.

First-Pass Illustration: The Naive Shuffle (and Why It’s Wrong)

The textbook introduction to Monte Carlo on backtest output is “take your N trades, randomly shuffle the order, simulate the equity curve, repeat 10,000 times, look at the distribution.” This is useful as a first-pass illustration of variance — it shows that the historical equity curve is one of many possible paths — but it is the wrong production method for trading data.

The reason: shuffling assumes trades are independent and identically distributed (i.i.d.). They are not. Real trading P&L exhibits:

  • Serial correlation: Trend-following systems win streaks during trending regimes and lose streaks during chop. Mean-reversion systems do the opposite.
  • Regime clustering: A bear market produces clustered losses for long-only systems. Shuffling sprays those losses across the timeline and erases the cluster.
  • Volatility clustering: High-vol periods produce both bigger wins and bigger losses, bunched together. Shuffling smears them flat.

This matters because real drawdowns come precisely from clustered losses — runs of correlated bad trades during the wrong regime. A naive shuffle systematically underestimates tail drawdown risk by destroying exactly the dependence structure that produces it. If you size your account on naive-shuffle 95th-percentile drawdown, you are sizing on an optimistic distribution.

Block Bootstrap (the Correct First Production Method)

Block bootstrap preserves local autocorrelation by resampling contiguous blocks of trades instead of individual trades.

Pseudocode:

CODE · PSEUDOCODEscroll to read
trades = list of N trade returns in chronological order
block_size = round(sqrt(N))           # rule of thumb; see below
n_blocks = ceil(N / block_size)

FOR each of 10,000 simulations:
    sampled = []
    FOR i in 1..n_blocks:
        start = random integer in [0, N - block_size]
        sampled.extend(trades[start : start + block_size])
    truncate sampled to length N
    simulate equity curve from sampled
    record final return, max DD, Sharpe, etc.

distribution = the 10,000 results

Block-length rule of thumb: block ≈ √N is a reasonable default. For higher-frequency strategies (intraday, hundreds of trades), 5–10 trades per block is usually enough to preserve short-horizon dependence. For lower-frequency strategies with strong regime structure, push toward 15–20 trades per block so each block spans a meaningful slice of regime time. Sensitivity-test the result across two or three block sizes — if your tail estimates wobble dramatically, the dependence structure is doing real work and the answer is uncertain.

Stationary Bootstrap (Politis–Romano)

Fixed block lengths have an arbitrariness problem: why 5? why 20? The stationary bootstrap (Politis & Romano, 1994) draws a random block length each time from a geometric distribution with mean p−1. This produces a resampled series that is itself stationary (the fixed-block version is not), and is generally more robust to block-size mis-specification.

Recipe: at each step, with probability p start a new block at a random position; otherwise continue the current block. Choose p so 1/p matches your target average block length (e.g. p = 0.1 for an average block length of 10 trades). For most retail-scale playbook work, stationary bootstrap is the default to reach for.

Regime-Stratified Resample

If you have already labelled each trade with the regime it occurred in (bull / bear / chop, or high-vol / low-vol), you can resample within each regime separately and recombine in proportion to the regime exposure you expect going forward. This preserves regime-conditional behaviour rather than averaging it out.

This is useful when the strategy clearly behaves differently across regimes and you care about scenarios like “what happens if the next 12 months are 70% chop and 30% bear?” You can also use it to stress-test against a regime mix that is more hostile than the historical mix.

Diagram
Regime-Stratified Resample
Bull120 trades
Bear80 trades
Chop100 trades
↓ resample WITHIN each regime, not across
Bull120 resampled (in-regime only)
Bear80 resampled (in-regime only)
Chop100 resampled (in-regime only)
Result: each bootstrap run preserves regime exposure. A bull-regime run can’t accidentally include bear-regime drawdowns it would never have seen.

Regime-stratified resample preserves the regime composition of the original sample so bootstrap distributions don’t blend incompatible market environments. Recombine in the proportion you expect going forward, or stress-test against a more hostile mix than the historical one.

Distribution
Block-Bootstrap Drawdown Distribution (10,000 runs)
  • −5% to −10%6%
  • −10% to −15%18%
  • −15% to −20%28%
  • −20% to −25%30%
  • −25% to −30%14%
  • −30% to −40%4%
◀ Historical result sits in the −10% to −15% bucket
◀ 95th percentile: −38%
Tail is heavier than naive-shuffle would show, because clustered losses are preserved. Size on −38%, not −30%.

Block bootstrap typically produces a heavier left tail than naive shuffle on the same trade list, because it preserves the loss-clustering that drives real drawdowns. The honest distribution is wider and uglier than the i.i.d. one.

Key Insight

Always plan for the 95th-percentile drawdown from a dependence-preserving bootstrap (block or stationary), not the historical value and not the naive-shuffle value. If the strategy survives the bootstrap-95th in 95% of simulations, it can survive realistic clustered losses. If you size on the historical or i.i.d. result, you are one regime-cluster away from ruin.

Sample-Size Honesty

If you only have 50–100 trades, even block bootstrap is unreliable — the resulting confidence intervals will be very wide and the tail estimates noisy. Don’t let Monte Carlo give you false comfort. A pretty distribution chart computed from 60 trades is still a chart computed from 60 trades. In low-N regimes, lean on cross-instrument robustness and walk-forward consistency rather than on bootstrap percentiles, and state explicitly in your strategy report that the MC tails are estimated from a small sample.

You Understand This When…

  • You can explain why naive trade-shuffling underestimates tail drawdown for trading data
  • You can implement block bootstrap and choose a sensible block length
  • You know when to reach for stationary bootstrap or regime-stratified resampling
  • You plan risk around a dependence-preserving 95th percentile, not the historical or i.i.d. result
  • You acknowledge the limits of MC at low trade counts

5.4 Walk-Forward & Out-of-Sample Testing

The most powerful test of a strategy is whether it works on data it has never seen. Walk-forward testing simulates this by training on one period and testing on the next, rolling forward through time.

Out-of-Sample (OOS) Testing

The simplest version: split your data into two parts.

  • In-sample (70%): 2020–2023. Use this to develop and optimise your strategy.
  • Out-of-sample (30%): 2024–2026. Test the finalised strategy here. Do not look at this data during development.

If the strategy performs similarly on both sets, the edge might be real. If it performs well in-sample but fails out-of-sample, you overfitted.

Walk-Forward Testing

A more rigorous version: roll through time in windows.

  1. Train on 2020–2022, test on 2023
  2. Train on 2021–2023, test on 2024
  3. Train on 2022–2024, test on 2025

Each test period is truly unseen. If the strategy is consistently profitable across all test windows, the edge is robust. If it works in some windows and fails in others, investigate which market conditions caused the failures — that tells you about regime sensitivity.

Rolling vs Anchored Walk-Forward

Two variants, with a real trade-off:

  • Rolling: the training window has a fixed length (e.g. 3 years) and slides forward. Each retraining drops the oldest data. Adapts faster to regime change because old, no-longer-representative data falls off. Use when you believe the data-generating process drifts — market microstructure changes, new venues, post-halving regime shifts.
  • Anchored: the training window grows — the start point stays fixed, more data is appended each cycle. Uses all available history. More statistical power, less adaptive. Use when the underlying mechanism is stationary (a structural carry trade, a well-grounded behavioural pattern) and old data is still informative.

Rule of thumb: rolling for crypto and fast-moving microstructure-driven edges; anchored for macro-structural or cross-asset patterns. If unsure, run both and compare — if rolling is materially better than anchored, that itself tells you the edge is regime-sensitive and you have a degradation-risk dimension to monitor in production.

Time-Cut Hygiene: Data Freeze, Commit Hash, Single-Look

Walk-forward is only as honest as the discipline around it. Three rules:

  • Data freeze date. Record, in the strategy report, the exact date and time at which the OOS data was carved off and the in-sample window was sealed. Any data that arrives after that timestamp is OOS. If you re-run the analysis later with newer data, that is a new walk-forward cycle, not a continuation.
  • Commit hash discipline. Every backtest result is associated with a code commit hash. Results from uncommitted code are unverifiable — you can’t reproduce them, and a future you can’t audit them. Strategy reports without an attached commit hash get treated as informal notes, not evidence.
  • Single-look OOS. Once you have looked at the OOS performance, you have used that data to inform your decisions, even if all you did was “think about it.” Your subsequent choices are now informed by the OOS sample — the data is contaminated. You have two honest options: (a) accept the result as the final, single-look verdict on the strategy in its current form; or (b) freeze a new OOS window going forward (true walk-forward into the future) and treat the previous OOS result as part of the in-sample. There is no third option that preserves OOS purity.

Warning: The Peeking Problem

The temptation is enormous: your strategy fails on the out-of-sample data, so you “adjust” it and re-test. You have now contaminated the OOS data — it is no longer unseen. The only honest approach: develop on in-sample, test on OOS once, and accept the result. If it fails, fold the OOS into in-sample, develop a new hypothesis, and freeze a fresh forward window. Do not pretend the same data is still untouched.

You Understand This When…

  • You can split data into in-sample and out-of-sample sets
  • You understand walk-forward testing and the rolling-vs-anchored trade-off
  • Every backtest report carries a data freeze date and a commit hash
  • You apply the single-look OOS rule: once seen, the data is contaminated
  • You will not peek at out-of-sample data during development

5.5 Parameter Sensitivity

If changing a parameter by 10% destroys your edge, you don’t have an edge. You have a coincidence. Parameter sensitivity analysis tests whether your strategy is robust or fragile.

How to Run a Sensitivity Sweep

  1. Identify every tuneable parameter in your strategy (e.g., a moving-average lookback, a close-position gate threshold, an efficiency-ratio gate threshold)
  2. For each parameter, test a range of values around your chosen setting (e.g., for a lookback of N, test N−1, N, N+1, N+2)
  3. Run the full backtest for each combination
  4. Plot the results as a heatmap or table
  5. Check: is your chosen parameter in a “plateau” of good performance? Or is it a lone spike?

What Good vs Bad Sensitivity Looks Like

Here is the shape of a healthy parameter sweep. Imagine sweeping a gate threshold across its plausible range and recording CAGR and profit factor at each value:

Gate Threshold (relative)CAGRProfit FactorVerdict
Loose (low end)StrongStrongStrong
Slightly tighterStrongStrongStrong
Mid-rangeStrongStrongStrong
Selected operating pointBestBestSelected
Slightly tighter stillStrongStrongStrong
TighterAcceptableAcceptableAcceptable
Tightest (sample-starved)LowerLowerDegrading

This is a robust parameter: performance is strong across a wide range, and the selected value sits in a broad plateau. Moving the threshold by ±10% barely changes the result. This is what you want.

A fragile parameter would show a sharp spike at exactly one value, with sudden collapse one tick in either direction. If one tick destroys the strategy, the “edge” is an artefact of the specific data, not a real market phenomenon.

Key Insight

Robust strategies have flat plateaus in parameter space. Overfitted strategies have sharp spikes. If your strategy only works at one specific lookback and fails one tick in either direction, it is almost certainly curve-fitted. Real edges are broad. Coincidences are narrow.

You Understand This When…

  • You have swept all tuneable parameters in your strategy
  • Your selected parameters sit in a broad plateau of good performance
  • Moving any parameter by ±10–20% does not destroy the edge

5.6 Multiple-Testing & FDR Control

The moment you start sweeping — across parameters, instruments, hypotheses, or anomaly-scanner output — the naive p-value framing breaks down. You are running many simultaneous hypothesis tests, and the standard 5% threshold guarantees a fixed rate of false positives by chance alone. Without correction, your “winners” are mostly luck.

The Multiple-Testing Problem

Run 100 strategy variants and apply a p < 0.05 cutoff. If none of them have any real edge, you still expect ~5 of them to look statistically significant by pure chance. The standard p-value is calibrated to a single test. Run many tests and the probability that at least one looks significant climbs fast: with 20 independent tests and no real signal, there is a 64% chance at least one comes in under p < 0.05. Run 100, it is essentially certain.

This affects almost everything in trading research:

  • Parameter sweeps within a strategy (“the best of 200 combinations”)
  • Hypothesis scans across instruments (“works on 3 of 23 coins”)
  • Anomaly-scanner output filtering (“80 anomalies, 40 plausible”)
  • Cross-venue tests, time-of-day effects, day-of-week effects, calendar effects

If you don’t correct for the number of tests, you are guaranteeing a steady stream of fake winners.

Bonferroni Correction (Conservative)

The simplest fix: divide your significance threshold α by the number of tests N. To claim a finding is significant at α = 0.05 across 100 tests, that finding must individually satisfy p < 0.0005. Bonferroni controls the family-wise error rate (FWER) — the probability of any false positive across the whole family of tests — at α.

Bonferroni is the right call for confirmatory tests: when a false positive is genuinely costly (you’re about to deploy capital), and you would rather miss real edges than ship a fake one.

The downside: it is brutally conservative. Real but moderate edges will fail Bonferroni in any large sweep, and you’ll under-discover.

Benjamini–Hochberg / FDR Control (Better for Exploratory Research)

The Benjamini–Hochberg (BH) procedure controls the False Discovery Rate (FDR) — the expected fraction of false positives among the tests you call significant, rather than the probability of any false positive at all. This is usually what you actually want: “of the 12 strategies I’m flagging as winners, I’m willing to tolerate ~10% being noise, in exchange for finding more real edges.”

Recipe:

  1. Run all N tests, collect their p-values.
  2. Sort the p-values ascending: p(1) ≤ p(2) ≤ … ≤ p(N).
  3. Choose your desired FDR level q (e.g. 0.10 = tolerate 10% false positives among discoveries).
  4. Find the largest rank i for which p(i) ≤ (i/N) × q.
  5. Reject (call significant) all tests with rank ≤ i.

BH is the right call for exploratory research: anomaly scans, parameter sweeps, multi-instrument hypothesis testing — situations where you can tolerate some false positives downstream because they get filtered by paper trading and the falsification suite, but you want to bound the noise rate so the candidate list is meaningful.

Choosing & Applying

SituationUseWhy
About to deploy live capital on the “winner”BonferroniFake winner is expensive; conservative is correct
Filtering 80 scanner anomalies down to a candidate setBH / FDRWant a meaningful shortlist, not zero
Parameter sweep within one strategyBH / FDR + plateau checkCombine FDR with sensitivity (Module 5.5) — isolated p-value spikes are suspect even if they survive correction
Cross-instrument hypothesis test (“does this work on the other 22 coins?”)BH / FDRYou want a calibrated set of survivors to investigate further

Practical note: track the total number of tests run on a research idea across the whole project lifetime, not just inside one notebook. If you tested 50 variants last week, killed them, and are now testing 50 more, the relevant N is 100. This is uncomfortable but honest. Selection bias compounds across sessions if you don’t.

Key Insight

The naive “p < 0.05” cutoff is a single-test concept. The moment you sweep, scan, or compare alternatives, you owe the data a correction — Bonferroni for confirmation, BH/FDR for exploration. Without it, your research pipeline is a noise factory that produces a steady stream of plausible-looking strategies that don’t survive deployment.

You Understand This When…

  • You can explain why a 5% cutoff applied to 100 tests guarantees ~5 fake winners
  • You know when to use Bonferroni (confirmatory) vs Benjamini–Hochberg (exploratory)
  • You can run the BH procedure manually given a list of p-values
  • You account for multiple testing in parameter sweeps, instrument scans, and anomaly-filter output

Module 6

Trying to Kill Your Strategy
4 sections · ~3 hours

6.1 Why Most Backtests Lie

A positive backtest is the most dangerous moment in strategy development. It feels like validation. It is usually an illusion. This section explains the specific mechanisms by which backtests mislead, so you can defend against each one.

The Three Ways Backtests Lie

1

Overfitting (the most common)

You tested 200 parameter combinations and picked the best. The strategy is not exploiting a market phenomenon — it is exploiting the specific random sequence of your historical data. It will fail on new data because it was built to fit the noise in old data. The more parameters you tune, the more opportunities for overfitting.

2

Selection Bias

You tested 50 different strategy ideas and the one that worked is the one you are presenting. But if you test 50 random strategies, some will show positive results by chance alone. At a 5% significance level, you expect 2–3 false positives out of 50 tests. The strategy that “worked” might just be the lucky random one.

3

Implementation Leakage

Subtle bugs that make the backtest easier than reality: using the close price to enter a trade on the same candle, not modelling slippage on large orders, ignoring funding rates that erode 0.1% every 8 hours, or assuming fills at the mid-price when you would actually cross the spread. Each leak adds a few basis points of phantom edge.

War Story — Framework Working as Intended

One of our derivatives-based reversal candidates looked promising in early single-window testing. The early stage of stress-testing — rerunning on clean, full-period data with the same rules — surfaced the problem immediately: the apparent edge was concentrated inside one short window and reversed sign outside it. The candidate was killed at the falsification gate, before any capital was committed. The lesson is not that a bad strategy slipped through; it is the opposite. Stress-testing exists precisely so candidates like this one die in the lab. The framework worked because we tested before deploying.

You Understand This When…

  • You can name three specific mechanisms by which backtests mislead
  • You are sceptical of positive results, not excited by them
  • You understand that the purpose of this module is to protect you from your own confirmation bias

6.2 The Six Falsification Tests

Every strategy that passes backtesting must survive all six of these tests before it is considered for live deployment. Fail any one, and the strategy goes back to the lab or gets killed. There is no “well, it mostly passed.”

1

Parameter Robustness

Move every tuneable parameter by ±10–20%. Does the edge survive? If the strategy only works at exactly the chosen parameters and collapses at nearby values, it is curve-fitted. Pass condition: Performance remains positive across the parameter neighbourhood. (Covered in Module 5.5)

2

Out-of-Sample Holdout

Test the strategy on data it has never seen. Develop on 2020–2023, test on 2024–2026. Pass condition: OOS performance is in the same ballpark as in-sample. It doesn’t need to be identical, but it must be positive and directionally consistent.

3

Regime Stability

Split your backtest by market regime: bull, bear, and chop. A strategy does not need to be profitable in all three, but you must know which regimes it works in and which it doesn’t. Pass condition: Profitable in at least two of three regimes, or clearly designated as a single-regime strategy with a regime gate (Module 10).

4

Cross-Venue Transfer

Run the strategy on data from a different exchange. The primary reasons performance can diverge across venues are deeper than “different volume profiles”: each venue has its own index price constituents (the basket of spot exchanges feeding the mark price — this directly drives liquidation prices, funding payments, and stop fills), its own liquidation engine mechanics (partial liquidation tiers, ADL queues, maintenance-margin schedules), and its own fee schedule (maker rebates, taker tiers, VIP discounts). On top of that sit venue-specific quirks — funding-rate caps, tick-size differences, COIN-M conventions, USDT-M vs COIN-M margining. Surface-level differences (close times, volume profiles) matter, but they are secondary. If the edge survives on one venue but dies on another, the dominant cause is usually one of the deep mechanics, not the cosmetics. Pass condition: Profit factor and direction are consistent across at least two data sources after applying each venue’s actual fee schedule, funding accrual, and liquidation rules.

5

Placebo / Random Baseline

Generate random entry signals as a baseline and compare your strategy against the distribution of random outcomes. The naive version (“random entries at the same frequency, beat the 95th percentile”) is only valid if the random baseline matches the strategy on every dimension that drives P&L. Otherwise the p-value is meaningless. The baseline must match: (a) average entry frequency (entries per year); (b) holding-period distribution (same mean and variance of trade duration, not just the mean); (c) time-in-market (% of bars in position); and (d) regime exposure — baseline trades must be drawn from the same regime mix the strategy actually traded in. If the strategy only enters in trending regimes, the baseline must be stratified to do the same; otherwise you are comparing strategy-in-trend vs random-in-everything, and the p-value is comparing two different distributions. If any of these don’t match, the placebo test is invalid and the p-value is misleading. Pass condition: Strategy performance exceeds the 95th percentile of matched random baselines (p < 0.05) and the matching dimensions are documented.

CODE · PSEUDOCODEscroll to read
# Stratified-randomisation baseline construction
strategy_trades = list of (entry_time, hold_bars, regime_at_entry)

FOR each of 1000 baseline runs:
    baseline_trades = []
    FOR each strategy_trade in strategy_trades:
        # Match regime exposure: only sample entry times from
        # bars in the SAME regime as the original entry
        candidate_bars = bars where regime == strategy_trade.regime
        random_entry  = random choice from candidate_bars
        # Match holding period exactly (or sample from
        # the strategy's hold-period distribution)
        baseline_trades.append((random_entry, strategy_trade.hold_bars))
    simulate baseline P&L using the SAME execution model
    record metric (Sharpe / PF / total return)

p_value = fraction of baselines whose metric ≥ strategy's metric
6

Time Stability

Split your data in half chronologically. Does the strategy work in the first half AND the second half? If it only works in one period, the edge may have been regime-specific or the market microstructure may have changed. Pass condition: Positive performance in both halves.

The Falsification Funnel

Funnel
Many strategy ideas
↓ Backtest
SurvivorsMost fail to show positive headline results
↓ Parameter Robustness
SurvivorsCurve-fitted candidates die here
↓ Out-of-Sample
SurvivorsOverfitted candidates die here
↓ Regime Stability
SurvivorsRegime-fragile candidates die here
↓ Cross-Venue Transfer
SurvivorsVenue-specific quirks die here
↓ Placebo Baseline
SurvivorsAnything not better than random dies here
↓ Time Stability
Candidate for paper tradingAnything that worked only in one era died on the way here

The falsification funnel. Each successive stage kills the vast majority of remaining candidates. By the time a hypothesis reaches paper trading, it has survived statistical, walk-forward, regime, cross-venue, placebo, and time-stability pressure. This is normal. This is the process working correctly. If every idea survived, your tests are not rigorous enough.

You Understand This When…

  • You can name all six falsification tests and what each one detects
  • You know the pass condition for each test
  • You accept that most strategies will fail and that is the expected outcome

6.3 When to Kill vs When to Tune

Not every failure means the strategy is worthless. Some failures point to fixable problems. Others point to fundamental issues. This section helps you distinguish between the two.

Kill It

  • Fails the placebo test (not better than random)
  • Performance collapses completely out-of-sample (zero or negative)
  • Only works at one exact parameter setting (narrow spike, no plateau)
  • The mechanical reason for why it should work doesn’t hold up logically
  • Sample size is too small for meaningful conclusions (<20 trades)

Tune It

  • Works well in two regimes but fails in a third → add a regime gate (Module 10)
  • OOS performance is positive but weaker than in-sample → likely some overfitting, but the core signal may be real. Simplify the strategy (fewer parameters) and re-test.
  • Fails on one exchange but works on two others → investigate the exchange-specific issue (different trading hours, different fee structure)
  • Time stability shows degradation in the recent period → the market may have changed. Investigate what changed and whether the signal can adapt.

Key Insight

The default is kill. Tuning should be the exception, not the rule. The temptation to “fix” a failing strategy by adding parameters, filters, and exceptions is how overfitting happens. Every filter you add to rescue a strategy is an opportunity to fit to noise. Be honest with yourself: if the core signal is weak, no amount of filtering will make it strong.

You Understand This When…

  • You can distinguish between fixable failures (add a regime gate) and fundamental failures (not better than random)
  • Your default response to failure is “kill it” with tuning as a justified exception

6.4 The Adversarial Review Process

Before any strategy goes to paper trading, it gets attacked by an independent reviewer — someone (or something) whose job is to find flaws. In our system, this means giving the strategy and its results to a different LLM with explicit instructions to destroy it.

The Adversarial Review Prompt

Give a fresh LLM (one that did not help build the strategy) the following:

  • The strategy rules (entry, exit, gates, parameters)
  • The backtest results (all metrics, trade list)
  • The parameter sensitivity sweep results
  • The out-of-sample results

Then ask: “Your job is to find every reason this strategy might fail in live trading. Attack the methodology, the statistics, the assumptions, and the implementation. Assume the builder has confirmation bias. What are they not seeing?”

What Good Adversarial Review Looks Like

A good adversarial review will surface things like:

  • “The headline CAGR is driven primarily by a small handful of trades clustered in one favourable year. Remove those and the CAGR drops by more than half.”
  • “You used Monday-start weeks but didn’t verify this against the exchange’s actual weekly candle definition.”
  • “The backtest assumes entry at the open, but the weekly open on Monday at 00:00 UTC is often a low-liquidity period with wider spreads.”
  • “Your stop-loss is calculated from the entry price, but in a gap-down scenario, the actual fill could be significantly worse.”

Each of these is either a fixable issue (update the spec, add gap-down handling) or a genuine threat (if 3 trades drive all the returns, the sample is too concentrated). The review process surfaces these before real money is at risk.

You Understand This When…

  • You have subjected your strategy to an independent adversarial review
  • Every issue raised has been addressed (fixed, explained, or accepted as a known risk)
  • The strategy has survived the complete pipeline: backtest → falsification → adversarial review

Module 7

Position Sizing & Risk Management
4 sections · ~2 hours

7.1 Risk-Per-Trade Calculation

The question is not “how much should I buy?” The question is “how much am I willing to lose on this trade?” Position size is derived from risk tolerance, not from conviction or account size.

The Formula

Every position size calculation follows this structure:

Formula
RISK PER TRADE = Account Balance × Risk Percentage
Example: $10,000 × 1% = $100 maximum loss on this trade
POSITION SIZE = Risk Per Trade ÷ Stop-Loss Distance
Example: $100 ÷ 2% = $5,000 notional position (if BTC drops 2%, you lose $100)
MARGIN REQUIRED = Position Size ÷ Leverage
Example: $5,000 ÷ 50x = $100 margin locked up ($9,900 remains available)
Risk tolerance → Risk percentage (e.g. 1%)
Risk percentage × balance → $ at risk per trade
$ at risk ÷ stop distance → Position size (notional)
Position size ÷ leverage → Margin locked

The position sizing chain: risk tolerance determines risk per trade, stop-loss distance determines position size, leverage determines margin required. You control the risk. The leverage is just plumbing.

Risk Percentage Guidelines

Under fractional-fraction sizing (always risk a fixed % of current equity), the account is never literally “depleted” by a fixed number of consecutive losses — each loss is smaller than the last in absolute terms. The correct compounding formula is:

equity_remaining = (1 - r)^N

where r is risk per trade and N is the number of consecutive losses.

Risk Per Trade (r)Equity remaining after 50 lossesEquity remaining after 100 lossesAppropriate For
0.5%~77.8%~60.6%Conservative, high-frequency strategies
1.0%~60.5%~36.6%Standard for most systematic strategies
2.0%~36.4%~13.3%Aggressive, high-conviction strategies
5.0%~7.7%~0.6%Dangerous — deep drawdowns are likely
10%+~0.5%~0.003%Effectively gambling

Start at 1%. 100 consecutive losses at 1% leaves you with roughly 36.6% of starting equity — a brutal 63% drawdown, but not zero. The real metric to focus on is probability of ruin (or probability of hitting a chosen drawdown threshold), which depends jointly on win rate, payoff ratio (avg win / avg loss), risk per trade, and the drawdown level you treat as ruin. Naive “consecutive-loss-to-zero” math both overstates safety (you don’t actually go to zero) and understates damage (you can hit a 50% drawdown long before any “ruin” threshold). Model probability of ruin explicitly using your validated strategy’s edge stats.

Key Insight

Position sizing is the only lever you have that affects risk without changing the strategy. The same strategy at 1% risk per trade and 5% risk per trade has identical signals, identical win rate, and identical profit factor. The only difference is that the 5% version can blow up 5x faster during a drawdown. Size conservatively. You can always add leverage later. You cannot un-lose money.

You Understand This When…

  • You can calculate position size from risk per trade and stop-loss distance
  • You have chosen a risk percentage (start with 1%)
  • You understand that leverage determines margin, not risk

7.2 Stop-Loss Philosophy

A stop-loss is your contract with reality: “if I am wrong by this much, I accept I am wrong and exit.” Every position must have one. No exceptions.

Types of Stop-Losses

1

Fixed Percentage

Exit if the position moves X% against you. Simple, predictable, easy to calculate position size from. Example: 2% stop on a $5,000 position = $100 max loss. Best for: strategies where the entry logic is precise and you know exactly how much adverse movement is acceptable.

2

Trailing Stop

The stop moves in your favour as the trade progresses but never moves against you. Example: 20% trailing stop on a long trade — if BTC hits $100,000 from an entry at $80,000, the stop moves to $80,000 (20% below the peak). If BTC then drops to $80,000, you exit. Locks in profits during extended moves.

3

Indicator-Based

Exit when an indicator signals the trade thesis is invalidated. Example: exit a trend-following long when the SMA slope turns negative. This is the approach our weekly strategy uses. The stop is logical, not arbitrary.

4

Time-Based

Exit if the trade hasn’t reached its target within N candles. Prevents capital being tied up in dead trades. Example: if the trade hasn’t moved +2% in 14 days, exit at market.

Critical: Exchange-Side Stop-Losses

Your bot’s internal stop-loss is not enough. Bots crash. Servers go offline. Network connections drop. Every leveraged position must have an exchange-side stop-loss order placed at the time of entry. This means even if your bot is completely dead, the exchange will close the position at your predetermined price. This is non-negotiable for any leveraged system.

Exchange-Side Stop Order Mechanics

“Place a stop” sounds like one button. It isn’t. The flags you set on that order determine whether it does what you actually wanted in adverse conditions. The following are the parameters every operator should consciously choose, not accept by default.

Trigger price source: mark vs last

Most perpetual venues let you trigger a stop on either the mark price (an index-derived fair value, often a moving average of multiple spot venues) or the last traded price. The trade-off:

  • Mark price — smoother, harder to manipulate, less prone to wick-outs. A single bad-print trade on your venue won’t take you out.
  • Last price — faster to react, but vulnerable to a wick on a thin book or a one-tick spoof. Your stop fires on a price that may not represent fair value.

Default to mark price for protective stops. Last price is acceptable only when you specifically need wick-speed reaction and your liquidity is deep enough that wicks reflect real flow.

Reduce-only flag

A stop should only ever close exposure, never open new exposure. Set reduceOnly = true on every stop order. Without this flag, an edge case can flip you into a doubled position: the entry order is still partially filling when the stop fires, the stop sells the full intended size, and you end up short the unfilled portion. Reduce-only tells the venue “this order can only reduce or close my position; if there’s nothing to close, do nothing.” Belt-and-braces against the partial-fill race.

Time-in-force (TIF)

Every order has a TIF that governs how long it lives:

  • GTC (good-till-cancel): rests on the book until filled or cancelled. Default for stops — you want the stop to stay until it triggers, not expire silently.
  • IOC (immediate-or-cancel): any portion that can’t fill immediately is cancelled. Useful for market entries where you want the fill or nothing — rather than have a partial sit on the book at a price that’s now stale.
  • FOK (fill-or-kill): entire order fills immediately or the whole thing cancels. Rarely useful at retail size; mostly relevant for block executions where partial fills break the strategy.
  • GTD / Day: good-till-date or session. Avoid for stops — you do not want your protective stop to expire at midnight UTC.

OCO (one-cancels-the-other)

If your strategy has both a stop-loss and a take-profit on the same position, you want them linked: when one fills, the other cancels automatically. Otherwise the surviving order remains live with no position behind it — and on next move it opens a fresh position in the wrong direction.

  • If your venue supports OCO natively: use it. One submission, one atomic cancel-on-fill.
  • If not: emulate. Submit two separate orders, then subscribe to the venue’s order-update stream; when one fires, immediately cancel the other. Build the cancellation into the same handler so there is no window where both can fill.
  • Failure mode to test: what happens if the stream disconnects between fill and cancel? Your reconciliation loop (Module 8.3) should catch the orphaned opposing order on its next pass.

Position mode: hedge vs one-way

Most venues offer two position modes:

  • One-way mode: a symbol has a single net position. A buy in a short position reduces or flips the position. Simpler accounting; one stop per symbol.
  • Hedge mode: a symbol can hold a long and a short position simultaneously, each with its own margin and PnL. Lets independent strategies trade the same symbol without interfering. Each side gets its own stop.

Stops behave differently across modes — a reduce-only sell stop in hedge mode reduces your long position; the same order in one-way mode could open a short if your long has already closed. Choose a mode explicitly per venue and document it in your config. Mismatch between local-state assumptions and venue-side mode is a classic source of phantom positions.

Re-sizing stops after partial fills

Your entry order is for 1.0 BTC; the venue fills 0.6 BTC and you decide to cancel the remainder. Your initial stop was sized for 1.0 BTC. Now it’s wrong — if it fires, it sells 1.0 of position you don’t have (or, worse, with reduce-only off, it flips you short by 0.4). The pattern:

CODE · PSEUDOCODEscroll to read
on partial_fill(order_id, filled_qty):
    current_position = filled_qty   # what you actually hold
    if existing_stop_order:
        cancel(existing_stop_order)            # remove the wrongly-sized stop
        wait_for_cancel_ack()                  # confirm before placing new
    new_stop = place_stop(
        symbol      = order.symbol,
        side        = opposite(order.side),
        qty         = current_position,        # match the actual fill
        trigger     = stop_price,
        trigger_src = "mark",
        reduceOnly  = True,
        tif         = "GTC",
    )
    persist(new_stop.id)

Note the cancel-then-replace pattern is not free of race conditions (see Module 8.4 on amend-vs-replace) — if your venue supports atomic amendment of stop quantity, prefer that. The window between cancel and new-place is your exposure window: keep it short, alert if the cancel-ack is slow, and the reconciliation loop is your safety net.

You Understand This When…

  • Every position in your system has a defined stop-loss
  • You have chosen a stop-loss type appropriate for your strategy
  • Exchange-side stop-loss orders are placed at entry time, not managed by the bot alone
  • Your stops use mark-price triggers, reduce-only, and GTC by default
  • You have a documented OCO behaviour (native or emulated) and a position-mode choice
  • Your code re-sizes the stop after any partial fill

7.3 Circuit Breakers & Drawdown Limits

Stop-losses protect individual trades. Circuit breakers protect the entire account. They are the emergency brake that stops everything when conditions become extreme.

Circuit Breaker Rules

TriggerActionResume Condition
Account drawdown exceeds your “soft” threshold (calibrated to your Monte Carlo distribution)Close all positions, halt new entriesManual review + a defined cooling-off pause
Consecutive-loss streak exceeds your threshold (calibrated to your hit rate and signal frequency)Pause new entries for a defined windowAutomatic resume after the window expires
Exchange API errors exceed thresholdHalt all trading, alert operatorManual verification that API is working
Position reconciliation failsHalt new entries, alert operatorManual reconciliation of actual vs expected positions

The Hard Stop

At the account level, set an explicit drawdown threshold calibrated to your strategy’s expected drawdown profile — an absolute circuit breaker. If the account drops past that threshold from its peak, everything stops. All positions are closed. The system enters a mandatory cooling-off pause.

The reason to fix this number in advance is asymmetry: recovering from a 40% drawdown requires a 67% gain (achievable); recovering from a 70% drawdown requires a 233% gain (functionally starting over). The circuit breaker exists to prevent the drawdown from ever reaching the point of no return — and to take the decision out of your hands when you’re emotional.

Practical Advice

Calibrate your circuit breakers using your Monte Carlo results (Module 5.3). If the 95th-percentile drawdown from Monte Carlo is X, set your circuit breaker a few points beyond X. This gives the strategy room to operate within its expected range while protecting against genuine failure. The same principle applies to consecutive-loss thresholds and reconciliation cadence: pick numbers calibrated to your strategy’s actual loss-streak distribution and the latency you can tolerate between an exchange-side change and your system noticing it.

You Understand This When…

  • Your system has account-level circuit breakers defined
  • You know the difference between a stop-loss (trade level) and a circuit breaker (account level)
  • Circuit breaker thresholds are calibrated to your Monte Carlo results

7.4 Portfolio-Level Risk

Running multiple strategies introduces a new dimension of risk: correlation. Two strategies that are independently profitable can blow up together if they are correlated — meaning they both lose at the same time.

Correlation Risk

If you run a trend-following long strategy and a momentum long strategy on BTC, both will lose during a sudden market crash. Your portfolio drawdown is not the average of the two strategies — it’s additive. Two -15% drawdowns happening simultaneously become a -30% portfolio drawdown.

Mitigation strategies:

  • Trade different assets: BTC and ETH are correlated (~0.85). BTC and gold are weakly correlated (~0.15). A BTC trend strategy + a gold mean-reversion strategy provides genuine diversification.
  • Trade different directions: A long-only strategy paired with a short-only strategy (on different assets or regime-gated) provides natural hedging.
  • Trade different timeframes: A weekly strategy and an intraday strategy have low trade overlap even on the same asset.
  • Capital allocation: Don’t put 100% of capital into correlated strategies. Allocate based on correlation: high correlation = lower combined allocation.

Key Insight

The most underrated risk in crypto is that everything is correlated during a crash. BTC, ETH, SOL, altcoins — they all drop together during a market-wide deleveraging event. Cross-asset diversification within crypto alone is limited. True diversification requires non-crypto assets (FX, commodities, indices) or strategies that profit from crashes (shorts, volatility strategies).

You Understand This When…

  • You understand that portfolio risk is not the average of individual strategy risks
  • You can identify correlated strategies and know how to mitigate the overlap
  • You have a capital allocation plan that accounts for correlation

Module 8

Building the System
7 sections · ~7 hours

8.1 Architecture Decisions

Before writing code, you need to decide how the system is structured. This decision affects everything: how easy it is to add strategies, how failures propagate, how you monitor and debug.

The Architecture Spectrum

ArchitectureDescriptionWhen to Use
Single ScriptOne Python file does everything: fetch data, calculate signals, place ordersFirst prototype, one strategy, one exchange
Modular MonolithOne application with separate modules for data, strategy, execution, and monitoring1–3 strategies, one exchange, serious but not complex
Per-Exchange ContainersEach exchange gets its own Docker container with the full strategy stack. Shared data layer.Multiple exchanges, multiple strategies, production deployment

Our production system uses per-exchange containers. Each exchange runs in its own Docker container with its own strategy engine, order executor, and state management. They share a data layer (candle database) and a regime detection service. If one container crashes, the others keep running.

Architecture
Shared Services
  • Candle Database
  • Regime Detector
  • Health Watchdog (monitors all containers)
↓ each container reads shared services, owns its own state ↓
BTC @ Venue A — Container
  • Strategy Engine
  • Order Executor
  • State Management
  • Dashboard
BTC @ Venue B — Container
  • Strategy Engine
  • Order Executor
  • State Management
  • Dashboard
ETH @ Venue B — Container
  • Strategy Engine
  • Order Executor
  • State Management
  • Dashboard
Each container is independent: a crash in one venue or strategy never blocks the others. The shared-services layer is small and recoverable.

Per-exchange container architecture. Each container is independent and can crash without affecting others. Shared services provide data and regime detection.

Start Simple

Do not start with per-exchange containers. Start with a single script. Get it working. Then refactor into modules. Then containerise. Premature architecture is as dangerous as premature optimisation.

You Understand This When…

  • You’ve chosen an architecture appropriate for your current stage
  • You understand the trade-offs between simplicity and robustness
  • You have a plan for how to evolve the architecture as complexity grows

8.2 Essential Components

Every trading system, regardless of architecture, needs these six components. Miss any one and the system has a critical gap.

1

Data Fetcher

Pulls candle data from the exchange API, validates it (Module 3.4), and stores it. Runs on a cron schedule (e.g., daily at 00:30 UTC). Must handle: API rate limits, pagination, incomplete candle correction (overlap window), and network failures.

2

Strategy Engine

Loads candle data, calculates indicators, evaluates entry/exit conditions and gates, and produces a signal: BUY, SELL, or HOLD. Must be deterministic: same input always produces same output. All parameters come from a config file, not hardcoded values.

3

Order Executor

Translates signals into exchange API calls. Handles: order placement, order status checking, partial fills, order cancellation, retry on transient errors, and permanent error classification. Must know the difference between “try again in 5 seconds” and “stop, this will never work” (e.g., insufficient balance, invalid symbol).

4

Position Reconciliation

Periodically checks: what does the bot think its position is vs what the exchange actually shows? If they differ, something went wrong. This catches: phantom positions (bot thinks it’s in a trade but isn’t), untracked external closes, and failed order acknowledgements.

5

State Management

Persists the bot’s state to disk (an embedded SQL database, JSON, or another local store) so it can resume correctly after a restart. State includes: current position, entry price, stop-loss level, strategy-specific variables, and last processed candle timestamp. Without this, a restart means the bot doesn’t know if it’s in a trade.

6

Dashboard & Alerts

A way to see what the bot is doing and get notified of important events. Minimum: instant-messaging alerts (a chat-based alert bot) for trade entries, exits, and errors. Better: a web dashboard showing current position, recent trades, and system health. Our production systems use a Python web framework for internal dashboards and a chat-based instant-messaging channel for real-time alerts.

War Story

Our order executor initially treated whole ranges of exchange errors as “transient” (retryable). This meant permanent errors — “IP not whitelisted,” “bad authentication,” “insufficient balance,” “parameter error,” “position-size violation” — were retried hundreds of times over hours before giving up. The fix: a small allowlist of genuinely transient error codes (rate-limit, network-timeout, temporary-server-error) documented per exchange. Everything else is classified as permanent and fails immediately. Error classification is not glamorous work, but it’s the difference between a system that recovers gracefully and one that hammers a dead API for hours.

Order Executor Patterns

The executor sits between intent (“buy 1 BTC at market”) and reality (a possibly-partial fill on a possibly-flaky API). The patterns below are what separate a toy executor from one you can leave running unattended.

Idempotent submission

Every order has a deterministic clientOrderId derived from the underlying intent — not a fresh UUID per call. If you crash mid-submit and retry, the venue dedupes on the ID and gives you back the existing order rather than creating a duplicate. Pattern:

CODE · PYTHONscroll to read
def make_client_order_id(strategy_id, symbol, intent_ts, nonce):
    # Deterministic from intent. Same inputs --> same ID.
    raw = f"{strategy_id}|{symbol}|{intent_ts}|{nonce}"
    return hashlib.sha256(raw.encode()).hexdigest()[:32]

def submit_idempotent(intent):
    coid = make_client_order_id(
        intent.strategy_id, intent.symbol,
        intent.intent_ts,   intent.nonce,
    )
    try:
        return venue.place_order(client_order_id=coid, **intent.params)
    except VenueError as e:
        if e.code in {"DUPLICATE_CLIENT_ORDER_ID", "ORDER_ALREADY_EXISTS"}:
            return venue.get_order_by_client_id(coid)   # already accepted
        raise

Most tier-1 venues honour clientOrderId for deduplication for at least a few hours. Read your venue’s docs for the dedup window and design your retry policy to fit inside it.

Retryable error classification

Maintain a tight allowlist of retryable error codes per venue. Everything not on the list fails fast.

CODE · PYTHONscroll to read
RETRYABLE = {
    "RATE_LIMIT",          # HTTP 429 or venue-specific
    "NETWORK_TIMEOUT",     # transport-level
    "TEMP_SERVER_ERROR",   # 5xx
    "VENUE_OVERLOAD",      # documented transient
}

PERMANENT = {
    "INVALID_SIGNATURE", "INVALID_TIMESTAMP",   # config error
    "INSUFFICIENT_BALANCE", "POSITION_LIMIT",   # state error
    "INVALID_SYMBOL", "INVALID_PARAMETER",      # logic error
    "IP_NOT_WHITELISTED", "PERMISSION_DENIED", # auth error
}
# Anything not in either set: log, alert, treat as permanent until classified.

Never retry on permanent errors. Hammering a dead API doesn’t fix it; it just buries the real problem under noise and burns your rate-limit budget.

Cancel-replace vs amend

To move a stop or change a price, you have two options:

  • Cancel + replace: cancel the existing order, place a new one. Two round trips. Race condition: if the cancel succeeds but the replace fails (rate limit, transient error, validation), you have no protective order on the position until you notice and recover.
  • Amend: a single atomic call that modifies the existing order in place. One round trip, no exposure window.

Prefer amend wherever the venue supports it. Especially for stop-loss adjustments, where the exposure window between cancel and replace is exactly the window during which you might need the stop.

Partial fill handling

Track filled_qty separately from order_qty in local state. Decide a policy per intent:

  • Leave remainder open (GTC): default for resting limit entries; you wanted the price, accept slow fills.
  • Cancel remainder: for time-sensitive entries; better to take the partial than carry stale exposure on a moving market.
  • Replace at new price: for execution-bot patterns; cancel residual, re-quote at current best.

The choice is strategy-dependent; the requirement is that you make it explicitly, encode it in config, and re-size every dependent order (stop, take-profit) to match the actual filled quantity (Module 7.2).

Order status: websocket vs polling

Two ways to learn what happened to your order: poll the REST endpoint or subscribe to the venue’s private order-update websocket. Differences:

  • Polling: simple, but high-latency (you only learn about a fill on the next poll) and rate-limit-expensive at any reasonable cadence.
  • Websocket: push-based, sub-second latency, doesn’t consume your REST rate budget, lets you react to fills the moment they happen (e.g., for OCO emulation).

Use websocket for order updates wherever the venue supports it. Keep polling as a fallback for reconnection scenarios and for the reconciliation pass — the websocket is for “tell me what changed,” the REST poll is for “tell me ground truth.”

You Understand This When…

  • You can name all six essential components
  • You have at least a basic version of each one in your system
  • Your order executor distinguishes between transient and permanent errors
  • Submissions carry a deterministic clientOrderId for dedup on retry
  • You prefer amend over cancel-replace where supported
  • You react to fills via websocket where available, with REST polling as fallback

8.3 State Management & Reconciliation

Your bot will crash. Your server will restart. The exchange will go down for maintenance. The question is not whether this happens, but whether your system recovers correctly when it does.

What State Must Be Persisted

  • Position state: in_trade (yes/no), direction (long/short), entry_price, current_stop_loss
  • Strategy state: any variables the strategy needs across candles (e.g., “last exit was a slope exit” flag for trend resumption logic)
  • Pending orders: order IDs, expected fills, timeout timestamps
  • Last processed candle: so the system knows where to resume

Store this in an embedded SQL database or a JSON file. Update it after every state change. Read it on startup.

Concrete State Schema

Three tables are the irreducible core: orders, positions, fills. Use any embedded SQL store you like — the shape is what matters. Schemas below are vendor-agnostic.

CODE · SQLscroll to read
-- orders: every order ever submitted, current and historical
CREATE TABLE orders (
    id                 INTEGER PRIMARY KEY AUTOINCREMENT,   -- monotonic local ID
    client_order_id    TEXT    NOT NULL UNIQUE,             -- deterministic; idempotency key
    strategy_id        TEXT    NOT NULL,
    symbol             TEXT    NOT NULL,
    side               TEXT    NOT NULL CHECK (side IN ('buy','sell')),
    order_type         TEXT    NOT NULL CHECK (order_type IN ('market','limit','stop','stop_limit')),
    qty                REAL    NOT NULL,
    price              REAL,                                -- NULL for market
    status             TEXT    NOT NULL CHECK (status IN
                          ('PENDING','SUBMITTED','ACK','PARTIAL_FILL',
                           'FILLED','CANCELLED','REJECTED','UNKNOWN')),
    submitted_at       INTEGER NOT NULL,                    -- epoch ms
    last_updated_at    INTEGER NOT NULL,
    exchange_order_id  TEXT,                                -- assigned by venue; NULL until ACK
    error_code         TEXT,
    error_message      TEXT
);
CREATE INDEX idx_orders_strategy_status ON orders (strategy_id, status);
CREATE INDEX idx_orders_symbol_status   ON orders (symbol, status);

-- positions: net exposure per (strategy, symbol)
CREATE TABLE positions (
    id                 INTEGER PRIMARY KEY AUTOINCREMENT,
    strategy_id        TEXT    NOT NULL,
    symbol             TEXT    NOT NULL,
    side               TEXT    NOT NULL CHECK (side IN ('long','short','flat')),
    qty                REAL    NOT NULL,
    avg_entry_price    REAL    NOT NULL,
    unrealised_pnl     REAL    NOT NULL DEFAULT 0,
    realised_pnl       REAL    NOT NULL DEFAULT 0,
    opened_at          INTEGER NOT NULL,
    closed_at          INTEGER                              -- NULL while open
);
CREATE INDEX idx_positions_strategy_symbol ON positions (strategy_id, symbol);

-- fills: every execution event the venue reports
CREATE TABLE fills (
    id                 INTEGER PRIMARY KEY AUTOINCREMENT,
    order_id           INTEGER NOT NULL REFERENCES orders(id),
    exchange_fill_id   TEXT    NOT NULL UNIQUE,             -- dedup on replays
    qty                REAL    NOT NULL,
    price              REAL    NOT NULL,
    fee                REAL    NOT NULL,
    fee_currency       TEXT    NOT NULL,
    ts                 INTEGER NOT NULL                     -- epoch ms
);
CREATE INDEX idx_fills_order ON fills (order_id);

Two design notes:

  • client_order_id is UNIQUE — the database itself enforces idempotent submission. A retry that re-inserts the same intent gets a constraint violation, not a duplicate order.
  • exchange_fill_id is UNIQUE in fills — a websocket replay or a polling-overlap won’t double-count a fill into your PnL.

Order Lifecycle State Machine

An order moves through a finite set of states. Every transition has a trigger, every terminal state is reached intentionally or by timeout.

State Machine
  1. PENDINGIntent created, not yet sent
  2. ↓ submit() called
  3. SUBMITTEDSent on the wire, awaiting ACK
  4. ↓ on ACK received  |  on REJECTED returned  |  on no ACK in N sec
  5. ACKVenue accepted the order; live in their book. Capture exchange_order_id.
    REJECTEDVenue refused (signature, balance, parameter). Terminal.
    UNKNOWN → reconcileNo ACK within timeout. Query the venue by client_order_id and transition the row to its real state.
  6. ↓ from ACK: fills arrive  |  cancel() at ACK or PARTIAL_FILL
  7. PARTIAL_FILL ⇄ FILLEDDriven by fill events (websocket or polling). Oscillates between PARTIAL_FILL and FILLED until complete.
  8. ↓ terminal
  9. FILLEDOrder is complete. Write-once. No further transitions.
    CANCELLEDOperator (or system) cancelled the remainder. Write-once.
    REJECTEDReached from SUBMITTED. Write-once.
Timeout rule: if no ACK within N seconds (e.g. 5s) → mark as UNKNOWN and trigger reconciliation by client_order_id. Terminal states (FILLED, CANCELLED, REJECTED) are write-once.

Order lifecycle state machine. Every transition has a trigger; every terminal state is reached intentionally or by timeout. The UNKNOWN state is the recovery hatch — no order ever stays lost.

Trigger and timeout rules:

  • PENDING → SUBMITTED: on submit() call. Persist row before the network call so a crash after-send leaves a recoverable record.
  • SUBMITTED → ACK: on the venue’s order-accepted response. Capture exchange_order_id.
  • SUBMITTED → REJECTED: permanent error from venue (signature, balance, parameter). Terminal.
  • SUBMITTED → UNKNOWN: no ACK within timeout (e.g. 5 seconds). Trigger reconciliation by client_order_id: query the venue, find the order’s real state, transition the row.
  • ACK → PARTIAL_FILL / FILLED / CANCELLED: driven by fill events (websocket or polling).
  • PARTIAL_FILL → FILLED / CANCELLED: further fills or operator cancel of remainder.
  • Terminal states (FILLED, CANCELLED, REJECTED) are write-once. No further transitions.

Idempotent clientOrderId Pattern

Generate clientOrderId = hash(strategy_id + symbol + intent_timestamp + nonce) — deterministic from intent. On the wire, your submit code is just if not exists(clientOrderId): submit_order(...). The venue dedupes on its side; your DB’s UNIQUE constraint dedupes on yours. A network retry, a process restart mid-submit, a re-sent message from a flaky pipeline — none of them can produce a doubled position. This single pattern eliminates an entire class of phantom-position bugs.

Reconciliation Algorithm

Treat the exchange as the source of truth for state (positions, order statuses, fills). Treat your local DB as the source of truth for intent (strategy_id, signal_id, the why behind each order). Reconciliation is the periodic alignment of these two.

CODE · PYTHONscroll to read
def reconcile(strategy_id, now):
    # 1. Pull ground truth from venue
    venue_orders    = venue.get_open_orders(strategy_filter=strategy_id)
    venue_positions = venue.get_positions(strategy_filter=strategy_id)

    # 2. Pull local view
    local_orders    = db.select_open_orders(strategy_id)
    local_positions = db.select_positions(strategy_id)

    # 3. Diff and classify each discrepancy
    diffs = []
    for vo in venue_orders:
        lo = find_by_client_id(local_orders, vo.client_order_id)
        if lo is None:
            diffs.append(("MISSING_LOCALLY", vo))         # venue has it, we don't
        elif lo.status != vo.status or lo.qty != vo.qty:
            diffs.append(("STATE_MISMATCH", lo, vo))      # statuses differ

    for lo in local_orders:
        if not any(vo.client_order_id == lo.client_order_id for vo in venue_orders):
            diffs.append(("MISSING_ON_EXCHANGE", lo))     # we have it, venue doesn't

    # Same diff for positions, comparing (symbol, side, qty).

    # 4. Apply resolution rules atomically
    with db.transaction():                                # all-or-nothing
        for d in diffs:
            kind = d[0]
            if   kind == "MISSING_LOCALLY":
                # Venue wins on existence + state; we annotate with intent if recoverable
                db.insert_order_from_venue(d[1], strategy_id=strategy_id)
                log.warning("reconcile.insert", coid=d[1].client_order_id)
            elif kind == "STATE_MISMATCH":
                # Venue wins on qty/status; local keeps strategy_id, signal_id
                db.update_order_state(d[1].id, status=d[2].status, qty=d[2].qty)
                log.warning("reconcile.update", coid=d[1].client_order_id)
            elif kind == "MISSING_ON_EXCHANGE":
                # Order is gone (filled, cancelled, expired). Mark terminal, fetch final state.
                final = venue.get_order_history(d[1].client_order_id)
                db.update_order_state(d[1].id, status=final.status)
                log.warning("reconcile.terminal", coid=d[1].client_order_id)

    return diffs

Resolution rules in one line: exchange wins on state (qty, status, fills); local wins on intent metadata (strategy_id, signal_id, the reason this order exists).

Cadence

  • Slow strategies (weekly, daily): every N minutes — cheap insurance.
  • Fast strategies (hourly or below): every M seconds — tighter loop, but never on every cycle (rate-limit cost; reconciliation can saturate your budget).
  • On startup, always: a full reconciliation before resuming any signal evaluation. Never let the strategy take a decision against a stale local view.
  • On UNKNOWN-state timeout: targeted reconciliation by client_order_id, not the full sweep.

Atomicity

All updates from a single reconciliation pass run inside one DB transaction. Either every diff is applied or none is — you never want a torn state where half the positions match and half don’t after a crash mid-loop.

Two Generals’ Problem

You cannot guarantee that exchange and local agree at any single instant. Between “I sent the cancel” and “I learned the cancel was processed,” reality and your view of reality are different. This is fundamental, not a bug to fix — the same impossibility result that prevents two generals from coordinating an attack over an unreliable channel applies here. Design for eventual consistency with a bounded delay: after at most one reconciliation cycle, local and remote should agree. Document the bound. Alert when it’s breached. Don’t pretend you’ve eliminated the gap — you haven’t; you’ve only narrowed it.

War Story

A pending entry order expired as “failed_permanent” but the state management code didn’t roll back properly. The strategy retained the entry_price and peak_price from the signal time, putting it in “holding mode” with no actual position. It was managing a phantom position for 6 days — trailing a stop on nothing. The fix: when any entry order fails, explicitly clear entry_price and peak_price back to null. The reconciliation loop would have caught this within an hour, but the state rollback prevented it from happening in the first place.

You Understand This When…

  • Your bot persists its state to disk and can resume after a restart
  • A reconciliation loop compares bot state to exchange state regularly
  • Failed orders roll back state correctly
  • Your DB has orders, positions, fills with appropriate uniqueness constraints
  • Your order rows traverse a defined state machine with timeout handling
  • Reconciliation runs in atomic transactions and on startup before any signal eval

8.4 Configuration-Driven Strategies

Strategy parameters should live in config files, not in code. This lets you change thresholds, add strategies, and adjust risk without modifying source code or redeploying.

YAML Configuration Example

CODE · YAMLscroll to read
# config/strategies/sma4_weekly.yaml
strategy:
  name: "SMA4 Weekly Slope"
  enabled: true
  direction: long_only
  timeframe: weekly

entry:
  sma_period: 4
  slope_threshold: 0          # slope > 0 to enter
  close_position_min: 0.75    # CP gate
  efficiency_ratio_min: 0.20  # ER gate

exit:
  slope_exit: true            # exit when slope turns negative
  crash_exit: -0.15           # exit on 15% weekly drop

risk:
  position_pct: 1.0           # 100% of capital (spot, no leverage)
  stop_loss_pct: 0.20         # 20% trailing stop
  stop_type: exchange_side    # placed as exchange order

trend_resumption:
  enabled: true
  momentum_lookback: 2        # re-enter if close > close[2 weeks ago]

YAML configuration for a strategy. Every parameter is explicit. Changing a threshold is a config edit, not a code change.

Why This Matters

  • Auditability: You can see exactly what parameters the system is running by reading one file
  • Safety: Changing a threshold doesn’t risk introducing code bugs
  • Multi-strategy: Each strategy gets its own config file. Adding a new strategy means adding a new YAML file.
  • Version control: Config changes are tracked in git alongside code changes

You Understand This When…

  • All strategy parameters live in config files, not hardcoded in source code
  • You can change a threshold without modifying Python code
  • Config files are version-controlled in git

8.5 Using AI to Build

You do not need to be a professional software engineer to build a trading system. Modern AI coding assistants can write, debug, and refactor code at a level that would have required years of experience five years ago. Here is how to use them effectively.

The Right Way to Use AI for Code

  1. Describe what you want precisely. “Write a data fetcher” is too vague. “Write a Python script that fetches BTCUSDT daily candles from a tier-1 perpetual venue’s public API starting from 2020-01-01, handles pagination with 200 candles per page, saves to a lightweight embedded SQL database file called market_data.db with columns (timestamp, open, high, low, close, volume), and runs data quality checks after each batch” is precise.
  2. Review every line. AI-generated code can have subtle bugs. Read the code. Understand what it does. Ask the AI to explain any part you don’t understand.
  3. Test before trusting. Run the code on a small sample. Verify the output makes sense. Check edge cases.
  4. Iterate. “This works but the timestamps are in milliseconds and I need seconds.” “Add error handling for HTTP 429 rate limit responses.” Small, precise iterations produce better code than trying to get everything right in one prompt.

Recommended Tools

ToolBest ForAccess
Claude Code (CLI)Full system builds — reads your codebase, writes files, runs testsTerminal / IDE extension
ChatGPTExploration, hypothesis generation, explaining conceptsBrowser / app
Replit AgentQuick prototypes if you don’t have a server yetBrowser

For building the actual production system, a CLI-based AI tool that can read your files, run your tests, and edit your code directly is dramatically more productive than copy-pasting between a chat interface and a text editor.

You Understand This When…

  • You can write precise prompts that produce working code
  • You review and understand AI-generated code before running it
  • You have a development workflow: prompt → review → test → iterate

8.6 Order Rounding & Contract Math

Your sizing function says “buy 0.13427 BTC at $94,517.83”. The venue rejects it. Then it rejects the next one, and the next, while you watch your bot fire and miss for an hour straight. Welcome to contract math — the unglamorous layer between “intended order” and “order the venue will actually accept.”

The Four Constraints Every Order Must Satisfy

Every symbol on every venue advertises four numerical constraints. An order that violates any of them is rejected; you don’t get a partial credit for getting three out of four right.

  • Minimum quantity (minQty). The smallest size the venue will accept. Below this, your order is rejected with a “below minimum” error. Different per symbol; sometimes different across the same symbol on linear vs inverse contracts.
  • Step size / lot size (stepSize). Quantity must be a multiple of this increment. If stepSize = 0.001, then 0.13427 is invalid; 0.134 is valid. Round down (never up — rounding up can push you over your risk budget).
  • Tick size (tickSize). Limit-order price must be a multiple of this increment. If tickSize = 0.5, then $94,517.83 is invalid; $94,517.50 is valid. For a buy, round down (better price for you, more fillable); for a sell, round up. The exact convention varies; pick one and document it.
  • Minimum notional (minNotional). The order’s value — quantity × price — must clear a venue-wide floor (often $5, $10, or similar). This is independent of minQty; an order can pass minQty and still fail minNotional if the price is low enough. Especially relevant for low-priced altcoins and for sizing-down trades during drawdowns.

Fetch these constraints from the venue’s symbol-info / instrument endpoint at startup, cache them, and refresh on a schedule. They do change — venues adjust tick size after sustained price moves and shift step size for new contract series.

Linear vs Inverse Perpetuals: The P&L Math Differs

Two perpetual contract families dominate. They look superficially similar in a venue UI but the P&L math is fundamentally different, and getting them confused will cause your sizing to be off by a factor that depends on price.

  • USDT-margined (linear). Quantity is denominated in the underlying coin. Margin and P&L settle in stablecoin. P&L is linear in price: P&L = qty × (exit_price - entry_price) for a long. Long BTC at $90,000, exit at $100,000, qty 0.1 → P&L = 0.1 × $10,000 = $1,000. Simple.
  • COIN-margined (inverse). Quantity is denominated in USD notional. Margin and P&L settle in the underlying coin. P&L is inverse in price: P&L (in coin) = qty_usd × (1/entry_price - 1/exit_price) for a long. Same trade, expressed as “long $9,000 of BTC at $90,000, exit at $100,000” → P&L = 9000 × (1/90000 - 1/100000) ≈ 0.01 BTC.

Two consequences operators repeatedly miss:

  1. Position sizing math is different. A linear-contract sizing function passing into an inverse-contract executor will produce wrong sizes. They are not interchangeable.
  2. Inverse contracts have non-linear sensitivity to price. Your COIN-margined long’s effective leverage increases as price falls (because your collateral is also falling). Liquidation behaviour and margin-call behaviour are correspondingly different. Stop placement that’s reasonable on a USDT-M contract may be too tight on COIN-M during a violent move.

The Rounding Helper

Rather than scatter rounding logic across every place that places an order, isolate it in one helper that takes intended values and returns venue-compliant ones (or raises). The contract is small and explicit:

CODE · PYTHONscroll to read
class RoundingHelper:
    def __init__(self, symbol_info):
        self.min_qty       = symbol_info.min_qty
        self.step_size     = symbol_info.step_size
        self.tick_size     = symbol_info.tick_size
        self.min_notional  = symbol_info.min_notional
        self.contract_type = symbol_info.contract_type   # "linear" | "inverse"

    def prepare(self, intended_qty, intended_price, side):
        # 1. Round qty DOWN to step
        qty = floor(intended_qty / self.step_size) * self.step_size

        # 2. Round price to tick (down for buy, up for sell)
        if side == "buy":
            price = floor(intended_price / self.tick_size) * self.tick_size
        else:
            price = ceil(intended_price / self.tick_size) * self.tick_size

        # 3. Validate against floors
        if qty < self.min_qty:
            raise OrderTooSmall(f"qty {qty} below min {self.min_qty}")
        if qty * price < self.min_notional:
            raise OrderTooSmall(f"notional {qty*price} below min {self.min_notional}")

        return qty, price

Two non-obvious choices baked in: rounding qty down (so we never accidentally exceed our risk budget by rounding up to the next step), and raising on impossible orders rather than silently shrinking. A silent shrink-to-zero is worse than a loud rejection — the loud rejection bubbles up and your strategy can decide whether to skip the trade or alert.

War Story: The First Fifty Orders Were All Rejections

A bot went live on a new symbol. Sizing function emitted clean fractional quantities. Venue’s step size was 0.001; the bot was emitting 0.0014213 with five digits of precision. Every order: rejected. The bot didn’t crash — it logged the rejection, moved on, and waited for the next signal. Forty-seven signals over the next eight hours, all rejected, none caught because the alert threshold for “high reject rate” was set at “5 in 5 minutes” and the signal frequency was lower than that. The fix was twelve lines of code. The miss was an entire day’s opportunity. Lesson: validate the rounding helper end-to-end on a one-tick test order on every new symbol you add, not just symbols you’ve traded before.

You Understand This When…

  • Every order in your system passes through a single RoundingHelper that enforces minQty, stepSize, tickSize, and minNotional
  • You can articulate the difference between linear and inverse contracts and which your strategy uses
  • Your symbol-info cache is refreshed on a schedule, not assumed static
  • Your alerts catch a sustained reject rate within minutes, not at end-of-day
  • You smoke-test the rounding path on every new symbol with a one-tick test order before turning a strategy on against it

8.7 Research vs Production Environment Separation

The environment you use to discover a strategy and the environment you use to run a strategy have different requirements that conflict at every turn. Trying to satisfy both inside one environment gives you neither: a research environment too lean to explore in, or a production environment too fat to trust.

Why They Have to Be Separate

Research is exploratory. You want notebooks, large historical datasets sitting on disk, a half-dozen plotting libraries, the ability to reach for a GPU when you decide to fit something heavier, and tolerance for mutable state — you re-run cells, you keep variables around, you experiment. Dependencies sprawl naturally because you don’t know in advance what you’ll need.

Production is the opposite. You want a small, deterministic, immutable container that does exactly one thing. Every dependency in production is a security and reliability surface; every megabyte of image is something that has to download and start cleanly when a host fails over. The environment must be reproducible byte-for-byte from version control. Mutable state is your enemy.

The conflict is total. A single environment that satisfies research also drags Jupyter, plotting libraries, two ML frameworks, and a CUDA stack into your live trading container — multiplying the surface area of what can break and what can be exploited, while making the container slow to start and impossible to audit.

The Shared Layer Is the Strategy Itself

The asymmetry doesn’t mean two parallel implementations — that’s the worst of both worlds. The split that works:

  • Strategy logic lives in a shared package. A pure Python module with no notebook-specific dependencies, no heavy ML libraries, no I/O wired in. Inputs are arrays/DataFrames; outputs are signals/orders. Same code, byte-for-byte identical, called from research notebooks and from the production runner.
  • Research environment imports the package. Notebooks call from lib.strategies.my_strategy import generate_signals exactly like the production runner does. The research environment provides the data, the exploration tools, the plotting — the strategy logic is shared.
  • Production environment imports the same package. A small headless service that calls the same generate_signals function with live data and routes the output to the order executor.

If the research notebook and the production runner are calling the same function with the same arguments, their behaviour is identical by construction. Bugs you find in one are fixed in the other for free. Backtest-vs-live divergence becomes a data problem, never a code problem.

The Anti-Pattern

The single most common environment-separation failure looks like this:

CODE · PYTHONscroll to read
# In production_runner.py, deep in the strategy module
from research.notebooks.helpers import compute_indicator

Now your production code path imports a notebook helper. Three things have just gone wrong: production now requires Jupyter to be installed; production behaviour depends on a file that lives in the “mutable, exploratory” part of your repo; and a researcher refactoring their notebook has just changed live trading behaviour without realising it. The file path is the bug.

The remedy is mechanical: the production environment cannot import anything outside lib/. Enforce it with import path discipline; if you have a build pipeline, fail the build when production code imports from research/.

Directory Structure

The simplest layout that enforces the split:

CODE · STRUCTUREscroll to read
repo/
├── lib/                           # SHARED (production must only import from here)
│   ├── strategies/
│   │   └── my_strategy.py        # pure functions: data in, signals out
│   ├── indicators/
│   ├── execution/
│   └── risk/
├── research/                      # NEVER imported by production
│   ├── notebooks/
│   ├── adhoc_scripts/
│   └── data/                     # large local datasets
├── production/                    # the live runner
│   ├── runner.py                 # imports from lib/ only
│   ├── Dockerfile                # small, lean, deterministic
│   └── requirements.txt          # minimal
└── tests/
    └── strategies/                # tests run on lib/ — same code as production

The Promotion Path

The path from idea to live is the same every time, with explicit gates:

  1. Notebook proves the idea. Quick-and-dirty exploration in research/. The output is a clear yes/no on whether to invest the engineering effort to formalise it.
  2. Logic moves into lib/strategies/. Refactored into a pure function with explicit inputs and outputs. The notebook now imports the function and uses it; nothing strategy-relevant lives in the notebook anymore.
  3. Backtest harness validates. The same generate_signals the notebook uses, the production runner will use, and the backtester uses — one code path, three call-sites.
  4. Falsification & adversarial review (Module 6). Same code, attacked.
  5. Paper-trade in production runner (Module 9.3). First time the strategy runs against real-time data through the production code path. Catches integration issues that backtest can’t.
  6. Live, with size ramp. Module 9.4.

At every stage the strategy logic is the same shared code. The only thing changing is the data (historical vs live) and the side-effects (none in research; orders submitted in production).

Key Insight

“Research code” and “production code” is the wrong frame. There is strategy code, which is the same in both, and there is scaffolding — notebooks, plotters, datasets in research; runner, executor, watchdog in production — which is different by necessity. Get the shared/scaffold split right and the “backtest worked but live doesn’t” class of bug largely disappears.

You Understand This When…

  • Strategy logic lives in lib/ and is imported, byte-identical, from both research notebooks and the production runner
  • The production environment has no Jupyter, no plotting libraries, no notebook helpers
  • You can run any production strategy from a notebook by importing the same function with historical data
  • Your build / test pipeline rejects production imports from research/
  • You have a documented promotion path: notebook → lib → backtest harness → falsification → paper → live

Module 9

Deployment & Operations
6 sections · ~4.5 hours

9.1 Server Setup

Your trading bot runs 24/7. It cannot run on your laptop. You need a server — a virtual private server (VPS) in the cloud that is always on, always connected, and accessible from anywhere.

Recommended: A Tier-1 European Dedicated-Server Provider

We run all production trading infrastructure on modest dedicated hardware (a few cores, ~64GB RAM) from a tier-1 European dedicated-server provider. Why this class of provider, rather than a hyperscaler:

  • Cost: Dedicated servers at a small fraction of AWS/GCP/Azure pricing for equivalent specs.
  • Reliability: 99.9%+ uptime in well-run European data centres with excellent connectivity.
  • Performance: Dedicated CPU and RAM, not shared with other tenants.
  • No surprise bills: Fixed monthly pricing, not usage-based.

For a first system, a small shared VPS is sufficient (a few vCPU, 8GB RAM, a recent Ubuntu LTS). Upgrade to dedicated hardware when you have multiple strategies running.

Initial Setup Checklist

1

Provision the server

Choose Ubuntu 24.04 LTS. Set up SSH key authentication (no password login). Configure the firewall (UFW) to allow only SSH (port 22) and any ports your dashboards need.

2

Install Python 3.12+

Most Ubuntu 24.04 installs come with Python 3.12. Verify with python3 --version. Install pip and venv.

3

Install Docker

Docker containerises your trading bot so it runs in an isolated environment with all dependencies. Install Docker Engine and Docker Compose. This is covered in section 9.2.

4

Set up git

Clone your trading system repo. Set up deploy keys so the server can pull code from GitHub without your password.

5

Create the .env file

Store all API keys, secrets, and configuration in a .env file on the server. Never commit this to git.

You’re Done When…

  • You have a VPS running Ubuntu with SSH access
  • Python, Docker, and git are installed
  • Your API keys are in a .env file on the server (not in git)
  • You can SSH into the server from your laptop

9.2 Docker & Containerisation

Docker wraps your trading bot and all its dependencies into a container that runs identically everywhere. No more “works on my laptop but not on the server” problems.

Why Docker

  • Isolation: Each container has its own Python environment. No dependency conflicts between strategies.
  • Reproducibility: The container runs the same way on your laptop, on the server, and a year from now.
  • Auto-restart: Docker can automatically restart your container if it crashes (restart: unless-stopped).
  • Easy deployment: docker compose up -d starts everything. docker compose down stops everything.
CODE · YAMLscroll to read
# docker-compose.yml (minimal example)
version: "3.8"
services:
  btc-strategy:
    build: .
    container_name: btc-strategy
    restart: unless-stopped
    env_file: .env
    volumes:
      - ./data:/app/data          # persist database
      - ./config:/app/config      # strategy configs
    ports:
      - "8080:8080"               # dashboard

Minimal Docker Compose file for a trading bot. The bot auto-restarts on crash, loads secrets from .env, and persists data to a mounted volume.

Healthchecks, Restart Policy, and Volumes (Operational Depth)

The minimal compose above is a starting point. The version below adds the three pieces that separate a toy deployment from one you can leave running unattended: a healthcheck, a finite restart policy, and named volumes for stateful data.

CODE · YAMLscroll to read
# docker-compose.yml (operator-grade)
version: "3.8"
services:
  btc-strategy:
    build: .
    container_name: btc-strategy
    env_file: .env

    # Restart policy: bounded retries, not infinite crash loop
    restart: on-failure
    deploy:
      restart_policy:
        condition: on-failure
        max_attempts: 5
        window: 120s

    # Healthcheck: container is "healthy" only when /health responds 200
    healthcheck:
      test: ["CMD", "curl", "--fail", "--max-time", "5", "http://localhost:8080/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 30s   # grace window on first boot

    volumes:
      - btc_strategy_data:/app/data    # named volume; survives container recreate
      - ./config:/app/config:ro        # configs read-only into container
    ports:
      - "8080:8080"

volumes:
  btc_strategy_data:                   # declared once, persists independent of container lifecycle

Why each change matters:

  • restart: on-failure with max_attempts: unless-stopped retries forever, which means a bug that crashes on boot becomes an infinite crash-loop that fills logs and masks the real issue. on-failure with a finite retry surfaces persistent failures to your alerting instead of hiding them under restart spam.
  • healthcheck:: Docker now knows the difference between “process is running” and “process is operational.” Your watchdog (Module 9.5) and any orchestrator can read the health status; without it, a hung-but-not-crashed process looks healthy from the outside.
  • Named volumes: a bind-mount (./data:/app/data) ties your state to a path on the host, which is fine until you move servers or recreate the container with a different working directory. A named volume (btc_strategy_data) is owned by Docker, lives independently of the container, and survives docker compose down. For stateful containers (your trading bot’s local SQL DB lives here) this is the safer default.
  • Read-only configs: mounting ./config as :ro means a misbehaving container cannot write to your strategy YAMLs.

The application must expose the /health endpoint the healthcheck calls (covered in Module 9.5). Without it, the healthcheck can’t do its job.

Practical Advice

Docker is overkill for your first prototype. Run the bot directly with python3 run.py first. Containerise once you have a working system and want it to survive server reboots and crashes automatically. Docker adds a layer of complexity that is not worth it during the “does this even work?” phase.

You’re Done When…

  • Your trading bot runs in a Docker container with auto-restart
  • Data is persisted via a named volume (not inside the container, not a bind mount that ties you to a host path)
  • You can start and stop the system with docker compose up/down
  • The compose file declares a healthcheck and a finite retry policy

9.3 Shadow Mode (Paper Trading)

Before risking real money, run the system in shadow mode: it processes real market data, generates real signals, but does not place real orders. It simulates what would have happened. This is the final validation step before going live.

Why Paper Trading Matters

  • Catches bugs that backtesting misses: Timezone issues, API edge cases, state management bugs, order handling errors
  • Verifies infrastructure: Does the cron job fire? Does the bot restart after a crash? Do your instant-messaging alerts work?
  • Builds confidence: Watching the system make correct decisions in real time, with real data, for weeks, before committing capital
  • Validates live performance vs backtest: Are the signals matching what the backtest predicted? If live paper results diverge significantly from backtest expectations, something is wrong.

How Long to Paper Trade

Minimum: 2–4 weeks. Longer for low-frequency strategies. You need enough time to observe:

  • At least 2–3 signal events (entries and exits)
  • At least one server restart or maintenance window
  • At least one period of high volatility (to test error handling)

For a weekly strategy that trades 3 times per year, you might need 2–3 months to see a full signal cycle. For a daily strategy, 2 weeks may suffice.

Do Not Skip This

The temptation to skip paper trading and “just go live with a small amount” is strong. Resist. A bug that mismanages state or miscalculates position size will cost you real money. Paper trading costs nothing and catches errors that no amount of backtesting reveals. Every professional trading desk paper-trades new strategies before deploying capital.

Fill-Simulation Spec (Why “Just Run It” Isn’t Enough)

Without an explicit fill-simulation spec, paper P&L cannot be compared to live P&L — you don’t know whether divergence is “the strategy decayed” or “the simulator was optimistic.” The spec below is the minimum operator-grade contract for what paper fills mean.

Entry fills

  • Market entry: fill at the next candle’s open price, plus a slippage adjustment in the adverse direction (long entries fill above open, short entries below). Open-of-next-candle is the conservative baseline; never fill at the signal-bar close (look-ahead).
  • Limit entry: fill only if the next-bar low (for buy limits) reaches the limit price, or the next-bar high (for sell limits) reaches it. Fill at the limit price exactly (no improvement).
  • Alternative entry models (declare which is in use, do not mix):
    • VWAP within candle: fill at the candle’s typical price (HLC/3 or volume-weighted) — less conservative; only valid when the strategy claims to use TWAP/VWAP execution in live.
    • Mid + slippage model: fill at mid + slippage_bps in the adverse direction, where the slippage model is parametric or empirical (see below).

Stop exits

  • Trigger condition: for a long stop, the candle’s low reaches or breaches the stop price; for a short stop, the candle’s high reaches or breaches it.
  • Fill price (intra-bar): fill at the stop price, plus a worst-case slippage of K basis points beyond the stop in the adverse direction. Default K conservatively (e.g. 5–10bp on liquid majors, more on alts).
  • Gap-through-stop: if the candle’s open already breaches the stop (i.e. the bar gapped through the stop level), simulate as filled at the next-bar open, not at the stop price. This models genuine gap risk — on a major news gap, you do not get the stop price; you get whatever’s available when the market re-prices.

Target / take-profit exits

  • Trigger: for a long target, the candle’s high reaches the target; for a short, the candle’s low.
  • Fill price: fill at the target price exactly (no positive slippage on resting limits in conservative simulation).
  • Both stop and target hit in the same bar: ambiguous — the candle gives high and low but not order. Pick a deterministic rule (we recommend: assume the stop hit first, the conservative outcome) and document it. Never assume target-first; that’s a known overfit lever.

Slippage model

State which model is in use and why:

  • Parametric (constant bp): simplest; one number per asset. Fine for liquid majors at retail size. Calibrate from observed live fills if available.
  • Depth-based: use L2 order-book snapshots to compute the impact of your size against the resting depth. More accurate at scale; requires storing book data.
  • Empirical: fit a slippage distribution from your own historical live fills (price you got vs price you intended) and sample from it. Only available once you have live fills to calibrate against.

Funding accrual (perpetuals)

For each funding tick (typically every 8h, sometimes 1h or 4h depending on venue) the position is open, accrue funding using the actual historical funding rate for that interval:

CODE · PSEUDOCODEscroll to read
funding_payment = position_notional * funding_rate * sign
# sign: longs pay when funding is positive; shorts pay when funding is negative
position.realised_pnl += -funding_payment

Carry funding through the trade’s PnL accounting, not as a separate ledger — otherwise paper PnL looks rosier than live PnL on funding-heavy markets.

Fee model

  • Default to taker fee for both entry and exit in conservative paper simulation. Most market entries and stop-triggered exits are taker.
  • If your strategy explicitly rests as maker, model maker fee (or rebate) on those legs — but require live evidence that your maker fills are actually filling at maker rates before you trust paper PnL that assumes them.
  • Apply fees to both legs (open and close) and include them in the same PnL ledger as funding.

Byte-for-Byte Parity Rule

Paper signals must use the same signal-evaluation code as live. Identical inputs, identical indicator implementations, identical gates, identical config. The only difference is the order-submission step: paper writes a synthetic fill to a paper_fills table; live submits to the venue. Anything else — a separate “backtest engine” that re-implements the strategy, a config that flips off a gate “just for paper,” a different time source — produces drift between paper and live evaluation, which makes the paper test useless. If the paper system says enter and the live system would not have, paper P&L tells you nothing about live performance. This is a hard rule, not a guideline.

You’re Done When…

  • The system has run in shadow mode for at least 2 weeks
  • Paper trading results are consistent with backtest expectations
  • All infrastructure (crons, restarts, alerts) has been verified
  • At least 2–3 signal events have been observed and validated
  • The paper fill-simulation spec is documented (entry, stop, target, gap, slippage, funding, fees)
  • Paper signal evaluation is byte-for-byte identical to the live signal-evaluation path

9.4 Going Live

The moment of truth. Your strategy is validated, your infrastructure is tested, your paper trading is consistent. Here is how to transition to live trading safely.

The Go-Live Checklist

1

Verify exchange balance

Ensure USDT (or your collateral currency) is in the correct sub-account on the exchange. For isolated margin: ensure the margin is allocated to the correct position type.

2

Switch from paper to live mode

In your config: mode: live and testnet: false. Double-check. This is the single most important config change you will ever make.

3

Start with minimum position size

Even though your strategy is validated at 1% risk, start live with 0.25% or 0.5% for the first week. This limits damage if there is a bug that paper trading didn’t catch. Scale up after the first few live trades confirm everything works.

4

Monitor the first trade closely

Watch the first live entry in real time. Verify: the order was placed, the fill price is reasonable, the exchange-side stop-loss was set, the state was persisted correctly, and the dashboard shows the correct position.

5

Set up your watchdog

A separate process that monitors the bot and alerts you if anything goes wrong: container crash, API errors above threshold, reconciliation mismatch, or cron job failure.

Key Insight

The transition from paper to live is psychologically harder than technically hard. You will feel the urge to override the system, to take profits early, to increase position size after a win. Trust the system. If you validated it properly through Modules 5–6, the system knows better than your emotions do. Your job now is to monitor, not to intervene.

You’re Done When…

  • The system is running in live mode with real capital
  • The first trade has been executed and verified
  • Exchange-side stop-loss is confirmed active
  • Watchdog and alerts are operational
  • You are monitoring, not intervening

9.5 Monitoring & Alerts

A live trading system that you don’t monitor is a time bomb. This section covers the minimum monitoring setup to keep your system healthy and catch problems before they cost money.

Essential Alerts

EventAlert MethodPriority
Trade entry/exitInstant-messaging alert with detailsInformational
Stop-loss triggeredInstant-messaging alertImportant
API error rate spikeInstant-messaging alertUrgent
Container crashInstant-messaging alert from watchdogCritical
Reconciliation mismatchInstant-messaging alertCritical
Cron job missedStaleness check in watchdogImportant
Account drawdown beyond thresholdInstant-messaging alert + circuit breaker activationCritical

Health Watchdog

A separate script (not part of the trading bot) that runs on a cron schedule and checks:

  • Are all trading containers running? (docker ps)
  • Has the candle update cron fired recently? (check file modification time)
  • Are there any stale positions (open for longer than expected)?
  • Is disk space running low?
  • Are there error patterns in container logs?

If any check fails, send an instant-messaging alert. This is your insurance against the 3am crash you sleep through.

Health and Readiness Endpoints

Every container exposes two HTTP endpoints. The distinction matters: a container can be alive without being operational.

  • /health — process is alive. Returns 200 if the HTTP server is responding. Used by Docker healthcheck and the watchdog’s liveness probe.
  • /ready — process is operational. Returns 200 only if: data is fresh (latest candle younger than 2 candle intervals), DB is reachable, last reconciliation succeeded, no fatal-error flag is set. Used by the watchdog’s readiness probe.
CODE · PYTHONscroll to read
@app.get("/health")
def health():
    return {"status": "alive", "ts": now_ms()}, 200

@app.get("/ready")
def ready():
    checks = {
        "data_fresh":   latest_candle_age_seconds() < 2 * candle_interval_seconds(),
        "db_reachable": db_ping(),
        "reconcile_ok": last_reconcile_age_seconds() < max_reconcile_age,
        "no_fatal":     not fatal_flag.is_set(),
    }
    if all(checks.values()):
        return {"status": "ready", "checks": checks}, 200
    return {"status": "not_ready", "checks": checks}, 503

An always-200 /health is a lie if the bot is hung, blocked on a deadlock, or has lost its data feed. /ready is what tells you whether to trust the system right now.

Watchdog Checklist (Concrete Commands)

The watchdog is a separate process — usually a cron-driven shell or Python script — that exercises the system from the outside. The minimum check set:

CODE · BASHscroll to read
# 1. Liveness: each container responds to /health
for container in $TRADING_CONTAINERS; do
  curl --fail --max-time 5 "http://${container}:${PORT}/health" \
    || alert "container ${container} not alive"
done

# 2. Readiness: each container reports operational
for container in $TRADING_CONTAINERS; do
  curl --fail --max-time 5 "http://${container}:${PORT}/ready" \
    || alert "container ${container} not ready"
done

# 3. Data freshness (sanity check, even if /ready already covers it)
latest_ts=$(query_db "SELECT MAX(ts) FROM candles WHERE symbol=$SYMBOL")
age=$((NOW - latest_ts))
[ "$age" -gt "$((2 * INTERVAL))" ] && alert "candles stale: ${age}s old"

# 4. Position reconciliation: exchange == local for every (strategy, symbol)
diffs=$(reconcile_dry_run --all-strategies)
[ -n "$diffs" ] && alert "reconcile diffs: ${diffs}"

# 5. Heartbeat-to-monitor: container writes last_loop_at to a status file
for container in $TRADING_CONTAINERS; do
  last=$(stat -c %Y "/var/run/${container}.heartbeat")
  age=$((NOW - last))
  [ "$age" -gt "$HEARTBEAT_THRESHOLD" ] && alert "${container} heartbeat stale: ${age}s"
done

# 6. Disk and log volume
df -h | awk '$5 ~ /9[0-9]%|100%/ { print }' | grep . && alert "disk pressure"

Run the watchdog from a process that cannot share a failure mode with the trading containers: a different host where practical, or at minimum a separate systemd unit on the same host. If your watchdog dies with your bot, you have no watchdog.

Log Shipping and Retention

  • Structured JSON logs — every log line is a parseable object with fields (ts, level, strategy, event, coid, etc.). Free-text logs are unsearchable at the volume a live system produces.
  • Ship to a central collector — not because you need a fancy stack, but because logs on the same host as the failing container are often unreachable when you most need them. A simple syslog forward, a journald forward, or a lightweight log-shipper agent is enough.
  • Retention: hot retention (instantly searchable) at least 7–14 days covers most incident windows; cold retention (compressed archive) 3–12 months for post-mortem and regulatory needs.
  • Redact secrets at the source, never at the collector — if a key ends up in your central log store you have to rotate the key.

SLOs and Metric Thresholds

Pick a few measurable targets and alert when reality breaches them. Examples calibrated for a daily/intraday system at retail size:

MetricTargetAlert when
Signal-evaluation completeness99% of expected evaluations completed within 1 candle intervalTwo consecutive candles missed
Order submission latency99% of orders submitted within 500ms of signalP99 above 1s for 5+ minutes
Reconciliation discrepancy rate<1% of orders show diff at reconciliationSustained >5% over an hour
Alert delivery latency<60s from breach detection to alert deliveredEnd-to-end test fails
Heartbeat stalenessLast loop write within N × cycleStale beyond N=3 cycles
Error budget (per-endpoint 4xx/5xx)<0.5% of calls in steady stateSpike above baseline by 5×

Numbers are illustrative — calibrate to your strategy’s timescale. The point is to have numbers, not to invent them ad hoc when something breaks.

Alert Routing

Not every alert deserves to interrupt you. Tier alerts by required action:

  • Chat / push (high-urgency, action-required): container crash, reconciliation hard-fail, account drawdown breach, exchange API blackout, fatal-error flag set. Wakes you up. Has a human-readable runbook link.
  • Email / dashboard (informational, review-during-business-hours): trade entries/exits, daily PnL roll-up, weekly performance vs backtest, normal cron completion.
  • Suppress / aggregate: repeated identical alerts within a window collapse to one. The 47th “rate-limit hit” tells you nothing the first did not.

Alert fatigue is a real failure mode. If every notification is “urgent,” the genuinely urgent ones get muted with the rest. Be ruthless about what gets to interrupt sleep.

Runbook Entries (Per-Alert)

For every alert your system can fire, write a runbook entry. Four lines, not a novel:

  1. What it means. “Reconciliation found a position on exchange that local DB doesn’t know about.”
  2. How to verify. “SSH to host; docker exec ... reconcile --dry-run; check the diff output.”
  3. How to fix. “If the venue position is intentional (manual hedge), insert a row tagged manual in positions. If unintentional, close it via venue UI and re-run reconcile.”
  4. How to prevent recurrence. “If this fires more than once a week: investigate the venue order-update stream for dropped messages.”

Runbooks live in version control next to the code, not in someone’s head. The point of writing them is the next 3am page is handled by reading, not thinking.

You’re Done When…

  • Instant-messaging alerts are configured for all critical events
  • A health watchdog runs independently and checks system health
  • You receive alerts within minutes of any critical issue
  • Every container exposes /health and /ready
  • Watchdog checks include liveness, readiness, data freshness, reconciliation, heartbeat, and disk
  • SLOs are documented and breach-alerts wired
  • Each alert has a four-line runbook entry in version control

9.6 Disaster Recovery

Your monitoring catches problems. Your watchdog restarts crashed containers. Both fail when the host itself dies, the disk corrupts, the provider has an outage, or your DB silently rots. Disaster recovery is the layer below the watchdog — the plan for when the layer above the disaster has stopped working too.

State DB Backups: Continuous and Periodic

The trading system’s state DB — orders, positions, fills, strategy state — is the only piece of local data that, if lost, cannot be reconstructed in minutes. Candle history can be re-fetched. The DB cannot. Two layers of backup, both required:

  • Continuous WAL streaming. Most production-grade SQL databases support write-ahead-log streaming to a remote target. Every committed transaction is shipped, asynchronously, to a backup endpoint. Recovery point objective approaches zero — in the worst case you lose only the transactions in flight when the primary died.
  • Periodic full snapshots. Daily or weekly, take a consistent snapshot of the entire DB and ship it offsite, encrypted at rest. Snapshots cover failure modes WAL streaming cannot — logical corruption, accidental schema changes, an attacker silently overwriting WAL too — and they give you a known-good restore point with a known timestamp.

Encryption is non-negotiable: the backup contains your full trading history and any secrets your DB persisted. Treat it as you would the live DB.

The Only Backup That Works Is the One You’ve Restored

Backups that you have never restored are not backups; they are hopes. Several of the most expensive failures in production systems share the same plot: backups were running for years, and on the day they were needed, the restore process failed — corrupted file, missing dependency, version mismatch, key lost. Schedule a monthly restore drill. Spin up a fresh container from your backup against a clean disk; verify the DB starts; verify the data is intact (row counts, recent timestamps, a known query). Treat a failed drill as a Sev-1 incident.

Infrastructure as Code: Rebuild From Git

If the box dies right now, how long until you have a replacement running? The honest answer for most retail operators is “hours, while I remember how I set it up.” The target answer is “under 30 minutes, from a script.”

Every server in the system should be rebuildable from a git repository. The minimum content of that repo:

  • A provisioning script (Ansible, Terraform, a shell script — the form matters less than the existence) that installs the OS-level dependencies, configures firewalls, sets up the user, and pulls the application repo.
  • A docker-compose.yml (or equivalent) that brings up the trading containers, the watchdog, the log shipper, the DB.
  • A documented secrets-injection process — the secrets themselves do not live in git, but the procedure to fetch them from your secret store does.
  • A RECOVERY.md at the repo root with the literal commands, in order, to bring up a new host from cold metal to live trading.

If you can’t hand the repo to a competent engineer who has never seen it and have them bring up a working replica in under an hour, the IaC isn’t complete.

Blue/Green Deployment

Deploying a new version of the trading bot directly over the running version is a needless risk. The blue/green pattern:

  • Run the new version (green) alongside the old (blue). Both connect to the same DB and the same venue, but only one of them is the “active” instance permitted to place orders. The other runs in shadow mode, evaluating signals against the same data and writing them to a parallel log.
  • Switch traffic by flipping a single config flag. The active flag is read by both containers; whichever sees active = true places orders. Switch from blue to green; observe; if anything looks wrong, flip back.
  • Keep blue around for 24–72 hours. An instant rollback is just a config flag flip. After the watch window expires, decommission blue.

Many failure modes — a subtle indicator change, a new bug introduced by a refactor, a venue API behaviour you didn’t notice — only manifest under live data. Blue/green lets you catch them with a five-second rollback rather than a forty-minute redeploy.

Region and Provider Failover

Your provider can have an outage. Whole datacentres lose power. Networks partition. The defence isn’t complex multi-region orchestration on day one — it’s a documented runbook plus a cold standby:

  • Cold standby on a different provider. A second host, on a different provider in a different region, with the IaC repo cloned and the DB backups synced. It is not running — you pay only for the disk and the small instance fee. Activation is a script: pull latest backup, restore DB, bring up containers, point DNS / API alerts at the new host.
  • Activation runbook in version control. Step-by-step. Time-it under non-emergency conditions: from “decide to fail over” to “new host taking trades,” the target is under 30 minutes. The first time you measure it, it will be longer; iterate the runbook until it isn’t.
  • Test the failover quarterly. Same logic as backup restore drills: an untested failover is a hope. The drill is “bring up the standby on cold backup, run paper trading on it for an hour, decommission.”

The Catastrophic-Loss Runbook

The worst case: local DB is corrupted, latest backup is also corrupted, infra is intact but state is gone. You don’t know what you own.

The recovery is structural: the venue is the source of truth for fills, orders, and positions. Every fill that ever happened was recorded by the venue; every open position has a record there. The reconcile-from-venue procedure:

  1. Halt all strategies. No new orders until reconciliation completes.
  2. Pull full history from the venue. All fills, all closed orders, all open orders, all positions, going back as far as the venue’s API exposes (usually 90 days for fills; longer with paginated requests; full account-statement export for older history).
  3. Rebuild local state from venue history. Reconstruct positions by replaying fills; reconstruct strategy attribution from your clientOrderId tags — this is why the clientOrderId discipline in Module 8 is non-negotiable. Without strategy tags in the order id, you cannot tell which strategy owns which position.
  4. Reconcile against current open state. The replayed positions should match what the venue currently reports. If they don’t, there’s a fill you missed — investigate before resuming.
  5. Resume strategies one at a time, with size reduced, watching for any abnormal behaviour.

This procedure is slow and tedious. The point isn’t that it’s elegant; the point is that it exists, it’s documented, and you have rehearsed it once. The day you need it is not the day to discover that fill history beyond 30 days isn’t available on the venue’s standard API.

The 3-2-1 Rule

The simplest discipline that covers almost every backup failure mode: 3 copies, 2 different media, 1 offsite.

  • 3 copies: the live DB, plus two independent backups. One of the backups can be derived from the other (e.g. WAL stream archived to disk, then disk snapshot to object storage).
  • 2 different media / providers: not just “two folders on the same disk.” A provider-managed object store and a local NAS, or two different cloud providers. The point is that one ransomware/corruption/policy event cannot take both.
  • 1 offsite: at least one copy is in a different physical location and a different ownership domain (different cloud provider, different region). A flooded datacentre or a billing-suspension at your primary provider should not be the end of your data.

Retail systems often run with 1 copy and call it “a backup.” That’s not a backup; that’s a single point of failure with extra steps.

Key Insight

Disaster recovery is the discipline of treating the worst case as inevitable. It will happen. The only question is whether your future self has 30 minutes of script-execution between disaster and resumption, or 30 hours of panic. The cost of preparing now is small; the cost of not preparing now is unbounded.

You’re Done When…

  • State DB has continuous WAL streaming plus periodic encrypted full snapshots
  • You have completed at least one successful restore drill in the last 30 days
  • The system can be rebuilt from a git repo in under an hour by someone unfamiliar with it
  • You deploy via blue/green with a documented one-flag rollback
  • You have a cold standby on a different provider and have tested failover at least once
  • The catastrophic-loss runbook (reconcile from venue) is written and rehearsed
  • Your backup setup satisfies 3-2-1

Module 10

Regime Detection & Macro Overlay
4 sections · ~3 hours

10.1 Why the Same Strategy Fails in Different Markets

A strategy validated on 2020–2021 data (explosive bull market) will get destroyed in 2022 (grinding bear). The strategy didn’t break. The market changed. This section explains why regime awareness is the single biggest factor in whether a system survives long-term.

The Regime Problem

Markets exist in distinct regimes. Each regime has different statistical properties:

RegimeCharacteristicsWhat ThrivesWhat Dies
Bull TrendStrong upward momentum, shallow pullbacks, high confidenceTrend following, momentum, dip buyingShort selling, mean reversion
Bear TrendSustained declines, relief rallies that trap longs, fearShort selling (if gated by regime), cashDip buying, leverage longs
Chop / RangeNo direction, false breakouts, whipsawsMean reversion, range strategies, cashTrend following, breakout strategies
High VolatilityLarge daily moves, wide spreads, fast liquidationsWider stops, smaller positions, volatility sellingTight stops (get stopped by noise)
Low VolatilityCompressed ranges, narrow spreads, low volumePatience, breakout anticipationActive strategies (not enough movement)

The Most Expensive Mistake

Running a trend-following strategy during chop is the most common and most expensive regime mismatch. The system enters on a “breakout,” the breakout fails, the system exits at a loss, enters again on the next “breakout,” that fails too. Each trade loses fees + slippage. After 10 whipsaw trades, you’ve lost 5–10% of your account with zero market exposure. This is called death by a thousand cuts.

The solution: don’t trade in regimes where your strategy has no edge. This is what regime gates are for.

Key Insight

The highest-value improvement you can make to any strategy is not a better entry signal. It is a regime gate that prevents the strategy from trading when the market is in the wrong state. Our regime-conditional short system added an efficiency-ratio gate (only trade during low-efficiency, choppy periods within bear regimes) and CAGR roughly doubled while max drawdown dropped meaningfully — well into double-digit percentage-point improvement.

You Understand This When…

  • You can name 5 market regimes and what works/fails in each
  • You understand why regime awareness matters more than signal refinement
  • You’ve identified which regimes your strategy is designed for

10.2 Building a Regime Detector

A regime detector classifies the current market state so your strategies can gate on it. It does not need to be complex. A simple moving average slope + volatility measure gets you 80% of the way.

Simple Regime Classification

The pattern below uses a slow weekly moving-average slope (direction) crossed with a volatility or efficiency measure (character of motion). Pick your own indicators — the structure is what matters:

ConditionRegime
Slow weekly slope positive AND volatility below its mid-range percentileBull (low-vol) — ideal for trend following
Slow weekly slope positive AND volatility in the upper percentile bandBull (high-vol) — trend following with wider stops
Slow weekly slope negative AND daily efficiency-ratio lowBear (choppy) — short-side strategies
Slow weekly slope negative AND daily efficiency-ratio elevatedBear (trending) — cash or aggressive shorts
Slow weekly slope near zero (within a narrow neutral band)Chop — mean reversion or sit out

This is not sophisticated. It does not need to be. The goal is to prevent your trend strategy from trading during chop and your short strategy from trading during bull markets. Broad strokes are enough.

Advanced: External Regime Signals

For more nuanced detection, add external data:

  • Fear & Greed Index: Extreme fear (<20) often precedes reversals. Extreme greed (>80) often precedes corrections.
  • Funding rates: Deeply positive = crowded longs (bullish sentiment). Deeply negative = crowded shorts (bearish sentiment). Extremes tend to revert.
  • VIX (if trading correlated assets): VIX >30 signals high fear in traditional markets, which often spills into crypto.
  • DXY (US Dollar Index): Strong dollar typically pressures BTC. Weak dollar typically supports BTC.

These are filters, not signals. They don’t tell you what to trade. They tell you whether conditions are favourable for your strategy type.

You Understand This When…

  • You have built a basic regime detector using SMA slope + volatility
  • Your strategy has a regime gate that prevents trading in unfavourable conditions
  • You know how to add external signals (fear/greed, funding, VIX) as additional filters

10.3 Macro Overlays

Crypto does not exist in a vacuum. It is influenced by the US dollar, interest rates, equity markets, geopolitical events, and broader risk appetite. A macro overlay gives your system awareness of these external forces.

Key Macro Indicators for Crypto

IndicatorRelationship to BTCData Source
DXY (US Dollar Index)Inverse — strong dollar pressures BTCRetail FX/CFD broker free API, or FRED
US 10Y Treasury YieldHigher yields = tighter liquidity = BTC pressureFRED, or a retail broker API
S&P 500 / NASDAQPositively correlated in risk-on periodsRetail FX/CFD broker free API (look for SPX500/NAS100 instruments)
Gold (XAU/USD)Weakly correlated; both are “alternative” assetsRetail FX/CFD broker free API
VIXHigh VIX = fear = BTC sell-off riskCBOE official feed (spot index) or a market-data redistributor. Note that retail FX/CFD brokers often only offer a synthesised VIX-like instrument that tracks VIX futures, not the spot index — if you use one, label clearly and don’t conflate it with spot VIX.
Fear & Greed IndexExtremes tend to revert (contrarian signal)alternative.me API (free)

You do not need to trade these instruments. You just need to read them as context for your crypto strategies. “DXY just spiked 2% and VIX is above 30” is important context when your BTC strategy wants to go long.

You Understand This When…

  • You know the key macro indicators that affect crypto prices
  • You have data feeds for at least DXY and VIX as context (and you know whether your VIX feed is spot or futures-tracker)
  • You understand these are filters/context, not direct trading signals

10.4 The Volatility Filter

Across almost every investigation we have run, the volatility filter is the single most impactful dimension. Strategies that are flat overall become strongly positive when filtered by volatility regime. This pattern recurs so consistently that it deserves its own section.

The Pattern

From a calendar-effect investigation (a deliberate test of a weak claim, used here as an example of how filters change a verdict):

  • The headline signal across all conditions: barely positive average return — effectively flat.
  • Filtered to low volatility only: strongly positive average return.
  • Filtered to medium volatility: negative — the signal flips sign with the regime.

From a derivatives-driven contrarian signal we tested:

  • All conditions: hit rate above 60%.
  • Filtered to low volatility only: hit rate well above 80% (small sample, but striking).

The same pattern appears in trend following, mean reversion, and derivatives signals. Low-volatility environments compress ranges, reduce noise, and make genuine signals cleaner. High-volatility environments are full of noise that triggers false signals.

Key Insight

If your strategy performs inconsistently, the first thing to test is a volatility split. Measure ATR (Average True Range) as a percentile of its 90-day distribution. Filter your backtest results by “ATR percentile < 50th” (low vol) vs “ATR percentile > 50th” (high vol). In our experience, this single filter is the most likely to turn a mediocre strategy into a strong one — or to reveal that the strategy only works in one volatility regime.

You Understand This When…

  • You have tested your strategy split by volatility regime
  • You know whether your strategy works in low-vol, high-vol, or both
  • If it only works in one regime, you have added a volatility gate

Module 11

Continuous Improvement
4 sections · ~2 hours

11.1 Live Performance vs Backtest

Your system is live. Now you need to know: is it performing as expected? And if not, is it a normal deviation or a sign that the edge is dying?

What to Track

MetricCompare ToConcern Threshold
Win rateBacktest win rate>10 percentage points below backtest after 20+ trades
Average win / Average lossBacktest ratioRatio has degraded by >30%
Profit factorBacktest PFDropped below 1.0 over 20+ trades
Max drawdownMonte Carlo 95th percentileApproaching or exceeding MC95
Trade frequencyExpected from backtestSignificantly more or fewer trades than expected

Some deviation is expected — live trading will never perfectly match backtesting due to slippage variance, execution timing, and market microstructure differences. The question is whether the deviation is within the range your Monte Carlo simulations predicted.

The Review Cadence

  • Daily: Check that the system is running (watchdog alerts handle this)
  • Weekly: Review any trades from the past week. Do the entries and exits make sense?
  • Monthly: Compare cumulative live performance to backtest expectations. Check all metrics in the table above.
  • Quarterly: Full strategy review. Is the edge still intact? Has the market regime changed? Should position sizing be adjusted?

You Understand This When…

  • You have a tracking spreadsheet or dashboard comparing live vs backtest metrics
  • You know the concern thresholds for each metric
  • You have a review cadence (weekly, monthly, quarterly) scheduled

11.2 Strategy Degradation Detection

Edges die. Market structure changes. What worked in 2024 may not work in 2027. Detecting degradation early — before it costs serious money — is a core skill. The hard part is doing it statistically rather than by eyeballing a rolling chart, especially for low-frequency strategies where 20 trades take years to accumulate.

The 20-Trade Window Is Not a Test

A common heuristic is “watch the rolling 20-trade window.” This is fine as an attention trigger but useless as a decision rule, especially for low-frequency strategies. A strategy that fires 3 times a year needs ~7 years of live trading to fill a 20-trade window. By the time the heuristic flags a problem, you’ve already lost the money. Use the window for noticing; use the methods below for deciding.

Statistical Methods for Degradation Detection

  • Bootstrap confidence intervals on rolling metrics. Take the long-run trade list (backtest + live), bootstrap a distribution of N-trade-window Sharpe (or PF, or mean P&L) under the assumption that the strategy is unchanged. If the recent N-trade window’s metric falls outside the 95% CI of that long-run distribution, that is statistical evidence of drift. This is the most general method and works whether N is 10 or 200.
  • CUSUM (cumulative sum) tests. Track the cumulative sum of (actual P&L − expected P&L) trade by trade. Under the null of “strategy unchanged” the CUSUM is a random walk around zero; under degradation it drifts persistently below zero. Trigger when the CUSUM crosses a control limit calibrated to a target false-alarm rate. CUSUM is the textbook tool for detecting persistent small shifts that a noisy rolling-mean would miss.
  • Bayesian posterior updating. Maintain a posterior distribution on the strategy’s key performance parameters — e.g. a Beta posterior on win rate, a Normal-Inverse-Gamma posterior on per-trade returns. Update with each new trade. Flag when the posterior mean shifts materially relative to the backtest prior, or when the posterior probability that the true edge is positive drops below a threshold (e.g. 80%). This naturally weights recent data without you having to choose a window length.
  • GLR (Generalized Likelihood Ratio) change-point tests. Formal change-point detection: at each new trade, compute the likelihood ratio of “the return distribution changed at some point in the recent past” vs “the distribution is unchanged.” Trigger when the ratio crosses a threshold. The most rigorous of the four; also the most computationally involved. Worth the cost on flagship strategies.

Pick one or two and commit to them in the strategy’s monitoring spec. The choice matters less than the discipline of applying it.

Cross-Instrument Sanity Check (When Stats Are Too Thin)

For low-frequency strategies where even bootstrap CIs are unreliable, fall back to a structural sanity check: does the strategy still produce signals on instruments where it should, and do those signals still correlate with the things they used to correlate with? If you have a funding-rate-extreme reversal strategy and funding extremes still occur but no longer mean-revert, that is degradation evidence even without any new live trades. If the strategy is signal-silent on instruments it used to fire on, the underlying condition is gone.

This is qualitative but it bridges the gap when the trade count is too thin to support a formal test.

Surface-Level Symptoms (Use as Triggers, Not Decisions)

  • Rolling win rate declining below backtest average and trending downward
  • Profit factor below 1.0 over recent trades
  • Increasing whipsaw frequency (entering and exiting rapidly without capturing moves)
  • Drawdown exceeding bootstrap 95th percentile from Module 5.3
  • Trade frequency change — significantly more or fewer signals than expected

Each of these is a flag to investigate, not a verdict. Run the statistical tests above before acting.

What to Do When Degradation Is Confirmed

  1. Reduce position size immediately. Don’t wait for full diagnosis — cut risk first, investigate after.
  2. Check the market regime. Is the strategy running in a regime it wasn’t designed for? If so, the fix might be a tighter regime gate, not a strategy change.
  3. Re-run the backtest with recent data included. Does the strategy still work historically? If adding recent data destroys the edge, the market structure has shifted.
  4. If degradation persists: suspend the strategy. Paper trade it for another cycle to see if the edge returns. Do not throw good money after bad.

Practical Advice

Set a hard kill switch on a statistically-grounded trigger — e.g. CUSUM crossing its control limit, or posterior probability of positive edge below 50% — not on a raw 20-trade rolling number. Have it auto-reduce position size to 25% and alert you. The 20-trade window can sit alongside as an attention prompt; it should not be the trigger.

You Understand This When…

  • You can name at least three formal statistical methods for degradation detection (bootstrap CI, CUSUM, Bayesian update, GLR)
  • You know why the 20-trade rolling window is an attention heuristic, not a decision rule
  • You have a defined response process: cut risk → diagnose → suspend if needed
  • Your kill switch is calibrated to a statistical trigger, not a raw threshold
  • You have a fallback (cross-instrument sanity check) for when trade counts are too thin for formal tests

11.3 Automated Research Loop

Once the data, backtester, falsification suite, and paper-trading rig are in place, the same components can be wired into a loop that runs without you driving each step. The loop runs periodic anomaly scans, generates candidate hypotheses, routes survivors through validation, and queues them for paper trading. A human still approves anything that touches live capital.

The Research Pipeline

Pipeline
  1. 1
    SCANCross-market anomaly scanner runs on a periodic schedule over the operator’s instrument universe.
  2. 2
    FILTERDrop findings that don’t survive statistical and multiple-testing correction.
  3. 3
    EVALUATELLM assesses plausibility and maps to strategy templates.
  4. 4
    GENERATECreates testable hypothesis with entry/exit rules.
  5. 5
    REGISTERHypothesis added to the registry with metadata, parameters, and kill criteria.
  6. 6
    PAPER TRADELive signals, simulated execution, parity-tracked against backtest.
  7. HUMAN GATEOperator review and approval before live capital. The system never deploys autonomously.
↻ Results feed back into the scanner so successful (and failed) hypotheses inform future scans.

The automated research loop. Anomaly scans run on a schedule; survivors are filtered, evaluated, formalised into testable hypotheses, registered, and routed to paper trading. Human approval is the gate between paper and live.

What the Scanner Looks For

The scanner is a battery of statistical tests run on a fixed schedule against the operator’s instrument universe. The test categories cover:

  • Cross-asset relationships: Do two instruments move together in unusual ways? Does one lead another with a time delay?
  • Temporal effects: Are returns at specific hours, days, or other calendar windows consistently different from random?
  • Regime-conditional patterns: Does a pattern only appear in bull markets, or only in high-volatility environments?
  • Feature interactions: Do combinations of indicators predict returns better than the same indicators do individually?

A typical scan produces a long list of raw findings. After filtering and LLM-assisted evaluation, a smaller subset becomes new hypotheses. Most will fail validation. Some will survive. The ones that survive become candidates for the paper-trading pipeline.

The Human Gate

No strategy goes live without human approval. The system scans, evaluates, registers, and paper-trades on its own. The decision to allocate real capital is a separate, manual step. The reason is structural, not philosophical: paper-trading parity is never perfect, and the cost of a mis-approved live deployment is much higher than the cost of a slow approval queue.

Why Build This

The point of the loop is throughput. A human researcher can investigate one or two hypotheses a week. The scan-evaluate-paper pipeline can run hundreds of candidates through the same falsification process while you sleep, and surface only the small number that survived. You don’t need to build this on day one — the early-stage operator should run scans manually first — but every component you build (data pipeline, backtester, falsification suite, paper trader) is reusable here. This is what they compose into.

You Understand This When…

  • You understand the SCAN → FILTER → EVALUATE → GENERATE → REGISTER → PAPER TRADE pipeline as a composition of components you already have
  • You know human approval is always the gate between paper and live, and why
  • You understand the loop is a throughput tool, not a magic engine

11.4 Cross-Pollination

When you find a signal on one instrument, test it on every other instrument you have data for. Edges that transfer across markets are more likely to be real. Edges that only work on one asset are more likely to be noise.

How It Works

You discover that BTC shows a mean-reversion pattern after extreme funding rate readings. Instead of only trading BTC with this signal, test it on ETH, SOL, and every other perpetual futures instrument in your data. If the pattern persists across 5+ instruments, the underlying mechanism (crowded positioning creates mechanical pressure) is likely real. If it only works on BTC, it might be BTC-specific or overfitted.

Cross-Asset Discovery

Even more powerful: test signals across asset classes entirely.

  • Does a signal from the FX market predict crypto behaviour? (DXY strength → BTC weakness)
  • Does a signal from equity indices predict crypto behaviour? (S&P fear → BTC sell-off)
  • Does a pattern in gold appear in BTC? (Both are “alternative store of value” assets)

Our research environment maintains an instrument universe spanning crypto, FX, commodities, and indices. Every finding is automatically tested across all of them. This cross-pollination is where the most surprising and robust edges are found — because an edge that works across multiple markets is much harder to explain away as noise.

Starting Point

You don’t need a large instrument universe to cross-pollinate. Start with 3: BTC, ETH, and one non-crypto asset (gold, or a major equity index via a retail FX/CFD broker). If a signal works on all three, it is almost certainly capturing a real market dynamic. If it only works on one, investigate why before trusting it.

You Understand This When…

  • You test every finding across multiple instruments before trusting it
  • You have data for at least 3 instruments (2 crypto + 1 non-crypto)
  • You understand that cross-market validity is a strong indicator of a real edge

Module 12

Tax & Accounting
7 sections · ~1.5 hours

12.1 Why Tax Matters from Day One

The systematic trader’s tax obligation is non-trivial, jurisdiction-specific, and easy to break in ways you only discover at year-end. Forgetting it once is expensive. Building the records to handle it is something you do at the start, not at the end.

The Cost of Treating Tax as an Afterthought

The pattern shows up regularly: an operator runs a high-frequency strategy for a year, makes money, and arrives at tax season with a screenshot of the venue’s P&L tab and the assumption that it will suffice. It will not. The venue’s P&L tab is not a tax record — it’s a marketing surface. It typically excludes funding payments, bundles fees in ways your jurisdiction may not accept, doesn’t track per-lot cost basis, and goes back only as far as the venue feels like keeping it. The accountant asks for the data the tax authority will ask for, you don’t have it, and the cost is either an estimated assessment (almost always against you) or a forensic reconstruction project that costs more than the year’s profits.

The defence is structural and cheap if done early: the records you need at tax time are exactly the records the trading system already produces. You just have to make sure they’re saved, exportable, and reconcilable. That’s a one-time engineering cost that pays itself back the first time you produce a clean tax export in an hour rather than a panic-week.

What Differs by Instrument, Jurisdiction, and Structure

  • Instrument. Spot crypto, perpetual futures, dated futures, options, and stablecoin transfers each have distinct treatment in most jurisdictions. Capital gains vs ordinary income vs mark-to-market vs trading-stock all turn on what the instrument is and what your activity pattern looks like.
  • Jurisdiction. Cost basis methods, holding-period rules, treatment of perps, treatment of staking and DeFi, GST/VAT applicability, and what counts as a “disposal” vary materially. Two operators in the same trade pay different tax in different jurisdictions; the trade was the same.
  • Structure. Trading as an individual, through a discretionary trust, in a company, or inside a self-managed retirement vehicle each give you different rates, different deductibility, different reporting, and different paperwork. The choice is partly about tax and partly about asset protection and estate planning.

This Module Is Not Tax Advice

Nothing here is legal or financial advice. Tax law is jurisdiction-specific, changes annually, and is rarely as “obvious” as a layperson’s reading suggests. Use this module to know what records you need to keep and what questions to ask. Engage a tax specialist who has explicit experience with crypto derivatives in your jurisdiction before you have material P&L. The cost of a specialist is small relative to the cost of getting it wrong.

You Understand This When…

  • You can articulate why “the venue’s P&L screen” is not a tax record
  • You know the three axes that determine your treatment: instrument, jurisdiction, structure
  • You have committed to engaging a tax specialist before you have material profits, not after

12.2 Cost Basis Methods

When you sell 1 BTC, which BTC did you sell? It is not a rhetorical question — the answer changes your taxable gain. Cost basis methods are the rules for assigning a purchase price to each disposal. The crucial thing is to pick one, document it, and apply it consistently.

The Four Methods

  • FIFO (First-In, First-Out). The oldest acquired lot is the first sold. Default in most jurisdictions; simplest to compute; produces the largest gain in a rising market (because the oldest lots have the lowest cost basis).
  • LIFO (Last-In, First-Out). The most recently acquired lot is the first sold. Some jurisdictions allow it; others prohibit it. Produces a smaller gain in a rising market — useful for tax deferral but not legal everywhere.
  • Specific identification. You explicitly identify which lot is sold per disposal. Most flexible: you can choose to realise gains or harvest losses. Requires per-lot tracking with timestamps and quantities. Often the best method for active traders, where allowed.
  • Average cost. The cost basis of all units of an asset is averaged into a single per-unit number; each disposal uses that average. Common for long-term holders, rare for active traders — loses precision and is often not optimal for tax outcomes.

FIFO Accumulator Pseudocode

The simplest production-grade implementation is a per-symbol FIFO queue of lots. Each lot stores (qty, price, timestamp); on a disposal, lots are popped from the front until the disposal qty is satisfied:

CODE · PYTHONscroll to read
class FifoCostBasis:
    def __init__(self):
        self.lots = defaultdict(deque)   # symbol -> deque of (qty, price, ts)

    def add_acquisition(self, symbol, qty, price, ts):
        self.lots[symbol].append((qty, price, ts))

    def realise_disposal(self, symbol, qty, price, ts):
        remaining = qty
        realised  = 0.0
        consumed  = []
        while remaining > 0 and self.lots[symbol]:
            lot_qty, lot_price, lot_ts = self.lots[symbol][0]
            take = min(lot_qty, remaining)
            realised  += take * (price - lot_price)
            consumed.append((take, lot_price, lot_ts, price, ts))
            remaining -= take
            if take < lot_qty:
                self.lots[symbol][0] = (lot_qty - take, lot_price, lot_ts)
            else:
                self.lots[symbol].popleft()
        if remaining > 0:
            raise InsufficientLots(symbol, qty, qty - remaining)
        return realised, consumed   # consumed is your per-lot tax record

The consumed list is the row-level tax record: each entry is one (proceed, cost basis, hold-period) tuple, which is exactly what a tax preparer needs. Persist it.

Pick One. Document It. Don’t Switch.

The single biggest cost-basis mistake is switching methods between years — or worse, between trades within a year — because each year’s software defaulted differently. Tax authorities treat unexplained method changes as a flag: at minimum it forces a reconciliation; at worst it triggers an audit. Pick one method, document the choice in a written policy, apply it across every symbol and every year, and only switch with a tax specialist’s blessing and a paper trail.

You Understand This When…

  • You can describe the four cost-basis methods and the trade-offs between them
  • You have chosen a method that’s legal in your jurisdiction and documented it in writing
  • Your system records every acquisition and every disposal at the lot level, not just the net
  • You can produce a per-disposal tax record showing proceed, cost basis, and hold period

12.3 Spot vs Derivatives Tax Treatment

Spot trades and perpetual futures are usually taxed as different categories of asset, even though to your bot they look identical. The treatment difference can be material — one is realised on disposal, the other is often realised continuously.

Spot

In most jurisdictions, spot crypto disposal is a capital gains event: you compute (proceed − cost basis) at the moment of sale. Holding period often matters — many jurisdictions distinguish short-term (taxed as ordinary income) from long-term (lower rate). What counts as a “disposal” is broader than people expect: selling crypto for fiat is obvious; swapping one crypto for another is also a disposal in most jurisdictions, as is using crypto to pay a fee. Spending it counts. Lending it sometimes counts.

Perps and Futures

Perpetual futures and dated futures are often treated as a separate asset class, frequently with mark-to-market treatment: at year-end, every open position’s unrealised P&L is treated as if realised, taxed in that year, and the cost basis is reset for the next year. Two operator-relevant consequences:

  • You can owe tax on positions you haven’t closed. A long that was up at year-end and reverses in January generates a tax bill you have to fund from cash on hand, not from the position.
  • Wash-sale rules and carry-back/carry-forward rules differ from spot. Losses that would offset gains in spot may behave differently for derivatives. This is jurisdiction-specific; ask the specialist.

Funding Payments

On perpetuals, you receive or pay funding at each interval. Treatment varies: some jurisdictions treat it as ordinary income / expense at each tick; others bundle it into the position’s P&L. The system needs to record every funding event regardless — treatment decisions belong to the tax preparer.

Fees and Stablecoin Transfers

  • Trading fees are typically deductible against gains as a cost of acquisition or disposal — track them per fill, not just as a monthly aggregate.
  • Stablecoin transfers between exchanges are usually not a disposal — same asset, different wallet. But timestamps, quantities, and on-chain hashes must reconcile, or the transfer can be flagged as an unaccounted disposal-and-acquisition pair.
  • Stablecoin-to-stablecoin swaps (USDT to USDC) are usually disposals, even though the value is stable, because they are different assets in tax terms.

You Understand This When…

  • You know your jurisdiction’s treatment of spot vs perp vs dated future
  • You understand whether perps are mark-to-market in your jurisdiction and what that means for your January cashflow
  • You record fees per fill and funding events per tick, not just net month-end totals
  • You can articulate which transfers are taxable disposals and which are not

12.4 Records You Need to Keep

The good news: the system you built already records everything a tax authority will ask for. The work is making sure those records are complete, immutable, and exportable in a form your tax preparer can actually use.

The Minimum Record Set

  • Every fill. Timestamp (UTC, to the second or better), symbol, side, quantity, price, fee, fee asset, the venue’s fill_id, your clientOrderId, the strategy that owned it.
  • Every funding payment. Timestamp, symbol, position direction, quantity, funding rate, payment in stablecoin (or coin, for inverse contracts).
  • Every deposit and withdrawal. Timestamp, asset, quantity, source/destination address, on-chain transaction hash where applicable, network used. The on-chain hash is what reconciles your records to the blockchain if the venue or your records ever come into question.
  • Every fiat ramp transaction. Bank transfer in or out, date, amount, currency, crypto purchased or sold, the rate used, the fees charged. Banks ask for this; tax authorities ask for this; you cannot reconstruct it after the fact.
  • Every transfer between your own wallets/venues. Same data as a deposit/withdrawal. These should not be taxable disposals, but they have to be visible in the audit trail or they look like undocumented disposals.

Immutability and Backups

Tax records are an append-only log. Once written, never edited — if a fill needs correcting, write a correcting entry, don’t mutate the original. The same backup discipline from Module 9.6 applies: continuous WAL streaming plus periodic encrypted snapshots, with at least one offsite copy. Tax records are also subject to retention requirements — many jurisdictions require 5–7 years of immutable history; some longer.

Don’t Trust the Venue’s Tax-Export Tool

Most venues offer a “tax export” button. Use it as a sanity check, never as your primary record. The reasons are non-negotiable: venues change export formats year-to-year (sometimes mid-year); venues lose history after retention windows expire; venues delist symbols and the data goes with them; venues have been known to fail entirely, taking their export tool with them. Your records must be venue-independent. Keep them in a vendor-neutral format you control — CSV, Parquet, your own database — and treat the venue’s export as a cross-check, not a source of truth.

You’re Done When…

  • Your fills, funding payments, deposits, withdrawals, and fiat-ramp transactions are all captured at the row level with full metadata
  • The records are append-only and backed up per Module 9.6 discipline
  • You can produce a year of records without depending on any venue’s export tool
  • You retain records for at least your jurisdiction’s required period

12.5 Practical Setup

The practical implementation is small: a tax-export view that joins the row-level records you already have, an annual export procedure, and a reconciliation gate that catches mismatches before your accountant does.

The tax_export View

Build a database view (or materialised table refreshed nightly) that joins fills, funding payments, deposits, withdrawals, and fiat ramps into one chronological event stream. Columns at minimum:

CODE · STRUCTUREscroll to read
timestamp_utc | event_type | venue | symbol | side | qty | price |
fee | fee_asset | counterparty_id | tx_hash | strategy | notes

Where event_type is one of fill, funding, deposit, withdrawal, fiat_in, fiat_out, transfer. The output is one append-only stream, sortable by timestamp, with every event in your trading history. From this view, any required tax report — capital gains schedule, income summary, fee deduction list — is a query.

Annual Exports

  • CSV is the lingua franca. Whatever else you do, the year-end deliverable is a CSV your accountant can open in their tooling. One file per event_type, or one master file with the type column — preferences vary.
  • Parquet for archive. CSV is fine for handoff but loses type fidelity at scale. Keep a Parquet copy of every annual export; it preserves dtypes, compresses well, and is portable across tools.
  • Generate the export deterministically. The same query against the same data must produce the same output every time. If your accountant comes back with a question in March, you should be able to regenerate the exact file you sent them in January.

The Reconciliation Gate

Before sending the export to your accountant, run a reconciliation:

  1. Sum realised P&L from your records — total proceeds minus total cost basis from disposals, plus funding income/expense, minus fees.
  2. Compare to the venue’s reported P&L for the same period. Some discrepancy is normal (the venue may bundle fees differently); large discrepancy is a flag.
  3. Compare to your own banking — total cash in minus total cash out plus current account value should approximate (cumulative realised P&L − tax paid). It will not match exactly because of unrealised P&L, but it should not be wildly off.
  4. If the numbers don’t reconcile, you have incomplete records. Find the gap before tax season, not during it.

Key Insight

If your tax export and your accountant’s computation disagree, one of three things is wrong: a record is missing, a method is being applied inconsistently, or you have misunderstood the treatment of one event type. All three are findable by reconciliation. None of them are findable by trusting the venue’s export tool.

You’re Done When…

  • You have a tax_export view or materialised table joining all event types
  • You can produce an annual CSV export with a single command
  • You have a reconciliation procedure that catches missing records before submission
  • Your last reconciliation matched (or you can explain every line item that didn’t)

12.6 Structures

The legal structure you trade through changes your effective tax rate, your asset protection, and your administrative overhead. The right structure is jurisdiction-specific and revenue-specific; this section is the framework, not the answer.

The Common Options

  • Individual. Simplest. No setup cost. Trade in your own name; report on your personal return. Full marginal-rate exposure, which becomes painful at higher income brackets. Limited asset protection — a personal lawsuit can reach the trading account.
  • Discretionary trust. Income earned by the trust can be distributed to beneficiaries (yourself, spouse, adult children, sometimes a corporate beneficiary) at the trustee’s discretion each year. This permits income-splitting where the lower-income beneficiary pays at a lower marginal rate. Common in jurisdictions that allow it; not all do. Setup cost is non-trivial; ongoing administration (separate accounts, separate returns, trustee resolutions) requires a competent accountant. Asset protection is meaningful in many jurisdictions.
  • Trading company. A separate corporate entity that trades and pays corporate tax on its profits. The corporate rate is often materially below the top personal marginal rate, so retained profits compound at a higher effective rate. The catch: most jurisdictions require you to demonstrate you are “carrying on a business” (regular activity, profit motive, business-like records) rather than passive investing — passive investment in a company is often treated punitively. Asset protection is strong; setup and ongoing compliance costs are higher.
  • Self-managed retirement vehicle. Long-horizon strategies inside a tax-advantaged retirement wrapper (the form varies by jurisdiction — SMSF, IRA, SIPP, etc.) can compound tax-deferred or tax-free for decades. Restrictions are heavy: contribution limits, rules on what you can invest in, prohibition on personal use, age-locked withdrawals. Best suited to slow, low-turnover strategies with clear tax advantages, not to high-frequency trading.

Don’t Choose This Yourself

Each structure interacts with your jurisdiction’s rules in ways that are not obvious. The wrong choice for your situation — e.g. a company that gets recharacterised as “not carrying on a business,” or a trust that doesn’t qualify for the income-splitting benefit you set it up for — can be more expensive than no structure at all. Talk to a specialist before earning material money. The cost of the consultation is a rounding error against the cost of choosing wrong.

You Understand This When…

  • You can describe the trade-offs of individual / trust / company / retirement-vehicle structures in your jurisdiction at a high level
  • You have a written rationale for the structure you chose, decided with a specialist
  • You revisit the structure question when revenue grows by an order of magnitude, not just at startup

12.7 Module Competency Checklist

Tax discipline isn’t about being a tax expert. It’s about having the records, the method, and the relationships that make tax season a one-day clean handoff instead of a multi-week reconstruction project.

The Bar

You’re ready when, asked at any point in the year, you can:

  • Produce a year’s P&L in a vendor-neutral format your accountant can use, in under an hour.
  • Articulate your jurisdiction’s treatment of perps vs spot, including whether perps are mark-to-market and what that means for your January cashflow.
  • Point to a written cost-basis method you have applied consistently across all symbols and all years.
  • Reconcile your own records against the venue’s — not because the venue is the source of truth, but because a discrepancy is a signal.
  • Show that you do not rely on any venue’s tax-export tool as your primary record.
  • Name your tax specialist and confirm they have explicit experience with crypto derivatives in your jurisdiction.

Key Insight

Tax is the unglamorous discipline that determines whether you keep what you make. A trader who clears 50% gross but loses a third of it to disorganisation, missing records, and worst-case-default tax positions has a worse net than a trader at half the gross with clean records and a sharp specialist. Build the records on day one.

You’re Done When…

  • You meet every item on the bar above
  • Your tax preparer has reviewed at least one full year of your records and confirmed they’re complete
  • You have an annual recurring task in your calendar — not a panic in March

Module 13

Operator Psychology & Discipline
7 sections · ~1.5 hours

13.1 The Drawdown Test

The system will draw down. It is not a question of whether; it is a question of when, and how deep, and how you behave while it’s happening. The drawdown is the test the rest of the playbook is preparing you for.

What the Drawdown Reveals

Up until the drawdown, “systematic” is just a posture. You backtest, you specify rules, you write a falsification suite, you put it on a server. The system makes money for a while. You feel virtuous. You haven’t been tested yet.

The test is the drawdown. The system is doing exactly what it was specified to do. The Monte Carlo distribution from Module 5.3 said this drawdown was within the 95% confidence band. The market is not broken; the strategy is not broken; the only question is whether you stay out of the way while the system finishes the recovery curve it was built to walk.

Most people don’t. They override. They “just close this one position because it doesn’t look right.” They “reduce exposure until things calm down.” They “temporarily turn off the strategy and watch.” Each of these is the moment the operator stopped being systematic. The drawdown didn’t cost them money — it was already costing them money on paper, and would have recovered. The override cost them the system itself.

War Story (Composite, Illustrative)

An operator runs a validated trend-following strategy through a deep drawdown — the system is at the worst point of its expected MC distribution but inside the band. At the trough, the operator manually closes the largest position because “it just feels wrong.” Two days later the position would have reversed and the system would have closed it for a small loss instead of a large one. The operator misses the recovery, sits in cash, and within a month is paper-trading their old strategy alongside three new ones “just to compare.” They never go fully systematic again.

The lesson is not “you missed gains.” The gains are recoverable. The lesson is that the operator now knows about themselves that under stress they will override the system — which means they don’t have a system, they have a tool they pick up and put down based on how they feel. The cost of breaking discipline once is the precedent that follows.

The Bar

You either trust your validated system more than your in-the-moment gut, or you don’t. There is no honourable middle position; the “hybrid” trap (Module 1.1) is the same trap dressed up. The drawdown is where you find out which side you’re actually on.

If the answer is “I don’t trust it,” the right move is not to override during a drawdown. The right move is to kill the strategy when it’s not in drawdown, with a clear head, and rebuild whatever was missing in your validation. Override-during-drawdown is the worst-case version of every decision; it is made under stress, with the loudest emotional input and the least information.

You Understand This When…

  • You can articulate why a validated strategy in drawdown is not a broken strategy
  • You understand that overriding during a drawdown is a decision about yourself, not the trade
  • You accept that the cost of breaking discipline once is the precedent it sets

13.2 The Three Override Modes

Not every manual intervention is illegitimate. Some are necessary; some are catastrophic. The skill is being able to tell, in the moment, which one you’re about to do.

The Three Modes

  • Engineering override. You discovered a real bug. The system is doing something it was not specified to do — a stop-loss isn’t firing, an indicator is computed on stale data, a strategy is duplicating positions. You halt to investigate and fix. This is legitimate, and the work is to define ahead of time what counts as “a real bug.” The lazy version — “the system did something I didn’t expect” — reclassifies any unwelcome valid behaviour as a bug. The disciplined version requires an objective discrepancy between specification and behaviour, demonstrated in code or logs, before you halt.
  • Risk override. Exposure has materially exceeded your risk budget — usually because correlated positions moved together more than the model assumed, or because a venue produced an unexpected fill. You cap exposure to bring it back inside the budget. This is legitimate only if the rule is pre-declared (“if portfolio gross exposure exceeds X% of equity, halve the largest position”) and applied mechanically. A “risk override” that’s improvised on the day is just an emotional override wearing a risk-management costume.
  • Emotional override. You don’t like the P&L. The position scares you. The drawdown is uncomfortable. The strategy hasn’t fired in weeks and you’re bored. None of these are legitimate reasons to touch the system. The whole point of building a systematic operation is to insulate execution from the operator’s in-the-moment emotional state — if the operator’s emotional state can override the system, the system isn’t systematic.

The Override Log

The discipline that makes this real: every manual override goes in a log, with a written reason, a timestamp, and a classification. Three columns plus a free-text reason field:

CODE · STRUCTUREscroll to read
timestamp_utc | classification        | action                   | reason
2026-04-12T14:33Z | Engineering         | halted strategy_X        | stop-loss didn't fire on fill at 14:31; logs show ...
2026-04-15T09:01Z | Risk                | reduced position to 50%  | gross exposure 142% of budget after correlated fills
2026-04-22T22:18Z | Emotional (logged)  | none taken               | wanted to close but recognised this is emotional

The third row is the most important kind of entry. Logging an emotional impulse you did not act on trains the muscle of recognising the impulse without acting. Over time, the proportion of (Emotional, action_taken) entries should fall to zero; the proportion of (Emotional, none_taken) entries should rise and then fall as the impulse itself fades.

The test of legitimacy: can you write a non-emotional reason? If the “reason” field reduces to “I don’t like how this looks,” the override is emotional. Don’t take the action.

Key Insight

The log is not bureaucracy — it is a structured pause. The act of writing “Engineering / Risk / Emotional” before the action forces the question. Three out of four times you reach for the keyboard in distress, the writing exercise itself reveals you don’t have a non-emotional case, and the override doesn’t happen.

You Understand This When…

  • You can name and define Engineering / Risk / Emotional override modes from memory
  • Every manual touch of the live system is logged with classification and reason
  • You have caught and logged at least one emotional impulse that you did not act on
  • You can define ahead of time what counts as “a real bug” rather than “unwelcome behaviour”

13.3 The Drawdown Protocol

A drawdown protocol is a contract you sign with your past self in writing, when your head is clear, that your future self — in distress, in the middle of a drawdown — agrees to honour. Pre-commitment is not a nice-to-have; it’s the only mechanism that beats in-the-moment emotional override.

The Three Pre-Committed Thresholds

Before going live, write down three numbers. Quantitative, specific, signed and dated. They will not be perfect — they don’t need to be. They need to exist:

  1. The investigation threshold. “If the strategy reaches drawdown level X, I halt new entries, leave existing positions to run their stops, and conduct a structured investigation: is the strategy operating inside its MC distribution? Is the regime materially different from the validation period? Has any input data changed?” The threshold should be calibrated to your strategy’s expected drawdown profile — deep enough that you don’t investigate every wiggle, shallow enough that you investigate before disaster.
  2. The kill threshold. “If the strategy reaches drawdown level Y, I close all positions, halt the strategy, and do not resume it without rebuilding the validation case from scratch.” This is the “the strategy is dead” threshold. Below it, you assume something fundamental has changed and the prior validation no longer applies. Y is materially deeper than X.
  3. The size-stays-fixed threshold. “If the strategy is performing above expectation by margin Z, I do not increase size on the basis of recent strength.” This is the under-discussed threshold. Most operators don’t need rules to prevent them from cutting a winner; they need rules to prevent them from upsizing a winner that’s fired five times in a row. Recency-bias upsizing is how a 1% strategy becomes a 5% strategy at the worst possible moment.

What “Pre-Specified” Actually Means

The thresholds must be quantitative, written, dated, and committed to in advance. Two failure modes to avoid:

  • Inventing the threshold during the drawdown. “I think 25% is my kill point” said at 23% drawdown is not a kill threshold; it’s a rationalisation gathering momentum. By 26% it will be 30%, and by 30% it will be 40%. Numbers chosen under stress drift.
  • Vague thresholds. “I’ll halt if things get bad” is not a threshold. “I’ll halt at -22% on the strategy’s own equity curve, measured against the all-time high of that strategy’s equity” is a threshold. Vagueness is where willpower goes to die.

Honouring the Contract

The protocol works on one principle: honour the contract with your past self even when your present self disagrees with it. Your past self chose those numbers in conditions of clear-headed analysis; your present self is in a drawdown. The past self has better epistemic access to the truth of the strategy than the present self does. Trust the past self’s numbers, not the present self’s gut.

If you find yourself wanting to renegotiate the contract during the drawdown, that wanting is itself diagnostic information — it tells you the protocol is doing exactly what it was built to do, which is sit between you and the worst version of your judgement. Honour it; renegotiate later, in writing, in non-stress conditions.

You’re Done When…

  • You have written, dated, and signed three quantitative drawdown thresholds before going live
  • The thresholds are visible somewhere you can’t pretend to forget — pinned in the dashboard, in the runbook, on a printout next to the screen
  • You have rehearsed the response action for each threshold (e.g. “run this script”) so it’s mechanical, not improvisational
  • You have not renegotiated the thresholds during a drawdown

13.4 Information Hygiene During Live Operation

Your system runs on its own clock. Your attention does not. The operator who watches the P&L tick by tick is not getting more information — they’re getting more emotional load on the same information. Hygiene matters.

Don’t Watch the Tick

Watching live P&L move minute by minute is corrosive even when the P&L is going up. The brain treats the equity curve as feedback — up feels good, down feels bad — and that feedback loop quietly shifts your relationship with the system away from “trust the validated process” and toward “feel the line.” The drawdown then triggers an emotional response disproportionate to its statistical significance, because you’ve been emotionally invested in every wiggle for weeks.

The discipline: read the dashboard once per day, at a fixed time, for a fixed duration. Pick a time that’s outside any major venue rollover or funding tick — mid-morning local works for most. Five to fifteen minutes. Look at the metrics that matter (current positions, current P&L, recent trades, watchdog status, any pending alerts), confirm the system is healthy, close the dashboard. Don’t graze.

Mute Alerts That Aren’t Action-Required

Module 9.5 covers the mechanics of alert routing; this is the operator-side complement. Every alert that fires without requiring an action is teaching your nervous system to ignore alerts in general. By the time the genuinely urgent alert fires — reconciliation hard-fail, drawdown threshold breach, exchange API blackout — you’ve already trained yourself to swipe it away with the rest.

The rule: every alert reaching your phone should require an action you would actually take in the next 15 minutes. Trade entries, daily PnL, “rate-limit recovered” — these belong in a dashboard or a daily-digest email, not on the lock screen.

Social Inputs During Drawdowns

Discretionary traders’ opinions on your validated systematic strategy, especially during a drawdown, are not signal. They are a hostile influence on your decision-making, even if the discretionary trader is a friend. Their emotional state is not yours, their information set is not yours, and their incentives during your drawdown are unaligned (they often want company in their pessimism).

Concrete moves: during drawdowns, mute or unfollow the noisy chat groups. Don’t doom-scroll instrument-specific Twitter. Don’t read Reddit threads about your strategy’s underlying instrument. The validated system is the system; its inputs are price, your indicators, and your gates — not other people’s panic.

Key Insight

Information hygiene is not stoicism cosplay; it is risk management. Every input you let into your decision loop during a drawdown is an input that can override the validated system. The validated system survives more than the operator’s real-time emotional state survives. Limit the inputs and the operator survives the drawdown alongside the system.

You Understand This When…

  • You read the dashboard at a fixed cadence, not on demand
  • Every alert reaching your phone requires an action you would actually take
  • You have a defined social-input protocol for drawdowns (mute lists, no instrument Twitter, etc.)
  • You can articulate why watching the tick is corrosive even in profitable periods

13.5 Incident Response When Emotional

The system has a bug. Money is on the line. You are stressed. This is the worst possible state in which to ship a fix — and it is exactly the state in which most fixes get shipped. Discipline is what stands between you and a worse bug than the one you started with.

The Checklist Beats the Hero

The temptation in an incident is to skip steps because you “know what’s wrong.” You don’t. You have a hypothesis. The hypothesis is contaminated by stress, by recency bias, by the symptom that’s loudest, and by the fix you would emotionally prefer to be the answer. Three out of four times you push a fix in this state, the symptom changes but the bug remains, and now there’s a second bug stacked on top.

The discipline is mechanical:

  1. Read the runbook. Module 9.5 specifies that every alert has a four-line runbook entry. Read it. The runbook was written when you were calm; your present self is not. Trust the runbook over the impulse.
  2. Verify the symptom. Reproduce it in logs. Confirm the bug is what you think it is, not just what you fear it is. Surprisingly often the “bug” is a misread of normal behaviour under unusual market conditions.
  3. Halt the system. If the bug is real and material, halt new entries before investigating. A halted system that’s safe is preferable to a running system you don’t trust.
  4. Investigate. Read the code. Read the data. Reproduce the bug in a non-production environment if possible.
  5. Fix. Write the fix. Tests for the fix. Run the tests.
  6. Deploy with blue/green. Module 9.6 covers the mechanics. Push the fix to the green instance; observe; only then promote.
  7. Document the incident. Write the post-mortem while it’s fresh: what happened, when, what you did, what you should have done, what changes to prevent it. The post-mortem is the longest-term ROI of the incident.

The Pair Voice

Stressed engineers ship typos. The single highest-leverage anti-pattern intervention is to not work alone during incidents. The pair partner does not need to be an expert — they need to be a second voice that asks “wait, what does that line do?” before you push it. A competent AI assistant in pair-programming mode counts; a human friend on a video call counts; the rule is just “not alone with the keyboard at 2am, scared.”

The pair’s job is to slow you down. They will catch the missing semicolon, the wrong sign on a comparison, the off-by-one in the time window, and the deployment-to-prod-instead-of-staging that you would otherwise make. Their cost is their attention; the saving is the bug they prevent.

The Anti-Pattern

“I know what’s wrong, let me push a quick fix.” This sentence, spoken at 3am during a P&L event, is responsible for more compounded losses in retail systematic trading than any single market move. The fix is rarely as quick as it sounds. The push is rarely as safe as it feels. If you find yourself saying it, the correct response is to halt the system, walk away from the keyboard for ten minutes, and then come back to the runbook.

You Understand This When…

  • Your incident response is checklist-driven, not hero-driven
  • Every alert has a runbook entry and you actually read it before acting
  • You do not push fixes alone during incidents — pair-with-AI or pair-with-human is the rule
  • You write a post-mortem after every material incident, even ones that resolved cleanly

13.6 Burnout

Running a 24/7 system you have to maintain is corrosive over time. The cost is not the work in any one week; it’s the cumulative absence of off-switch. Burnout in systematic trading is not a personal failing — it’s a system architecture failure, and it has system architecture solutions.

If You Can’t Walk Away, You Don’t Have a System

The blunt test: can you go on holiday for two weeks, leave the system running, and not check it? If the answer is no — if you have to dial in daily, if certain manual interventions only you can do, if there are alerts that only your judgement can resolve — you don’t have a systematic operation. You have a job. The job pays you, but it owns you.

The remedy is automation, not willpower. Anything you find yourself doing manually on a schedule should be automated. Anything that requires your judgement to resolve should either become a rule (and therefore automated) or be acknowledged as an unsystematic dependency that limits the strategy’s viability long-term. The system runs without you; the only role you should be filling on a daily basis is “person who reads the dashboard once and confirms it’s healthy.”

Schedule Deliberate Downtime

Running a system continuously for years requires deliberate periods where you are not the operator. Not just “I’m not at the keyboard” — properly off. Phone notifications muted (except for true Critical-tier alerts), dashboard not opened, mental model of the system not engaged. A weekly half-day, a monthly weekend, a quarterly week.

Build the system to make this safe: redundant alerting that reaches a designated backup contact (a paid service, an instrument-monitor alongside, anything that catches a true emergency without you), and a hard-coded rule that during your scheduled off-time the system reduces position size or halts new entries. Both can co-exist — the system trades through your weekend; the watchdog is loud enough that a true emergency reaches you anyway; routine alerts don’t.

Anxiety as a Sizing Signal

If you’re too anxious to sleep, the system is undersized for your psychological capital, even if it’s correctly sized for your financial capital. The fix is not “tough it out” — the fix is to reduce capital until you can sleep. Sleep-deprived operators make worse decisions during incidents, are more prone to emotional override, and burn out faster. The Sharpe of a strategy run by a rested operator is higher than the same strategy run by an exhausted one, holding everything else constant.

The ratchet works in both directions: as you live with the system through drawdowns, your psychological capital grows, and you can size up. But size up because you’re sleeping fine and have been for six months, not because the recent equity curve is flattering.

Key Insight

The systematic trader’s long-term P&L is bounded above by the number of years they can keep running the system. A strategy with a 30% CAGR run for two years before burnout produces less wealth than a 15% CAGR strategy run for fifteen years. Architect for endurance from the start. The slower you build the operating dependence on your daily presence, the longer the compounding window.

You Understand This When…

  • You can leave the system unattended for at least a week without any manual intervention
  • You have scheduled downtime in your calendar and you take it
  • You have correctly diagnosed an anxiety spike as a sizing signal at least once
  • You have a backup contact / monitoring layer that catches true emergencies during your off-time

13.7 Operator Competency Markers

Psychological discipline isn’t a vibe; it’s a checklist. The markers below are concrete, observable, and either present or not. If they’re not, the system isn’t safe yet — not because the code is wrong, but because the operator is.

The Markers

  • Written drawdown protocol thresholds. Three numbers, signed and dated, visible in your runbook.
  • Override log with classification. Every manual touch tagged Engineering / Risk / Emotional, with a written reason.
  • Hands-off endurance. Capable of leaving the system unattended for a week without your hands itching to intervene.
  • Fixed dashboard cadence. Reading the dashboard at a defined time and duration, not on demand and not in response to emotional spikes.
  • System-trust hierarchy. You trust your validated system more than your in-the-moment gut, and you have a clear, pre-specified rule for when the gut is allowed to override (Engineering or Risk only, with documented reason).
  • Pair-during-incidents discipline. You do not push live fixes alone under stress.
  • Sleep test. You sleep through drawdowns. If you don’t, you’ve already identified the next adjustment to make.

The Final Bar

The systematic trader who has built every other module in this playbook but skipped this one will, statistically, blow up. Not because the code was bad, but because the operator overrode the code. The discipline modules are the cheapest insurance you can buy; the cost is doing the writing exercises now, when nothing is on fire, instead of trying to do them in the middle of a 25% drawdown when nothing else feels stable.

You’re Done When…

  • You meet every marker on the list above
  • You have lived through at least one material drawdown without overriding the system
  • You can describe, from your own log, an emotional impulse you recognised and did not act on
  • You no longer think of these markers as “extra discipline” — they’re part of the system

You now have the complete methodology
for building an automated trading system.

What You’ve Learned

  • What an edge is and how to verify you have one (Module 0)
  • The mindset shift from discretionary to systematic (Module 1)
  • How to set up exchange accounts safely with proper risk architecture (Module 2)
  • How to build a clean data pipeline (Module 3)
  • How to develop strategies from hypothesis to testable signal (Module 4)
  • How to backtest rigorously with Monte Carlo, walk-forward, and sensitivity analysis (Module 5)
  • How to try to kill your strategy before it kills your account (Module 6)
  • How to size positions using leverage as capital efficiency, not amplification (Module 7)
  • How to architect and build the actual system (Module 8)
  • How to deploy, monitor, and operate it 24/7 (Module 9)
  • How to detect market regimes and adapt (Module 10)
  • How to continuously discover new edges (Module 11)
  • How to keep the records, structures, and reconciliation that make tax season clean (Module 12)
  • How to operate the system without overriding it — the discipline that determines whether you survive long enough to compound (Module 13)

This is not a signal service. This is not copy-trading.

This is the methodology for building your own system, validated against real data, hardened through adversarial testing, and deployed on infrastructure you control. The strategies in this playbook are examples of the process. The process itself is what you take away.

Every war story, every diagram, every falsification test came from building and operating a real system with real money. The expensive mistakes have already been made. This playbook is how you avoid making them again.

Disclaimer

General educational information only. The author is not a licensed financial advisor. Nothing in this material constitutes personal financial advice or a recommendation to trade. Past performance does not predict future results. Crypto trading carries substantial risk of total loss. Consider seeking advice from a licensed advisor in your jurisdiction before making any financial decisions.