Press Esc to close · ↑↓ to navigate · Enter to open
Not signals. Not copy-trading. Not “trust me bro.”
The methodology behind a system that actually runs.
14 modules. One complete trading system.
What this is NOT
This is not a signal service. We do not publish trades you can copy.
This is not financial advice. Nothing in this playbook is a recommendation to buy, sell, or hold any asset, or to adopt any specific strategy.
This is not a guarantee. Past performance does not predict future results. Most systematic strategies fail. Most edges decay. Building a system that makes money is hard, and building one that survives multiple regimes is much harder.
This is not safe. Crypto trading carries substantial risk of total loss of capital. Leverage amplifies that risk. Decide what you can afford to lose before you start, then never deploy more than that.
This is the methodology. The process for building, testing, and operating your own automated trading system. The strategies described are examples of the process; the process is what matters.
Before you write a single line of code or open a single exchange account, you need to understand the only thing that matters in trading: whether you have an edge. Everything else — the servers, the algorithms, the dashboards — is infrastructure for exploiting an edge. Without one, you are building a very expensive random number generator.
A casino doesn’t win every hand. It wins 51% of them. Over thousands of hands, that 1% compounds into a fortune. That’s an edge: a small, repeatable, statistically verified advantage that manifests over many occurrences.
In trading, an edge is the same thing. It is not a hot tip. It is not a pattern you saw once on a chart. It is a measurable, repeatable tendency in price behaviour that persists across hundreds of trades, survives transaction costs, and holds up when you try to destroy it with statistical testing.
Most retail traders do not have an edge. They have opinions. Opinions do not compound.
Here is the shape of a real edge, drawn from a category of system we operate in production. The numbers are deliberately qualitative — what matters is the profile, not any single point estimate:
| Metric | Value | What It Means |
|---|---|---|
| Strategy | A long-only weekly trend-following system | Goes long when a slow moving-average derivative turns positive, gated by a close-position filter and an efficiency-ratio gate |
| Win Rate | Below 50% | Loses more trades than it wins |
| Profit Factor | High single digits | Winners are several times larger than losers |
| CAGR | Strongly positive over a multi-year window | Annualised return over the lookback |
| Max Drawdown | Contained well under 25% | Worst peak-to-trough decline |
| Trades per Year | Single-digit annual frequency | Extremely low frequency |
Notice: it loses more than half its trades. A beginner would look at that and say the strategy is broken. But the winners are so much larger than the losers that the overall expectancy is strongly positive. This is typical of trend-following systems.
Key Insight
An edge is not about being right most of the time. It’s about the ratio of what you make when you’re right versus what you lose when you’re wrong, multiplied across hundreds of occurrences. A 40% win rate with 3:1 reward-to-risk is more profitable than an 80% win rate with 1:4 reward-to-risk.
You don’t guess. You test. Rigorously. This playbook will teach you how to:
If it survives all of that, you might have an edge. If it fails any test, you don’t — and that just saved you real money.
Most people think of a market as a chart going up and down. That’s like thinking of the ocean as a line on a depth gauge. The chart is the output. Understanding the machinery underneath it is what separates people who build profitable systems from people who draw lines on screens.
Every trade happens because two people disagree. One thinks the price is going up and buys. The other thinks it’s going down and sells. The mechanism that matches them is called the order book.
Think of it as two queues facing each other:
When you hit Market Buy, your order eats through the ask side from the best price upward. The more size you push, the higher the average fill price climbs — this is slippage.
Simplified order book. The spread is the gap between the best bid and ask. Slippage is the price impact of your order eating through multiple levels.
When you backtest a strategy, you see clean numbers: “buy at $95,080, sell at $95,500.” In reality:
A strategy that shows +2% per trade in backtesting might show +0.5% after costs — or negative. Always model costs. In this playbook, every backtest uses 25 basis points (0.25%) round-trip as a baseline.
War Story
One of our strategies looked brilliant in backtesting: strong positive returns, great Sharpe ratio. When we modelled funding rates properly, the edge vanished. The strategy was holding long positions during periods of elevated perpetual premium — exactly when longs are paying. We were paying roughly 0.05% every 8 hours on the wrong side of funding, which compounds: over a multi-day hold that’s 0.15–0.3%+ in funding alone, on top of fees and slippage. The backtest without funding showed +15% per trade. With funding accrued per-interval: roughly −2%. The strategy was killed before it ever touched real money. The lesson: funding is path-dependent, not a constant — you must accrue it interval-by-interval over each actual hold.
Markets are not static. They behave differently under different conditions:
| Regime | Behaviour | What Works | What Fails |
|---|---|---|---|
| Bull / Trending Up | Strong directional moves, shallow pullbacks | Trend-following, momentum | Mean reversion, shorting |
| Bear / Trending Down | Sharp drops, relief rallies, low confidence | Short-selling, defensive positions | Dip-buying, averaging down |
| Chop / Range | Sideways, no clear direction, fake breakouts | Mean reversion, range strategies | Trend-following (gets whipsawed) |
| High Volatility | Large candles, wide spreads, fast moves | Wider stops, smaller positions | Tight stops (get stopped out by noise) |
| Low Volatility | Small candles, tight ranges, compression | Breakout anticipation, patience | Most active strategies (not enough movement) |
The same strategy can have a Sharpe ratio of 3.0 in one regime and -1.0 in another. Module 10 covers how to detect regimes and adjust — or sit out entirely.
Key Insight
The single most common mistake in strategy development is building a system that works in one regime and deploying it into all regimes. A trend-following strategy built on 2020–2021 bull market data will be destroyed in a choppy sideways market. You must either build regime-aware strategies or accept that some periods will be flat or negative. Module 10 covers this in depth.
Previous: 0.1 What Is an Edge
A trading system is not a strategy. A strategy is one component. The system is everything — data collection, signal generation, risk management, execution, monitoring, and continuous improvement. All of it connected, all of it feeding back into itself.
Just like a business has a flywheel (traffic → leads → customers → reviews → more traffic), a trading system has one too. Every component feeds the next, and the system gets better over time.
candles, funding, open interest
clean, store, validate
signal generation, gates, filters
position sizing, stops, circuit breakers
exchange API, order handling, reconciliation
trades, PnL, fills
↻ Feedback loop — live results feed back into research:
• Performance data feeds into research
• Anomalies feed into strategy refinement
• Insights feed back into the top of the loop
The trading system flywheel. Each component feeds the next. Live results feed back into research, creating a system that improves over time.
Most aspiring system builders jump straight to the strategy: “I want to build a MACD crossover bot.” They write the signal logic, backtest it, see positive numbers, connect it to an exchange, and deploy. Then it fails.
It fails because they built the engine without the chassis, the brakes, or the dashboard:
This playbook builds the entire system, in the right order. Strategy comes after data infrastructure, risk management, and architecture — not before.
War Story
Our bot once thought it was holding a short position for 6 days. It wasn’t. The entry order had failed and returned a permanent error, but the state management code didn’t roll back properly. The bot was trailing a stop on a phantom position — managing nothing. It would have continued indefinitely if we hadn’t built a reconciliation layer that checks the exchange’s actual state against the bot’s internal state every hour. Module 8 covers exactly how to build this.
Previous: 0.2 How Markets Actually Work
This is not a motivational scare tactic. It’s a diagnostic. If you understand precisely why most traders lose, you can build a system that avoids every failure mode. This section maps the failure modes so the rest of the playbook can address each one.
“BTC looks like it’s going up.” That is not a strategy. That is a feeling. Without a statistically validated edge, every trade is a coin flip minus fees. Over hundreds of trades, fees guarantee you lose. Most retail traders have never backtested a single idea. They trade on pattern recognition, gut feeling, or someone else’s signal. Module 4 solves this.
You test 200 parameter combinations and pick the one with the best returns. Congratulations — you’ve curve-fitted to historical noise. The strategy worked perfectly on data it was designed to fit and will fail on everything else. This is the single most common technical mistake. Modules 5 and 6 solve this.
A $500 account with 50x leverage and no risk management means a 2% adverse move wipes you out. The strategy could be excellent, but if one bad trade takes 100% of your equity, the strategy never gets to prove itself over hundreds of trades. Module 7 solves this.
The strategy was developed during a bull market. It worked because everything went up. Now the market is ranging sideways and the strategy is getting chopped to pieces on false signals. The trader thinks the strategy broke. It didn’t — the market changed. Module 10 solves this.
The bot says sell. You think “but it’ll come back.” You override the system. It doesn’t come back. This is why systematic beats discretionary for most people: the system doesn’t have emotions, FOMO, or ego. Module 1 covers the philosophical foundation for trusting the system.
Fees, funding rates, slippage, and spread. A strategy that trades 50 times a day and makes 0.05% per trade sounds profitable until you realise fees are 0.06% per trade. You are literally paying the exchange to lose money. Module 5 covers cost modelling.
The bot runs on a laptop. The laptop goes to sleep. The bot crashes. There’s no monitoring, no health checks, no alerts. An unmanaged position sits on the exchange for 12 hours. Module 9 covers deployment and operations.
The Good News
Every single failure mode listed above is solvable. That’s what this playbook is — a systematic solution to each one, in order. If you follow the modules sequentially and do the work, you will avoid the mistakes that destroy 95% of retail traders.
There are two ways to trade: make decisions yourself, or build a system that makes decisions for you. This section explains why this playbook is entirely about the second approach — and why the first approach fails for most people.
A discretionary trader looks at charts, reads news, considers context, and makes a decision: buy, sell, or do nothing. Every trade is a judgment call. The best discretionary traders in the world — the ones running hedge fund desks — are genuinely talented. They have spent 10,000+ hours reading order flow, developing intuition, and making real-time decisions under pressure.
You are probably not one of them. Neither am I. And that’s fine, because:
A systematic trader builds a set of rules. The rules are explicit, deterministic, and testable. “When X happens, do Y. When Z happens, do W.” The system executes those rules without deviation. The human’s job is to design and validate the rules, not to execute them.
| Discretionary | Systematic | |
|---|---|---|
| Decision maker | You, in real time | Code, based on tested rules |
| Testable | No (every decision is unique) | Yes (backtest across years) |
| Emotional influence | High (fear, greed, FOMO) | Zero (code has no feelings) |
| Scalable | No (limited by your attention) | Yes (runs 24/7 on a server) |
| Repeatable | No (you’ll make different decisions on different days) | Yes (same input = same output, every time) |
| Improvable | Slowly (requires experience) | Measurably (change a rule, re-test) |
| Skill required | Deep market intuition (rare) | Systems design + data analysis (learnable) |
Key Insight
Systematic trading converts the problem from “be a great trader” (which requires rare talent) into “be a great engineer” (which requires discipline and methodology). If you are someone who thinks in systems, processes, and rules — this is your domain.
Many people attempt a hybrid: build a system, but override it when they “feel” like the market is going to do something different. This is the worst of both worlds. You get the complexity of a system with the unreliability of discretion. The system was validated on data. Your override was validated on nothing. If you build a system, trust the system. If you don’t trust it, fix it — don’t override it.
Every system you build should have a single document that defines every constraint, every behaviour, and every authority. If it’s not in the spec, it doesn’t exist. This is the foundation of building systems that are deterministic, auditable, and reproducible.
A canonical specification is a single, authoritative document that describes your entire trading system. It contains:
The system we built has a canonical specification of approximately 4,000 lines. Every line of code traces back to a line in the spec. Nothing is assumed. Nothing is inferred.
Without a spec, you have “tribal knowledge” — the system works because the person who built it remembers what it’s supposed to do. That’s fine until:
War Story
Our backtest results shifted materially — double-digit-percent swing in headline metrics — when we accidentally switched from Monday-start weeks to Sunday-start weeks. Same strategy, same data, same indicators — but the moving-average values were different because the weekly close fell on a different day. This is the kind of thing that a canonical specification prevents: it locks down “weeks start on Monday, UTC, and here is the exact pandas resampling code.” No ambiguity. No drift.
The spec operates on a closed-world assumption: only things explicitly declared in the spec exist. If a behaviour is not specified, it is forbidden. If a parameter is not defined, it does not have a default — it is an error.
This sounds rigid. It is. That’s the point. Trading systems that “kind of work most of the time” will “kind of fail” at the worst possible moment. Rigidity in specification produces reliability in execution.
Practical Advice
You don’t need to write 4,000 lines on day one. Start with a one-page document for your first strategy: entry rule, exit rule, position size, stop-loss, data source, and timeframe. Then expand it as you discover edge cases. The spec grows with the system. The important thing is that it exists and is the single source of truth.
Previous: 1.1 Discretionary vs Systematic
In this framework, data wins every argument. No matter how elegant the theory, if the backtest shows negative returns, the theory is wrong. This section establishes the epistemological foundation: we believe what the data shows, not what we think should be true.
Someone on Twitter says: “BTC tends to rally on Tuesdays.” Sounds suspect. But instead of dismissing it or believing it, you test it:
The typical outcome of an investigation like this is: any apparent effect is small, often disappears once you condition on volatility regime, and is not tradeable standalone. The theory isn’t crazy — it just usually isn’t strong enough. You only know by testing.
Key Insight
The correct response to any trading claim is not “that’s stupid” or “that makes sense.” The correct response is: “show me the backtest.” Every claim is a hypothesis. Every hypothesis can be tested. This playbook teaches you how to test any claim against real data in hours, not weeks.
Every time you hear a trading idea — from Twitter, a podcast, a friend, or your own intuition — run it through this loop:
Module 4 walks through this in full detail with real examples.
Previous: 1.2 The Canonical Specification
The most dangerous moment in system development is when the backtest shows positive results. That is when most people stop testing and start deploying. That is also when the real work begins: trying to destroy your own strategy.
Confirmation bias is the tendency to find evidence that supports what you already believe. In trading system development, it manifests like this:
The adversarial mindset flips this: your default assumption is that the strategy does NOT work. Your job is to try to prove it does, and if you can’t break it after six different types of attack, then — tentatively — it might be real.
Module 6 covers these in full detail, but here is what you are going to throw at every strategy:
| # | Attack | What It Tests |
|---|---|---|
| 1 | Parameter Robustness | Move every parameter by ±10-20%. Does the edge survive? |
| 2 | Out-of-Sample Holdout | Test on data the strategy has never seen |
| 3 | Regime Stability | Does it work in bull, bear, AND chop markets? |
| 4 | Cross-Venue Transfer | Does it work on a different exchange’s data? |
| 5 | Placebo / Random Baseline | Is it better than random entry at the same frequency? |
| 6 | Time Stability | Does it work in the first half AND the second half of the data? |
If a strategy fails any of these, it goes back to the lab or gets killed. There is no “well, it mostly works.”
War Story
We spent two weeks investigating whether positioning footprints could produce a tradeable edge. After multiple phases of analysis, hundreds of trades across roughly a dozen candidate signals, and six falsification tests on each: one signal survived — a derivatives-driven contrarian setup, hit rate above 60% and profit factor above 1.5. But its p-value sat above the 0.05 line — not statistically significant. The honest verdict: “direction of evidence is positive but not strong enough to trade standalone.” We could have ignored the stats and deployed it. Instead, we shelved it as an overlay filter and moved on. That discipline is what keeps you alive.
Previous: 1.3 Empirical Over Theoretical
Next: 2.1 Choosing an Exchange
Your exchange is the foundation of your entire operation. It holds your money, executes your trades, and provides your data. Choosing the wrong one can cost you everything — not from bad trades, but from the exchange itself. This section covers how to evaluate exchanges and what to watch out for.
| Criterion | Why It Matters | How to Check |
|---|---|---|
| Security track record | Has the exchange been hacked? How did they respond? | Search “[exchange name] hack” — look for hot wallet breaches, user fund freezes |
| API quality | Your bot communicates via API. A bad API means bad execution. | Read the API docs. Check rate limits, WebSocket support, error handling |
| Liquidity | Low liquidity = high slippage = worse fills | Check 24h volume on your trading pair. Compare bid/ask spread. |
| Fee structure | Maker vs taker fees directly impact your edge | Maker 0.01–0.02%, Taker 0.04–0.06% is competitive for crypto perps |
| Deposit/withdrawal handling | Can you get your money out? What happens with wrong-network deposits? | Test with a small amount first. Always. |
| Regulatory status | Some exchanges aren’t available in your jurisdiction | Check ToS for your country. Verify KYC requirements. |
War Story
We ran two trading containers on BingX. The September 2024 security incident — widely-reported losses in the $43–52M range [Yahoo Finance] [Bloomberg Law] [CoinDesk] [DL News] — was already on the public record when we onboarded the venue, and we accepted the residual custody risk knowing the history.
We later abandoned the venue entirely. One driver was BingX’s own published deposit-recovery policy: BingX states that it “generally does not provide token or coin recovery service” for wrong deposits to its addresses, with assistance offered only “at its sole discretion” for significant losses [BingX policy]. Combined with the post-hack handling already on the public record, that policy stance — custody-controlled funds, exchange-controlled discretion to return them — falls below the bar we set for venues holding our capital.
Compare that to venues like Binance, Bybit, and Bitget, which publish self-service or structured recovery workflows for incorrect deposits [Binance] [Bybit] [Bitget]. The fact that some venues build recovery as a default and others build non-recovery as a default is a real, observable, due-diligence dimension. We migrated to venues whose published commitments matched our standard.
The September 2024 incident itself, and BingX’s framing of a $43M+ loss as a “minor asset loss,” is illustrative. Read the public record before you onboard a venue; trust your operational experience when you’re already there.
Your exchange due diligence is part of your trading risk, not separate from it.
Critical: Test Deposits First
Before sending any meaningful amount to a new exchange, send a small test deposit ($10–50) and immediately withdraw it. Verify the round trip works. Check that deposits credit correctly and withdrawals arrive. We lost $183 USDT by sending to the wrong token deposit address on an exchange that charged $200 to recover it. Yes — we paid more to recover the funds than the funds were worth. Always verify you are on the correct TOKEN deposit page, not just the correct network.
Previous: 1.4 The Adversarial Mindset
This section explains the three account modes available on crypto exchanges, what each one means for your risk, and when to use which. We teach them in order of complexity: spot is simplest, isolated is the next step (capped loss = posted margin only), and cross-margin is the most advanced and most dangerous (uses your entire account as collateral). Getting this wrong is how people wake up to a liquidated account.
You buy BTC with USDT. You now own BTC. If BTC goes to zero, you lose what you paid. You cannot lose more than you invested. There is no leverage, no liquidation, no margin calls. This is the simplest and safest mode. Use this for: long-term positions, conservative strategies, SMSF/retirement accounts, set-and-forget systems.
Each position has its own separate collateral. You decide how much margin to assign to each trade. If that position gets liquidated, only the assigned margin is lost — the rest of your account is untouched.
When to use: When running multiple strategies simultaneously, or when you want to contain the blast radius of any single trade. You might allocate $50 of margin to one position and $100 to another. If the first gets liquidated, you lose $50. The second position and the remaining account balance are unaffected. This is the natural next step beyond spot — you opt into leverage, but the maximum loss per trade is capped at the margin you posted.
Your entire account balance is collateral for every position. If you have $10,000 in the account and open a leveraged position, the exchange can use all $10,000 to keep your position alive. If the position moves against you far enough, your entire account gets liquidated — not just the margin allocated to that trade.
When to use: When you are running a single strategy on a dedicated account with robust risk management (stops, circuit breakers). The advantage is that your position can survive larger adverse moves without liquidation, because it has more collateral. The disadvantage is that a catastrophic failure liquidates everything — this is the most advanced mode and the most dangerous.
No leverage. No liquidation. Simplest. Safest.
Each position carries its own collateral. Blast radius is contained.
One bad trade can wipe the entire account.
The three account modes, in order of complexity. Spot is safest but offers no leverage. Isolated margin lets you control the blast radius of each position independently — max loss per trade is the margin you assigned. Cross-margin gives maximum collateral but maximum risk: a single liquidation can wipe the entire account.
Critical Decision
If you are starting out: use isolated margin. It forces you to think about how much you are willing to lose on each trade, and it prevents a single catastrophic trade from destroying your account. Move to cross-margin only when you have a battle-tested risk management layer with exchange-side stop-losses, circuit breakers, and a reconciliation system that verifies your actual exposure every hour.
Previous: 2.1 Choosing an Exchange
This is possibly the most misunderstood concept in crypto trading. Most people hear “50x leverage” and think “50x risk.” That is one way to use it — the way that blows up accounts. There is another way, and it’s the foundation of how professional systems use leverage.
Account balance: $500. Leverage: 50x. The gambler thinks: “I can now control $25,000 worth of BTC.” They open a $25,000 position. BTC moves 2% against them. That’s $500. Their entire account is gone.
This is leverage used as amplification. It amplifies gains and losses equally. A 2% market move becomes a 100% account move. The gambler is one bad candle from zero.
Same account: $500. Same leverage setting: 50x. But the engineer uses it differently:
The leverage setting enables the system to take precisely-sized positions with minimal margin consumption. The strategy’s built-in risk management (stop-losses, circuit breakers) ensures you are never exposed to the full notional.
Critical: Liquidation vs Stop-Loss
The naive view — “at 50x with isolated margin, a 5% adverse move only loses $17.50” — is dangerously wrong. At 50x leverage with isolated margin and a typical maintenance margin of ~0.5%, your liquidation price is roughly 1–2% away from entry. A 5% adverse move would liquidate the position long before any 5% calculation matters.
The rule: choose leverage low enough that the liquidation price sits well beyond your intended stop-loss, with a buffer for worst-case slippage and funding accrual. A 2% stop demands a liquidation price meaningfully further away — for that, you want low leverage on a small notional, not high leverage on the same notional.
Liquidation price (long, isolated, simplified):
liq_price ≈ entry × (1 - 1/L + MMR)
where L is leverage and MMR is the maintenance margin rate for your tier. At L=50, MMR=0.5% → liquidation roughly 1.5% below entry. At L=5, MMR=0.5% → roughly 19.5% below entry — which gives a 2% stop ample room.
Reframe: leverage is capital efficiency only if you have explicitly engineered liquidation safety. Otherwise it is amplification with extra steps.
Mark Price vs Last Price
Liquidations on most major venues are computed against the mark price, not the last traded price. Mark price is derived from a basket of spot prices (and/or a fair-value formula) specifically to prevent single-exchange wicks from triggering cascade liquidations. This is a feature, not a bug — it protects you from getting liquidated by a 0.5-second spike on one venue.
| The Gambler | The Engineer | |
|---|---|---|
| Account | $500 | $500 |
| Leverage | 50x | 50x |
| Position size | $25,000 (max) | $350 (strategy-determined) |
| Margin used | $500 (100% of account) | $7 (1.4% of account) |
| Stop-loss | “I’ll watch it” | 2% from entry, exchange-side |
| Max loss per trade (stop honoured) | $500 (entire account) | $7 |
| 2% adverse move | Liquidated | Stopped out at -$7 (liquidation engineered to sit well beyond the stop) |
| Drawdown after 10 consecutive losses (1% risk) | N/A — already wiped | ~9.6% drawdown ((1-0.01)^10) |
| Strategy validation | “Worked last time” | 10,000 Monte Carlo simulations |
Key Insight
Leverage is a tool for capital efficiency, not risk amplification. A 50x leverage setting does not mean you take 50x more risk. It means you can take the same position with 50x less capital locked up as margin. The risk is determined by your position size and your stop-loss, not by the leverage multiple. The leverage just determines how much collateral the exchange requires.
When a strategy has been validated with:
…the probability of a single trade wiping out the account is effectively zero. The strategy has been designed to survive adverse moves. The leverage just means you don’t need to lock up $350 of your $500 account as margin for a $350 position. You lock up $7 instead, keeping $493 available for other strategies or as a safety buffer.
Warning
This approach ONLY works when the strategy has been rigorously validated AND has exchange-side stop-losses. “My bot has a stop-loss” is not enough. Bots crash. Servers go offline. Network connections drop. The stop-loss must be placed as an exchange-side order so that even if your bot is completely dead, the exchange will close the position at your predetermined loss level. Module 9 covers how to implement this.
Previous: 2.2 Spot vs Isolated vs Cross-Margin
Next: 2.4 API Keys & Security
Your API key is the connection between your trading bot and your money. Set it up wrong and someone else controls your account. This section covers how to create, secure, and manage API keys properly.
An API key is a pair of strings — a key (public identifier) and a secret (private password) — that allows your bot to interact with the exchange on your behalf. Some exchanges also require a passphrase as a third component.
Before creating any API key, ensure your exchange account has two-factor authentication enabled. Use an authenticator app (Google Authenticator, Authy), not SMS — SIM swapping attacks can bypass SMS 2FA.
Only enable the permissions your bot needs. For a trading bot: Read (to check positions and balances) and Trade (to place and cancel orders). Never enable Withdrawal permission on a trading API key. If the key is compromised, the attacker can trade but cannot steal your funds.
Most exchanges allow you to restrict an API key to specific IP addresses. Always do this. Set it to the IP address of the server your bot runs on. If the key leaks, it can only be used from your server’s IP.
Store your API secret in an environment file (.env) on your server, not in your code. Never commit API keys to git. Add .env to your .gitignore. If you accidentally commit a key, rotate it immediately — git history retains deleted files forever.
War Story
We IP-whitelisted our server’s IPv4 address on the exchange. Orders kept failing with a cryptic “IP not whitelisted” error. After 3 days of debugging, we discovered the exchange’s API endpoint had inconsistent IPv6 routing — our server was sometimes resolving to and connecting over IPv6, a completely different address the exchange had never seen. The correct fix is to pin outbound traffic to IPv4 for that specific destination only — either via a route in the routing table, an /etc/hosts entry forcing the v4 record, an outbound firewall rule, or an HTTP client option. Do not globally disable IPv6 system-wide as a reflex — that degrades every other service on the box (DNS, package mirrors, monitoring) for an exchange-specific quirk. The lesson: verify which address family your server actually uses for outbound connections to each destination, and scope the fix to that destination.
.env file, not in code.env is in .gitignorePermissions, IP whitelist, and a hidden .env are the floor, not the ceiling. The following are the recurring failure modes we’ve seen across venues — the ones that fail at 03:00 with cryptic error codes and lose you a fill.
Every signed request you send is timestamped. The venue rejects requests whose timestamp is too far from its own clock — typically a window of a few hundred milliseconds to a few seconds. If your server clock drifts, you will see “invalid timestamp,” “recv window” or “request expired” errors. The fix is non-negotiable: install chrony or ntpd, point it at multiple stratum-2 sources, and verify drift stays under 500ms against your venue’s server time. Many venues expose a server-time endpoint — sample it on startup and alert if local-vs-venue offset exceeds your tolerance. Clock drift is silent: nothing tells you the box is drifting until orders start failing.
If your venue requires a nonce, it must be monotonically increasing across the lifetime of the API key. Most implementations derive the nonce from time.time_ns() or millisecond-epoch. This works until the clock steps backwards (NTP correction, VM migration, daylight-savings on a misconfigured box) and your next nonce is smaller than the last one the venue saw. The venue rejects every subsequent request from that key until you rotate it. Mitigation: persist the last-used nonce to disk and always emit max(persisted_nonce + 1, current_time_ns).
The signed string is a recipe and the recipe is fussy. Common ways to silently produce an invalid signature:
When debugging signature failures, log the exact pre-signature string (with secrets redacted) and compare against the venue’s reference example character-by-character.
Every order-submission request must carry a clientOrderId that is deterministic from the underlying intent (see Module 8.3). On a network timeout you do not know whether the venue received and processed your order, so you retry. The deterministic ID lets the venue dedupe: if it already saw that ID, it returns the existing order rather than creating a second one. Without this, a retry on a 504 can give you a doubled position.
HTTP 429 (or the venue-specific equivalent) is the venue telling you to slow down. Treat it with care:
min(cap, base * 2^attempt) + uniform(0, jitter). Jitter prevents a fleet of containers all retrying at the same instant after a transient outage (the “thundering herd”).Retry-After: if the response includes a Retry-After header, use that as the floor; never retry sooner..env file that is gitignoredRetry-AfterclientOrderId for idempotent retriesPrevious: 2.3 The Leverage Misconception
Next: 2.5 Fiat On/Off Ramps
Getting money into and out of crypto exchanges is surprisingly frustrating. Banks block transfers, exchanges have hidden fees, and sending to the wrong address can lose your funds permanently. This section covers how to set up reliable fiat rails.
For most countries, the best approach is:
Critical: Deposit Address Traps
Exchanges have separate deposit addresses per token. If you go to “Deposit BNB” and get a BSC address, then send USDT to that address on the same BSC network — the exchange may consider those funds “lost” even though they arrived at an address the exchange controls. Policies vary: some venues auto-credit, some charge a flat recovery fee (often well into the hundreds of dollars), and some declare such transfers unrecoverable entirely. Read your specific venue’s policy before sending anything material. Always verify: correct TOKEN page, correct NETWORK, correct ADDRESS.
Reverse the process: consolidate USDT to your fiat exchange, sell via limit order on the trading pair, withdraw fiat to your bank. Key points:
Previous: 2.4 API Keys & Security
Your trading P&L is denominated in stablecoins, your collateral is parked at venues, and both can fail. Most operators ignore this until the day it matters — at which point ignoring it costs them a quarter of their account or more. Treat the stablecoin you hold and the venue that holds it as risk exposures, not as cash.
A stablecoin is a promise. Different stablecoins make that promise via different mechanisms, and each mechanism has its own failure mode. The major fiat-pegged stables fall roughly into three buckets:
Every major stablecoin you can name has depegged at some point in its history. A 2% depeg is annoying. A 5% depeg on a portfolio that’s 70% in that stable, with positions sized off the assumption that collateral is dollar-stable, is a quarter of your year’s P&L gone in a weekend.
Diversify, but Diversify Across Mechanisms
Holding three stablecoins that all share the same failure mode (fiat-custodied, all backed by the same banks) is not diversification. If the failure is regulatory or banking-system, all three move together. Real diversification means picking stables with different reserve mechanisms: e.g. one fiat-custodied at a top issuer plus one crypto-over-collateralised. The point is that the same shock cannot take both out simultaneously.
Even if your stablecoin is sound, the venue holding it is a separate exposure. Exchange failures of the past decade share a common pattern: the failure is preceded by withdrawal slowdowns, then withdrawal pauses, then declarations that funds are “safe” while internally the venue scrambles. By the time the failure is public, withdrawing has been impossible for days.
The defensive posture, in priority order:
Don’t over-engineer this. A back-of-envelope risk-score per venue, refreshed quarterly:
Per-venue risk score, refreshed quarterly. Each term is a discipline you can score honestly; the weighted sum is the artefact you compare against past versions of itself when concentration starts to creep up.
You will not get this exact for any venue — that’s fine. The exercise of writing the score down forces you to be specific about why you trust a venue, instead of trusting it because you’ve been there a while. When the score drops, reduce exposure before you have to.
Key Insight
The stablecoin and the venue are two separate counterparty risks layered on top of each other. A 5% stablecoin depeg while you’re fully collateralised in that stable, on a venue that’s simultaneously slowing withdrawals, is the worst-case scenario — and it has happened. Treat each layer independently: pick stables that survive different shocks, pick venues that hold a bounded fraction of your capital, and do not let success at one venue tempt you to concentrate there.
Previous: 2.5 Fiat On/Off Ramps
Wrong-rail deposits — sending USDT on TRC20 to an ERC20-only address, depositing an unlisted token, picking the wrong network in a withdrawal flow, forgetting a memo on an XRP send — happen constantly to active traders. The funds usually arrive at exchange-issued infrastructure and remain visible on-chain. The question is what the exchange does next. The answer is observable in their published policy before you ever onboard.
Active traders move money between venues, between networks, and between account types continuously. The mistakes are predictable: an address copied from an ERC20 deposit screen pasted into a TRC20 withdrawal flow; a token sent to a venue that doesn’t list it; a deposit fired into the wrong sub-account at the same exchange; a memo-required chain (XRP, XLM, ATOM) sent without the memo. In nearly all of these cases, the funds do not vanish. They arrive at deposit infrastructure that the exchange or its custody provider controls. They are visible on-chain. They are credited to some address inside the venue’s wallet topology, even if not to your account. What happens next is a policy decision, not a blockchain inevitability.
A venue that publishes a self-service recovery flow is treating your asset as your asset that landed in the wrong slot. A venue that publishes “generally not recoverable” is treating the same situation as your loss to absorb. Same asset, same on-chain reality, same custody footprint — two completely different ethical commitments. This is not a theoretical distinction. It is written down, in each venue’s own help-centre articles, in language that you can read in five minutes before you ever fund an account.
Most retail traders only discover their venue’s policy on this after making a wrong deposit. By then it is too late. The remedy is to read the policy first and let the answer inform where you concentrate capital.
For each venue you are considering, search their support centre for explicit policy on each of the following:
The exact answers will differ. The point is that the answers exist, are written down, and are searchable. A venue’s answers to these four questions form a tier-of-platform criterion you can apply before depositing a single dollar.
A strong-posture venue publishes some combination of:
Examples (verified from the venues’ own published policy):
OKX sits a step behind these three but still publishes a recovery route — a support-ticket process plus an “untradable assets” withdrawal path for some cases [OKX].
A weak-posture venue publishes some combination of:
Examples (verified from the venues’ own published policy):
| Venue | Posture | Published Stance (paraphrased from each venue’s own help-centre) |
|---|---|---|
| Binance | Strong | Self-service retrieval flow plus wrong-deposit FAQ |
| Bybit | Strong | Documented Unsupported-Deposit Recovery Procedure |
| Bitget | Strong | Self-service refund for unlisted-coin / wrong-blockchain |
| OKX | Moderate | Support-ticket recovery plus untradable-assets withdrawal route |
| Kraken | Weak-to-moderate | “Likely non-recoverable” for unsupported networks, plus a recovery guide |
| BingX | Weak | “Generally does not provide” recovery; assistance “at sole discretion” |
| Swyftx | Weak | Wrong-network deposits may result in “permanent loss” |
Read each venue’s own page yourself before you trust this table — policies change. The discipline is the read, not the snapshot.
When funds land at exchange-issued infrastructure and are visible on-chain under custody the exchange controls, refusing to return them is not a blockchain inevitability — it is a policy decision. Some venues choose to design for the customer (build a recovery flow as the default); some venues choose to design for retention (build non-recovery as the default and offer discretionary exceptions). Both choices are observable in the published policy. Both choices tell you something durable about how the venue treats your funds when an edge case fires.
The fact that Binance, Bybit, and Bitget publish recovery workflows proves that wrong-rail deposits are operationally recoverable in many cases. A venue that declines to operate that workflow is making a deliberate trade-off — lower operational cost to them, higher loss-absorption by you. That is information you are entitled to before you fund an account.
Practical Checklist
Previous: 2.6 Stablecoin & Counterparty Risk
Next: 3.1 What Data You Need
FREE ACCESS
No payment. No spam. Just the complete playbook — data infrastructure, strategy development, backtesting, falsification, position sizing, deployment, and continuous improvement.
We respect your privacy. Unsubscribe anytime.
Your trading system is only as good as the data it runs on. This section covers the types of market data available, what each is used for, and what you need at minimum to build a working system.
Not all data is equal. Here’s the hierarchy from essential to advanced:
| Level | Data Type | What It Is | What It Enables |
|---|---|---|---|
| Essential | OHLCV Candles | Open, High, Low, Close, Volume for each time period | Indicators, backtesting, basic strategies |
| Important | Funding Rates | Periodic payments between longs and shorts on perpetual futures | Cost modelling, sentiment indicators, crowding signals |
| Important | Open Interest | Total number of outstanding futures contracts | Positioning analysis, crowding detection |
| Advanced | Bid/Ask Spread | Difference between best buy and sell price at any moment | Accurate slippage modelling, market quality assessment |
| Advanced | Liquidation Data | Forced closures of leveraged positions | Cascade detection, extreme event signals |
| Expert | Order Book Depth | Full list of resting orders at each price level | Liquidity analysis, support/resistance identification |
| Expert | Tick/Trade Data | Every individual trade that occurs | Tape reading simulation, market microstructure analysis |
Start with OHLCV candles. You can build your first strategy, backtest it, and deploy it with nothing else. Add funding rates and open interest when you want to model costs properly and explore positioning-based signals. The rest comes later.
Candles come in different timeframes. Each serves a different purpose:
More timeframes = more data = more storage = more maintenance. Start with daily and weekly. Add smaller timeframes only when your strategy requires them.
Key Insight
The data sophistication ladder is: mid-price candles → bid/ask candles → tick data → full order book. Each level adds cost and complexity. Most profitable systematic strategies we’ve tested work on simple OHLCV candles. Don’t over-engineer your data infrastructure before you’ve proven a strategy works on basic data.
Previous: 2.7 Wrong-Rail Deposit Policy
Next: 3.2 Where to Get It
Every crypto exchange provides free historical data through their API. The challenge is not finding data — it’s fetching it reliably, handling pagination, and managing rate limits.
| Source Type | Data Available | Typical History Depth | Rate Limits |
|---|---|---|---|
| Tier-1 perpetual venues (CEX) | OHLCV, funding, OI | ~2017–present, varies by venue | Hundreds to ~1000+ req/min |
| Tier-1 spot venues | OHLCV, trades, order book snapshots | Often the deepest crypto history | Generally generous |
| Newer DEX-style perpetual venues | OHLCV, funding | Inception (recent) onward | Generous |
| FX / commodities / index broker APIs | OHLCV (FX, commodities, indices) | ~2005–present | Moderate |
For crypto: One tier-1 perpetual venue with a clean API and good depth, paired with a tier-1 spot venue with the deepest history, is a solid default. Use the deeper-history source for historical backfill and your primary trading venue for live data.
For non-crypto: A retail FX/CFD broker with a free practice account is the easiest free source for FX, commodity, and index data going back many years. Useful for cross-asset correlation testing.
You need a script that:
This is a straightforward Python script. An LLM like Claude or ChatGPT can help you write it in under an hour if you describe exactly what you need.
Pro Tip
When using an LLM to help write your data fetcher, give it the exchange’s API documentation URL and say: “Write a Python script that fetches all BTCUSDT 1-day candles from [exchange] starting from 2020-01-01, handles pagination, respects rate limits, and saves to a lightweight embedded SQL database.” Review the output, test it, and iterate. This is exactly how production data pipelines start.
Previous: 3.1 What Data You Need
Next: 3.3 Storage & Databases
Your candle data needs to live somewhere reliable, queryable, and fast. This section covers the options from simple to production-grade.
One CSV per symbol per timeframe. Easy to inspect, easy to load into pandas, no database knowledge required. Fine for a single strategy on a single symbol. Falls apart when you have 25 symbols across 5 timeframes and need to join data efficiently. Start here.
A single-file database that requires no server. Supports SQL queries. Fast for read-heavy workloads. Our production trading bots use a lightweight embedded SQL database for their local market data. Perfect for one strategy, one exchange, up to a few million candles. Graduate to this when CSV gets messy.
A full relational database with time-series optimisation. Supports concurrent access, complex queries, and can handle hundreds of millions of rows. Our research environment uses a time-series database with vector-similarity capabilities for embedding storage. Use this when you have multiple strategies, multiple exchanges, and want an analytics layer.
Practical Advice
Do not start with a server-class database. Start with CSVs. Then move to a lightweight embedded SQL database when you need queries. Then move to a server-class time-series database when you need scale. Premature infrastructure optimisation is how people spend 3 weeks setting up a database and 0 weeks testing a strategy.
Previous: 3.2 Where to Get It
Bad data is worse than no data. A corrupted candle can trigger a false signal, open a real trade, and lose real money. This section covers the data quality checks that must exist before any data touches your strategy.
| Issue | What Happens | How to Detect |
|---|---|---|
| Missing candles (gaps) | Indicators calculate wrong values, signals fire at wrong times | Check for expected number of rows per day/week |
| Incomplete candles | A candle fetched before the period closed has partial data | Compare volume vs typical; check fetch timestamp vs period end |
| Duplicate candles | Same timestamp appears twice, inflates averages | Check for duplicate timestamps after ingestion |
| Extreme outliers | A candle shows volume of 5 when the average is 5,000 | Flag candles where volume or range is >5 standard deviations from mean |
| Wrong timezone | Weekly candles calculated from the wrong start day | Verify first candle timestamp matches expected timezone |
War Story
Our candle update script had a checkpoint bug: it fetched weekly candles at 00:30 UTC on Monday, which meant the Sunday close candle was still incomplete. The script saved it, advanced the checkpoint, and never went back to correct it. The result: one week’s candle had volume of 5 instead of 5,000 and a close that was off by $2,000. The strategy calculated the wrong SMA slope and would have made a trade based on garbage data. We only caught it because we built a data quality validator that flags candles with volume below the 1st percentile.
Run these checks every time new data is ingested. Fail loudly if any check fails. Never let unvalidated data reach your strategy engine.
Previous: 3.3 Storage & Databases
Next: 3.5 Historical Backfill
Before you can backtest anything, you need years of clean historical data. This section covers how to do a proper historical backfill and verify it’s correct.
For BTC: January 2020 gives you 5+ years including the COVID crash, 2021 bull run, 2022 bear market, and 2023–2026 recovery. This is a rich dataset.
Most APIs return 200–1000 candles per request. Paginate from your start date to present, using the last candle’s timestamp as the start of the next request. Add a 1-2 second delay between requests to stay within rate limits.
Run all data quality checks from section 3.4. Verify the total number of candles matches what you expect (365 daily candles per year, 52 weekly candles per year, etc.).
Create a cron job or scheduled task that fetches new candles daily. Use an overlap window: always re-fetch the last few candles in case the previous fetch caught an incomplete candle.
Pro Tip: Overlap Window
When resuming a data fetch from a checkpoint, always re-fetch the last 3–5 candles from the previous run. This corrects any candles that were incomplete when first fetched (e.g., fetched at 00:30 UTC before the daily candle closed at 00:00 UTC the next day). This one pattern eliminates the most common source of data corruption in automated pipelines.
Previous: 3.4 Data Quality & Cleaning
Next: 3.6 Real-Time Data Layer
Backfill gives you history. The live system needs now. The cheap option — polling REST in a loop — works at backtest pace and falls apart at production pace. WebSocket discipline is what separates a system that catches fills in real time from one that’s always ten seconds behind reality.
For anything that changes faster than your cron interval — ticks, your own fills, your own order updates, position state — polling is the wrong tool. Three reasons:
The two are complements, not substitutes. A reasonable default split:
The WebSocket is the source of truth for change. The REST snapshot is the source of truth for state. Both, together, defend against the failure modes of either alone.
Connections drop. Networks blip. Venues restart. Your reconnect logic decides whether you trade through it or sit blind for an hour.
Most venues stamp every WebSocket message with a monotonically-increasing sequence number per channel. The discipline is small and absolute: if you receive seq N+2 after seq N, you missed N+1. There is no “probably nothing important happened.”
An order book that’s silently missing one update is worse than no order book at all — it makes confident decisions on stale state. Force-resnap is cheap insurance.
Most venues that stream incremental order-book updates also publish a periodic checksum — typically a hash of the top-N levels of bids and asks. The contract is: if your locally-reconstructed book’s checksum doesn’t match the venue’s on the same tick, your local book is wrong. Causes range from a dropped delta to a mis-handled price-level deletion. The remedy is the same regardless: force-resnap. Discard the local book, fetch a full snapshot, replay any deltas that have arrived since.
If you trade off the order book at all — even just for sizing slippage estimates — verify checksums. A book you don’t verify is a book you can’t trust.
The same fill might land in your handler twice: once from the WebSocket fill stream, once from the REST snapshot you took on reconnect. Or three times, if the websocket re-delivers a buffered message after reconnect. The defence is a single line: dedupe by the venue’s fill_id (or order_id, or whatever immutable identifier the venue assigns). Before you process any fill, check whether you’ve already recorded it; if yes, drop. The cost is one indexed lookup per event; the benefit is that double-counting fills cannot corrupt your state.
The same applies to order updates: dedupe by (order_id, status, update_ts). Many venues will re-emit the “FILLED” status under various edge cases; your state machine should be a no-op the second time.
Key Insight
The WebSocket is not “the same thing as REST, but faster.” It is a different reliability model. REST gives you snapshot consistency at the cost of latency; streaming gives you change notifications at the cost of having to manage gaps, reconnects, and dedupe. Build for the streaming reliability model from day one — gap detection, checksum verification, dedupe, reconcile-on-reconnect — or you will eat the cost in silent state corruption when it matters most.
Previous: 3.5 Historical Backfill
Every strategy starts as an idea. The skill is converting that idea into a precise, testable, falsifiable hypothesis with exact entry and exit conditions. Vague ideas cannot be backtested. Precise hypotheses can.
Most trading ideas sound like this: “BTC tends to rally after big dips.” That is not a strategy. It is an observation. To make it testable, you need to answer five questions:
| Vague Idea | Precise Hypothesis |
|---|---|
| “BTC rallies after big dips” | When BTC daily close drops >10% from its 30-day high, go long at the next daily open. Exit after 14 days or at +8%, whichever comes first. Stop-loss at -5%. |
| “Trend following works” | When a slow weekly moving-average derivative turns positive AND a close-position filter confirms strong-conviction candles AND an efficiency-ratio gate confirms a trending (not choppy) tape, go long. Exit when the moving-average derivative turns negative. |
| “Funding rate extremes revert” | When 8h funding rate exceeds the 95th percentile of its 90-day distribution, go short. Exit when funding rate returns to median. Stop-loss at -3%. |
The vague versions are untestable. The precise versions can be coded and backtested in an afternoon.
Key Insight
If you cannot write the entry and exit conditions as an if statement in code, the hypothesis is not precise enough. “Buy when it looks like a reversal” cannot be coded. “Buy when RSI(14) crosses below 30 and then crosses back above 30 on the next candle” can be coded in 3 lines of Python.
Previous: 3.6 Real-Time Data Layer
Next: 4.2 Types of Strategies
Not all strategies are the same. They trade on different timeframes, exploit different market behaviours, and suit different personality types. This section maps the landscape so you can choose where to start.
| Type | Holds For | Trades/Year | Edge Source | Complexity |
|---|---|---|---|---|
| Weekly Trend Following | Weeks to months | 3–8 | Riding large trends, cutting losers early | Low |
| Daily Swing | Days to weeks | 15–40 | Multi-day momentum or mean reversion | Medium |
| Intraday | Hours | 100–500 | Session patterns, liquidity sweeps, time-of-day effects | High |
| Derivatives / Positioning | Hours to days | 5–20 | Funding rate extremes, OI crowding, liquidation cascades | Medium |
| Statistical Anomaly | Varies | Varies | Day-of-week effects, cross-asset correlation, regime-conditional patterns | Medium-High |
Start with weekly trend following. Here is why:
Important nuance: low trade count does not automatically mean “hard to overfit.” The opposite is true — low N means high variance, wider confidence intervals on every metric, and an easier path to fitting noise. The defence against overfitting at low N is parsimony (very few parameters), long observation windows (multiple regimes), simple rules (mechanism you can articulate), and cross-instrument validation (does the same rule work on ETH, SOL, etc. without re-tuning?). A weekly system is “safer” only because the format forces these constraints — not because few trades is inherently more honest.
Our simplest live strategy — a weekly trend-following system — trades a handful of times per year and shows a strong profit factor over a multi-year window. Simple does not mean weak — but a multi-year, low-double-digit-trade sample is still a wide confidence interval, which is why we lean on cross-instrument and walk-forward checks rather than the single point estimate.
A long-only weekly system that goes long when a slow moving-average derivative turns positive, gated by a close-position filter (the candle closed strongly into its range) and an efficiency-ratio gate (the tape has been trending, not chopping). Goes flat when the derivative turns negative. No shorting — the underlying has strong positive long-term drift, so shorting is structurally expensive. Single-digit annual trade count. Profit factor in the high single digits over a multi-year window. Max drawdown contained well under 25%.
A short-side system that fires only when a confirmed bear regime is in force AND the daily efficiency-ratio is low (choppy / ranging). Exits when efficiency-ratio recovers or the bear regime closes. The edge comes from the underlying bleeding slowly during low-efficiency bear windows — slow drift, not sharp drops. Low double-digit annual trade count. Compound returns roughly proportional to the regime it operates in, with drawdown profile in line with that regime.
A system that fires when derivatives positioning data signals one side of the book is crowded and price action confirms the squeeze. Mechanical logic: crowded positioning creates forced unwind cascades when price moves against it. The edge is extracted from positioning, not from technicals. Hit rate above 60%, profit factor above 1.5, single-digit annual trade count. Used as an overlay rather than a standalone — statistical confidence is moderate, not strong.
During specific intraday session windows, identify a liquidity sweep on a higher intraday timeframe, wait for a market-structure shift on a lower one, enter on the retracement into a price-imbalance zone. Stop below the sweep wick. Target the next opposing liquidity pool, typically with an asymmetric reward-to-risk profile. This is the most complex strategy type — requires fine-grained data and multi-timeframe analysis. We treat it as a research direction, not a primary capital allocation.
Practical Advice
Build your first strategy at the simple end of the spectrum. Get it live, profitable, and boring. Then explore complexity. The temptation is to start with intraday multi-timeframe systems because they feel sophisticated. Resist. Complexity is earned, not chosen.
Previous: 4.1 Hypothesis to Testable Signal
A raw signal tells you when to trade. Gates and filters tell you when not to trade. The difference between a mediocre strategy and a great one is often not the entry signal — it’s the trades the system refuses to take.
A gate is a condition that must be true in addition to the entry signal. If the signal fires but the gate is closed, the system does nothing. Gates filter out low-conviction setups.
| Gate | What It Measures | Why It Helps |
|---|---|---|
| Close Position (CP) | Where the candle closed within its range (0 = low, 1 = high) | A high-CP threshold means price closed strongly into the top of its range — strong conviction. Filters out indecisive candles. |
| Efficiency Ratio (ER) | Direction vs noise ratio over N periods (0 = pure chop, 1 = pure trend) | An ER threshold above pure-noise levels means the market has been trending rather than chopping. Filters out choppy periods where trend-following gets whipsawed. |
| Regime Gate | Current market regime (bull, bear, chop) | Only trade in regimes where the strategy has proven edge. A long-only strategy gated by “bull regime” avoids bear markets entirely. |
| Volatility Filter | Current volatility relative to historical (e.g., ATR percentile) | Some strategies only work in low-vol (mean-reversion, day-of-week effects) or high-vol (breakout, momentum). The filter is the differentiator. |
Here is the directional impact gates had on one of our live strategies. The numbers are deliberately qualitative — the point is the shape of the improvement, not specific values:
| Metric | Without Gates | With Close-Position + Efficiency-Ratio Gates |
|---|---|---|
| CAGR | Positive but modest | Substantially higher |
| Profit Factor | Low single digits | High single digits |
| Max Drawdown | Above 30% | Well under 20% |
| Trades | Higher count | Lower count (gates filter many out) |
Fewer trades, higher returns, lower drawdown. The gates eliminated trades that would have been losers — low-conviction entries during choppy or indecisive markets. The signal was the same. The gates made it profitable.
Key Insight
The best strategies are defined as much by what they refuse to trade as by what they trade. Most profitable traders sit out 60–80% of sessions. Your system should do the same. Gates are how you codify patience.
Previous: 4.2 Types of Strategies
This is the repeatable process for testing any trading idea. Hear a claim, formulate the hypothesis, pull data, test it, find the filters that matter, deliver a verdict. You will use this template dozens of times.
From Twitter, a podcast, a friend, a paper, or your own observation. “Funding rate extremes revert.” “Tuesday and Wednesday are the best trading days.” “Returns are higher right after a sharp VIX spike.” Don’t judge it — just write it down.
Convert the claim into a precise, testable statement with exact conditions: “When BTC 8h funding rate exceeds the 95th percentile of its 90-day distribution, the 24-hour forward return is negative on average.” If you cannot make it precise, it is not testable.
Fetch the specific data you need. For the funding rate hypothesis: 5 years of 8-hour funding rate history + daily OHLCV candles. You already have this from Module 3.
Write a simple backtest. Identify every occurrence of the condition. Measure the forward return at your target horizon (4h, 24h, 7d). Calculate hit rate, average return, profit factor. Chart the results.
Split the results by regime (bull/bear/chop), volatility (high/low), day of week, and any other relevant dimension. Often a signal that is flat overall becomes strong in one specific condition. In one investigation we ran, a flat headline result turned strongly positive once we filtered to low-volatility windows only — the filter was the entire signal.
Be honest. The verdicts are: “Strong signal, worth developing further” (rare), “Weak signal, useful as overlay/filter only” (common), or “No signal, kill it” (most common). Do not force a positive result.
This is a textbook illustrative example — not a proprietary investigation — chosen because it’s familiar and shows the template clearly. Imagine you’ve heard the claim that a major equity index has a “Tuesday effect” — that Tuesdays are systematically stronger than other weekdays. Here is the shape of how that investigation would play out under this template:
Total time on a real investigation of this shape: about half a day. The investigation produces real knowledge: a conditional relationship exists, it’s not strong enough to trade, and the volatility filter is the key variable. That last insight is useful for other research.
Pro Tip
Keep a research log. Every investigation — even the ones that produce nothing — generates knowledge about what doesn’t work and which filters matter. Over time, patterns emerge across investigations: “volatility regime matters for almost everything” is a finding that improves all future research.
Large language models (Claude, ChatGPT, Gemini, Grok) are extraordinary tools for accelerating every step of the investigation process. They cannot tell you whether a strategy works — only data can do that — but they can write the code to test it, generate hypotheses you hadn’t considered, and help you interpret results.
Critical: Look-Ahead Bias in LLM-Generated Code
The most common bug in LLM-generated backtesting code is look-ahead bias — using information that would not have been available at the time of the decision. Example: calculating an indicator using today’s close to make a decision that should have been made at today’s open. Always review generated code line by line and ask: “at the moment this decision is made, has this data point been observed yet?”
For maximum rigour, use multiple LLMs in an adversarial workflow:
This mirrors how professional quant teams work: the researcher proposes, the risk team attacks, and the portfolio manager decides. You can simulate all three roles with different LLMs or different conversations.
Previous: 4.4 The Investigation Template
A backtesting engine simulates your strategy against historical data as if you were trading in real time. The key word is “as if.” Every decision the engine makes must only use information that would have been available at that moment. Violate this and your results are fantasy.
Every backtester, no matter how sophisticated, runs the same basic loop:
The backtesting core loop. Every decision uses only data available up to and including the current candle. Never the next candle.
Sharpe ratio, profit factor, win rate, and max drawdown are the starting set, but no single metric captures a strategy. Pros look at a panel. Add at least these three:
When each is most informative: Sharpe for symmetric / mean-reverting systems and broad portfolio comparisons. Sortino for trend-following or any strategy where upside vol is the point. Calmar when you’re sizing capital or comparing strategies for retirement-style accounts. IC for signal research before strategy construction. Profit factor for trade-by-trade asymmetry. Win rate only in combination with average-win/average-loss ratio — high win rate with poor PF is a red flag for “picking up pennies in front of a steamroller.”
Look at all of them. A strategy that looks good on Sharpe but ugly on Calmar has hidden tail risk. A strategy with a high IC but low Sharpe likely has an execution problem, not a signal problem. The panel tells you which.
You have two options:
backtrader, vectorbt, or zipline provide pre-built infrastructure. Faster to start but harder to customise and easier to misuse.We recommend building your own for your first strategy, using an LLM to help with the code. This forces you to understand every line. Once you understand the mechanics, frameworks become useful for speed.
Prompt for Your LLM
“Write a Python backtester using pandas that: (1) loads daily OHLCV from a CSV, (2) calculates SMA(20) and SMA(50), (3) goes long when SMA(20) crosses above SMA(50), exits when it crosses below, (4) models 0.1% fees per trade, (5) tracks each trade with entry/exit prices and PnL, (6) outputs CAGR, profit factor, max drawdown, and win rate. The decision to trade must be made on the candle’s close, with the entry/exit at the next candle’s open.”
Previous: 4.5 Using LLMs
Next: 5.2 The Cardinal Rules
These rules are non-negotiable. Violate any one of them and your backtest results are meaningless. Memorise them.
Every decision must use only data available at the time of the decision. If your strategy decides to buy based on today’s close, the earliest you can execute is tomorrow’s open. Using today’s close to enter at today’s open is impossible in real life. This is the #1 bug in backtesting code.
Every trade incurs: exchange fees (0.02–0.06% per side), slippage (0.01–0.05% per side), and for perpetuals, funding rates (variable, every 8 hours). Use 25 basis points (0.25%) round-trip as a baseline for crypto perpetuals. If your edge doesn’t survive 25bps of costs, it’s not an edge — it’s noise.
If you only test on assets that exist today, you miss the ones that went to zero. In crypto this matters less (BTC and ETH have survived), but for altcoins it is critical. A strategy that “worked on the top 20 coins” might have been tested on the 20 that survived — not the 200 that didn’t.
A strategy with 8 trades over 2 years is not statistically meaningful. You need enough trades and enough time to cover different market regimes. Minimum: 3 years of data covering at least one full bull-bear cycle. No fixed trade count is sufficient on its own. What matters is: (a) the confidence interval on the chosen metric (Sharpe, profit factor) is tight enough to act on, (b) the rule is robust across multiple instruments without re-tuning, (c) walk-forward out-of-sample performance holds up. Rule of thumb: under ~30 trades, confidence intervals on Sharpe and PF are too wide to trust as a standalone claim — you must lean on cross-instrument and walk-forward evidence. 100+ trades is preferred when basing a decision on a single instrument’s point estimate. Low-N strategies (e.g. ~3 weekly trades/year) are not invalid — they just have to earn trust through parsimony, multi-regime exposure, and cross-instrument robustness rather than through a tight confidence band on the trade sample alone.
If you sweep 200 parameter combinations and pick the best one, your backtest is measuring how well you fit to historical noise, not how well the strategy works. Split your data: optimise on the first 70%, test on the remaining 30%. If it works on both, it might be real.
War Story
We ran a comprehensive investigation into retail and institutional trading footprints. Across 10 engineered signals and 1,260 total trades: 7 of 10 signals were at or below coin-flip after costs. The overall portfolio produced +0.039% mean return — effectively zero. Only one signal survived all falsification checks, and even that had a p-value of 0.22. The investigation produced more “no” answers than “yes” answers. That is normal. That is the process working correctly.
Previous: 5.1 Building a Backtesting Engine
A single backtest gives you one realised path. Monte Carlo gives you a distribution of plausible alternative paths so you can size risk against the worst-case end of that distribution rather than against the lucky single sample you actually lived through. The catch: how you generate those alternative paths matters enormously, because trading P&L is not i.i.d.
Your backtest shows a max drawdown of -15%. Great. But what if the three worst trades had happened consecutively instead of spread out? The drawdown might have been -35%. The specific order of trades in history is one random sample from a much larger distribution of plausible orderings. Monte Carlo’s job is to estimate that distribution.
The textbook introduction to Monte Carlo on backtest output is “take your N trades, randomly shuffle the order, simulate the equity curve, repeat 10,000 times, look at the distribution.” This is useful as a first-pass illustration of variance — it shows that the historical equity curve is one of many possible paths — but it is the wrong production method for trading data.
The reason: shuffling assumes trades are independent and identically distributed (i.i.d.). They are not. Real trading P&L exhibits:
This matters because real drawdowns come precisely from clustered losses — runs of correlated bad trades during the wrong regime. A naive shuffle systematically underestimates tail drawdown risk by destroying exactly the dependence structure that produces it. If you size your account on naive-shuffle 95th-percentile drawdown, you are sizing on an optimistic distribution.
Block bootstrap preserves local autocorrelation by resampling contiguous blocks of trades instead of individual trades.
Pseudocode:
trades = list of N trade returns in chronological order
block_size = round(sqrt(N)) # rule of thumb; see below
n_blocks = ceil(N / block_size)
FOR each of 10,000 simulations:
sampled = []
FOR i in 1..n_blocks:
start = random integer in [0, N - block_size]
sampled.extend(trades[start : start + block_size])
truncate sampled to length N
simulate equity curve from sampled
record final return, max DD, Sharpe, etc.
distribution = the 10,000 resultsBlock-length rule of thumb: block ≈ √N is a reasonable default. For higher-frequency strategies (intraday, hundreds of trades), 5–10 trades per block is usually enough to preserve short-horizon dependence. For lower-frequency strategies with strong regime structure, push toward 15–20 trades per block so each block spans a meaningful slice of regime time. Sensitivity-test the result across two or three block sizes — if your tail estimates wobble dramatically, the dependence structure is doing real work and the answer is uncertain.
Fixed block lengths have an arbitrariness problem: why 5? why 20? The stationary bootstrap (Politis & Romano, 1994) draws a random block length each time from a geometric distribution with mean p−1. This produces a resampled series that is itself stationary (the fixed-block version is not), and is generally more robust to block-size mis-specification.
Recipe: at each step, with probability p start a new block at a random position; otherwise continue the current block. Choose p so 1/p matches your target average block length (e.g. p = 0.1 for an average block length of 10 trades). For most retail-scale playbook work, stationary bootstrap is the default to reach for.
If you have already labelled each trade with the regime it occurred in (bull / bear / chop, or high-vol / low-vol), you can resample within each regime separately and recombine in proportion to the regime exposure you expect going forward. This preserves regime-conditional behaviour rather than averaging it out.
This is useful when the strategy clearly behaves differently across regimes and you care about scenarios like “what happens if the next 12 months are 70% chop and 30% bear?” You can also use it to stress-test against a regime mix that is more hostile than the historical mix.
Regime-stratified resample preserves the regime composition of the original sample so bootstrap distributions don’t blend incompatible market environments. Recombine in the proportion you expect going forward, or stress-test against a more hostile mix than the historical one.
Block bootstrap typically produces a heavier left tail than naive shuffle on the same trade list, because it preserves the loss-clustering that drives real drawdowns. The honest distribution is wider and uglier than the i.i.d. one.
Key Insight
Always plan for the 95th-percentile drawdown from a dependence-preserving bootstrap (block or stationary), not the historical value and not the naive-shuffle value. If the strategy survives the bootstrap-95th in 95% of simulations, it can survive realistic clustered losses. If you size on the historical or i.i.d. result, you are one regime-cluster away from ruin.
Sample-Size Honesty
If you only have 50–100 trades, even block bootstrap is unreliable — the resulting confidence intervals will be very wide and the tail estimates noisy. Don’t let Monte Carlo give you false comfort. A pretty distribution chart computed from 60 trades is still a chart computed from 60 trades. In low-N regimes, lean on cross-instrument robustness and walk-forward consistency rather than on bootstrap percentiles, and state explicitly in your strategy report that the MC tails are estimated from a small sample.
Previous: 5.2 The Cardinal Rules
The most powerful test of a strategy is whether it works on data it has never seen. Walk-forward testing simulates this by training on one period and testing on the next, rolling forward through time.
The simplest version: split your data into two parts.
If the strategy performs similarly on both sets, the edge might be real. If it performs well in-sample but fails out-of-sample, you overfitted.
A more rigorous version: roll through time in windows.
Each test period is truly unseen. If the strategy is consistently profitable across all test windows, the edge is robust. If it works in some windows and fails in others, investigate which market conditions caused the failures — that tells you about regime sensitivity.
Two variants, with a real trade-off:
Rule of thumb: rolling for crypto and fast-moving microstructure-driven edges; anchored for macro-structural or cross-asset patterns. If unsure, run both and compare — if rolling is materially better than anchored, that itself tells you the edge is regime-sensitive and you have a degradation-risk dimension to monitor in production.
Walk-forward is only as honest as the discipline around it. Three rules:
Warning: The Peeking Problem
The temptation is enormous: your strategy fails on the out-of-sample data, so you “adjust” it and re-test. You have now contaminated the OOS data — it is no longer unseen. The only honest approach: develop on in-sample, test on OOS once, and accept the result. If it fails, fold the OOS into in-sample, develop a new hypothesis, and freeze a fresh forward window. Do not pretend the same data is still untouched.
Previous: 5.3 Monte Carlo Simulation
If changing a parameter by 10% destroys your edge, you don’t have an edge. You have a coincidence. Parameter sensitivity analysis tests whether your strategy is robust or fragile.
Here is the shape of a healthy parameter sweep. Imagine sweeping a gate threshold across its plausible range and recording CAGR and profit factor at each value:
| Gate Threshold (relative) | CAGR | Profit Factor | Verdict |
|---|---|---|---|
| Loose (low end) | Strong | Strong | Strong |
| Slightly tighter | Strong | Strong | Strong |
| Mid-range | Strong | Strong | Strong |
| Selected operating point | Best | Best | Selected |
| Slightly tighter still | Strong | Strong | Strong |
| Tighter | Acceptable | Acceptable | Acceptable |
| Tightest (sample-starved) | Lower | Lower | Degrading |
This is a robust parameter: performance is strong across a wide range, and the selected value sits in a broad plateau. Moving the threshold by ±10% barely changes the result. This is what you want.
A fragile parameter would show a sharp spike at exactly one value, with sudden collapse one tick in either direction. If one tick destroys the strategy, the “edge” is an artefact of the specific data, not a real market phenomenon.
Key Insight
Robust strategies have flat plateaus in parameter space. Overfitted strategies have sharp spikes. If your strategy only works at one specific lookback and fails one tick in either direction, it is almost certainly curve-fitted. Real edges are broad. Coincidences are narrow.
Previous: 5.4 Walk-Forward Testing
The moment you start sweeping — across parameters, instruments, hypotheses, or anomaly-scanner output — the naive p-value framing breaks down. You are running many simultaneous hypothesis tests, and the standard 5% threshold guarantees a fixed rate of false positives by chance alone. Without correction, your “winners” are mostly luck.
Run 100 strategy variants and apply a p < 0.05 cutoff. If none of them have any real edge, you still expect ~5 of them to look statistically significant by pure chance. The standard p-value is calibrated to a single test. Run many tests and the probability that at least one looks significant climbs fast: with 20 independent tests and no real signal, there is a 64% chance at least one comes in under p < 0.05. Run 100, it is essentially certain.
This affects almost everything in trading research:
If you don’t correct for the number of tests, you are guaranteeing a steady stream of fake winners.
The simplest fix: divide your significance threshold α by the number of tests N. To claim a finding is significant at α = 0.05 across 100 tests, that finding must individually satisfy p < 0.0005. Bonferroni controls the family-wise error rate (FWER) — the probability of any false positive across the whole family of tests — at α.
Bonferroni is the right call for confirmatory tests: when a false positive is genuinely costly (you’re about to deploy capital), and you would rather miss real edges than ship a fake one.
The downside: it is brutally conservative. Real but moderate edges will fail Bonferroni in any large sweep, and you’ll under-discover.
The Benjamini–Hochberg (BH) procedure controls the False Discovery Rate (FDR) — the expected fraction of false positives among the tests you call significant, rather than the probability of any false positive at all. This is usually what you actually want: “of the 12 strategies I’m flagging as winners, I’m willing to tolerate ~10% being noise, in exchange for finding more real edges.”
Recipe:
BH is the right call for exploratory research: anomaly scans, parameter sweeps, multi-instrument hypothesis testing — situations where you can tolerate some false positives downstream because they get filtered by paper trading and the falsification suite, but you want to bound the noise rate so the candidate list is meaningful.
| Situation | Use | Why |
|---|---|---|
| About to deploy live capital on the “winner” | Bonferroni | Fake winner is expensive; conservative is correct |
| Filtering 80 scanner anomalies down to a candidate set | BH / FDR | Want a meaningful shortlist, not zero |
| Parameter sweep within one strategy | BH / FDR + plateau check | Combine FDR with sensitivity (Module 5.5) — isolated p-value spikes are suspect even if they survive correction |
| Cross-instrument hypothesis test (“does this work on the other 22 coins?”) | BH / FDR | You want a calibrated set of survivors to investigate further |
Practical note: track the total number of tests run on a research idea across the whole project lifetime, not just inside one notebook. If you tested 50 variants last week, killed them, and are now testing 50 more, the relevant N is 100. This is uncomfortable but honest. Selection bias compounds across sessions if you don’t.
Key Insight
The naive “p < 0.05” cutoff is a single-test concept. The moment you sweep, scan, or compare alternatives, you owe the data a correction — Bonferroni for confirmation, BH/FDR for exploration. Without it, your research pipeline is a noise factory that produces a steady stream of plausible-looking strategies that don’t survive deployment.
Previous: 5.5 Parameter Sensitivity
A positive backtest is the most dangerous moment in strategy development. It feels like validation. It is usually an illusion. This section explains the specific mechanisms by which backtests mislead, so you can defend against each one.
You tested 200 parameter combinations and picked the best. The strategy is not exploiting a market phenomenon — it is exploiting the specific random sequence of your historical data. It will fail on new data because it was built to fit the noise in old data. The more parameters you tune, the more opportunities for overfitting.
You tested 50 different strategy ideas and the one that worked is the one you are presenting. But if you test 50 random strategies, some will show positive results by chance alone. At a 5% significance level, you expect 2–3 false positives out of 50 tests. The strategy that “worked” might just be the lucky random one.
Subtle bugs that make the backtest easier than reality: using the close price to enter a trade on the same candle, not modelling slippage on large orders, ignoring funding rates that erode 0.1% every 8 hours, or assuming fills at the mid-price when you would actually cross the spread. Each leak adds a few basis points of phantom edge.
War Story — Framework Working as Intended
One of our derivatives-based reversal candidates looked promising in early single-window testing. The early stage of stress-testing — rerunning on clean, full-period data with the same rules — surfaced the problem immediately: the apparent edge was concentrated inside one short window and reversed sign outside it. The candidate was killed at the falsification gate, before any capital was committed. The lesson is not that a bad strategy slipped through; it is the opposite. Stress-testing exists precisely so candidates like this one die in the lab. The framework worked because we tested before deploying.
Previous: 5.5 Parameter Sensitivity
Every strategy that passes backtesting must survive all six of these tests before it is considered for live deployment. Fail any one, and the strategy goes back to the lab or gets killed. There is no “well, it mostly passed.”
Move every tuneable parameter by ±10–20%. Does the edge survive? If the strategy only works at exactly the chosen parameters and collapses at nearby values, it is curve-fitted. Pass condition: Performance remains positive across the parameter neighbourhood. (Covered in Module 5.5)
Test the strategy on data it has never seen. Develop on 2020–2023, test on 2024–2026. Pass condition: OOS performance is in the same ballpark as in-sample. It doesn’t need to be identical, but it must be positive and directionally consistent.
Split your backtest by market regime: bull, bear, and chop. A strategy does not need to be profitable in all three, but you must know which regimes it works in and which it doesn’t. Pass condition: Profitable in at least two of three regimes, or clearly designated as a single-regime strategy with a regime gate (Module 10).
Run the strategy on data from a different exchange. The primary reasons performance can diverge across venues are deeper than “different volume profiles”: each venue has its own index price constituents (the basket of spot exchanges feeding the mark price — this directly drives liquidation prices, funding payments, and stop fills), its own liquidation engine mechanics (partial liquidation tiers, ADL queues, maintenance-margin schedules), and its own fee schedule (maker rebates, taker tiers, VIP discounts). On top of that sit venue-specific quirks — funding-rate caps, tick-size differences, COIN-M conventions, USDT-M vs COIN-M margining. Surface-level differences (close times, volume profiles) matter, but they are secondary. If the edge survives on one venue but dies on another, the dominant cause is usually one of the deep mechanics, not the cosmetics. Pass condition: Profit factor and direction are consistent across at least two data sources after applying each venue’s actual fee schedule, funding accrual, and liquidation rules.
Generate random entry signals as a baseline and compare your strategy against the distribution of random outcomes. The naive version (“random entries at the same frequency, beat the 95th percentile”) is only valid if the random baseline matches the strategy on every dimension that drives P&L. Otherwise the p-value is meaningless. The baseline must match: (a) average entry frequency (entries per year); (b) holding-period distribution (same mean and variance of trade duration, not just the mean); (c) time-in-market (% of bars in position); and (d) regime exposure — baseline trades must be drawn from the same regime mix the strategy actually traded in. If the strategy only enters in trending regimes, the baseline must be stratified to do the same; otherwise you are comparing strategy-in-trend vs random-in-everything, and the p-value is comparing two different distributions. If any of these don’t match, the placebo test is invalid and the p-value is misleading. Pass condition: Strategy performance exceeds the 95th percentile of matched random baselines (p < 0.05) and the matching dimensions are documented.
# Stratified-randomisation baseline construction
strategy_trades = list of (entry_time, hold_bars, regime_at_entry)
FOR each of 1000 baseline runs:
baseline_trades = []
FOR each strategy_trade in strategy_trades:
# Match regime exposure: only sample entry times from
# bars in the SAME regime as the original entry
candidate_bars = bars where regime == strategy_trade.regime
random_entry = random choice from candidate_bars
# Match holding period exactly (or sample from
# the strategy's hold-period distribution)
baseline_trades.append((random_entry, strategy_trade.hold_bars))
simulate baseline P&L using the SAME execution model
record metric (Sharpe / PF / total return)
p_value = fraction of baselines whose metric ≥ strategy's metricSplit your data in half chronologically. Does the strategy work in the first half AND the second half? If it only works in one period, the edge may have been regime-specific or the market microstructure may have changed. Pass condition: Positive performance in both halves.
The falsification funnel. Each successive stage kills the vast majority of remaining candidates. By the time a hypothesis reaches paper trading, it has survived statistical, walk-forward, regime, cross-venue, placebo, and time-stability pressure. This is normal. This is the process working correctly. If every idea survived, your tests are not rigorous enough.
Previous: 6.1 Why Most Backtests Lie
Not every failure means the strategy is worthless. Some failures point to fixable problems. Others point to fundamental issues. This section helps you distinguish between the two.
Key Insight
The default is kill. Tuning should be the exception, not the rule. The temptation to “fix” a failing strategy by adding parameters, filters, and exceptions is how overfitting happens. Every filter you add to rescue a strategy is an opportunity to fit to noise. Be honest with yourself: if the core signal is weak, no amount of filtering will make it strong.
Before any strategy goes to paper trading, it gets attacked by an independent reviewer — someone (or something) whose job is to find flaws. In our system, this means giving the strategy and its results to a different LLM with explicit instructions to destroy it.
Give a fresh LLM (one that did not help build the strategy) the following:
Then ask: “Your job is to find every reason this strategy might fail in live trading. Attack the methodology, the statistics, the assumptions, and the implementation. Assume the builder has confirmation bias. What are they not seeing?”
A good adversarial review will surface things like:
Each of these is either a fixable issue (update the spec, add gap-down handling) or a genuine threat (if 3 trades drive all the returns, the sample is too concentrated). The review process surfaces these before real money is at risk.
The question is not “how much should I buy?” The question is “how much am I willing to lose on this trade?” Position size is derived from risk tolerance, not from conviction or account size.
Every position size calculation follows this structure:
The position sizing chain: risk tolerance determines risk per trade, stop-loss distance determines position size, leverage determines margin required. You control the risk. The leverage is just plumbing.
Under fractional-fraction sizing (always risk a fixed % of current equity), the account is never literally “depleted” by a fixed number of consecutive losses — each loss is smaller than the last in absolute terms. The correct compounding formula is:
equity_remaining = (1 - r)^N
where r is risk per trade and N is the number of consecutive losses.
| Risk Per Trade (r) | Equity remaining after 50 losses | Equity remaining after 100 losses | Appropriate For |
|---|---|---|---|
| 0.5% | ~77.8% | ~60.6% | Conservative, high-frequency strategies |
| 1.0% | ~60.5% | ~36.6% | Standard for most systematic strategies |
| 2.0% | ~36.4% | ~13.3% | Aggressive, high-conviction strategies |
| 5.0% | ~7.7% | ~0.6% | Dangerous — deep drawdowns are likely |
| 10%+ | ~0.5% | ~0.003% | Effectively gambling |
Start at 1%. 100 consecutive losses at 1% leaves you with roughly 36.6% of starting equity — a brutal 63% drawdown, but not zero. The real metric to focus on is probability of ruin (or probability of hitting a chosen drawdown threshold), which depends jointly on win rate, payoff ratio (avg win / avg loss), risk per trade, and the drawdown level you treat as ruin. Naive “consecutive-loss-to-zero” math both overstates safety (you don’t actually go to zero) and understates damage (you can hit a 50% drawdown long before any “ruin” threshold). Model probability of ruin explicitly using your validated strategy’s edge stats.
Key Insight
Position sizing is the only lever you have that affects risk without changing the strategy. The same strategy at 1% risk per trade and 5% risk per trade has identical signals, identical win rate, and identical profit factor. The only difference is that the 5% version can blow up 5x faster during a drawdown. Size conservatively. You can always add leverage later. You cannot un-lose money.
Previous: 6.4 Adversarial Review
Next: 7.2 Stop-Loss Philosophy
A stop-loss is your contract with reality: “if I am wrong by this much, I accept I am wrong and exit.” Every position must have one. No exceptions.
Exit if the position moves X% against you. Simple, predictable, easy to calculate position size from. Example: 2% stop on a $5,000 position = $100 max loss. Best for: strategies where the entry logic is precise and you know exactly how much adverse movement is acceptable.
The stop moves in your favour as the trade progresses but never moves against you. Example: 20% trailing stop on a long trade — if BTC hits $100,000 from an entry at $80,000, the stop moves to $80,000 (20% below the peak). If BTC then drops to $80,000, you exit. Locks in profits during extended moves.
Exit when an indicator signals the trade thesis is invalidated. Example: exit a trend-following long when the SMA slope turns negative. This is the approach our weekly strategy uses. The stop is logical, not arbitrary.
Exit if the trade hasn’t reached its target within N candles. Prevents capital being tied up in dead trades. Example: if the trade hasn’t moved +2% in 14 days, exit at market.
Critical: Exchange-Side Stop-Losses
Your bot’s internal stop-loss is not enough. Bots crash. Servers go offline. Network connections drop. Every leveraged position must have an exchange-side stop-loss order placed at the time of entry. This means even if your bot is completely dead, the exchange will close the position at your predetermined price. This is non-negotiable for any leveraged system.
“Place a stop” sounds like one button. It isn’t. The flags you set on that order determine whether it does what you actually wanted in adverse conditions. The following are the parameters every operator should consciously choose, not accept by default.
Most perpetual venues let you trigger a stop on either the mark price (an index-derived fair value, often a moving average of multiple spot venues) or the last traded price. The trade-off:
Default to mark price for protective stops. Last price is acceptable only when you specifically need wick-speed reaction and your liquidity is deep enough that wicks reflect real flow.
A stop should only ever close exposure, never open new exposure. Set reduceOnly = true on every stop order. Without this flag, an edge case can flip you into a doubled position: the entry order is still partially filling when the stop fires, the stop sells the full intended size, and you end up short the unfilled portion. Reduce-only tells the venue “this order can only reduce or close my position; if there’s nothing to close, do nothing.” Belt-and-braces against the partial-fill race.
Every order has a TIF that governs how long it lives:
If your strategy has both a stop-loss and a take-profit on the same position, you want them linked: when one fills, the other cancels automatically. Otherwise the surviving order remains live with no position behind it — and on next move it opens a fresh position in the wrong direction.
Most venues offer two position modes:
Stops behave differently across modes — a reduce-only sell stop in hedge mode reduces your long position; the same order in one-way mode could open a short if your long has already closed. Choose a mode explicitly per venue and document it in your config. Mismatch between local-state assumptions and venue-side mode is a classic source of phantom positions.
Your entry order is for 1.0 BTC; the venue fills 0.6 BTC and you decide to cancel the remainder. Your initial stop was sized for 1.0 BTC. Now it’s wrong — if it fires, it sells 1.0 of position you don’t have (or, worse, with reduce-only off, it flips you short by 0.4). The pattern:
on partial_fill(order_id, filled_qty):
current_position = filled_qty # what you actually hold
if existing_stop_order:
cancel(existing_stop_order) # remove the wrongly-sized stop
wait_for_cancel_ack() # confirm before placing new
new_stop = place_stop(
symbol = order.symbol,
side = opposite(order.side),
qty = current_position, # match the actual fill
trigger = stop_price,
trigger_src = "mark",
reduceOnly = True,
tif = "GTC",
)
persist(new_stop.id)Note the cancel-then-replace pattern is not free of race conditions (see Module 8.4 on amend-vs-replace) — if your venue supports atomic amendment of stop quantity, prefer that. The window between cancel and new-place is your exposure window: keep it short, alert if the cancel-ack is slow, and the reconciliation loop is your safety net.
Stop-losses protect individual trades. Circuit breakers protect the entire account. They are the emergency brake that stops everything when conditions become extreme.
| Trigger | Action | Resume Condition |
|---|---|---|
| Account drawdown exceeds your “soft” threshold (calibrated to your Monte Carlo distribution) | Close all positions, halt new entries | Manual review + a defined cooling-off pause |
| Consecutive-loss streak exceeds your threshold (calibrated to your hit rate and signal frequency) | Pause new entries for a defined window | Automatic resume after the window expires |
| Exchange API errors exceed threshold | Halt all trading, alert operator | Manual verification that API is working |
| Position reconciliation fails | Halt new entries, alert operator | Manual reconciliation of actual vs expected positions |
At the account level, set an explicit drawdown threshold calibrated to your strategy’s expected drawdown profile — an absolute circuit breaker. If the account drops past that threshold from its peak, everything stops. All positions are closed. The system enters a mandatory cooling-off pause.
The reason to fix this number in advance is asymmetry: recovering from a 40% drawdown requires a 67% gain (achievable); recovering from a 70% drawdown requires a 233% gain (functionally starting over). The circuit breaker exists to prevent the drawdown from ever reaching the point of no return — and to take the decision out of your hands when you’re emotional.
Practical Advice
Calibrate your circuit breakers using your Monte Carlo results (Module 5.3). If the 95th-percentile drawdown from Monte Carlo is X, set your circuit breaker a few points beyond X. This gives the strategy room to operate within its expected range while protecting against genuine failure. The same principle applies to consecutive-loss thresholds and reconciliation cadence: pick numbers calibrated to your strategy’s actual loss-streak distribution and the latency you can tolerate between an exchange-side change and your system noticing it.
Previous: 7.2 Stop-Loss Philosophy
Next: 7.4 Portfolio-Level Risk
Running multiple strategies introduces a new dimension of risk: correlation. Two strategies that are independently profitable can blow up together if they are correlated — meaning they both lose at the same time.
If you run a trend-following long strategy and a momentum long strategy on BTC, both will lose during a sudden market crash. Your portfolio drawdown is not the average of the two strategies — it’s additive. Two -15% drawdowns happening simultaneously become a -30% portfolio drawdown.
Mitigation strategies:
Key Insight
The most underrated risk in crypto is that everything is correlated during a crash. BTC, ETH, SOL, altcoins — they all drop together during a market-wide deleveraging event. Cross-asset diversification within crypto alone is limited. True diversification requires non-crypto assets (FX, commodities, indices) or strategies that profit from crashes (shorts, volatility strategies).
Previous: 7.3 Circuit Breakers
Before writing code, you need to decide how the system is structured. This decision affects everything: how easy it is to add strategies, how failures propagate, how you monitor and debug.
| Architecture | Description | When to Use |
|---|---|---|
| Single Script | One Python file does everything: fetch data, calculate signals, place orders | First prototype, one strategy, one exchange |
| Modular Monolith | One application with separate modules for data, strategy, execution, and monitoring | 1–3 strategies, one exchange, serious but not complex |
| Per-Exchange Containers | Each exchange gets its own Docker container with the full strategy stack. Shared data layer. | Multiple exchanges, multiple strategies, production deployment |
Our production system uses per-exchange containers. Each exchange runs in its own Docker container with its own strategy engine, order executor, and state management. They share a data layer (candle database) and a regime detection service. If one container crashes, the others keep running.
Per-exchange container architecture. Each container is independent and can crash without affecting others. Shared services provide data and regime detection.
Do not start with per-exchange containers. Start with a single script. Get it working. Then refactor into modules. Then containerise. Premature architecture is as dangerous as premature optimisation.
Previous: 7.4 Portfolio-Level Risk
Next: 8.2 Essential Components
Every trading system, regardless of architecture, needs these six components. Miss any one and the system has a critical gap.
Pulls candle data from the exchange API, validates it (Module 3.4), and stores it. Runs on a cron schedule (e.g., daily at 00:30 UTC). Must handle: API rate limits, pagination, incomplete candle correction (overlap window), and network failures.
Loads candle data, calculates indicators, evaluates entry/exit conditions and gates, and produces a signal: BUY, SELL, or HOLD. Must be deterministic: same input always produces same output. All parameters come from a config file, not hardcoded values.
Translates signals into exchange API calls. Handles: order placement, order status checking, partial fills, order cancellation, retry on transient errors, and permanent error classification. Must know the difference between “try again in 5 seconds” and “stop, this will never work” (e.g., insufficient balance, invalid symbol).
Periodically checks: what does the bot think its position is vs what the exchange actually shows? If they differ, something went wrong. This catches: phantom positions (bot thinks it’s in a trade but isn’t), untracked external closes, and failed order acknowledgements.
Persists the bot’s state to disk (an embedded SQL database, JSON, or another local store) so it can resume correctly after a restart. State includes: current position, entry price, stop-loss level, strategy-specific variables, and last processed candle timestamp. Without this, a restart means the bot doesn’t know if it’s in a trade.
A way to see what the bot is doing and get notified of important events. Minimum: instant-messaging alerts (a chat-based alert bot) for trade entries, exits, and errors. Better: a web dashboard showing current position, recent trades, and system health. Our production systems use a Python web framework for internal dashboards and a chat-based instant-messaging channel for real-time alerts.
War Story
Our order executor initially treated whole ranges of exchange errors as “transient” (retryable). This meant permanent errors — “IP not whitelisted,” “bad authentication,” “insufficient balance,” “parameter error,” “position-size violation” — were retried hundreds of times over hours before giving up. The fix: a small allowlist of genuinely transient error codes (rate-limit, network-timeout, temporary-server-error) documented per exchange. Everything else is classified as permanent and fails immediately. Error classification is not glamorous work, but it’s the difference between a system that recovers gracefully and one that hammers a dead API for hours.
The executor sits between intent (“buy 1 BTC at market”) and reality (a possibly-partial fill on a possibly-flaky API). The patterns below are what separate a toy executor from one you can leave running unattended.
Every order has a deterministic clientOrderId derived from the underlying intent — not a fresh UUID per call. If you crash mid-submit and retry, the venue dedupes on the ID and gives you back the existing order rather than creating a duplicate. Pattern:
def make_client_order_id(strategy_id, symbol, intent_ts, nonce):
# Deterministic from intent. Same inputs --> same ID.
raw = f"{strategy_id}|{symbol}|{intent_ts}|{nonce}"
return hashlib.sha256(raw.encode()).hexdigest()[:32]
def submit_idempotent(intent):
coid = make_client_order_id(
intent.strategy_id, intent.symbol,
intent.intent_ts, intent.nonce,
)
try:
return venue.place_order(client_order_id=coid, **intent.params)
except VenueError as e:
if e.code in {"DUPLICATE_CLIENT_ORDER_ID", "ORDER_ALREADY_EXISTS"}:
return venue.get_order_by_client_id(coid) # already accepted
raiseMost tier-1 venues honour clientOrderId for deduplication for at least a few hours. Read your venue’s docs for the dedup window and design your retry policy to fit inside it.
Maintain a tight allowlist of retryable error codes per venue. Everything not on the list fails fast.
RETRYABLE = {
"RATE_LIMIT", # HTTP 429 or venue-specific
"NETWORK_TIMEOUT", # transport-level
"TEMP_SERVER_ERROR", # 5xx
"VENUE_OVERLOAD", # documented transient
}
PERMANENT = {
"INVALID_SIGNATURE", "INVALID_TIMESTAMP", # config error
"INSUFFICIENT_BALANCE", "POSITION_LIMIT", # state error
"INVALID_SYMBOL", "INVALID_PARAMETER", # logic error
"IP_NOT_WHITELISTED", "PERMISSION_DENIED", # auth error
}
# Anything not in either set: log, alert, treat as permanent until classified.Never retry on permanent errors. Hammering a dead API doesn’t fix it; it just buries the real problem under noise and burns your rate-limit budget.
To move a stop or change a price, you have two options:
Prefer amend wherever the venue supports it. Especially for stop-loss adjustments, where the exposure window between cancel and replace is exactly the window during which you might need the stop.
Track filled_qty separately from order_qty in local state. Decide a policy per intent:
The choice is strategy-dependent; the requirement is that you make it explicitly, encode it in config, and re-size every dependent order (stop, take-profit) to match the actual filled quantity (Module 7.2).
Two ways to learn what happened to your order: poll the REST endpoint or subscribe to the venue’s private order-update websocket. Differences:
Use websocket for order updates wherever the venue supports it. Keep polling as a fallback for reconnection scenarios and for the reconciliation pass — the websocket is for “tell me what changed,” the REST poll is for “tell me ground truth.”
clientOrderId for dedup on retryPrevious: 8.1 Architecture Decisions
Your bot will crash. Your server will restart. The exchange will go down for maintenance. The question is not whether this happens, but whether your system recovers correctly when it does.
Store this in an embedded SQL database or a JSON file. Update it after every state change. Read it on startup.
Three tables are the irreducible core: orders, positions, fills. Use any embedded SQL store you like — the shape is what matters. Schemas below are vendor-agnostic.
-- orders: every order ever submitted, current and historical
CREATE TABLE orders (
id INTEGER PRIMARY KEY AUTOINCREMENT, -- monotonic local ID
client_order_id TEXT NOT NULL UNIQUE, -- deterministic; idempotency key
strategy_id TEXT NOT NULL,
symbol TEXT NOT NULL,
side TEXT NOT NULL CHECK (side IN ('buy','sell')),
order_type TEXT NOT NULL CHECK (order_type IN ('market','limit','stop','stop_limit')),
qty REAL NOT NULL,
price REAL, -- NULL for market
status TEXT NOT NULL CHECK (status IN
('PENDING','SUBMITTED','ACK','PARTIAL_FILL',
'FILLED','CANCELLED','REJECTED','UNKNOWN')),
submitted_at INTEGER NOT NULL, -- epoch ms
last_updated_at INTEGER NOT NULL,
exchange_order_id TEXT, -- assigned by venue; NULL until ACK
error_code TEXT,
error_message TEXT
);
CREATE INDEX idx_orders_strategy_status ON orders (strategy_id, status);
CREATE INDEX idx_orders_symbol_status ON orders (symbol, status);
-- positions: net exposure per (strategy, symbol)
CREATE TABLE positions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
strategy_id TEXT NOT NULL,
symbol TEXT NOT NULL,
side TEXT NOT NULL CHECK (side IN ('long','short','flat')),
qty REAL NOT NULL,
avg_entry_price REAL NOT NULL,
unrealised_pnl REAL NOT NULL DEFAULT 0,
realised_pnl REAL NOT NULL DEFAULT 0,
opened_at INTEGER NOT NULL,
closed_at INTEGER -- NULL while open
);
CREATE INDEX idx_positions_strategy_symbol ON positions (strategy_id, symbol);
-- fills: every execution event the venue reports
CREATE TABLE fills (
id INTEGER PRIMARY KEY AUTOINCREMENT,
order_id INTEGER NOT NULL REFERENCES orders(id),
exchange_fill_id TEXT NOT NULL UNIQUE, -- dedup on replays
qty REAL NOT NULL,
price REAL NOT NULL,
fee REAL NOT NULL,
fee_currency TEXT NOT NULL,
ts INTEGER NOT NULL -- epoch ms
);
CREATE INDEX idx_fills_order ON fills (order_id);Two design notes:
client_order_id is UNIQUE — the database itself enforces idempotent submission. A retry that re-inserts the same intent gets a constraint violation, not a duplicate order.exchange_fill_id is UNIQUE in fills — a websocket replay or a polling-overlap won’t double-count a fill into your PnL.An order moves through a finite set of states. Every transition has a trigger, every terminal state is reached intentionally or by timeout.
exchange_order_id.client_order_id and transition the row to its real state.UNKNOWN and trigger reconciliation by client_order_id. Terminal states (FILLED, CANCELLED, REJECTED) are write-once.Order lifecycle state machine. Every transition has a trigger; every terminal state is reached intentionally or by timeout. The UNKNOWN state is the recovery hatch — no order ever stays lost.
Trigger and timeout rules:
submit() call. Persist row before the network call so a crash after-send leaves a recoverable record.exchange_order_id.client_order_id: query the venue, find the order’s real state, transition the row.Idempotent clientOrderId Pattern
Generate clientOrderId = hash(strategy_id + symbol + intent_timestamp + nonce) — deterministic from intent. On the wire, your submit code is just if not exists(clientOrderId): submit_order(...). The venue dedupes on its side; your DB’s UNIQUE constraint dedupes on yours. A network retry, a process restart mid-submit, a re-sent message from a flaky pipeline — none of them can produce a doubled position. This single pattern eliminates an entire class of phantom-position bugs.
Treat the exchange as the source of truth for state (positions, order statuses, fills). Treat your local DB as the source of truth for intent (strategy_id, signal_id, the why behind each order). Reconciliation is the periodic alignment of these two.
def reconcile(strategy_id, now):
# 1. Pull ground truth from venue
venue_orders = venue.get_open_orders(strategy_filter=strategy_id)
venue_positions = venue.get_positions(strategy_filter=strategy_id)
# 2. Pull local view
local_orders = db.select_open_orders(strategy_id)
local_positions = db.select_positions(strategy_id)
# 3. Diff and classify each discrepancy
diffs = []
for vo in venue_orders:
lo = find_by_client_id(local_orders, vo.client_order_id)
if lo is None:
diffs.append(("MISSING_LOCALLY", vo)) # venue has it, we don't
elif lo.status != vo.status or lo.qty != vo.qty:
diffs.append(("STATE_MISMATCH", lo, vo)) # statuses differ
for lo in local_orders:
if not any(vo.client_order_id == lo.client_order_id for vo in venue_orders):
diffs.append(("MISSING_ON_EXCHANGE", lo)) # we have it, venue doesn't
# Same diff for positions, comparing (symbol, side, qty).
# 4. Apply resolution rules atomically
with db.transaction(): # all-or-nothing
for d in diffs:
kind = d[0]
if kind == "MISSING_LOCALLY":
# Venue wins on existence + state; we annotate with intent if recoverable
db.insert_order_from_venue(d[1], strategy_id=strategy_id)
log.warning("reconcile.insert", coid=d[1].client_order_id)
elif kind == "STATE_MISMATCH":
# Venue wins on qty/status; local keeps strategy_id, signal_id
db.update_order_state(d[1].id, status=d[2].status, qty=d[2].qty)
log.warning("reconcile.update", coid=d[1].client_order_id)
elif kind == "MISSING_ON_EXCHANGE":
# Order is gone (filled, cancelled, expired). Mark terminal, fetch final state.
final = venue.get_order_history(d[1].client_order_id)
db.update_order_state(d[1].id, status=final.status)
log.warning("reconcile.terminal", coid=d[1].client_order_id)
return diffsResolution rules in one line: exchange wins on state (qty, status, fills); local wins on intent metadata (strategy_id, signal_id, the reason this order exists).
client_order_id, not the full sweep.All updates from a single reconciliation pass run inside one DB transaction. Either every diff is applied or none is — you never want a torn state where half the positions match and half don’t after a crash mid-loop.
Two Generals’ Problem
You cannot guarantee that exchange and local agree at any single instant. Between “I sent the cancel” and “I learned the cancel was processed,” reality and your view of reality are different. This is fundamental, not a bug to fix — the same impossibility result that prevents two generals from coordinating an attack over an unreliable channel applies here. Design for eventual consistency with a bounded delay: after at most one reconciliation cycle, local and remote should agree. Document the bound. Alert when it’s breached. Don’t pretend you’ve eliminated the gap — you haven’t; you’ve only narrowed it.
War Story
A pending entry order expired as “failed_permanent” but the state management code didn’t roll back properly. The strategy retained the entry_price and peak_price from the signal time, putting it in “holding mode” with no actual position. It was managing a phantom position for 6 days — trailing a stop on nothing. The fix: when any entry order fails, explicitly clear entry_price and peak_price back to null. The reconciliation loop would have caught this within an hour, but the state rollback prevented it from happening in the first place.
orders, positions, fills with appropriate uniqueness constraintsPrevious: 8.2 Essential Components
Strategy parameters should live in config files, not in code. This lets you change thresholds, add strategies, and adjust risk without modifying source code or redeploying.
# config/strategies/sma4_weekly.yaml
strategy:
name: "SMA4 Weekly Slope"
enabled: true
direction: long_only
timeframe: weekly
entry:
sma_period: 4
slope_threshold: 0 # slope > 0 to enter
close_position_min: 0.75 # CP gate
efficiency_ratio_min: 0.20 # ER gate
exit:
slope_exit: true # exit when slope turns negative
crash_exit: -0.15 # exit on 15% weekly drop
risk:
position_pct: 1.0 # 100% of capital (spot, no leverage)
stop_loss_pct: 0.20 # 20% trailing stop
stop_type: exchange_side # placed as exchange order
trend_resumption:
enabled: true
momentum_lookback: 2 # re-enter if close > close[2 weeks ago]
YAML configuration for a strategy. Every parameter is explicit. Changing a threshold is a config edit, not a code change.
Previous: 8.3 State Management
Next: 8.5 Using AI to Build
You do not need to be a professional software engineer to build a trading system. Modern AI coding assistants can write, debug, and refactor code at a level that would have required years of experience five years ago. Here is how to use them effectively.
| Tool | Best For | Access |
|---|---|---|
| Claude Code (CLI) | Full system builds — reads your codebase, writes files, runs tests | Terminal / IDE extension |
| ChatGPT | Exploration, hypothesis generation, explaining concepts | Browser / app |
| Replit Agent | Quick prototypes if you don’t have a server yet | Browser |
For building the actual production system, a CLI-based AI tool that can read your files, run your tests, and edit your code directly is dramatically more productive than copy-pasting between a chat interface and a text editor.
Your sizing function says “buy 0.13427 BTC at $94,517.83”. The venue rejects it. Then it rejects the next one, and the next, while you watch your bot fire and miss for an hour straight. Welcome to contract math — the unglamorous layer between “intended order” and “order the venue will actually accept.”
Every symbol on every venue advertises four numerical constraints. An order that violates any of them is rejected; you don’t get a partial credit for getting three out of four right.
minQty). The smallest size the venue will accept. Below this, your order is rejected with a “below minimum” error. Different per symbol; sometimes different across the same symbol on linear vs inverse contracts.stepSize). Quantity must be a multiple of this increment. If stepSize = 0.001, then 0.13427 is invalid; 0.134 is valid. Round down (never up — rounding up can push you over your risk budget).tickSize). Limit-order price must be a multiple of this increment. If tickSize = 0.5, then $94,517.83 is invalid; $94,517.50 is valid. For a buy, round down (better price for you, more fillable); for a sell, round up. The exact convention varies; pick one and document it.minNotional). The order’s value — quantity × price — must clear a venue-wide floor (often $5, $10, or similar). This is independent of minQty; an order can pass minQty and still fail minNotional if the price is low enough. Especially relevant for low-priced altcoins and for sizing-down trades during drawdowns.Fetch these constraints from the venue’s symbol-info / instrument endpoint at startup, cache them, and refresh on a schedule. They do change — venues adjust tick size after sustained price moves and shift step size for new contract series.
Two perpetual contract families dominate. They look superficially similar in a venue UI but the P&L math is fundamentally different, and getting them confused will cause your sizing to be off by a factor that depends on price.
P&L = qty × (exit_price - entry_price) for a long. Long BTC at $90,000, exit at $100,000, qty 0.1 → P&L = 0.1 × $10,000 = $1,000. Simple.P&L (in coin) = qty_usd × (1/entry_price - 1/exit_price) for a long. Same trade, expressed as “long $9,000 of BTC at $90,000, exit at $100,000” → P&L = 9000 × (1/90000 - 1/100000) ≈ 0.01 BTC.Two consequences operators repeatedly miss:
Rather than scatter rounding logic across every place that places an order, isolate it in one helper that takes intended values and returns venue-compliant ones (or raises). The contract is small and explicit:
class RoundingHelper:
def __init__(self, symbol_info):
self.min_qty = symbol_info.min_qty
self.step_size = symbol_info.step_size
self.tick_size = symbol_info.tick_size
self.min_notional = symbol_info.min_notional
self.contract_type = symbol_info.contract_type # "linear" | "inverse"
def prepare(self, intended_qty, intended_price, side):
# 1. Round qty DOWN to step
qty = floor(intended_qty / self.step_size) * self.step_size
# 2. Round price to tick (down for buy, up for sell)
if side == "buy":
price = floor(intended_price / self.tick_size) * self.tick_size
else:
price = ceil(intended_price / self.tick_size) * self.tick_size
# 3. Validate against floors
if qty < self.min_qty:
raise OrderTooSmall(f"qty {qty} below min {self.min_qty}")
if qty * price < self.min_notional:
raise OrderTooSmall(f"notional {qty*price} below min {self.min_notional}")
return qty, priceTwo non-obvious choices baked in: rounding qty down (so we never accidentally exceed our risk budget by rounding up to the next step), and raising on impossible orders rather than silently shrinking. A silent shrink-to-zero is worse than a loud rejection — the loud rejection bubbles up and your strategy can decide whether to skip the trade or alert.
War Story: The First Fifty Orders Were All Rejections
A bot went live on a new symbol. Sizing function emitted clean fractional quantities. Venue’s step size was 0.001; the bot was emitting 0.0014213 with five digits of precision. Every order: rejected. The bot didn’t crash — it logged the rejection, moved on, and waited for the next signal. Forty-seven signals over the next eight hours, all rejected, none caught because the alert threshold for “high reject rate” was set at “5 in 5 minutes” and the signal frequency was lower than that. The fix was twelve lines of code. The miss was an entire day’s opportunity. Lesson: validate the rounding helper end-to-end on a one-tick test order on every new symbol you add, not just symbols you’ve traded before.
RoundingHelper that enforces minQty, stepSize, tickSize, and minNotionalThe environment you use to discover a strategy and the environment you use to run a strategy have different requirements that conflict at every turn. Trying to satisfy both inside one environment gives you neither: a research environment too lean to explore in, or a production environment too fat to trust.
Research is exploratory. You want notebooks, large historical datasets sitting on disk, a half-dozen plotting libraries, the ability to reach for a GPU when you decide to fit something heavier, and tolerance for mutable state — you re-run cells, you keep variables around, you experiment. Dependencies sprawl naturally because you don’t know in advance what you’ll need.
Production is the opposite. You want a small, deterministic, immutable container that does exactly one thing. Every dependency in production is a security and reliability surface; every megabyte of image is something that has to download and start cleanly when a host fails over. The environment must be reproducible byte-for-byte from version control. Mutable state is your enemy.
The conflict is total. A single environment that satisfies research also drags Jupyter, plotting libraries, two ML frameworks, and a CUDA stack into your live trading container — multiplying the surface area of what can break and what can be exploited, while making the container slow to start and impossible to audit.
The asymmetry doesn’t mean two parallel implementations — that’s the worst of both worlds. The split that works:
from lib.strategies.my_strategy import generate_signals exactly like the production runner does. The research environment provides the data, the exploration tools, the plotting — the strategy logic is shared.generate_signals function with live data and routes the output to the order executor.If the research notebook and the production runner are calling the same function with the same arguments, their behaviour is identical by construction. Bugs you find in one are fixed in the other for free. Backtest-vs-live divergence becomes a data problem, never a code problem.
The single most common environment-separation failure looks like this:
# In production_runner.py, deep in the strategy module
from research.notebooks.helpers import compute_indicator
Now your production code path imports a notebook helper. Three things have just gone wrong: production now requires Jupyter to be installed; production behaviour depends on a file that lives in the “mutable, exploratory” part of your repo; and a researcher refactoring their notebook has just changed live trading behaviour without realising it. The file path is the bug.
The remedy is mechanical: the production environment cannot import anything outside lib/. Enforce it with import path discipline; if you have a build pipeline, fail the build when production code imports from research/.
The simplest layout that enforces the split:
repo/
├── lib/ # SHARED (production must only import from here)
│ ├── strategies/
│ │ └── my_strategy.py # pure functions: data in, signals out
│ ├── indicators/
│ ├── execution/
│ └── risk/
├── research/ # NEVER imported by production
│ ├── notebooks/
│ ├── adhoc_scripts/
│ └── data/ # large local datasets
├── production/ # the live runner
│ ├── runner.py # imports from lib/ only
│ ├── Dockerfile # small, lean, deterministic
│ └── requirements.txt # minimal
└── tests/
└── strategies/ # tests run on lib/ — same code as productionThe path from idea to live is the same every time, with explicit gates:
research/. The output is a clear yes/no on whether to invest the engineering effort to formalise it.lib/strategies/. Refactored into a pure function with explicit inputs and outputs. The notebook now imports the function and uses it; nothing strategy-relevant lives in the notebook anymore.generate_signals the notebook uses, the production runner will use, and the backtester uses — one code path, three call-sites.At every stage the strategy logic is the same shared code. The only thing changing is the data (historical vs live) and the side-effects (none in research; orders submitted in production).
Key Insight
“Research code” and “production code” is the wrong frame. There is strategy code, which is the same in both, and there is scaffolding — notebooks, plotters, datasets in research; runner, executor, watchdog in production — which is different by necessity. Get the shared/scaffold split right and the “backtest worked but live doesn’t” class of bug largely disappears.
lib/ and is imported, byte-identical, from both research notebooks and the production runnerresearch/Previous: 8.6 Order Rounding & Contract Math
Next: 9.1 Server Setup
Your trading bot runs 24/7. It cannot run on your laptop. You need a server — a virtual private server (VPS) in the cloud that is always on, always connected, and accessible from anywhere.
We run all production trading infrastructure on modest dedicated hardware (a few cores, ~64GB RAM) from a tier-1 European dedicated-server provider. Why this class of provider, rather than a hyperscaler:
For a first system, a small shared VPS is sufficient (a few vCPU, 8GB RAM, a recent Ubuntu LTS). Upgrade to dedicated hardware when you have multiple strategies running.
Choose Ubuntu 24.04 LTS. Set up SSH key authentication (no password login). Configure the firewall (UFW) to allow only SSH (port 22) and any ports your dashboards need.
Most Ubuntu 24.04 installs come with Python 3.12. Verify with python3 --version. Install pip and venv.
Docker containerises your trading bot so it runs in an isolated environment with all dependencies. Install Docker Engine and Docker Compose. This is covered in section 9.2.
Clone your trading system repo. Set up deploy keys so the server can pull code from GitHub without your password.
Store all API keys, secrets, and configuration in a .env file on the server. Never commit this to git.
.env file on the server (not in git)Previous: 8.7 Research vs Production
Docker wraps your trading bot and all its dependencies into a container that runs identically everywhere. No more “works on my laptop but not on the server” problems.
restart: unless-stopped).docker compose up -d starts everything. docker compose down stops everything.# docker-compose.yml (minimal example)
version: "3.8"
services:
btc-strategy:
build: .
container_name: btc-strategy
restart: unless-stopped
env_file: .env
volumes:
- ./data:/app/data # persist database
- ./config:/app/config # strategy configs
ports:
- "8080:8080" # dashboard
Minimal Docker Compose file for a trading bot. The bot auto-restarts on crash, loads secrets from .env, and persists data to a mounted volume.
The minimal compose above is a starting point. The version below adds the three pieces that separate a toy deployment from one you can leave running unattended: a healthcheck, a finite restart policy, and named volumes for stateful data.
# docker-compose.yml (operator-grade)
version: "3.8"
services:
btc-strategy:
build: .
container_name: btc-strategy
env_file: .env
# Restart policy: bounded retries, not infinite crash loop
restart: on-failure
deploy:
restart_policy:
condition: on-failure
max_attempts: 5
window: 120s
# Healthcheck: container is "healthy" only when /health responds 200
healthcheck:
test: ["CMD", "curl", "--fail", "--max-time", "5", "http://localhost:8080/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 30s # grace window on first boot
volumes:
- btc_strategy_data:/app/data # named volume; survives container recreate
- ./config:/app/config:ro # configs read-only into container
ports:
- "8080:8080"
volumes:
btc_strategy_data: # declared once, persists independent of container lifecycle
Why each change matters:
restart: on-failure with max_attempts: unless-stopped retries forever, which means a bug that crashes on boot becomes an infinite crash-loop that fills logs and masks the real issue. on-failure with a finite retry surfaces persistent failures to your alerting instead of hiding them under restart spam.healthcheck:: Docker now knows the difference between “process is running” and “process is operational.” Your watchdog (Module 9.5) and any orchestrator can read the health status; without it, a hung-but-not-crashed process looks healthy from the outside../data:/app/data) ties your state to a path on the host, which is fine until you move servers or recreate the container with a different working directory. A named volume (btc_strategy_data) is owned by Docker, lives independently of the container, and survives docker compose down. For stateful containers (your trading bot’s local SQL DB lives here) this is the safer default../config as :ro means a misbehaving container cannot write to your strategy YAMLs.The application must expose the /health endpoint the healthcheck calls (covered in Module 9.5). Without it, the healthcheck can’t do its job.
Practical Advice
Docker is overkill for your first prototype. Run the bot directly with python3 run.py first. Containerise once you have a working system and want it to survive server reboots and crashes automatically. Docker adds a layer of complexity that is not worth it during the “does this even work?” phase.
docker compose up/downPrevious: 9.1 Server Setup
Before risking real money, run the system in shadow mode: it processes real market data, generates real signals, but does not place real orders. It simulates what would have happened. This is the final validation step before going live.
Minimum: 2–4 weeks. Longer for low-frequency strategies. You need enough time to observe:
For a weekly strategy that trades 3 times per year, you might need 2–3 months to see a full signal cycle. For a daily strategy, 2 weeks may suffice.
Do Not Skip This
The temptation to skip paper trading and “just go live with a small amount” is strong. Resist. A bug that mismanages state or miscalculates position size will cost you real money. Paper trading costs nothing and catches errors that no amount of backtesting reveals. Every professional trading desk paper-trades new strategies before deploying capital.
Without an explicit fill-simulation spec, paper P&L cannot be compared to live P&L — you don’t know whether divergence is “the strategy decayed” or “the simulator was optimistic.” The spec below is the minimum operator-grade contract for what paper fills mean.
mid + slippage_bps in the adverse direction, where the slippage model is parametric or empirical (see below).State which model is in use and why:
For each funding tick (typically every 8h, sometimes 1h or 4h depending on venue) the position is open, accrue funding using the actual historical funding rate for that interval:
funding_payment = position_notional * funding_rate * sign
# sign: longs pay when funding is positive; shorts pay when funding is negative
position.realised_pnl += -funding_paymentCarry funding through the trade’s PnL accounting, not as a separate ledger — otherwise paper PnL looks rosier than live PnL on funding-heavy markets.
Byte-for-Byte Parity Rule
Paper signals must use the same signal-evaluation code as live. Identical inputs, identical indicator implementations, identical gates, identical config. The only difference is the order-submission step: paper writes a synthetic fill to a paper_fills table; live submits to the venue. Anything else — a separate “backtest engine” that re-implements the strategy, a config that flips off a gate “just for paper,” a different time source — produces drift between paper and live evaluation, which makes the paper test useless. If the paper system says enter and the live system would not have, paper P&L tells you nothing about live performance. This is a hard rule, not a guideline.
Previous: 9.2 Docker
Next: 9.4 Going Live
The moment of truth. Your strategy is validated, your infrastructure is tested, your paper trading is consistent. Here is how to transition to live trading safely.
Ensure USDT (or your collateral currency) is in the correct sub-account on the exchange. For isolated margin: ensure the margin is allocated to the correct position type.
In your config: mode: live and testnet: false. Double-check. This is the single most important config change you will ever make.
Even though your strategy is validated at 1% risk, start live with 0.25% or 0.5% for the first week. This limits damage if there is a bug that paper trading didn’t catch. Scale up after the first few live trades confirm everything works.
Watch the first live entry in real time. Verify: the order was placed, the fill price is reasonable, the exchange-side stop-loss was set, the state was persisted correctly, and the dashboard shows the correct position.
A separate process that monitors the bot and alerts you if anything goes wrong: container crash, API errors above threshold, reconciliation mismatch, or cron job failure.
Key Insight
The transition from paper to live is psychologically harder than technically hard. You will feel the urge to override the system, to take profits early, to increase position size after a win. Trust the system. If you validated it properly through Modules 5–6, the system knows better than your emotions do. Your job now is to monitor, not to intervene.
Previous: 9.3 Shadow Mode
Next: 9.5 Monitoring & Alerts
A live trading system that you don’t monitor is a time bomb. This section covers the minimum monitoring setup to keep your system healthy and catch problems before they cost money.
| Event | Alert Method | Priority |
|---|---|---|
| Trade entry/exit | Instant-messaging alert with details | Informational |
| Stop-loss triggered | Instant-messaging alert | Important |
| API error rate spike | Instant-messaging alert | Urgent |
| Container crash | Instant-messaging alert from watchdog | Critical |
| Reconciliation mismatch | Instant-messaging alert | Critical |
| Cron job missed | Staleness check in watchdog | Important |
| Account drawdown beyond threshold | Instant-messaging alert + circuit breaker activation | Critical |
A separate script (not part of the trading bot) that runs on a cron schedule and checks:
docker ps)If any check fails, send an instant-messaging alert. This is your insurance against the 3am crash you sleep through.
Every container exposes two HTTP endpoints. The distinction matters: a container can be alive without being operational.
/health — process is alive. Returns 200 if the HTTP server is responding. Used by Docker healthcheck and the watchdog’s liveness probe./ready — process is operational. Returns 200 only if: data is fresh (latest candle younger than 2 candle intervals), DB is reachable, last reconciliation succeeded, no fatal-error flag is set. Used by the watchdog’s readiness probe.@app.get("/health")
def health():
return {"status": "alive", "ts": now_ms()}, 200
@app.get("/ready")
def ready():
checks = {
"data_fresh": latest_candle_age_seconds() < 2 * candle_interval_seconds(),
"db_reachable": db_ping(),
"reconcile_ok": last_reconcile_age_seconds() < max_reconcile_age,
"no_fatal": not fatal_flag.is_set(),
}
if all(checks.values()):
return {"status": "ready", "checks": checks}, 200
return {"status": "not_ready", "checks": checks}, 503An always-200 /health is a lie if the bot is hung, blocked on a deadlock, or has lost its data feed. /ready is what tells you whether to trust the system right now.
The watchdog is a separate process — usually a cron-driven shell or Python script — that exercises the system from the outside. The minimum check set:
# 1. Liveness: each container responds to /health
for container in $TRADING_CONTAINERS; do
curl --fail --max-time 5 "http://${container}:${PORT}/health" \
|| alert "container ${container} not alive"
done
# 2. Readiness: each container reports operational
for container in $TRADING_CONTAINERS; do
curl --fail --max-time 5 "http://${container}:${PORT}/ready" \
|| alert "container ${container} not ready"
done
# 3. Data freshness (sanity check, even if /ready already covers it)
latest_ts=$(query_db "SELECT MAX(ts) FROM candles WHERE symbol=$SYMBOL")
age=$((NOW - latest_ts))
[ "$age" -gt "$((2 * INTERVAL))" ] && alert "candles stale: ${age}s old"
# 4. Position reconciliation: exchange == local for every (strategy, symbol)
diffs=$(reconcile_dry_run --all-strategies)
[ -n "$diffs" ] && alert "reconcile diffs: ${diffs}"
# 5. Heartbeat-to-monitor: container writes last_loop_at to a status file
for container in $TRADING_CONTAINERS; do
last=$(stat -c %Y "/var/run/${container}.heartbeat")
age=$((NOW - last))
[ "$age" -gt "$HEARTBEAT_THRESHOLD" ] && alert "${container} heartbeat stale: ${age}s"
done
# 6. Disk and log volume
df -h | awk '$5 ~ /9[0-9]%|100%/ { print }' | grep . && alert "disk pressure"Run the watchdog from a process that cannot share a failure mode with the trading containers: a different host where practical, or at minimum a separate systemd unit on the same host. If your watchdog dies with your bot, you have no watchdog.
ts, level, strategy, event, coid, etc.). Free-text logs are unsearchable at the volume a live system produces.Pick a few measurable targets and alert when reality breaches them. Examples calibrated for a daily/intraday system at retail size:
| Metric | Target | Alert when |
|---|---|---|
| Signal-evaluation completeness | 99% of expected evaluations completed within 1 candle interval | Two consecutive candles missed |
| Order submission latency | 99% of orders submitted within 500ms of signal | P99 above 1s for 5+ minutes |
| Reconciliation discrepancy rate | <1% of orders show diff at reconciliation | Sustained >5% over an hour |
| Alert delivery latency | <60s from breach detection to alert delivered | End-to-end test fails |
| Heartbeat staleness | Last loop write within N × cycle | Stale beyond N=3 cycles |
| Error budget (per-endpoint 4xx/5xx) | <0.5% of calls in steady state | Spike above baseline by 5× |
Numbers are illustrative — calibrate to your strategy’s timescale. The point is to have numbers, not to invent them ad hoc when something breaks.
Not every alert deserves to interrupt you. Tier alerts by required action:
Alert fatigue is a real failure mode. If every notification is “urgent,” the genuinely urgent ones get muted with the rest. Be ruthless about what gets to interrupt sleep.
For every alert your system can fire, write a runbook entry. Four lines, not a novel:
docker exec ... reconcile --dry-run; check the diff output.”manual in positions. If unintentional, close it via venue UI and re-run reconcile.”Runbooks live in version control next to the code, not in someone’s head. The point of writing them is the next 3am page is handled by reading, not thinking.
/health and /readyPrevious: 9.4 Going Live
Next: 9.6 Disaster Recovery
Your monitoring catches problems. Your watchdog restarts crashed containers. Both fail when the host itself dies, the disk corrupts, the provider has an outage, or your DB silently rots. Disaster recovery is the layer below the watchdog — the plan for when the layer above the disaster has stopped working too.
The trading system’s state DB — orders, positions, fills, strategy state — is the only piece of local data that, if lost, cannot be reconstructed in minutes. Candle history can be re-fetched. The DB cannot. Two layers of backup, both required:
Encryption is non-negotiable: the backup contains your full trading history and any secrets your DB persisted. Treat it as you would the live DB.
The Only Backup That Works Is the One You’ve Restored
Backups that you have never restored are not backups; they are hopes. Several of the most expensive failures in production systems share the same plot: backups were running for years, and on the day they were needed, the restore process failed — corrupted file, missing dependency, version mismatch, key lost. Schedule a monthly restore drill. Spin up a fresh container from your backup against a clean disk; verify the DB starts; verify the data is intact (row counts, recent timestamps, a known query). Treat a failed drill as a Sev-1 incident.
If the box dies right now, how long until you have a replacement running? The honest answer for most retail operators is “hours, while I remember how I set it up.” The target answer is “under 30 minutes, from a script.”
Every server in the system should be rebuildable from a git repository. The minimum content of that repo:
docker-compose.yml (or equivalent) that brings up the trading containers, the watchdog, the log shipper, the DB.RECOVERY.md at the repo root with the literal commands, in order, to bring up a new host from cold metal to live trading.If you can’t hand the repo to a competent engineer who has never seen it and have them bring up a working replica in under an hour, the IaC isn’t complete.
Deploying a new version of the trading bot directly over the running version is a needless risk. The blue/green pattern:
active = true places orders. Switch from blue to green; observe; if anything looks wrong, flip back.Many failure modes — a subtle indicator change, a new bug introduced by a refactor, a venue API behaviour you didn’t notice — only manifest under live data. Blue/green lets you catch them with a five-second rollback rather than a forty-minute redeploy.
Your provider can have an outage. Whole datacentres lose power. Networks partition. The defence isn’t complex multi-region orchestration on day one — it’s a documented runbook plus a cold standby:
The worst case: local DB is corrupted, latest backup is also corrupted, infra is intact but state is gone. You don’t know what you own.
The recovery is structural: the venue is the source of truth for fills, orders, and positions. Every fill that ever happened was recorded by the venue; every open position has a record there. The reconcile-from-venue procedure:
clientOrderId tags — this is why the clientOrderId discipline in Module 8 is non-negotiable. Without strategy tags in the order id, you cannot tell which strategy owns which position.This procedure is slow and tedious. The point isn’t that it’s elegant; the point is that it exists, it’s documented, and you have rehearsed it once. The day you need it is not the day to discover that fill history beyond 30 days isn’t available on the venue’s standard API.
The simplest discipline that covers almost every backup failure mode: 3 copies, 2 different media, 1 offsite.
Retail systems often run with 1 copy and call it “a backup.” That’s not a backup; that’s a single point of failure with extra steps.
Key Insight
Disaster recovery is the discipline of treating the worst case as inevitable. It will happen. The only question is whether your future self has 30 minutes of script-execution between disaster and resumption, or 30 hours of panic. The cost of preparing now is small; the cost of not preparing now is unbounded.
A strategy validated on 2020–2021 data (explosive bull market) will get destroyed in 2022 (grinding bear). The strategy didn’t break. The market changed. This section explains why regime awareness is the single biggest factor in whether a system survives long-term.
Markets exist in distinct regimes. Each regime has different statistical properties:
| Regime | Characteristics | What Thrives | What Dies |
|---|---|---|---|
| Bull Trend | Strong upward momentum, shallow pullbacks, high confidence | Trend following, momentum, dip buying | Short selling, mean reversion |
| Bear Trend | Sustained declines, relief rallies that trap longs, fear | Short selling (if gated by regime), cash | Dip buying, leverage longs |
| Chop / Range | No direction, false breakouts, whipsaws | Mean reversion, range strategies, cash | Trend following, breakout strategies |
| High Volatility | Large daily moves, wide spreads, fast liquidations | Wider stops, smaller positions, volatility selling | Tight stops (get stopped by noise) |
| Low Volatility | Compressed ranges, narrow spreads, low volume | Patience, breakout anticipation | Active strategies (not enough movement) |
Running a trend-following strategy during chop is the most common and most expensive regime mismatch. The system enters on a “breakout,” the breakout fails, the system exits at a loss, enters again on the next “breakout,” that fails too. Each trade loses fees + slippage. After 10 whipsaw trades, you’ve lost 5–10% of your account with zero market exposure. This is called death by a thousand cuts.
The solution: don’t trade in regimes where your strategy has no edge. This is what regime gates are for.
Key Insight
The highest-value improvement you can make to any strategy is not a better entry signal. It is a regime gate that prevents the strategy from trading when the market is in the wrong state. Our regime-conditional short system added an efficiency-ratio gate (only trade during low-efficiency, choppy periods within bear regimes) and CAGR roughly doubled while max drawdown dropped meaningfully — well into double-digit percentage-point improvement.
Previous: 9.6 Disaster Recovery
A regime detector classifies the current market state so your strategies can gate on it. It does not need to be complex. A simple moving average slope + volatility measure gets you 80% of the way.
The pattern below uses a slow weekly moving-average slope (direction) crossed with a volatility or efficiency measure (character of motion). Pick your own indicators — the structure is what matters:
| Condition | Regime |
|---|---|
| Slow weekly slope positive AND volatility below its mid-range percentile | Bull (low-vol) — ideal for trend following |
| Slow weekly slope positive AND volatility in the upper percentile band | Bull (high-vol) — trend following with wider stops |
| Slow weekly slope negative AND daily efficiency-ratio low | Bear (choppy) — short-side strategies |
| Slow weekly slope negative AND daily efficiency-ratio elevated | Bear (trending) — cash or aggressive shorts |
| Slow weekly slope near zero (within a narrow neutral band) | Chop — mean reversion or sit out |
This is not sophisticated. It does not need to be. The goal is to prevent your trend strategy from trading during chop and your short strategy from trading during bull markets. Broad strokes are enough.
For more nuanced detection, add external data:
These are filters, not signals. They don’t tell you what to trade. They tell you whether conditions are favourable for your strategy type.
Previous: 10.1 Why Strategies Fail
Next: 10.3 Macro Overlays
Crypto does not exist in a vacuum. It is influenced by the US dollar, interest rates, equity markets, geopolitical events, and broader risk appetite. A macro overlay gives your system awareness of these external forces.
| Indicator | Relationship to BTC | Data Source |
|---|---|---|
| DXY (US Dollar Index) | Inverse — strong dollar pressures BTC | Retail FX/CFD broker free API, or FRED |
| US 10Y Treasury Yield | Higher yields = tighter liquidity = BTC pressure | FRED, or a retail broker API |
| S&P 500 / NASDAQ | Positively correlated in risk-on periods | Retail FX/CFD broker free API (look for SPX500/NAS100 instruments) |
| Gold (XAU/USD) | Weakly correlated; both are “alternative” assets | Retail FX/CFD broker free API |
| VIX | High VIX = fear = BTC sell-off risk | CBOE official feed (spot index) or a market-data redistributor. Note that retail FX/CFD brokers often only offer a synthesised VIX-like instrument that tracks VIX futures, not the spot index — if you use one, label clearly and don’t conflate it with spot VIX. |
| Fear & Greed Index | Extremes tend to revert (contrarian signal) | alternative.me API (free) |
You do not need to trade these instruments. You just need to read them as context for your crypto strategies. “DXY just spiked 2% and VIX is above 30” is important context when your BTC strategy wants to go long.
Previous: 10.2 Building a Regime Detector
Across almost every investigation we have run, the volatility filter is the single most impactful dimension. Strategies that are flat overall become strongly positive when filtered by volatility regime. This pattern recurs so consistently that it deserves its own section.
From a calendar-effect investigation (a deliberate test of a weak claim, used here as an example of how filters change a verdict):
From a derivatives-driven contrarian signal we tested:
The same pattern appears in trend following, mean reversion, and derivatives signals. Low-volatility environments compress ranges, reduce noise, and make genuine signals cleaner. High-volatility environments are full of noise that triggers false signals.
Key Insight
If your strategy performs inconsistently, the first thing to test is a volatility split. Measure ATR (Average True Range) as a percentile of its 90-day distribution. Filter your backtest results by “ATR percentile < 50th” (low vol) vs “ATR percentile > 50th” (high vol). In our experience, this single filter is the most likely to turn a mediocre strategy into a strong one — or to reveal that the strategy only works in one volatility regime.
Previous: 10.3 Macro Overlays
Your system is live. Now you need to know: is it performing as expected? And if not, is it a normal deviation or a sign that the edge is dying?
| Metric | Compare To | Concern Threshold |
|---|---|---|
| Win rate | Backtest win rate | >10 percentage points below backtest after 20+ trades |
| Average win / Average loss | Backtest ratio | Ratio has degraded by >30% |
| Profit factor | Backtest PF | Dropped below 1.0 over 20+ trades |
| Max drawdown | Monte Carlo 95th percentile | Approaching or exceeding MC95 |
| Trade frequency | Expected from backtest | Significantly more or fewer trades than expected |
Some deviation is expected — live trading will never perfectly match backtesting due to slippage variance, execution timing, and market microstructure differences. The question is whether the deviation is within the range your Monte Carlo simulations predicted.
Previous: 10.4 The Volatility Filter
Edges die. Market structure changes. What worked in 2024 may not work in 2027. Detecting degradation early — before it costs serious money — is a core skill. The hard part is doing it statistically rather than by eyeballing a rolling chart, especially for low-frequency strategies where 20 trades take years to accumulate.
The 20-Trade Window Is Not a Test
A common heuristic is “watch the rolling 20-trade window.” This is fine as an attention trigger but useless as a decision rule, especially for low-frequency strategies. A strategy that fires 3 times a year needs ~7 years of live trading to fill a 20-trade window. By the time the heuristic flags a problem, you’ve already lost the money. Use the window for noticing; use the methods below for deciding.
Pick one or two and commit to them in the strategy’s monitoring spec. The choice matters less than the discipline of applying it.
For low-frequency strategies where even bootstrap CIs are unreliable, fall back to a structural sanity check: does the strategy still produce signals on instruments where it should, and do those signals still correlate with the things they used to correlate with? If you have a funding-rate-extreme reversal strategy and funding extremes still occur but no longer mean-revert, that is degradation evidence even without any new live trades. If the strategy is signal-silent on instruments it used to fire on, the underlying condition is gone.
This is qualitative but it bridges the gap when the trade count is too thin to support a formal test.
Each of these is a flag to investigate, not a verdict. Run the statistical tests above before acting.
Practical Advice
Set a hard kill switch on a statistically-grounded trigger — e.g. CUSUM crossing its control limit, or posterior probability of positive edge below 50% — not on a raw 20-trade rolling number. Have it auto-reduce position size to 25% and alert you. The 20-trade window can sit alongside as an attention prompt; it should not be the trigger.
Previous: 11.1 Live Performance vs Backtest
Once the data, backtester, falsification suite, and paper-trading rig are in place, the same components can be wired into a loop that runs without you driving each step. The loop runs periodic anomaly scans, generates candidate hypotheses, routes survivors through validation, and queues them for paper trading. A human still approves anything that touches live capital.
The automated research loop. Anomaly scans run on a schedule; survivors are filtered, evaluated, formalised into testable hypotheses, registered, and routed to paper trading. Human approval is the gate between paper and live.
The scanner is a battery of statistical tests run on a fixed schedule against the operator’s instrument universe. The test categories cover:
A typical scan produces a long list of raw findings. After filtering and LLM-assisted evaluation, a smaller subset becomes new hypotheses. Most will fail validation. Some will survive. The ones that survive become candidates for the paper-trading pipeline.
No strategy goes live without human approval. The system scans, evaluates, registers, and paper-trades on its own. The decision to allocate real capital is a separate, manual step. The reason is structural, not philosophical: paper-trading parity is never perfect, and the cost of a mis-approved live deployment is much higher than the cost of a slow approval queue.
Why Build This
The point of the loop is throughput. A human researcher can investigate one or two hypotheses a week. The scan-evaluate-paper pipeline can run hundreds of candidates through the same falsification process while you sleep, and surface only the small number that survived. You don’t need to build this on day one — the early-stage operator should run scans manually first — but every component you build (data pipeline, backtester, falsification suite, paper trader) is reusable here. This is what they compose into.
Previous: 11.2 Strategy Degradation
Next: 11.4 Cross-Pollination
When you find a signal on one instrument, test it on every other instrument you have data for. Edges that transfer across markets are more likely to be real. Edges that only work on one asset are more likely to be noise.
You discover that BTC shows a mean-reversion pattern after extreme funding rate readings. Instead of only trading BTC with this signal, test it on ETH, SOL, and every other perpetual futures instrument in your data. If the pattern persists across 5+ instruments, the underlying mechanism (crowded positioning creates mechanical pressure) is likely real. If it only works on BTC, it might be BTC-specific or overfitted.
Even more powerful: test signals across asset classes entirely.
Our research environment maintains an instrument universe spanning crypto, FX, commodities, and indices. Every finding is automatically tested across all of them. This cross-pollination is where the most surprising and robust edges are found — because an edge that works across multiple markets is much harder to explain away as noise.
Starting Point
You don’t need a large instrument universe to cross-pollinate. Start with 3: BTC, ETH, and one non-crypto asset (gold, or a major equity index via a retail FX/CFD broker). If a signal works on all three, it is almost certainly capturing a real market dynamic. If it only works on one, investigate why before trusting it.
Previous: 11.3 Automated Research Loop
The systematic trader’s tax obligation is non-trivial, jurisdiction-specific, and easy to break in ways you only discover at year-end. Forgetting it once is expensive. Building the records to handle it is something you do at the start, not at the end.
The pattern shows up regularly: an operator runs a high-frequency strategy for a year, makes money, and arrives at tax season with a screenshot of the venue’s P&L tab and the assumption that it will suffice. It will not. The venue’s P&L tab is not a tax record — it’s a marketing surface. It typically excludes funding payments, bundles fees in ways your jurisdiction may not accept, doesn’t track per-lot cost basis, and goes back only as far as the venue feels like keeping it. The accountant asks for the data the tax authority will ask for, you don’t have it, and the cost is either an estimated assessment (almost always against you) or a forensic reconstruction project that costs more than the year’s profits.
The defence is structural and cheap if done early: the records you need at tax time are exactly the records the trading system already produces. You just have to make sure they’re saved, exportable, and reconcilable. That’s a one-time engineering cost that pays itself back the first time you produce a clean tax export in an hour rather than a panic-week.
This Module Is Not Tax Advice
Nothing here is legal or financial advice. Tax law is jurisdiction-specific, changes annually, and is rarely as “obvious” as a layperson’s reading suggests. Use this module to know what records you need to keep and what questions to ask. Engage a tax specialist who has explicit experience with crypto derivatives in your jurisdiction before you have material P&L. The cost of a specialist is small relative to the cost of getting it wrong.
Previous: 11.4 Cross-Pollination
Next: 12.2 Cost Basis Methods
When you sell 1 BTC, which BTC did you sell? It is not a rhetorical question — the answer changes your taxable gain. Cost basis methods are the rules for assigning a purchase price to each disposal. The crucial thing is to pick one, document it, and apply it consistently.
The simplest production-grade implementation is a per-symbol FIFO queue of lots. Each lot stores (qty, price, timestamp); on a disposal, lots are popped from the front until the disposal qty is satisfied:
class FifoCostBasis:
def __init__(self):
self.lots = defaultdict(deque) # symbol -> deque of (qty, price, ts)
def add_acquisition(self, symbol, qty, price, ts):
self.lots[symbol].append((qty, price, ts))
def realise_disposal(self, symbol, qty, price, ts):
remaining = qty
realised = 0.0
consumed = []
while remaining > 0 and self.lots[symbol]:
lot_qty, lot_price, lot_ts = self.lots[symbol][0]
take = min(lot_qty, remaining)
realised += take * (price - lot_price)
consumed.append((take, lot_price, lot_ts, price, ts))
remaining -= take
if take < lot_qty:
self.lots[symbol][0] = (lot_qty - take, lot_price, lot_ts)
else:
self.lots[symbol].popleft()
if remaining > 0:
raise InsufficientLots(symbol, qty, qty - remaining)
return realised, consumed # consumed is your per-lot tax recordThe consumed list is the row-level tax record: each entry is one (proceed, cost basis, hold-period) tuple, which is exactly what a tax preparer needs. Persist it.
Pick One. Document It. Don’t Switch.
The single biggest cost-basis mistake is switching methods between years — or worse, between trades within a year — because each year’s software defaulted differently. Tax authorities treat unexplained method changes as a flag: at minimum it forces a reconciliation; at worst it triggers an audit. Pick one method, document the choice in a written policy, apply it across every symbol and every year, and only switch with a tax specialist’s blessing and a paper trail.
Spot trades and perpetual futures are usually taxed as different categories of asset, even though to your bot they look identical. The treatment difference can be material — one is realised on disposal, the other is often realised continuously.
In most jurisdictions, spot crypto disposal is a capital gains event: you compute (proceed − cost basis) at the moment of sale. Holding period often matters — many jurisdictions distinguish short-term (taxed as ordinary income) from long-term (lower rate). What counts as a “disposal” is broader than people expect: selling crypto for fiat is obvious; swapping one crypto for another is also a disposal in most jurisdictions, as is using crypto to pay a fee. Spending it counts. Lending it sometimes counts.
Perpetual futures and dated futures are often treated as a separate asset class, frequently with mark-to-market treatment: at year-end, every open position’s unrealised P&L is treated as if realised, taxed in that year, and the cost basis is reset for the next year. Two operator-relevant consequences:
On perpetuals, you receive or pay funding at each interval. Treatment varies: some jurisdictions treat it as ordinary income / expense at each tick; others bundle it into the position’s P&L. The system needs to record every funding event regardless — treatment decisions belong to the tax preparer.
Previous: 12.2 Cost Basis Methods
The good news: the system you built already records everything a tax authority will ask for. The work is making sure those records are complete, immutable, and exportable in a form your tax preparer can actually use.
clientOrderId, the strategy that owned it.Tax records are an append-only log. Once written, never edited — if a fill needs correcting, write a correcting entry, don’t mutate the original. The same backup discipline from Module 9.6 applies: continuous WAL streaming plus periodic encrypted snapshots, with at least one offsite copy. Tax records are also subject to retention requirements — many jurisdictions require 5–7 years of immutable history; some longer.
Don’t Trust the Venue’s Tax-Export Tool
Most venues offer a “tax export” button. Use it as a sanity check, never as your primary record. The reasons are non-negotiable: venues change export formats year-to-year (sometimes mid-year); venues lose history after retention windows expire; venues delist symbols and the data goes with them; venues have been known to fail entirely, taking their export tool with them. Your records must be venue-independent. Keep them in a vendor-neutral format you control — CSV, Parquet, your own database — and treat the venue’s export as a cross-check, not a source of truth.
Previous: 12.3 Spot vs Derivatives Tax Treatment
Next: 12.5 Practical Setup
The practical implementation is small: a tax-export view that joins the row-level records you already have, an annual export procedure, and a reconciliation gate that catches mismatches before your accountant does.
tax_export ViewBuild a database view (or materialised table refreshed nightly) that joins fills, funding payments, deposits, withdrawals, and fiat ramps into one chronological event stream. Columns at minimum:
timestamp_utc | event_type | venue | symbol | side | qty | price |
fee | fee_asset | counterparty_id | tx_hash | strategy | notesWhere event_type is one of fill, funding, deposit, withdrawal, fiat_in, fiat_out, transfer. The output is one append-only stream, sortable by timestamp, with every event in your trading history. From this view, any required tax report — capital gains schedule, income summary, fee deduction list — is a query.
Before sending the export to your accountant, run a reconciliation:
Key Insight
If your tax export and your accountant’s computation disagree, one of three things is wrong: a record is missing, a method is being applied inconsistently, or you have misunderstood the treatment of one event type. All three are findable by reconciliation. None of them are findable by trusting the venue’s export tool.
tax_export view or materialised table joining all event typesPrevious: 12.4 Records You Need to Keep
Next: 12.6 Structures
The legal structure you trade through changes your effective tax rate, your asset protection, and your administrative overhead. The right structure is jurisdiction-specific and revenue-specific; this section is the framework, not the answer.
Don’t Choose This Yourself
Each structure interacts with your jurisdiction’s rules in ways that are not obvious. The wrong choice for your situation — e.g. a company that gets recharacterised as “not carrying on a business,” or a trust that doesn’t qualify for the income-splitting benefit you set it up for — can be more expensive than no structure at all. Talk to a specialist before earning material money. The cost of the consultation is a rounding error against the cost of choosing wrong.
Previous: 12.5 Practical Setup
Tax discipline isn’t about being a tax expert. It’s about having the records, the method, and the relationships that make tax season a one-day clean handoff instead of a multi-week reconstruction project.
You’re ready when, asked at any point in the year, you can:
Key Insight
Tax is the unglamorous discipline that determines whether you keep what you make. A trader who clears 50% gross but loses a third of it to disorganisation, missing records, and worst-case-default tax positions has a worse net than a trader at half the gross with clean records and a sharp specialist. Build the records on day one.
Previous: 12.6 Structures
Next: 13.1 The Drawdown Test
The system will draw down. It is not a question of whether; it is a question of when, and how deep, and how you behave while it’s happening. The drawdown is the test the rest of the playbook is preparing you for.
Up until the drawdown, “systematic” is just a posture. You backtest, you specify rules, you write a falsification suite, you put it on a server. The system makes money for a while. You feel virtuous. You haven’t been tested yet.
The test is the drawdown. The system is doing exactly what it was specified to do. The Monte Carlo distribution from Module 5.3 said this drawdown was within the 95% confidence band. The market is not broken; the strategy is not broken; the only question is whether you stay out of the way while the system finishes the recovery curve it was built to walk.
Most people don’t. They override. They “just close this one position because it doesn’t look right.” They “reduce exposure until things calm down.” They “temporarily turn off the strategy and watch.” Each of these is the moment the operator stopped being systematic. The drawdown didn’t cost them money — it was already costing them money on paper, and would have recovered. The override cost them the system itself.
War Story (Composite, Illustrative)
An operator runs a validated trend-following strategy through a deep drawdown — the system is at the worst point of its expected MC distribution but inside the band. At the trough, the operator manually closes the largest position because “it just feels wrong.” Two days later the position would have reversed and the system would have closed it for a small loss instead of a large one. The operator misses the recovery, sits in cash, and within a month is paper-trading their old strategy alongside three new ones “just to compare.” They never go fully systematic again.
The lesson is not “you missed gains.” The gains are recoverable. The lesson is that the operator now knows about themselves that under stress they will override the system — which means they don’t have a system, they have a tool they pick up and put down based on how they feel. The cost of breaking discipline once is the precedent that follows.
You either trust your validated system more than your in-the-moment gut, or you don’t. There is no honourable middle position; the “hybrid” trap (Module 1.1) is the same trap dressed up. The drawdown is where you find out which side you’re actually on.
If the answer is “I don’t trust it,” the right move is not to override during a drawdown. The right move is to kill the strategy when it’s not in drawdown, with a clear head, and rebuild whatever was missing in your validation. Override-during-drawdown is the worst-case version of every decision; it is made under stress, with the loudest emotional input and the least information.
Previous: 12.7 Module Competency Checklist
Not every manual intervention is illegitimate. Some are necessary; some are catastrophic. The skill is being able to tell, in the moment, which one you’re about to do.
The discipline that makes this real: every manual override goes in a log, with a written reason, a timestamp, and a classification. Three columns plus a free-text reason field:
timestamp_utc | classification | action | reason
2026-04-12T14:33Z | Engineering | halted strategy_X | stop-loss didn't fire on fill at 14:31; logs show ...
2026-04-15T09:01Z | Risk | reduced position to 50% | gross exposure 142% of budget after correlated fills
2026-04-22T22:18Z | Emotional (logged) | none taken | wanted to close but recognised this is emotionalThe third row is the most important kind of entry. Logging an emotional impulse you did not act on trains the muscle of recognising the impulse without acting. Over time, the proportion of (Emotional, action_taken) entries should fall to zero; the proportion of (Emotional, none_taken) entries should rise and then fall as the impulse itself fades.
The test of legitimacy: can you write a non-emotional reason? If the “reason” field reduces to “I don’t like how this looks,” the override is emotional. Don’t take the action.
Key Insight
The log is not bureaucracy — it is a structured pause. The act of writing “Engineering / Risk / Emotional” before the action forces the question. Three out of four times you reach for the keyboard in distress, the writing exercise itself reveals you don’t have a non-emotional case, and the override doesn’t happen.
Previous: 13.1 The Drawdown Test
A drawdown protocol is a contract you sign with your past self in writing, when your head is clear, that your future self — in distress, in the middle of a drawdown — agrees to honour. Pre-commitment is not a nice-to-have; it’s the only mechanism that beats in-the-moment emotional override.
Before going live, write down three numbers. Quantitative, specific, signed and dated. They will not be perfect — they don’t need to be. They need to exist:
The thresholds must be quantitative, written, dated, and committed to in advance. Two failure modes to avoid:
The protocol works on one principle: honour the contract with your past self even when your present self disagrees with it. Your past self chose those numbers in conditions of clear-headed analysis; your present self is in a drawdown. The past self has better epistemic access to the truth of the strategy than the present self does. Trust the past self’s numbers, not the present self’s gut.
If you find yourself wanting to renegotiate the contract during the drawdown, that wanting is itself diagnostic information — it tells you the protocol is doing exactly what it was built to do, which is sit between you and the worst version of your judgement. Honour it; renegotiate later, in writing, in non-stress conditions.
Previous: 13.2 The Three Override Modes
Next: 13.4 Information Hygiene
Your system runs on its own clock. Your attention does not. The operator who watches the P&L tick by tick is not getting more information — they’re getting more emotional load on the same information. Hygiene matters.
Watching live P&L move minute by minute is corrosive even when the P&L is going up. The brain treats the equity curve as feedback — up feels good, down feels bad — and that feedback loop quietly shifts your relationship with the system away from “trust the validated process” and toward “feel the line.” The drawdown then triggers an emotional response disproportionate to its statistical significance, because you’ve been emotionally invested in every wiggle for weeks.
The discipline: read the dashboard once per day, at a fixed time, for a fixed duration. Pick a time that’s outside any major venue rollover or funding tick — mid-morning local works for most. Five to fifteen minutes. Look at the metrics that matter (current positions, current P&L, recent trades, watchdog status, any pending alerts), confirm the system is healthy, close the dashboard. Don’t graze.
Module 9.5 covers the mechanics of alert routing; this is the operator-side complement. Every alert that fires without requiring an action is teaching your nervous system to ignore alerts in general. By the time the genuinely urgent alert fires — reconciliation hard-fail, drawdown threshold breach, exchange API blackout — you’ve already trained yourself to swipe it away with the rest.
The rule: every alert reaching your phone should require an action you would actually take in the next 15 minutes. Trade entries, daily PnL, “rate-limit recovered” — these belong in a dashboard or a daily-digest email, not on the lock screen.
Discretionary traders’ opinions on your validated systematic strategy, especially during a drawdown, are not signal. They are a hostile influence on your decision-making, even if the discretionary trader is a friend. Their emotional state is not yours, their information set is not yours, and their incentives during your drawdown are unaligned (they often want company in their pessimism).
Concrete moves: during drawdowns, mute or unfollow the noisy chat groups. Don’t doom-scroll instrument-specific Twitter. Don’t read Reddit threads about your strategy’s underlying instrument. The validated system is the system; its inputs are price, your indicators, and your gates — not other people’s panic.
Key Insight
Information hygiene is not stoicism cosplay; it is risk management. Every input you let into your decision loop during a drawdown is an input that can override the validated system. The validated system survives more than the operator’s real-time emotional state survives. Limit the inputs and the operator survives the drawdown alongside the system.
Previous: 13.3 The Drawdown Protocol
The system has a bug. Money is on the line. You are stressed. This is the worst possible state in which to ship a fix — and it is exactly the state in which most fixes get shipped. Discipline is what stands between you and a worse bug than the one you started with.
The temptation in an incident is to skip steps because you “know what’s wrong.” You don’t. You have a hypothesis. The hypothesis is contaminated by stress, by recency bias, by the symptom that’s loudest, and by the fix you would emotionally prefer to be the answer. Three out of four times you push a fix in this state, the symptom changes but the bug remains, and now there’s a second bug stacked on top.
The discipline is mechanical:
Stressed engineers ship typos. The single highest-leverage anti-pattern intervention is to not work alone during incidents. The pair partner does not need to be an expert — they need to be a second voice that asks “wait, what does that line do?” before you push it. A competent AI assistant in pair-programming mode counts; a human friend on a video call counts; the rule is just “not alone with the keyboard at 2am, scared.”
The pair’s job is to slow you down. They will catch the missing semicolon, the wrong sign on a comparison, the off-by-one in the time window, and the deployment-to-prod-instead-of-staging that you would otherwise make. Their cost is their attention; the saving is the bug they prevent.
The Anti-Pattern
“I know what’s wrong, let me push a quick fix.” This sentence, spoken at 3am during a P&L event, is responsible for more compounded losses in retail systematic trading than any single market move. The fix is rarely as quick as it sounds. The push is rarely as safe as it feels. If you find yourself saying it, the correct response is to halt the system, walk away from the keyboard for ten minutes, and then come back to the runbook.
Previous: 13.4 Information Hygiene
Next: 13.6 Burnout
Running a 24/7 system you have to maintain is corrosive over time. The cost is not the work in any one week; it’s the cumulative absence of off-switch. Burnout in systematic trading is not a personal failing — it’s a system architecture failure, and it has system architecture solutions.
The blunt test: can you go on holiday for two weeks, leave the system running, and not check it? If the answer is no — if you have to dial in daily, if certain manual interventions only you can do, if there are alerts that only your judgement can resolve — you don’t have a systematic operation. You have a job. The job pays you, but it owns you.
The remedy is automation, not willpower. Anything you find yourself doing manually on a schedule should be automated. Anything that requires your judgement to resolve should either become a rule (and therefore automated) or be acknowledged as an unsystematic dependency that limits the strategy’s viability long-term. The system runs without you; the only role you should be filling on a daily basis is “person who reads the dashboard once and confirms it’s healthy.”
Running a system continuously for years requires deliberate periods where you are not the operator. Not just “I’m not at the keyboard” — properly off. Phone notifications muted (except for true Critical-tier alerts), dashboard not opened, mental model of the system not engaged. A weekly half-day, a monthly weekend, a quarterly week.
Build the system to make this safe: redundant alerting that reaches a designated backup contact (a paid service, an instrument-monitor alongside, anything that catches a true emergency without you), and a hard-coded rule that during your scheduled off-time the system reduces position size or halts new entries. Both can co-exist — the system trades through your weekend; the watchdog is loud enough that a true emergency reaches you anyway; routine alerts don’t.
If you’re too anxious to sleep, the system is undersized for your psychological capital, even if it’s correctly sized for your financial capital. The fix is not “tough it out” — the fix is to reduce capital until you can sleep. Sleep-deprived operators make worse decisions during incidents, are more prone to emotional override, and burn out faster. The Sharpe of a strategy run by a rested operator is higher than the same strategy run by an exhausted one, holding everything else constant.
The ratchet works in both directions: as you live with the system through drawdowns, your psychological capital grows, and you can size up. But size up because you’re sleeping fine and have been for six months, not because the recent equity curve is flattering.
Key Insight
The systematic trader’s long-term P&L is bounded above by the number of years they can keep running the system. A strategy with a 30% CAGR run for two years before burnout produces less wealth than a 15% CAGR strategy run for fifteen years. Architect for endurance from the start. The slower you build the operating dependence on your daily presence, the longer the compounding window.
Psychological discipline isn’t a vibe; it’s a checklist. The markers below are concrete, observable, and either present or not. If they’re not, the system isn’t safe yet — not because the code is wrong, but because the operator is.
The Final Bar
The systematic trader who has built every other module in this playbook but skipped this one will, statistically, blow up. Not because the code was bad, but because the operator overrode the code. The discipline modules are the cheapest insurance you can buy; the cost is doing the writing exercises now, when nothing is on fire, instead of trying to do them in the middle of a 25% drawdown when nothing else feels stable.
Previous: 13.6 Burnout
This is the methodology for building your own system, validated against real data, hardened through adversarial testing, and deployed on infrastructure you control. The strategies in this playbook are examples of the process. The process itself is what you take away.
Every war story, every diagram, every falsification test came from building and operating a real system with real money. The expensive mistakes have already been made. This playbook is how you avoid making them again.
Disclaimer
General educational information only. The author is not a licensed financial advisor. Nothing in this material constitutes personal financial advice or a recommendation to trade. Past performance does not predict future results. Crypto trading carries substantial risk of total loss. Consider seeking advice from a licensed advisor in your jurisdiction before making any financial decisions.