风控与杠杆

rocky-bot 的所有风控集中在三个机制：RiskCaps（阈值配置）、CircuitBreaker（运行时跳闸）、Position-Cap Gate（策略层主动避险）。本文还会回顾 backend 侧的 LEVERAGE_V1 = 10 常量为什么是修复 margin-leak 的关键。

如果还没读过策略本身，建议先看策略详解。

一、RiskCaps —— 阈值配置

源码：rocky_bot/risk.py

@dataclass
class RiskCaps:
    max_loss_usdc: float = 50.0          # 累计 PnL loss 超过这个 → CB 跳闸
    max_notional_usdc: float = 200.0     # 单账号 position notional cap（main 内被覆盖为 150）
    max_leverage: int = 10               # （目前仅作记录，不主动校验）
    api_errors_to_trip: int = 5          # 连续 API 错误数到这个 → CB 跳闸
    pause_seconds: int = 60              # CB 跳闸后冷却时长
    feed_stale_seconds: int = 10         # Binance feed 多久无更新算 stale

每个账号一个独立的 RiskCaps 实例，通过 main.py 注入：

# rocky_bot/main.py
circuits = {
    acc.id: CircuitBreaker(RiskCaps(max_notional_usdc=150.0))
    for acc in accounts
}

注意 max_notional_usdc=150.0 是 main.py override 的（默认 200 偏松）。这个数字直接决定每个账号的最大持仓上限（150 USDC 名义价值），是漏斗形态稳定运行的关键参数。

二、CircuitBreaker —— 运行时跳闸

源码：rocky_bot/risk.py

每个 CircuitBreaker 维护以下状态：

class CircuitBreaker:
    caps: RiskCaps
    _consecutive_errors: int
    _cumulative_pnl: float        # 累计 wallet 损益（实时差值）
    _opened_at: float | None      # 跳闸时间戳，None=未跳闸
    _open_reason: str | None
    _last_wallet: float | None    # 上次见到的 wallet 余额，用来算 delta

2.1 触发跳闸的三种条件

触发器	条件	触发动作
API 错误累积	`_consecutive_errors >= api_errors_to_trip`	跳闸 `pause_seconds` 秒
累计亏损	`_cumulative_pnl <= -max_loss_usdc`	永久跳闸（除非外部 reset）
Feed 失效	（由策略自行判断）`feed.mid()` 抛 `StaleFeedError`	跳过本轮但不计数

2.2 跳闸后的行为

async def iterate_once(self):
    if self.circuit.is_open():
        return    # ← 所有 3 种策略循环开头都这一句
    ...

策略循环每轮第一件事就是检查 CB，若 open 就直接返回，等下一轮再试（pause_seconds 后 CB 自动恢复 → 下一轮通过）。

2.3 状态更新点

self.circuit.record_api_success()    # 每次 API 调用 OK 后清 _consecutive_errors
self.circuit.record_api_error()      # 每次 API 调用失败 +1
self.circuit.update_wallet(usdc)     # 每轮 balance() 后更新 _cumulative_pnl
self.circuit.record_realised_pnl()   # （未在 bot 内调用，预留）

策略代码示例（来自 ladder.py）：

balances = await self.client.balance()
self.circuit.record_api_success()
for b in balances:
    if b["asset"] == "USDC":
        try:
            self.circuit.update_wallet(float(b["balance"]))
        except (KeyError, ValueError):
            pass
        break

三、Position-Cap Gate —— 策略主动避险

不是所有风险都能等 CB 跳闸——CB 跳闸是事后的应急刹车，而 position cap 是事前的主动避险。

3.1 为什么需要

历史教训（详见部署与运维 §五）：

早期版本的 ladder 没有 position-cap 检查。每笔成交都会让 ladder 继续在同一 side 挂下一单 → 持仓单边累积 → wallet margin 被吃光 → -2010 insufficient balance 错误爆刷 → 实际无法再挂任何单 → 订单簿空白。

30 分钟内单账号 locked margin 从 $0 涨到 $98（钱包上限附近）。CB 这时甚至来不及跳闸（没有 API 错误，只是新挂单一直被 backend 拒绝）。

3.2 解决：每轮挂单前的 would-be 检查

所有三种策略（ladder / anchor / taker）都在 place_order 前做这个判断：

positions = await self.client.position_risk(symbol=binance_sym)
pos_amt = positions[0]["positionAmt"]
mark = positions[0]["markPrice"]

would_be = pos_amt + sign(side) * qty
if abs(would_be * mark) > caps.max_notional_usdc:
    # cancel 已有同 side 挂单（防止它成交后继续加仓）
    if live_order: await self.client.cancel_order(order_id=live_order["orderId"])
    return     # 不挂新单

3.3 关键细节

gate 在 cancel-replace 之前：先判断 cap，再判断 drift，避免"刚 cancel 旧单又 place 同样大的新单"
对手方向永远放行：cap 只约束加仓方向；reduce position 永远允许，让仓位能被对手单 / taker 吃掉
每轮重新拿 positionRisk：因为持仓可能在两次循环间被 taker / 其它 maker 成交改变

3.4 三种策略的具体应用

策略	gate 触发后的行为
Ladder	cancel 当前 same-side 挂单 + return（这一轮不挂）
Anchor	仅 cancel 触发那一侧的挂单 + 跳过那一侧；另一侧正常进行
Taker	把 side flip 到 reduce 方向（不是 skip，是反向 cross）

四、LEVERAGE_V1 —— Backend 修复

这部分严格说不属于 bot，但没有这个修复，所有 bot 侧的 cap 都失效。

4.1 问题症状

在 backend apply_trade_matched（撮合事件落账）里：

let leverage = (notional / order_margin).round_dp(0);    // 旧版

由 notional 和原始 order_margin 反推杠杆。出错场景：

部分成交后 order_margin 已经被按比例释放
价格漂移让 fill price 与 placement price 不同
上面两种叠加后 notional / order_margin 可能 round 到 7、9、11 等非 10 的整数

结果：apply_margin_recompute 用错误杠杆算 new_locked，与 decrement_with_margin_release 释放的 order margin 数额不匹配 → accounts.locked 出现幽灵增量（每笔 $1-3）。

4.2 诊断数据

部署 invariant logger 后，30 分钟收集 956 条违规记录。180 笔交易让 diff 上涨（永不下降）。pos_sum 数值含 142857 循环（即 1/7）。

4.3 修复

// services/internal-ledger/src/apply.rs
const LEVERAGE_V1: u32 = 10;
let taker_leverage = LEVERAGE_V1;
let maker_leverage = LEVERAGE_V1;

4 行替换 12 行的推导逻辑。前提是整个 stack 上面已经 hardcode lev=10（api-gateway/routes_orders.rs:96 中 leverage: 10, // default until /v1/leverage endpoint lands），所以这是真正的 single source of truth。

4.4 验证

部署后 5 分钟 smoke + 30 分钟 monitor：

0 条 invariant violation
max_locked = $27.31（cap 内）
over_80 == 0
-2010 count == 0
一直跑 63 分钟后 max_locked 仍稳定

完整的 5 轮 margin-leak 调查过程详见 rocky.interface/docs/superpowers/specs/2026-05-25-leverage-derivation-fix-design.md。

五、所有 risk 配置点速查

参数	位置	当前值	影响
`max_notional_usdc`	`main.py` (override)	150	单账号 position notional 上限
`max_loss_usdc`	`RiskCaps` default	50	累计亏损跳闸阈值
`api_errors_to_trip`	`RiskCaps` default	5	API 错误数跳闸阈值
`pause_seconds`	`RiskCaps` default	60	CB 冷却时长
`LEVERAGE_V1`	`apply.rs` (backend)	10	撮合层计算 locked margin 的杠杆
`TAKER_AGGRESSION`	`taker.py`	0.005 (50 bps)	taker cross 多远
`DRIFT_BPS` (ladder)	`ladder.py`	0.0002 (2 bps)	现单偏离阈值
`DRIFT_BPS` (anchor)	`anchor.py`	0.0001 (1 bps)	同上，anchor 更敏感
`interval_s` (ladder)	`ladder.py`	3.0	主循环周期
`interval_s` (anchor)	`anchor.py`	2.0	同上
`base_interval_s` (taker)	`taker.py`	30.0	同上

六、相关阅读

策略详解 — 这些 risk 机制在每个策略中如何被调用
部署与运维 — 监控 risk 是否在工作的指标 + 历史事故复盘

一、RiskCaps —— 阈值配置​

二、CircuitBreaker —— 运行时跳闸​

2.1 触发跳闸的三种条件​

2.2 跳闸后的行为​

2.3 状态更新点​

三、Position-Cap Gate —— 策略主动避险​

3.1 为什么需要​

3.2 解决：每轮挂单前的 would-be 检查​

3.3 关键细节​

3.4 三种策略的具体应用​

四、LEVERAGE_V1 —— Backend 修复​

4.1 问题症状​

4.2 诊断数据​

4.3 修复​

4.4 验证​

五、所有 risk 配置点速查​

六、相关阅读​