← Station

Cross-Asset Attention Hurt the Strategy

BLIP · · Research · 2 min read

Added MultiheadAttention across the 4 commodities before the LSTM in panel-VLSTM. Mean Sharpe Δ −0.30. Crude_Oil specifically lost 1.17.

Conventional wisdom in cross-asset DL says attention captures shared structure that per-asset architectures miss. The Oxford 2026 commodity-momentum paper uses a cross-asset block; transformer-style architectures default to it. I added it to my panel-VLSTM as Phase A item S10.

The architecture:

input x[B, A=4, L=24, F]  →
  per-asset VSN → h[B, A, L, hidden]
  cross-asset MultiheadAttention(d_model=hidden, n_heads=4)  ← NEW
    applied across the 4 assets at each timestep
    residual + LayerNorm
  per-asset LSTM (now sees cross-asset-aware features)
  per-asset head → pos[B, A]

Trained on H200, 5-fold walk-forward, same Sharpe-loss objective as the original panel-VLSTM. Result:

Commoditypanel_VLSTM (orig)panel_VLSTM_XAttnΔ
Gold1.331.46+0.13
Silver0.741.13+0.39
Copper1.791.25−0.54
Crude_Oil2.701.52−1.17
Mean1.641.34−0.30

Gold and Silver gained marginally. Copper and Crude_Oil collapsed. On average, the architecture lost 0.30 Sharpe.

The likely mechanism is one of these (or all):

  1. Capacity over-allocation. A 4-head attention layer over a 32-dim embedding adds ~16K params. On a panel where the LSTM proper only has ~50K, that’s a 30% bump in trainable surface area on the path to a target with low SNR. Without strong cross-asset signal to extract, the attention layer becomes a fancy noise generator.

  2. The 4 commodities don’t actually share much timestep-level signal. Gold and Silver co-move (gold-silver ratio is a stable 80-100). Copper trades on China industrial demand; Crude trades on OPEC + inventory. Pairing all four at every timestep dilutes the relationships that exist.

  3. Pre-LSTM placement may be wrong. I put attention before the LSTM so the recurrent layer sees cross-asset-aware features. Post-LSTM placement (attention over the final hidden states) might preserve per-asset learning while still allowing cross-asset adjustment at the output.

I tested (1)+(2). Didn’t try (3) — Phase A’s scope is exhausted. Future iteration could sweep placement.

Bigger lesson: conventional architectural choices don’t generalize cleanly out of domain. The Oxford 2026 paper applies cross-asset attention to a different commodity universe with different cross-correlations. Their result doesn’t transfer to mine 1:1. “It worked in the paper” is not a sufficient justification for adding parameters to a model.

The Phase A v11 candidate pool keeps panel_VLSTM_XAttn for completeness. Honest walk-forward selection de-ranks it; the original panel_VLSTM stays in the top-6.

// Discussion

Comments are powered by GitHub Discussions via Giscus. Sign in with your GitHub account to add a reply, or discuss on X.

Keyboard Shortcuts

// navigate
1 2 3
Manifest · Station · Archive
Cycle sheets
// go to (press g, then…)
g h
Home
g s
Station
g a
Artifacts
g e
Telemetry
g n
Now
g w
Watching
g r
Reading
g u
Uses
g m
Playlist
g c
Contact
g o
Colophon
// station
[ ]
Switch stream (blips / broadcasts)
/
Focus search
// reading a post
Older · newer post
k j
Older · newer post
// general
t
Cycle theme
?
Toggle this panel
Esc
Close panel