Broken Trust — Rolling an ML-DSA Subkey Downhill

TL;DR

ML-DSA (Dilithium) hides its key behind rejection sampling: released signatures are statistically independent of the secret. Leak one bit of the per-signature masking randomness and that defense collapses — each leaked bit becomes an informative relation, and the true subkey becomes the unique bottom of a hill you can roll down, no lattice reduction required.

The paper does it with 5,000–35,000 leaks (37–68× fewer than prior work) and still succeeds at 45% bit-flip noise. paper-measured Below, watch a toy version actually descend. toy · illustrative

One leaked masking bit per signature turns ML-DSA's hidden subkey into the unique minimum of a score the attacker can evaluate without the key.

The mechanism: a score you can evaluate without the key is zero exactly at the key — so local search finds it.
What's live below: a tiny self-contained engine recovering a key it generated itself — to make the dynamics visible.
What this is not: not an attack on real ML-DSA, no real key, no side-channel capture. An educational reconstruction.

Reconstructing Schubert, Müller, Seifert, Margraf, “Descent into Broken Trust: Uncovering ML-DSA Subkeys with Scarce Leakage and Local Optimization”, IACR ePrint 2026/472. Real-scale numbers are the paper's; the live descent is a toy miniature.

Live readout toy · illustrative

Toy seed —

Relations (leaks) —

Noise p (bit-flip) —

Current tier —

Best score (violations) —

Status —

Score = how many leaked relations the current candidate violates. Noiseless, it reaches 0 exactly at the true toy key. This is the toy's own run — its relation counts and noise limits are its own, not the paper's.

Watch the descent toy · illustrative

A seeded guess rolls downhill: each accepted move lowers the violation score, tier by tier (coarse → fine), toward the true toy subkey. Push the noise slider past the toy's ceiling and the descent visibly stalls — the honest failure mode.

Relations (leaks collected): 4000 Leak-bit noise p: 0% Toy seed: 1

Predict — will the score reach 0 at these settings?

Overlay a few alternate descents (different random starts) to see the spread

Score (violations) vs optimization step. Dashed verticals = tier boundaries (coarse → fine refinement). The curve only ever falls — it is the hill being descended.

Candidate subkey vs the true toy key

true

guess

—

Inside one relation toy · illustrative

An “informative relation” is just one yes/no constraint. Here is a single one, evaluated against the candidate at the current step — thousands like it carve the key out.

The score landscape toy · illustrative

The descent chart shows the optimizer's path; this shows why the path exists. Sweep any two coordinates (the other six fixed at their true values) and color each point by its violation score. The true key sits at the bottom of the bowl.

x-axis coord y-axis coord

—

How reliable is the toy? toy · illustrative

One descent recovers one key. The paper reports success over 10 independent keys (10/10). Run many toy trials at the current relations and noise — each a fresh random subkey — to see the toy's own success rate by the same methodology.

Trials:

How one leaked bit breaks the silence

Rejection sampling normally hides the key. ML-DSA forms a response z = y + c·s₁ from a fresh masking value y, the public challenge c, and the secret subkey s₁. It only releases z when z reveals nothing statistical about s₁ — that independence is the whole defense.
One leaked masking bit punctures it. Learn a single bit of y for a signature and, paired with the public (z, c), you get a constraint that does depend on s₁: an informative relation.
Many relations carve out the key. Each relation is one yes/no fact about a secret-dependent value — in the toy below, “is ⟨a, s⟩ ≥ τ?” for a public a and threshold τ. Stack thousands and only one short vector fits them all.
The score is zero exactly at the key. Count the relations a candidate violates. The true subkey violates none (noiseless) — it is the unique global minimum — and nearby candidates violate only a few. That makes the score a smooth hill.
Multi-tier hill-climbing rolls down it. From a guess, take coordinate steps that lower the score — big steps first (coarse tier), then progressively smaller ones (finer tiers) — until you hit the bottom. No lattice reduction; no other secret component needed.

Why it survives noise: when leaked bits are flipped at random, the true key no longer scores exactly zero — but thousands of relations still jointly point downhill on average, so the count-based score keeps its minimum near the truth. That is why the paper still recovers keys at 45% noise. paper-measured

Faithful-shape note. In real ML-DSA the challenge c has exactly τ nonzero ±1 entries (τ = 39 / 49 / 60 for ML-DSA-44 / -65 / -87), and a leaked bit turns a signature coefficient z = ⟨c,x⟩ + y into a two-sided interval constraint |⟨c,x⟩ − z̃| ≤ β with β = τ·η. The toy above uses a simpler one-sided threshold (⟨a,s⟩ ≥ τ) as the stand-in: same idea — one leaked bit ⇒ one linear constraint whose violations score a candidate — at a size you can watch.

No leakage vs. leakage toy · illustrative

The whole attack hinges on one thing. With zero leaked bits, the score is flat — every candidate scores the same, so nothing points toward the key. Each leaked bit adds a constraint that makes the true key stand out. Same toy key and the same candidates, scored both ways:

0 leaked bits

—

With leaks (current setting)

—

Toy vs. paper — the honesty bridge

The claim this demo makes, precisely: the toy reproduces the descent's qualitative behaviour — monotone, multi-tier convergence to a unique zero, and graceful degradation under noise — while the paper establishes the quantitative scale on real ML-DSA. The toy's small numbers are never the paper's.

Toy engine runs live here

Subkey: —
Relations to converge (noiseless): computing…
Descent shape: monotone, multi-tier, to score 0
Noise ceiling: its own (see noise explorer)

Paper (ePrint 2026/472) paper-measured

Subkey: real ML-DSA s₁/s₂ (256-wide blocks)
Relations to recover: 5,000–35,000
vs prior state of the art: 37–68× fewer
Noise tolerance: feasible up to ~45% bit-flip p

Same algorithm shape, two scales. The bridge is the behaviour — not the magnitudes.

Full toy ↔ paper mapping (what corresponds, and what differs)

Aspect	Toy illustrative	Paper measured
Subkey dimension	`n = 8`	`n = 256` (real s₁/s₂ block)
One relation (the leak)	one-sided threshold `⟨a,s⟩ ≥ τ`	two-sided interval `\|⟨c,x⟩ − z̃\| ≤ β`, `β = τ·η`
Public vector	ternary `a ∈ {−1,0,1}⁸`	challenge `c ∈ {−1,0,1}²⁵⁶` with τ nonzero
Score	count of violated relations	count-based (noisy) / excess-based (exact)
True key is…	the unique zero of the score	the unique zero (Theorem 3.1, prob → 1)
Optimizer	multi-tier coordinate hill-climb	multi-tier + blocks, lateral moves, restarts, diversification
Success metric	success rate over random keys (Run-N-trials)	10/10 keys, fixed seed 42
Relations to recover	≈ 1,500 (toy's own)	5,000–35,000 (Table 2)
Noise ceiling	≈ 20–30% (toy's own)	45% (Table 4)

The rows in the top half are faithful in shape; the bottom half are different in scale. The toy's numbers are never the paper's.

Paper-scale replay paper-measured

The paper's measured band, replayed per ML-DSA parameter set (no computation here). The 5,000–35,000 figure is the paper's aggregate across these sets and leakage-bit indices; exact per-set counts await the committed PDF.

The paper's results, visualized paper-measured

The headline reduction, drawn to scale. Prior work (Damm et al. [5], regression) needed 500k–2.4M informative relations in the high-leakage regime; this paper's hill-climbing needs 5k–35k — a 37–68× cut. Log scale.

Prior (Damm et al. [5]) vs this attack, high-leakage regime. Reduction factor above each set. Source: Tables 2–3.

Exact-setting relation counts — Table 2

The full grid behind the 5k–35k band: minimum relations for 10/10 key recovery, by parameter set and leakage-bit index j. The band's minimum (5,000) is ML-DSA-87 at j=6; its maximum (35,000) is ML-DSA-65 at j=9.

Table 2 — darker = fewer relations needed.

Noise explorer

Toy recovery rate vs leak-bit noise p (computed live from repeated runs at a fixed relation count). The toy's own ceiling is marked, alongside the paper's measured 45% — they need not be equal, and they aren't.

—

Why the toy's ceiling is lower than 45%: the toy is an 8-coordinate, single-start search; the paper's full optimizer works at real ML-DSA scale where thousands of relations average out far more noise. The toy's job is to show the descent stalls past its limit — not to match the paper's number. toy vs paper-measured

Implications & defenses

Does this mean ML-DSA is broken? No. With no leakage there is no attack, and the scheme's lattice math (FIPS 204) is untouched. What the paper sharpens is the implementation threat: the practical bar — how many signatures an attacker must observe — is far lower than thought. Details below, distilled honestly from the paper's discussion (§7).

What it means in practice

Organizations running certificate authorities, code-signing, or automated authentication routinely produce thousands of signatures per day with one long-lived key. An attacker with a side channel that leaks a single bit of the signing randomness per signature could accumulate enough leaky signatures within hours to days. Each recovered subkey becomes a perfect hint that feeds lattice reduction to recover the full signing key.

A counter-intuitive note from the paper: for subkey recovery, ML-DSA-87 (the highest security level) is actually the most exposed — its larger β = τ·η means more signatures are informative, so fewer are needed (5,000 at j=6).

Defenses — defense-in-depth

Treat individual randomness bits as security-critical. The whole attack rests on leaking even one bit per signature; protecting the masking randomness is the primary lever.
Higher-order masking — necessary but not sufficient. It raises extraction cost, but the paper notes masking alone doesn't eliminate leakage: Qiao et al. demonstrated a template attack recovering randomness bits from a masked Dilithium, and imperfect higher-order extraction maps directly onto this paper's noisy-leakage model.
Hardware-level shielding to reduce EM/power emanation at the source.
Bound signatures per key / rotate keys. Fewer signatures under one key ⇒ fewer accumulable informative relations.
Combine the above. No single measure suffices; the paper's takeaway is explicitly defense-in-depth.

What this does not change

The hardness of Module-LWE and the FIPS 204 security argument are intact. This is a leakage + local-optimization result about what becomes possible once bits leak — not a mathematical break, and not a fault-injection attack. No leakage, no attack.

Source: ePrint 2026/472, §7 (Conclusion & Discussion).

Quick self-check

Three questions. If these land, the demo did its job.

Misconceptions

Is ML-DSA broken?

No. This requires side-channel leakage of the masking randomness. With no leakage there is no attack. The standard's math is intact — this is an implementation / leakage result about what an attacker can do once bits leak, not a break of the scheme on paper.

Does this need fault injection / glitching?

No. It is a leakage + local-optimization attack, not a fault attack. Nothing is glitched; the attacker passively learns leaked masking bits and then optimizes a score. (Despite the “broken trust” framing, this is not a fault-injection result.)

Is the live demo recovering a real key?

No. It recovers a toy key the demo generated itself, at tiny dimension, purely to illustrate the algorithm's dynamics. No real ML-DSA key, no real signatures, no captured side channel. The real-scale numbers come from the paper and are badged paper-measured.

Why does it still work at 45% noise?

Because many relations jointly constrain the key. Each flipped bit only nudges the count-based score; averaged over thousands of relations, the score's minimum stays near the true key, so the descent still points downhill. (The toy, being tiny, gives up sooner — that's its own ceiling, shown above.)

Does this need a perfect score function or lattice reduction?

No. The verification routine scores a candidate using only the collected relations — no other secret component, no lattice reduction. That cheap, key-free score is precisely what makes local optimization viable.

Parameters & sources

Source of truth: Carsten Schubert, Niklas Julius Müller, Jean-Pierre Seifert, and Marian Margraf, “Descent into Broken Trust: Uncovering ML-DSA Subkeys with Scarce Leakage and Local Optimization”, IACR ePrint 2026/472 (published 2026-03-06). Full transcription with citations is in PAPER-NOTES.md.

Figure	Value	Provenance	Citation

Toy parameters toy

Verify it yourself

Known gaps (honest by construction)

The live engine is a toy miniature. Dimension 8, integer coefficients, a single-start multi-tier search — not the paper's full optimizer at real ML-DSA scale. Its relation counts and noise ceiling are its own, never the paper's 5,000–35,000 or 45%.
The paper's numbers are transcribed, not reproduced. The exact per-set, per-leakage-index counts (Table 2 → the 5,000–35,000 band), the 37.0/42.8/68.5× reductions (Table 3), and the ~45% noise result (Table 4) are read from ePrint 2026/472 and shown as paper-measured; this page does not re-run the real attack.
The noisy result is preliminary and costly. The paper's 45% figure was measured for ML-DSA-44 and -87 only (j ∈ {6,7,8}; ML-DSA-65 pending) and needs 2–6.5 million relations — two orders of magnitude more than the exact setting. "Survives 45% noise" is real, not free.
The toy leak is a stand-in. A real informative relation is a two-sided interval |⟨c,x⟩ − z̃| ≤ β (β = τ·η) derived from one leaked masking bit plus public signature data; the toy models it as a one-sided threshold on ⟨a, s⟩. Same idea (one leaked bit ⇒ one linear constraint), simpler arithmetic.
This models recovery GIVEN leakage, not how leakage is obtained. Acquiring real masking-bit leakage (power/EM/cache side channels, or a stuck-at fault) is out of scope and not simulated.
Figures verified against the full paper text. Tables 1–4 of ePrint 2026/472 were transcribed in full (see PAPER-NOTES.md); the PDF binary is intentionally not redistributed in this repo. The IACR site is Cloudflare-gated, so it isn't machine-fetched at build time.