Test Thresholds, Nulls & CI Methods

How HHM decides what “counts,” how we guard against coincidences, and how confident we are

What this page covers

HHM results are only as strong as their testing rules. This page documents how we set thresholds for operators (e.g., OP002 Rec, OP003 Echo, OP006 UnifiedEntropy, OP014 CollapseOrbit), how we construct null models to estimate chance levels, and how we compute confidence intervals and error control for reproducible claims.

Repro tip: Use the exact schemas in HHM_bundle.json when you (or AI) run tests. Return a Result Card with thresholds, null choice, CI method, effect sizes, and decisions.
Run this with an AI
"Load HHM_bundle.json. For the attached dataset, compute OP002 (Rec), OP003 (Echo), OP006 (Entropy),
and OP014 (CollapseOrbit). Use the bundle's default thresholds and null models.
Return a Result Card per schema with CIs, effect sizes, and pass/fail decisions."

Default Thresholds (bundle-wide, adjustable)

These are the baseline values used in the HHM minimum validation set. Adjust per dataset via the calibration flow below.

Metric / Operator Symbol Default Threshold Decision Rule Notes
Recurrence similarity (OP002) Rec(Ψ₁,Ψ₂) ≥ 0.85 Pass if Rec ≥ 0.85 Cosine similarity on matched feature space; domain-normalized.
Self-echo (OP003) Echo(Ψ) ≥ 0.90 Pass if max lag peak ≥ 0.90 (τ ≠ 0) Autocorr/RQA; report lag τ* at peak.
Entropy delta (OP006) |ΔH| ≤ 0.15 Pass if |H₁ − H₂| ≤ 0.15 UnifiedEntropy on normalized modal distribution.
Orbit agreement (OP014) ΔOrbit ≤ 1 step Pass if |n₁ − n₂| ≤ 1 Small tolerance for noisy sequences.
Identity match (composite) IMS ≥ 0.85 Pass if IMS ≥ 0.85 Bundle-defined weighted mix of Rec/Echo/Entropy/Orbit.
Result Card fields (excerpt)
{
  "operator": "OP002",
  "metric": "Rec",
  "value": 0.891,
  "threshold": 0.85,
  "null_model": "phase_randomization",
  "p_value": 0.004,
  "ci": {"method":"BCa","level":0.95,"lower":0.862,"upper":0.915},
  "effect_size": {"type":"Cohen_d","value":1.12},
  "decision": "pass"
}

Calibrating Thresholds by Dataset

Thresholds are principled defaults, not commandments. For each dataset, we calibrate using distributional checks, null simulations, and ROC analysis.

  1. Normalize features (per-basis scaling, z-score by channel/segment).
  2. Estimate chance levels with at least one null model.
  3. Target α (e.g., 0.05) and derive threshold that keeps FPR ≤ α on null samples.
  4. Check ROC/AUC if labeled positives exist; pick threshold near Youden’s J.
  5. Lock threshold before looking at holdout data.
Run this with an AI
"Using HHM_bundle.json, calibrate Rec/Echo thresholds for this dataset.
Generate null distributions via circular shift (time-series) and symbol shuffle (glyph).
Report ROC, AUC, chosen threshold for α=0.05, and freeze settings into a Thresholds block."

Null Models (choose by data type)

Nulls approximate “no structured match/recurrence.” Choose the weakest assumption that still preserves nuisance structure (e.g., spectrum, autocorrelation).

Null Use For Preserves Breaks
Random shuffle Symbols, unordered events Marginal histogram Order, recurrence
Circular time shift Time-series (EEG, audio) Spectrum, amplitude Phase alignment across windows
Phase randomization Stationary signals Power spectrum Temporal structure
IAAFT surrogate Nonlinear time-series Amplitude distribution + spectrum Higher-order dependencies
Block bootstrap Autocorrelated data Local correlation (block length) Long-range order
Label permutation Between-group tests All within-sample structure Condition linkage
Rule of thumb: If you’re testing Echo (OP003), keep the spectrum (use phase-randomized or circular-shift nulls). For Rec (OP002) across domains, use label permutations plus domainwise normalizations.
Run this with an AI
"Compute null distributions for OP003 (Echo) using phase randomization (N=2000) and circular shifts (N=2000).
Return p-values, QQ plots, and a merged conservative p via max method. Update the Result Card."

Permutation, Resampling & Power

Permutation tests

Bootstrap

Power analysis (pragmatic)

Run this with an AI
"Estimate power to detect Rec ≥ 0.85 given empirical null from circular shifts.
Use block bootstrap (block=2s) to model dependency; report N windows for 0.8 and 0.9 power."

Confidence Intervals & Effect Sizes

CI methods

Examples

Effect sizes

Run this with an AI
"Compute 95% BCa CI for Echo and Rec on the attached sequence (window=4s, step=0.5s).
Also report Cliff's δ vs null and interpret in plain language. Update the Result Card."

Multiple Comparisons & Reporting

When running many windows, channels, or operator variants, control false discoveries.

Report: raw p, adjusted p (method), number of tests, family definition, and pre-registered threshold choice.
Run this with an AI
"Apply BH-FDR (q=0.05) across 64 channels × 10 lags for Echo tests.
Return per-channel significant lags, adjusted p-values, and cluster summaries."

Cross-Domain Normalization (when comparing unlike things)

Note: Always state normalization in the Result Card. Different choices can change IMS slightly; thresholds should be calibrated under the same pipeline.

Result Card — Minimal Fields

FieldTypeDescription
operatorstringOP id (e.g., OP002)
metricstringName (Rec, Echo, Entropy, Orbit)
valuenumberObserved statistic
thresholdnumberDecision cutoff after calibration
null_modelstringe.g., circular_shift, phase_rand
p_valuenumberOne- or two-sided p
ciobject{method, level, lower, upper}
effect_sizeobject{type, value}
multiplicityobject{method, m, p_adj}
decisionstringpass / fail
notesstringPlain-language interpretation
Download HHM_bundle.json

Reproducibility Checklist

Safety & claims: HHM metrics are research signals. Don’t use them to diagnose, predict individual outcomes, or make high-stakes decisions without independent validation.