HHM — Test Thresholds, Nulls & CI Methods

What this page covers

HHM results are only as strong as their testing rules. This page documents how we set thresholds for operators (e.g., OP002 Rec, OP003 Echo, OP006 UnifiedEntropy, OP014 CollapseOrbit), how we construct null models to estimate chance levels, and how we compute confidence intervals and error control for reproducible claims.

Repro tip: Use the exact schemas in HHM_bundle.json when you (or AI) run tests. Return a Result Card with thresholds, null choice, CI method, effect sizes, and decisions.

Run this with an AI

"Load HHM_bundle.json. For the attached dataset, compute OP002 (Rec), OP003 (Echo), OP006 (Entropy),
and OP014 (CollapseOrbit). Use the bundle's default thresholds and null models.
Return a Result Card per schema with CIs, effect sizes, and pass/fail decisions."

Default Thresholds (bundle-wide, adjustable)

These are the baseline values used in the HHM minimum validation set. Adjust per dataset via the calibration flow below.

Metric / Operator	Symbol	Default Threshold	Decision Rule	Notes
Recurrence similarity (OP002)	`Rec(Ψ₁,Ψ₂)`	≥ 0.85	Pass if Rec ≥ 0.85	Cosine similarity on matched feature space; domain-normalized.
Self-echo (OP003)	`Echo(Ψ)`	≥ 0.90	Pass if max lag peak ≥ 0.90 (τ ≠ 0)	Autocorr/RQA; report lag τ* at peak.
Entropy delta (OP006)	`\|ΔH\|`	≤ 0.15	Pass if \|H₁ − H₂\| ≤ 0.15	UnifiedEntropy on normalized modal distribution.
Orbit agreement (OP014)	`ΔOrbit`	≤ 1 step	Pass if \|n₁ − n₂\| ≤ 1	Small tolerance for noisy sequences.
Identity match (composite)	`IMS`	≥ 0.85	Pass if IMS ≥ 0.85	Bundle-defined weighted mix of Rec/Echo/Entropy/Orbit.

Result Card fields (excerpt)

{
  "operator": "OP002",
  "metric": "Rec",
  "value": 0.891,
  "threshold": 0.85,
  "null_model": "phase_randomization",
  "p_value": 0.004,
  "ci": {"method":"BCa","level":0.95,"lower":0.862,"upper":0.915},
  "effect_size": {"type":"Cohen_d","value":1.12},
  "decision": "pass"
}

Calibrating Thresholds by Dataset

Thresholds are principled defaults, not commandments. For each dataset, we calibrate using distributional checks, null simulations, and ROC analysis.

Normalize features (per-basis scaling, z-score by channel/segment).
Estimate chance levels with at least one null model.
Target α (e.g., 0.05) and derive threshold that keeps FPR ≤ α on null samples.
Check ROC/AUC if labeled positives exist; pick threshold near Youden’s J.
Lock threshold before looking at holdout data.

Run this with an AI

"Using HHM_bundle.json, calibrate Rec/Echo thresholds for this dataset.
Generate null distributions via circular shift (time-series) and symbol shuffle (glyph).
Report ROC, AUC, chosen threshold for α=0.05, and freeze settings into a Thresholds block."

Null Models (choose by data type)

Nulls approximate “no structured match/recurrence.” Choose the weakest assumption that still preserves nuisance structure (e.g., spectrum, autocorrelation).

Null	Use For	Preserves	Breaks
Random shuffle	Symbols, unordered events	Marginal histogram	Order, recurrence
Circular time shift	Time-series (EEG, audio)	Spectrum, amplitude	Phase alignment across windows
Phase randomization	Stationary signals	Power spectrum	Temporal structure
IAAFT surrogate	Nonlinear time-series	Amplitude distribution + spectrum	Higher-order dependencies
Block bootstrap	Autocorrelated data	Local correlation (block length)	Long-range order
Label permutation	Between-group tests	All within-sample structure	Condition linkage

Rule of thumb: If you’re testing Echo (OP003), keep the spectrum (use phase-randomized or circular-shift nulls). For Rec (OP002) across domains, use label permutations plus domainwise normalizations.

Run this with an AI

"Compute null distributions for OP003 (Echo) using phase randomization (N=2000) and circular shifts (N=2000).
Return p-values, QQ plots, and a merged conservative p via max method. Update the Result Card."

Permutation, Resampling & Power

Permutation tests

Within-sample: Time-shift or phase-randomize; recompute metric to get null.
Between-group: Shuffle labels; recompute group difference of metric.

Bootstrap

Percentile / BCa CIs for metrics (Rec, Echo, Entropy).
Block bootstrap for autocorrelated series (pick block length via ACF cutoff).

Power analysis (pragmatic)

Estimate effect size from pilot (e.g., ΔRec vs null mean / pooled SD).
Simulate resamples to find N (segments/windows) for 80–90% power at α = 0.05.

Run this with an AI

"Estimate power to detect Rec ≥ 0.85 given empirical null from circular shifts.
Use block bootstrap (block=2s) to model dependency; report N windows for 0.8 and 0.9 power."

Confidence Intervals & Effect Sizes

CI methods

Bootstrap Percentile (default for bounded metrics): report 95% CI from resamples.
BCa (bias-corrected & accelerated) when distributions are skewed.
Parametric CI when justified; e.g., Fisher z-transform for correlations.

Examples

Rec: Fisher z CI or bootstrap-BCa if non-normal.
Echo: Bootstrap peak distribution across windows (exclude τ=0).
Entropy: Bootstrap on normalized modal probabilities; delta method optional.

Effect sizes

Cliff’s δ for robust difference vs null.
Cohen’s d when assumptions are met.
AUC for discriminability across classes.

Run this with an AI

"Compute 95% BCa CI for Echo and Rec on the attached sequence (window=4s, step=0.5s).
Also report Cliff's δ vs null and interpret in plain language. Update the Result Card."

Multiple Comparisons & Reporting

When running many windows, channels, or operator variants, control false discoveries.

FDR (Benjamini–Hochberg) across families of related tests.
FWER (Holm–Bonferroni) for small families or critical claims.
Cluster-wise correction for contiguous time/frequency effects.

Report: raw p, adjusted p (method), number of tests, family definition, and pre-registered threshold choice.

Run this with an AI

"Apply BH-FDR (q=0.05) across 64 channels × 10 lags for Echo tests.
Return per-channel significant lags, adjusted p-values, and cluster summaries."

Cross-Domain Normalization (when comparing unlike things)

Feature scaling: z-score within domain; unit-norm vectors before OP002.
Basis alignment: map to shared modal bins or learned embedding.
Entropy comparability: use the same binning or adaptive partitioning.

Note: Always state normalization in the Result Card. Different choices can change IMS slightly; thresholds should be calibrated under the same pipeline.

Result Card — Minimal Fields

Field	Type	Description
`operator`	string	OP id (e.g., OP002)
`metric`	string	Name (Rec, Echo, Entropy, Orbit)
`value`	number	Observed statistic
`threshold`	number	Decision cutoff after calibration
`null_model`	string	e.g., circular_shift, phase_rand
`p_value`	number	One- or two-sided p
`ci`	object	{method, level, lower, upper}
`effect_size`	object	{type, value}
`multiplicity`	object	{method, m, p_adj}
`decision`	string	pass / fail
`notes`	string	Plain-language interpretation

Download HHM_bundle.json

Reproducibility Checklist

State operators, windows, preprocessing, and basis.
Lock thresholds after calibration on train/validation; test on holdout.
Declare null model(s) and resample counts (N ≥ 2000 recommended).
Provide CI method, level (95% default), and effect sizes.
Control multiplicity; report adjusted p-values and family definition.
Attach Result Cards and data hashes; version everything.

Safety & claims: HHM metrics are research signals. Don’t use them to diagnose, predict individual outcomes, or make high-stakes decisions without independent validation.

Test Thresholds, Nulls & CI Methods

Overview & Defaults

Nulls & Uncertainty

What this page covers

Default Thresholds (bundle-wide, adjustable)

Calibrating Thresholds by Dataset

Null Models (choose by data type)

Permutation, Resampling & Power

Permutation tests

Bootstrap

Power analysis (pragmatic)

Confidence Intervals & Effect Sizes

CI methods

Examples

Effect sizes

Multiple Comparisons & Reporting

Cross-Domain Normalization (when comparing unlike things)

Result Card — Minimal Fields

Reproducibility Checklist