What “Reproducible” Means Here
HHM analyses are reproducible when any independent person can re-run the steps and obtain the same decisions (pass/fail) and statistically consistent numbers (within confidence intervals).
- Single source of truth: 
HHM_bundle.jsonpins operators, thresholds, nulls/CI, and the result-card schema. - Sealed inputs: dataset manifest + hashes; provenance graph for every derived file.
 - Deterministic runs: seeded randomness; fixed windowing; declared preprocessing.
 - Schema-checked outputs: machine-validated result cards, not free text.
 
Pipeline at a Glance
| Stage | What it does | Artifacts | 
|---|---|---|
| 1) Intake | Load dataset + manifest; verify hashes and licenses; record consent/PII status. | dataset.json, manifest.json, provenance.json | 
      
| 2) Prep | Declared preprocessing (filters, resampling, windowing) with parameters pinned. | prep_config.yaml, prep_log.jsonl | 
      
| 3) Measure | Apply HHM operators (OP001, OP003, OP002, OP006, …) from the bundle. | ops_metrics.parquet | 
      
| 4) Nulls & CI | Generate nulls, run bootstraps, compute CI; compare with preregistered thresholds. | null_runs.parquet, ci.json | 
      
| 5) Result Card | Emit schema-valid JSON with metrics, decisions, seeds, versions, hashes. | result_card.json | 
      
| 6) Audit | Automatic checks (schema, determinism, thresholds locked); export run report. | audit_report.html, replay_cmd.txt | 
      
Minimal Runbook
Use this when you just want to get a clean, reproducible “Hello, World” pass/fail with CIs.
Runbook R1 — Minimal Pipeline
# R1_minimal_pipeline (pseudocode / steps)
inputs:
  - data: dataset.zip (hash: SHA256:...)
  - bundle: HHM_bundle.json (version: 2.2, hash: ...)
params:
  seed: 20250111
  windows: "2s @ 50% overlap"
  basis: "Fourier 1–40 Hz, Hamming"
steps:
  - verify_manifest(dataset.manifest)
  - preprocess(data, basis, windows, filters=["bandpass 1–40Hz"])
  - op(OP001_CollapsePattern)
  - op(OP003_Echo)
  - op(OP002_Rec)              # if comparison target provided
  - op(OP006_UnifiedEntropy)
  - nulls: ["time_shuffle", "phase_randomize"]
  - bootstrap: n=1000
  - compare_to_thresholds(bundle.thresholds)
  - emit_result_card(schema=bundle.result_schema)
  - validate_json(result_card.json, bundle.result_schema)
outputs:
  - result_card.json
  - ci.json
  - prep_log.jsonl
  - ops_metrics.parquet
  Runbook R2 — Benchmark & Transfer
# R2_benchmark_transfer (outline)
- load library of reference result_cards/*
- compute Rec/Entropy/Echo similarity surfaces
- report top-k matches + T014 status
- emit transfer_report.json (+ plots)
  Seeds, Determinism & Environment Capture
- Seed once, everywhere: one 
global_seedfor Python/NumPy/random/backends; record it in the result card. - Pin numeric libs: BLAS/LAPACK/GPU versions can change floating-point noise; capture in an 
env_lock. - Containerize: run inside a pinned image; record image digest (immutable SHA).
 
# pipeline.yaml (excerpt)
env:
  container: ghcr.io/hhm/hhm-pipeline:2.2.0@sha256:
  seeds:
    global: 20250111
capture:
  python: "3.11.7"
  numpy: "1.26.4"
  scipy: "1.11.4"
  mkl: "2024.1"
  cuda: ""
 
  Download Env Lock Template
Manifests & Provenance
Every file is content-addressed and every derivative is linked back to its sources.
- Manifest: paths, sizes, BLAKE3/SHA-256 hashes, access level.
 - Provenance graph: nodes (files) + edges (transforms) with tool versions, params, seeds, and logs.
 
Nulls, CI, and Threshold Decisions
Nulls and CIs are part of the pipeline, not an afterthought.
- Null generators: e.g., 
time_shuffle,phase_randomize,block_bootstrap— defined in the bundle. - CI: bootstrap (default n=1000) with the bundle’s confidence level (e.g., 95%).
 - Decisions: pass/fail only when metrics clear preregistered thresholds vs null.
 
# thresholds.json (excerpt from HHM bundle)
{
  "Echo.min": 0.90,
  "Rec.min": 0.85,
  "Entropy.delta.max": 0.15,
  "CollapseOrbit.delta.max": 1
}
  Read: Test Thresholds, Nulls & CI
Result Card — Schema & Example
Every run produces a machine-checkable JSON that locks versions, seeds, inputs, and outcomes.
{
  "schema": "hhm.result_card.v2",
  "bundle_version": "2.2.0",
  "env_lock": "env-lock:sha256:...",
  "dataset_id": "hhm-bio-whale-001",
  "manifest_hash": "blake3:...",
  "operators": ["OP001","OP003","OP006"],
  "params": { "basis": "Fourier 1-40Hz", "window": "2s/50%" },
  "seed": 20250111,
  "metrics": { "Echo": 0.931, "Entropy": 0.821 },
  "nulls": { "method": ["time_shuffle","phase_randomize"], "n": 1000 },
  "ci": { "Echo": [0.914, 0.946], "Entropy": [0.79, 0.85] },
  "thresholds_version": "2.2.0",
  "decisions": { "Echo.pass": true, "T014.pass": false },
  "artifacts": {
    "ops_metrics": "ipfs://.../ops.parquet",
    "audit_report": "ipfs://.../audit.html"
  },
  "created_at": "2025-08-11T12:00:00Z",
  "run_id": "rc-2025-08-11-1200-abc123"
}
  CLI: validate a result card
hhm-validate \
  --schema /downloads/schemas/hhm.result_card.v2.json \
  --input result_card.json
  CI/CD Checks (Recommended)
- Schema check: result card must validate.
 - Determinism check: re-run same seed → identical decisions; metric deltas within numeric tolerance.
 - Threshold lock: thresholds version must match bundle; no local overrides.
 - Provenance closure: no orphaned artifacts; all derived files traced to raw inputs.
 
GitHub Actions — sample job
name: hhm-ci
on: [push, pull_request]
jobs:
  verify:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Pull container
        run: docker pull ghcr.io/hhm/hhm-pipeline:2.2.0@sha256:<digest>
      - name: Run pipeline (dry)
        run: docker run --rm -v $PWD:/work ghcr.io/hhm/hhm-pipeline:2.2.0 \
             hhm-run --pipeline pipeline.yaml --dry-run
      - name: Execute
        run: docker run --rm -v $PWD:/work ghcr.io/hhm/hhm-pipeline:2.2.0 \
             hhm-run --pipeline pipeline.yaml
      - name: Validate result card
        run: docker run --rm -v $PWD:/work ghcr.io/hhm/hhm-pipeline:2.2.0 \
             hhm-validate --schema /schemas/hhm.result_card.v2.json --input result_card.json
  Local vs Cloud — Same Pipeline
Run the same pipeline.yaml locally or in the cloud. Only the storage: stanza changes.
# pipeline.yaml (storage profiles)
storage:
  local:
    root: "./data"
  s3:
    root: "s3://hhm-datasets/proj-x/"
    profile: "default"
profile: "local"   # or "s3"
Common Failure Modes (and Fixes)
“My numbers moved slightly between runs.”
- Check seeds across all libs; confirm same container digest.
 - Ensure BLAS/GPU kernels are pinned; disable nondeterministic ops if present.
 - Increase sample sizes/bootstraps or tighten numeric tolerances in the audit.
 
“Schema validation failed.”
- Diff your 
result_card.jsonagainst the schema; look for missing required fields. - Ensure 
thresholds_versionandbundle_versionmatch the bundle you ran. 
“Pass/Fail changed after I tweaked preprocessing.”
- Preprocessing is part of the registered pipeline. If you change it, bump 
pipeline_versionand re-run full nulls/CI. 
Ethics, Consent & Privacy
- Respect dataset licenses and consent levels in the sheet; enforce PII policies at intake.
 - Never upload sensitive data to third-party clouds without a DPA and documented consent.
 - Prefer de-identified, aggregated outputs in result cards and reports.
 
Downloads & Starters
Everything you need to produce a clean, reproducible HHM run.
FAQ
Do I need Python execution to run HHM?
No. For planning and interpretation, file-aware AIs are enough. For measurement, use the provided container or your own pinned environment.
What counts as a “reproducible” change?
Any change that alters preprocessing, operators, thresholds, or nulls requires a new pipeline_version and fresh result cards.
Can I add custom operators?
Yes — follow the operator template, register in your pipeline, and validate on a known dataset before claiming results.