Paper 3 · Framework comparison

Beyond undocumented thresholds: a six-layer justification stack

Monorepo root commit: Not recorded in the public portfolio system_snapshot.json (v1.2, 2026-04-11T07:37:21Z) used for this binding. Not invented on this page.
Tier-0 shared-core commit (portfolio snapshot): cd9ad79fe16f34ad861bd6527670dcfbef8fe864
Paper 3 repository commit (released): 951fb441e7564e9e84c2d0ccdb03578d3e167ae6
Zenodo DOI: https://doi.org/10.5281/zenodo.19499798
Release version: v2.0.0 (portfolio release designation); CITATION.cff may list package version 1.0.0 — treat commit + DOI as authoritative if they diverge.
Page generated (UTC): 2026-04-12

Executive overview

Problem: Clinical AI systems depend on numeric thresholds (performance targets, alert cut-offs, deployment gates) whose evidentiary justification is often thin—“magic numbers” embedded in models without traceable documentation.

Why it matters: Undocumented thresholds are hard to audit, hard to defend after incidents, and easy to game. A structured documentation stack makes expectations explicit for safety leaders and committee review.

Core insight

Threshold failures cluster into a small number of structural mechanisms (proxying, context collapse, coupling, boundary gaming, epistemic asymmetry, audit disconnect). Addressing them requires layered documentation—not a single R² figure in isolation.

What was done

The repository renders manuscript Tables 1–3 into structured CSV from JSON sources: six failure mechanisms, the six-layer Threshold Justification Stack with tiered requirements (primary safety vs secondary operational), and an interpretive comparison to other governance instruments. Supplementary pathways cover gaming resistance sketches and NHS governance artefact mapping tables; illustrative TJS records demonstrate schema completion for both tiers.

What was found

The artefacts make the epistemic status of each table explicit—what is empirical versus interpretive—and separate in-repo reproduction scope from companion empirical papers (see boundary statement in the repository). Deterministic notebook outputs are hash-validated against expected manifests in QA.

Why this matters for regulation, safety, and deployment

Committees: receive a checklist-like stack for what must be documented before thresholds are accepted.
Integration: maps TJS layers to existing hospital risk artefacts to reduce duplicate paperwork while preserving safety intent.
Integrity: gaming pathways are named so assessments are proactive rather than purely reactive.

Limitations and ethics

Framework comparisons are author interpretive codings, not regulators’ official positions. Feasibility pilots, inter-rater reliability, and some empirical validations are explicitly required before scalability claims (P3-C14–C15) and sit outside this repository’s numerical scope per docs/claim_boundary_statement.md.

Reproducibility: QA session 2026-04-12 reported reproduce_all.py with VALIDATION PASSED for pinned outputs referenced in the traceability matrix.

View technical detail — notebook walkthrough (conceptual)

01_tjs_framework_and_failure_mechanisms Failure mechanism taxonomy + TJS specification tables (P3-C01–C07).

02_framework_comparison_and_mappings Table 3 comparison, NHS mappings, gaming pathways (P3-C08–C12).

03_tjs_record_schema_demonstration Illustrative TJS JSON records for both tiers (P3-C13, P3-C19).

04_release_validation Release validation and scope alignment with boundary documentation (P3-C14, P3-C20).

Closing this panel does not remove the executive conclusions already stated above.

Full claim traceability (P3-C01–P3-C20)

Two sub-tables mirror docs/claim_traceability.md (manuscript-grounded claims; then repository fidelity claims).

Manuscript-grounded claims

Claim ID	Claim (paraphrase)	Manuscript anchor	Notebook / code	Output / evidence path	Status
P3-C01	Clinical AI governance often operationalises safety via quantitative thresholds whose methodological rationale is undocumented (“magic numbers”).	Introduction; “What is already known”	`notebooks/01_tjs_framework_and_failure_mechanisms.ipynb`	Narrative setup; `data/tables/table1_failure_mechanisms.json` context	Traced
P3-C02	Conceptual synthesis identified six structural failure mechanisms in threshold design (proxy thresholding; context collapse; threshold coupling; boundary gaming; epistemic asymmetry; audit–lifecycle disconnect).	Results; Table 1	`01_…ipynb`	`outputs/tables/table1_failure_mechanisms.csv`; `data/tables/table1_failure_mechanisms.json` (6 `data` rows)	VERIFIED (QA 2026-04-12: tabular row count + repro pipeline)
P3-C03	Methods combine conceptual synthesis and regulatory analysis (targeted searches, Jan 2018–Dec 2025; structural inclusion criterion).	Methods	`01_…ipynb` (epistemic notice); `manuscript/Paper3_Manuscript.docx`	`data/tables/table1_failure_mechanisms.json` (`_caveat`)	Traced
P3-C04	The six mechanisms are argued structurally distinct and to cover major structural failure modes; not claimed exhaustive; inter-rater exercise proposed.	Methods (Reproducibility subsection)	`01_…ipynb`	JSON `_caveat` on interpretive tables	Traced
P3-C05	Regulatory alignment assessments are interpretive inferences by the author, not statements of regulatory intent or legal requirement.	Methods	`01_…ipynb`; `02_…ipynb`	Table JSON `_caveat` fields	Traced
P3-C06	Threshold Justification Stack (TJS) specifies six documentation layers with tiered requirements: Primary Safety (failure can directly harm patients) vs Secondary Operational; Primary applies by default when uncertain.	Results; Table 2	`01_…ipynb`	`outputs/tables/table2_tjs_specification.csv`; `data/tables/table2_tjs_specification.json` (6 `data` rows; tier fields)	VERIFIED (QA 2026-04-12: tabular structure + repro pipeline)
P3-C07	Table 2 documents each TJS layer (description, mechanisms addressed, tier requirement, regulatory counterpart).	Table 2	`01_…ipynb`	`data/tables/table2_tjs_specification.json` → CSV	VERIFIED (QA 2026-04-12: same artefact as C06)
P3-C08	Table 3 compares TJS threshold documentation expectations to other governance instruments; all characterisations interpretive.	Results; Table 3	`02_…ipynb`	`outputs/tables/table3_framework_comparison.csv`	VERIFIED (QA 2026-04-12: artefact + hash manifest; interpretive caveat retained)
P3-C09	TJS is positioned to augment NHS clinical risk management under DCB0129/0160, not replace hazard logs.	Discussion / integration	`02_…ipynb` (mappings); `03_…ipynb`	Narrative in notebooks; `outputs/tables/nhs_governance_mapping.csv`	Traced
P3-C10	Hospital integration entails procedural changes (tier classification; populate layers; route record to committee); field mapping to audit schemas in Supplementary Appendix C (see supplementary PDF).	Discussion	`02_…ipynb`; `03_…ipynb`	`inputs/supplementary.pdf`; schema demos	Traced
P3-C11	Gaming resistance pathways and assessment methodology are detailed in Supplementary Appendix F.	Results / Discussion	`02_…ipynb`	`outputs/tables/gaming_resistance_pathways.csv`; `inputs/supplementary.pdf`	Traced
P3-C12	NHS governance artefact mapping is specified in Supplementary Appendix G (see supplementary PDF).	Supplementary (referenced in manuscript)	`02_…ipynb`	`outputs/tables/nhs_governance_mapping.csv`; `inputs/supplementary.pdf`	Traced
P3-C13	Full worked examples for both threshold tiers appear in supplementary appendices.	Abstract; Methods	`03_…ipynb`	`outputs/schemas/*_rendered.json`; supplementary	Traced
P3-C14	Empirical validation (e.g. inter-rater reliability; feasibility pilot with median completion time and kappa for tier classification) is required before scalability claims.	Conclusions; Discussion	`04_…ipynb` (scope checks); `docs/claim_boundary_statement.md`	QA harness; boundary doc XC-7	Traced
P3-C15	TJS is advanced as a normative governance proposal; adoption at scale requires empirical feasibility assessment.	Abstract; Discussion	`01_…ipynb`; `03_…ipynb` (epistemic notices)	Schema `_epistemic_status`	Traced

Repository fidelity claims (`claim_boundary_statement.md`)

Claim ID	Claim	Source doc	Notebook / script	Output / artefact	Status
P3-C16	Manuscript Tables 1–3 rendered as structured CSV from JSON sources extracted from the manuscript text.	RC-1	`01_…ipynb`, `02_…ipynb`	`outputs/tables/table1_failure_mechanisms.csv`, `table2_tjs_specification.csv`, `table3_framework_comparison.csv`	VERIFIED (QA 2026-04-12: `python reproduce_all.py` + `VALIDATION PASSED`; hashes match `config/expected_outputs.json`)
P3-C17	Glossary of key terms consolidated from the manuscript.	RC-2	`02_…ipynb`	`outputs/tables/glossary.csv`	VERIFIED (QA 2026-04-12: same validation pass)
P3-C18	TJS layer → NHS governance artefacts mapping as specified in Supplementary Appendix G.	RC-3	`02_…ipynb`	`outputs/tables/nhs_governance_mapping.csv`	VERIFIED (QA 2026-04-12: same validation pass)
P3-C19	Two illustrative TJS records (Primary Safety; Secondary Operational) demonstrate the audit schema from worked examples.	RC-4	`03_…ipynb`	`outputs/schemas/tjs_record_primary_safety_rendered.json`, `tjs_record_secondary_operational_rendered.json`	VERIFIED (QA 2026-04-12: same validation pass)
P3-C20	Notebook outputs are deterministic and hash-validated against baselines.	RC-5	`reproduce_all.py` → `scripts/hash_manifest.py`, `scripts/validate_outputs.py`	`config/expected_outputs.json`, `logs/actual_manifest.json`	VERIFIED (QA 2026-04-12: `VALIDATION PASSED`; manifests aligned in session)