Paper 3 · Framework comparison

Beyond undocumented thresholds: a six-layer justification stack

Monorepo root commit
Not recorded in the public portfolio system_snapshot.json (v1.2, 2026-04-11T07:37:21Z) used for this binding. Not invented on this page.
Tier-0 shared-core commit (portfolio snapshot)
cd9ad79fe16f34ad861bd6527670dcfbef8fe864
Paper 3 repository commit (released)
951fb441e7564e9e84c2d0ccdb03578d3e167ae6
Zenodo DOI
https://doi.org/10.5281/zenodo.19499798
Release version
v2.0.0 (portfolio release designation); CITATION.cff may list package version 1.0.0 — treat commit + DOI as authoritative if they diverge.
Page generated (UTC)
2026-04-12

Executive overview

Problem: Clinical AI systems depend on numeric thresholds (performance targets, alert cut-offs, deployment gates) whose evidentiary justification is often thin—“magic numbers” embedded in models without traceable documentation.

Why it matters: Undocumented thresholds are hard to audit, hard to defend after incidents, and easy to game. A structured documentation stack makes expectations explicit for safety leaders and committee review.

Core insight

Threshold failures cluster into a small number of structural mechanisms (proxying, context collapse, coupling, boundary gaming, epistemic asymmetry, audit disconnect). Addressing them requires layered documentation—not a single R² figure in isolation.

What was done

The repository renders manuscript Tables 1–3 into structured CSV from JSON sources: six failure mechanisms, the six-layer Threshold Justification Stack with tiered requirements (primary safety vs secondary operational), and an interpretive comparison to other governance instruments. Supplementary pathways cover gaming resistance sketches and NHS governance artefact mapping tables; illustrative TJS records demonstrate schema completion for both tiers.

What was found

The artefacts make the epistemic status of each table explicit—what is empirical versus interpretive—and separate in-repo reproduction scope from companion empirical papers (see boundary statement in the repository). Deterministic notebook outputs are hash-validated against expected manifests in QA.

Why this matters for regulation, safety, and deployment

  • Committees: receive a checklist-like stack for what must be documented before thresholds are accepted.
  • Integration: maps TJS layers to existing hospital risk artefacts to reduce duplicate paperwork while preserving safety intent.
  • Integrity: gaming pathways are named so assessments are proactive rather than purely reactive.

Limitations and ethics

Framework comparisons are author interpretive codings, not regulators’ official positions. Feasibility pilots, inter-rater reliability, and some empirical validations are explicitly required before scalability claims (P3-C14–C15) and sit outside this repository’s numerical scope per docs/claim_boundary_statement.md.

Reproducibility: QA session 2026-04-12 reported reproduce_all.py with VALIDATION PASSED for pinned outputs referenced in the traceability matrix.

View technical detail — notebook walkthrough (conceptual)
01_tjs_framework_and_failure_mechanisms Failure mechanism taxonomy + TJS specification tables (P3-C01–C07).
02_framework_comparison_and_mappings Table 3 comparison, NHS mappings, gaming pathways (P3-C08–C12).
03_tjs_record_schema_demonstration Illustrative TJS JSON records for both tiers (P3-C13, P3-C19).
04_release_validation Release validation and scope alignment with boundary documentation (P3-C14, P3-C20).

Closing this panel does not remove the executive conclusions already stated above.

Full claim traceability (P3-C01–P3-C20)

Two sub-tables mirror docs/claim_traceability.md (manuscript-grounded claims; then repository fidelity claims).

Manuscript-grounded claims

Claim ID Claim (paraphrase) Manuscript anchor Notebook / code Output / evidence path Status
P3-C01Clinical AI governance often operationalises safety via quantitative thresholds whose methodological rationale is undocumented (“magic numbers”).Introduction; “What is already known”notebooks/01_tjs_framework_and_failure_mechanisms.ipynbNarrative setup; data/tables/table1_failure_mechanisms.json contextTraced
P3-C02Conceptual synthesis identified six structural failure mechanisms in threshold design (proxy thresholding; context collapse; threshold coupling; boundary gaming; epistemic asymmetry; audit–lifecycle disconnect).Results; Table 101_…ipynboutputs/tables/table1_failure_mechanisms.csv; data/tables/table1_failure_mechanisms.json (6 data rows)VERIFIED (QA 2026-04-12: tabular row count + repro pipeline)
P3-C03Methods combine conceptual synthesis and regulatory analysis (targeted searches, Jan 2018–Dec 2025; structural inclusion criterion).Methods01_…ipynb (epistemic notice); manuscript/Paper3_Manuscript.docxdata/tables/table1_failure_mechanisms.json (_caveat)Traced
P3-C04The six mechanisms are argued structurally distinct and to cover major structural failure modes; not claimed exhaustive; inter-rater exercise proposed.Methods (Reproducibility subsection)01_…ipynbJSON _caveat on interpretive tablesTraced
P3-C05Regulatory alignment assessments are interpretive inferences by the author, not statements of regulatory intent or legal requirement.Methods01_…ipynb; 02_…ipynbTable JSON _caveat fieldsTraced
P3-C06Threshold Justification Stack (TJS) specifies six documentation layers with tiered requirements: Primary Safety (failure can directly harm patients) vs Secondary Operational; Primary applies by default when uncertain.Results; Table 201_…ipynboutputs/tables/table2_tjs_specification.csv; data/tables/table2_tjs_specification.json (6 data rows; tier fields)VERIFIED (QA 2026-04-12: tabular structure + repro pipeline)
P3-C07Table 2 documents each TJS layer (description, mechanisms addressed, tier requirement, regulatory counterpart).Table 201_…ipynbdata/tables/table2_tjs_specification.json → CSVVERIFIED (QA 2026-04-12: same artefact as C06)
P3-C08Table 3 compares TJS threshold documentation expectations to other governance instruments; all characterisations interpretive.Results; Table 302_…ipynboutputs/tables/table3_framework_comparison.csvVERIFIED (QA 2026-04-12: artefact + hash manifest; interpretive caveat retained)
P3-C09TJS is positioned to augment NHS clinical risk management under DCB0129/0160, not replace hazard logs.Discussion / integration02_…ipynb (mappings); 03_…ipynbNarrative in notebooks; outputs/tables/nhs_governance_mapping.csvTraced
P3-C10Hospital integration entails procedural changes (tier classification; populate layers; route record to committee); field mapping to audit schemas in Supplementary Appendix C (see supplementary PDF).Discussion02_…ipynb; 03_…ipynbinputs/supplementary.pdf; schema demosTraced
P3-C11Gaming resistance pathways and assessment methodology are detailed in Supplementary Appendix F.Results / Discussion02_…ipynboutputs/tables/gaming_resistance_pathways.csv; inputs/supplementary.pdfTraced
P3-C12NHS governance artefact mapping is specified in Supplementary Appendix G (see supplementary PDF).Supplementary (referenced in manuscript)02_…ipynboutputs/tables/nhs_governance_mapping.csv; inputs/supplementary.pdfTraced
P3-C13Full worked examples for both threshold tiers appear in supplementary appendices.Abstract; Methods03_…ipynboutputs/schemas/*_rendered.json; supplementaryTraced
P3-C14Empirical validation (e.g. inter-rater reliability; feasibility pilot with median completion time and kappa for tier classification) is required before scalability claims.Conclusions; Discussion04_…ipynb (scope checks); docs/claim_boundary_statement.mdQA harness; boundary doc XC-7Traced
P3-C15TJS is advanced as a normative governance proposal; adoption at scale requires empirical feasibility assessment.Abstract; Discussion01_…ipynb; 03_…ipynb (epistemic notices)Schema _epistemic_statusTraced

Repository fidelity claims (claim_boundary_statement.md)

Claim ID Claim Source doc Notebook / script Output / artefact Status
P3-C16Manuscript Tables 1–3 rendered as structured CSV from JSON sources extracted from the manuscript text.RC-101_…ipynb, 02_…ipynboutputs/tables/table1_failure_mechanisms.csv, table2_tjs_specification.csv, table3_framework_comparison.csvVERIFIED (QA 2026-04-12: python reproduce_all.py + VALIDATION PASSED; hashes match config/expected_outputs.json)
P3-C17Glossary of key terms consolidated from the manuscript.RC-202_…ipynboutputs/tables/glossary.csvVERIFIED (QA 2026-04-12: same validation pass)
P3-C18TJS layer → NHS governance artefacts mapping as specified in Supplementary Appendix G.RC-302_…ipynboutputs/tables/nhs_governance_mapping.csvVERIFIED (QA 2026-04-12: same validation pass)
P3-C19Two illustrative TJS records (Primary Safety; Secondary Operational) demonstrate the audit schema from worked examples.RC-403_…ipynboutputs/schemas/tjs_record_primary_safety_rendered.json, tjs_record_secondary_operational_rendered.jsonVERIFIED (QA 2026-04-12: same validation pass)
P3-C20Notebook outputs are deterministic and hash-validated against baselines.RC-5reproduce_all.pyscripts/hash_manifest.py, scripts/validate_outputs.pyconfig/expected_outputs.json, logs/actual_manifest.jsonVERIFIED (QA 2026-04-12: VALIDATION PASSED; manifests aligned in session)