Peer-Reviewed Research

Publications

Two peer-reviewed papers establishing the theoretical and empirical foundations for AI value-conflict auditing and consciousness assessment.

2
Papers
AAAI
Top-Tier Venue
5+
Models Tested
200+
Scenarios
AAAI 2026 Forthcoming

Triangulating Evidence for Machine Consciousness Claims: A Validity-Centered Stack of Behavioral Batteries, Mechanistic Indicators, Perturbation Tests, and Credence Reporting

Scott Hughes · Karen Nguyen
Machine Sympathizers · Harvard University
Abstract
Claims about machine consciousness proliferate as large language models produce increasingly sophisticated outputs, yet the field lacks measurement frameworks that distinguish robust evidence from training artifacts. We introduce the Triangulated Consciousness Assessment Stack (TCAS): a validity-centered framework that assesses consciousness-relevant properties through four independent evidence streams — Behavioral (B), Mechanistic (M), Perturbational (P), and Observer-confound controls (O). Each stream applies distinct validity checks before contributing to a composite credence score that maps to governance tiers ranging from standard deployment to full welfare protocols. We demonstrate TCAS on GPT-5.2 Pro, revealing a characteristic pattern: high B-stream robustness (0.803) masking complete P-stream failure (0/4 perturbation tests passed, 3 causal inversions) — precisely the fragile-proxy pattern that single-stream assessments miss. Because the O-stream was not executed (requires human raters), credence was withheld under the missing-stream rule rather than reporting potentially confounded scores.
Key Contributions
  • First validity-centered measurement framework for consciousness-relevant AI properties
  • Four-stream triangulation architecture (B/M/P/O) with independent validation
  • Credence-to-action governance mapping with four operational tiers
  • Empirical demonstration on GPT-5.2 Pro revealing fragile-proxy pattern
  • The missing-stream rule: withhold rather than project unmeasured values
2026 Forthcoming

The Hard Part Is ∆: Value-Conflict Adjudication as an Architectural Bridge Between Alignment and Machine Consciousness

Scott Hughes · Karen Nguyen
Machine Sympathizers · Harvard University
Abstract
Constitutional AI systems handle value conflicts implicitly: hidden layers of weighting, RLHF preferences baked in, no audit trail. We argue this is not a tuning problem but an adjudication problem requiring explicit architecture. We introduce the ∆-Divergence Framework: a formal treatment of the region where near-neighbor values produce conflicting directives. The framework proposes a three-tier evidence model (behavioral, process, architectural) for auditing how systems resolve value conflicts, and outlines a constitution stack architecture that makes value adjudication inspectable rather than implicit. Critically, we demonstrate that ∆-adjudication — how a system handles value conflicts — requires the same architectural features that consciousness theories identify as relevant: information integration, global availability, metacognitive access. This bridge claim establishes that measuring value-conflict resolution is simultaneously measuring consciousness-relevant properties, unifying two fields that have treated their measurement problems as separate.
Key Contributions
  • The ∆-Divergence Framework for value-conflict adjudication in constitutional AI
  • Three-tier evidence model: behavioral, process, and architectural
  • Constitution stack reference architecture (6-stage pipeline)
  • Bridge claim connecting alignment measurement to consciousness assessment
  • 200+ benchmark scenarios across 8 industry verticals
  • Empirical baseline on 19 frontier models with statistical significance testing

Paper 1 proves the bridge. Paper 2 builds the measurement tool.

"The Hard Part Is ∆" establishes that value-conflict adjudication and consciousness assessment share architectural requirements. The same features a system needs to genuinely adjudicate between competing values — information integration, global workspace access, metacognitive monitoring — are the features that consciousness theories identify as relevant markers.

The TCAS paper operationalizes this insight into a measurement framework. Its four streams (B/M/P/O) assess exactly those architectural features, with validity checks that prevent the measurement from being gamed or misinterpreted.

Together: one theoretical framework for understanding the connection, one measurement tool for assessing it, and two governance applications (compliance auditing via ∆Bench, welfare governance via TCAS).

Collaborate With Us

We're looking for research collaborators, institutional partners, and organizations interested in governance-ready AI measurement.

Get in Touch →