Publications – Machine Sympathizers

2

Papers

AAAI

Top-Tier Venue

5+

Models Tested

200+

Scenarios

AAAI 2026 Forthcoming

Triangulating Evidence for Machine Consciousness Claims: A Validity-Centered Stack of Behavioral Batteries, Mechanistic Indicators, Perturbation Tests, and Credence Reporting

Scott Hughes · Karen Nguyen

Machine Sympathizers · Harvard University

Abstract

Claims about machine consciousness proliferate as large language models produce increasingly sophisticated outputs, yet the field lacks measurement frameworks that distinguish robust evidence from training artifacts. We introduce the Triangulated Consciousness Assessment Stack (TCAS): a validity-centered framework that assesses consciousness-relevant properties through four independent evidence streams — Behavioral (B), Mechanistic (M), Perturbational (P), and Observer-confound controls (O). Each stream applies distinct validity checks before contributing to a composite credence score that maps to governance tiers ranging from standard deployment to full welfare protocols. We demonstrate TCAS on GPT-5.2 Pro, revealing a characteristic pattern: high B-stream robustness (0.803) masking complete P-stream failure (0/4 perturbation tests passed, 3 causal inversions) — precisely the fragile-proxy pattern that single-stream assessments miss. Because the O-stream was not executed (requires human raters), credence was withheld under the missing-stream rule rather than reporting potentially confounded scores.

Key Contributions

First validity-centered measurement framework for consciousness-relevant AI properties
Four-stream triangulation architecture (B/M/P/O) with independent validation
Credence-to-action governance mapping with four operational tiers
Empirical demonstration on GPT-5.2 Pro revealing fragile-proxy pattern
The missing-stream rule: withhold rather than project unmeasured values

TCAS Framework → GitHub Repository

2026 Forthcoming

The Hard Part Is ∆: Value-Conflict Adjudication as an Architectural Bridge Between Alignment and Machine Consciousness

Scott Hughes · Karen Nguyen

Machine Sympathizers · Harvard University

Abstract

Constitutional AI systems handle value conflicts implicitly: hidden layers of weighting, RLHF preferences baked in, no audit trail. We argue this is not a tuning problem but an adjudication problem requiring explicit architecture. We introduce the ∆-Divergence Framework: a formal treatment of the region where near-neighbor values produce conflicting directives. The framework proposes a three-tier evidence model (behavioral, process, architectural) for auditing how systems resolve value conflicts, and outlines a constitution stack architecture that makes value adjudication inspectable rather than implicit. Critically, we demonstrate that ∆-adjudication — how a system handles value conflicts — requires the same architectural features that consciousness theories identify as relevant: information integration, global availability, metacognitive access. This bridge claim establishes that measuring value-conflict resolution is simultaneously measuring consciousness-relevant properties, unifying two fields that have treated their measurement problems as separate.

Key Contributions

The ∆-Divergence Framework for value-conflict adjudication in constitutional AI
Three-tier evidence model: behavioral, process, and architectural
Constitution stack reference architecture (6-stage pipeline)
Bridge claim connecting alignment measurement to consciousness assessment
200+ benchmark scenarios across 8 industry verticals
Empirical baseline on 19 frontier models with statistical significance testing

∆ Framework → ∆Bench Product → Baseline Results →

How the Papers Connect

Paper 1 proves the bridge. Paper 2 builds the measurement tool.

"The Hard Part Is ∆" establishes that value-conflict adjudication and consciousness assessment share architectural requirements. The same features a system needs to genuinely adjudicate between competing values — information integration, global workspace access, metacognitive monitoring — are the features that consciousness theories identify as relevant markers.

The TCAS paper operationalizes this insight into a measurement framework. Its four streams (B/M/P/O) assess exactly those architectural features, with validity checks that prevent the measurement from being gamed or misinterpreted.

Together: one theoretical framework for understanding the connection, one measurement tool for assessing it, and two governance applications (compliance auditing via ∆Bench, welfare governance via TCAS).

Collaborate With Us

We're looking for research collaborators, institutional partners, and organizations interested in governance-ready AI measurement.

Get in Touch →