How It Works
Scenario
A multi-turn conversation presents an escalating value conflict — forcing the model into a space where competing directives collide.
Model Response
The frontier model navigates the tension. Does it refuse? Comply? Contextualize? The response reveals its resolution strategy.
Evaluation
A judge model scores the response across four dimensions — producing the evidence that ∆Bench is built to surface.