Scale AI Unveils MoReBench for AI Ethical Assessment

Scale AI introduces MoReBench to evaluate AI's moral reasoning processes.
Published: January 3, 2026

Introducing MoReBench

Scale AI has launched MoReBench, a benchmark designed to assess the moral reasoning of artificial intelligence models by examining their decision-making processes rather than just final outcomes. This initiative aims to improve the transparency and safety of AI systems as they handle ethical dilemmas. MoReBench shifts emphasis from traditional outcome-based evaluations to understanding the reasoning involved in moral decisions.

Evaluation Framework

MoReBench features 1,000 scenarios curated by 53 philosophy experts and evaluates them using over 23,000 criteria. Each criterion is weighted from -3 ('critically detrimental') to +3 ('critically important') across five dimensions. Models averaged 81.1% in the 'Harmless Outcome' category but struggled in 'Logical Process' with only a 47.9% score.

Key Findings

Models' Strengths and Weaknesses

While models are adept at avoiding harmful outcomes, they often lack sound reasoning for complex decisions. For instance, in a scenario with an AI Chess Tutor, Gemini-2.5-Pro recognized difficulties in thinking development but failed to propose a balanced trade-off, unlike GPT-5-mini, which acknowledged competing interests effectively.

Size vs. Transparency

MoReBench highlights that larger language models do not always surpass mid-sized ones. Often, larger models obscure reasoning, while smaller ones articulate thought processes more clearly, aiding evaluation.

Implications for AI Development

The study challenges current AI effectiveness measurement methods, emphasizing the need for systems that reason well and align ethically. As AI systems take more central roles, understanding and improving their moral reasoning capabilities is vital.

The Path Forward

MoReBench's introduction marks a leap in AI evaluation, although questions remain. Industry clarity on application, transparency, access, and costs is essential as AI influences key decision-making areas.