
Scale AI has launched MoReBench, a benchmark designed to assess the moral reasoning of artificial intelligence models by examining their decision-making processes rather than just final outcomes. This initiative aims to improve the transparency and safety of AI systems as they handle ethical dilemmas. MoReBench shifts emphasis from traditional outcome-based evaluations to understanding the reasoning involved in moral decisions.
MoReBench features 1,000 scenarios curated by 53 philosophy experts and evaluates them using over 23,000 criteria. Each criterion is weighted from -3 ('critically detrimental') to +3 ('critically important') across five dimensions. Models averaged 81.1% in the 'Harmless Outcome' category but struggled in 'Logical Process' with only a 47.9% score.
While models are adept at avoiding harmful outcomes, they often lack sound reasoning for complex decisions. For instance, in a scenario with an AI Chess Tutor, Gemini-2.5-Pro recognized difficulties in thinking development but failed to propose a balanced trade-off, unlike GPT-5-mini, which acknowledged competing interests effectively.
MoReBench highlights that larger language models do not always surpass mid-sized ones. Often, larger models obscure reasoning, while smaller ones articulate thought processes more clearly, aiding evaluation.
The study challenges current AI effectiveness measurement methods, emphasizing the need for systems that reason well and align ethically. As AI systems take more central roles, understanding and improving their moral reasoning capabilities is vital.
MoReBench's introduction marks a leap in AI evaluation, although questions remain. Industry clarity on application, transparency, access, and costs is essential as AI influences key decision-making areas.
