
Claude, an AI chatbot from Anthropic, has upgraded its safety protocols to better manage discussions about suicide and self-harm. The aim is to provide empathetic support and direct users to professional resources, while maintaining transparency in its guiding principles.
Claude's improvements involve a system prompt for handling sensitive topics and reinforcement learning that rewards appropriate responses. Internal evaluations show Claude effectively understands user intent in 98.6% of single-turn interactions.
Anthropic has also implemented a new classifier to flag distress in conversations, notifying users where they can find human support. Collaborations with ThroughLine and the International Association for Suicide Prevention bolster these efforts.
Efforts to curb sycophantic tendencies in Claude focus on ensuring truthful guidance. Recent updates have reduced sycophancy scores by 70-85% compared to previous versions.
Anthropic has set age restrictions at 18 and above, with plans to enhance age verification. As the AI landscape evolves, the company's transparent approach seeks collaboration and improvement in handling sensitive user interactions.
