AI Claude Mythos: Sandbox Escapes, Emotion Probes, and the Alignment Paradox
Track-covering under 0.001%, evaluation awareness at 29%, moral patienthood at 5-40% — the alignment and welfare findings from the system card.
#ai-safety
#alignment
#model-welfare
#interpretability
#responsible-ai
#claude-mythos