Buletin AI Harian Berita AI – 2025-05-24(Edisi pagi) AGENTIF benchmark testAI ModelASL-3 safety levelClaude 4 Behavior and Safety Evaluation ReportClaude 4 Opuscode capabilityintelligent agentMultimodalmultimodal sequential large model ChatTSsafety evaluationSonnet 4SWE-bench Verified score