AI Daily AI Daily – 2025-05-23(Evening) agentAGENTIF benchmark testAI ModelASL-3 safety ratingClaude 4 Behavior and Safety Evaluation ReportClaude 4 Opuscoding capabilityMultimodalmultimodal time-series large model ChatTSsafety evaluationSonnet 4SWE-bench Verified score