AI Daily AI Daily – 2025-05-24(Evening) AI AgentAI ModelClaude 4Claude Opus 4 coding benchmarkcoding capabilityGRPO algorithmMultimodalPixel Reasoner frameworkreasoning abilityReinforcement learningTensorRT-LLM optimizationVCBench mathematical visual reasoning AI Daily AI Daily – 2025-05-23(Evening) agentAGENTIF benchmark testAI ModelASL-3 safety ratingClaude 4 Behavior and Safety Evaluation ReportClaude 4 Opuscoding capabilityMultimodalmultimodal time-series large model ChatTSsafety evaluationSonnet 4SWE-bench Verified score
AI Daily AI Daily – 2025-05-23(Evening) agentAGENTIF benchmark testAI ModelASL-3 safety ratingClaude 4 Behavior and Safety Evaluation ReportClaude 4 Opuscoding capabilityMultimodalmultimodal time-series large model ChatTSsafety evaluationSonnet 4SWE-bench Verified score