AI Daily - 2025-12-22(Evening)

Keywords：MiniMax M2.1, Kling 2.6, GLM-4.7, AI Agent, Video generation model, LLM training, Humanoid robot, AI commercial applications, MiniMax M2.1 programming capability enhancement, Kling 2.6 motion control technology, GLM-4.7 Agentic Coding optimization, AI Agent workspace integration, 192K context length recall rate

🎯 Trends

MiniMax M2.1/M2.5 Model Progress and Agent Capability Enhancement: MiniMax has released its M2.1 model, featuring significant improvements in programming, Agent capabilities, and long-context recall. It demonstrates exceptional performance in Agent tasks, far surpassing its predecessor M2 in performance tests. M2.1 achieves a 94% recall rate at a 192K context length and brings major upgrades in design and visual quality, hinting at further breakthroughs with M2.5. The company is actively integrating its Agentic models with workspaces, aiming to solve complex real-world problems rather than being limited to chat. (Source: karminski3, MiniMax__AI, MiniMax__AI, MiniMax__AI, MiniMax__AI, MiniMax__AI)

Kling 2.6/Wan 2.6 Video Generation Model Capability Upgrade: Kling AI and Alibaba Wan’s 2.6 versions show significant progress in video generation, particularly in motion control and multi-shot narrative capabilities. Kling 2.6 achieves fluid reproduction of character movements and expressions through motion control, precisely expressing complex dances, and supports real-time video AI models with long-context memory to ensure consistency. Wan 2.6 emphasizes multi-shot narrative and cinematic camera control, supporting intelligent storyboarding, cross-shot consistency, synchronized audio generation, and video creation up to 15 seconds, enhancing the coherence and expressiveness of video generation. (Source: karminski3, Alibaba_Wan, Kling_ai, Alibaba_Wan, Alibaba_Wan, Alibaba_Wan, Alibaba_Wan, Alibaba_Wan, connerruhl, Kling_ai, Kling_ai, Kling_ai, Alibaba_Wan, Kling_ai, Kling_ai, Kling_ai, Kling_ai, Kling_ai, Kling_ai, Kling_ai, Kling_ai, Kling_ai, Kling_ai, seo_leaders)

China’s GLM-4.7 Model Released, Leading in Programming and Agent Capabilities: Zhipu AI has released the GLM-4.7 model, significantly enhancing its coding capabilities, long-range task planning, and tool orchestration, especially optimized for Agentic Coding scenarios. The model has outperformed open-source models in multiple public benchmarks, including the LMArena Code Arena blind test and SWE-bench-Verified, even surpassing GPT-5.2 and Claude Sonnet 4.5, achieving SOTA scores on LiveCodeBench V6. (Source: dejavucoder, Reddit r/LocalLLaMA, Reddit r/LocalLLaMA)

Jan-v2-VL-Max 30B Multimodal Model Released: The Jan team has released Jan-v2-VL-Max, a 30B multimodal model designed for long-duration task execution. The model surpassed Gemini 2.5 Pro and DeepSeek R1 in the “Phantom Diminishing Returns” benchmark, which measures execution length. It is based on Qwen3-VL-30B-A3B-Thinking and employs LoRA-based RLVR technology to enhance stability and reduce error accumulation in multi-step execution. (Source: Reddit r/LocalLLaMA)

Gemini 3 Flash Released with Long-Context Capability: Google DeepMind has released Gemini 3 Flash, claiming state-of-the-art performance and being 3 times faster than 2.5 Pro. The model achieved 90% accuracy with a 1 million context window in OpenAI’s MRCR benchmark, demonstrating excellent performance on long-context tasks and surpassing most models that can only handle 256k context. (Source: GoogleDeepMind, agihippo)

Humanoid Robot Industry Progress and Market Outlook: Technology and commercialization in the humanoid robot sector are accelerating. Tesla’s Optimus is rapidly iterating in motion control and scene interaction, with plans to start million-unit-level production capacity construction in 2026. Domestic companies like UBTECH, ZHIYUAN Robotics, and Unitree Robotics are also accelerating mass production. The Beijing Humanoid Robot Innovation Center open-sourced the embodied VLA large model XR-1, promoting robots to be “fully autonomous and easier to use.” The market is expected to shift from “speculative hype” to “order-performance elasticity” driven, with domestic substitution of upstream core components being a key investment focus. (Source: Ronald_vanLoon, Ronald_vanLoon, Ronald_vanLoon, Ronald_vanLoon, Sentdex, 36氪)

Anthropic Bloom Tool Released for Evaluating AI Behavioral Misalignment: Anthropic has released Bloom, an open-source tool for generating behavioral misalignment evaluations of frontier AI models. Bloom allows researchers to define specific behaviors and automatically generate scenarios to quantify their frequency and severity, aiming to enhance the safety and alignment of AI models. (Source: crystalsssup)

Qwen-Image-Layered Model Achieves Layered Image Editing: Alibaba has open-sourced the Qwen-Image-Layered model, offering native image decomposition capabilities that support Photoshop-level RGBA layered editing. The model allows users to control image structure via Prompt, specify 3-10 layers, and achieve infinite depth decomposition, bringing new flexibility and precision to image generation and editing. (Source: RisingSayak, RisingSayak)

Improved Framework for Multi-Agent LLM Systems: New research proposes an adaptive coordination framework that significantly enhances the performance of multi-agent LLM systems when handling ambiguity, changing contexts, and tasks with imbalanced performance, through dynamic routing, bidirectional feedback, and parallel agent evaluation mechanisms. In SEC 10-K analysis tasks, this framework increased factual coverage to 92%, compliance accuracy to 94%, and significantly reduced correction rates. (Source: omarsar0)

Runway Releases Gen-4.5, Enhancing Anatomical and Physical Understanding in Generated Videos: Runway has released Gen-4.5, marking a significant step forward in generative video technology’s understanding of anatomy, physics, and motion, expected to create more realistic and coherent video content. (Source: c_valenzuelab)

🧰 Tools

Google LangExtract Library: LLM for Structured Information Extraction: Google has released LangExtract, a Python library that leverages LLMs to extract structured information from unstructured text. It features precise source attribution, reliable structured output, optimized handling of long documents, and interactive visualization capabilities. Supporting Gemini and local Ollama models, it is suitable for various domains such as clinical notes and reports, with customizable extraction tasks. (Source: GitHub Trending)

LLM-Assisted PPT and Infographic Generation: A user shared their experience in automating the generation of high-quality PPTs and cartoon infographics using LLMs (e.g., Google Gemini/Opal). By using structured prompts and JSON-formatted content, they achieved rapid editing and multi-page generation of PPT content, as well as transforming article content into hand-drawn cartoon-style infographics, improving content creation efficiency and visual appeal. (Source: dotey, dotey)

Qdrant Supports Multi-faceted Text Search: Qdrant offers comprehensive text search support, including semantic search (based on dense vectors), lexical/keyword search, and hybrid search combining both. This functionality allows users to flexibly configure and adjust search strategies according to specific application scenarios, meeting various needs from intent understanding to precise keyword matching, suitable for RAG and general search systems. (Source: qdrant_engine)

AI Coding Agent Testing and Applications: Arstechnica conducted a Minesweeper game reconstruction test with four AI coding Agents, revealing AI’s potential in game development and code generation. Concurrently, GPT-5.2-Codex was used to build a 3D dog walking simulator, iterating asset and prop placement logic via screenshots, demonstrating AI’s assistive role in complex software development. (Source: Reddit r/artificial, kylebrussell)

Claude Chrome Extension Features and Applications: The Claude Chrome extension is being used by users for various complex tasks, such as migrating Notion projects to MySQL databases (including database creation and code writing), completing work training, comparing application and prototype UI/UX differences, and managing schedules. This extension significantly improves work efficiency by analyzing and manipulating web content, showcasing the powerful potential of AI Agents in browser environments. (Source: Reddit r/ClaudeAI)

Open WebUI AI Support Bot: The Open WebUI Discord channel has launched an “omniscient” question/support bot that has indexed all Open WebUI documentation, issues, and discussion content. It can effectively answer user questions about configuration, error codes, and more, aiming to improve community support efficiency. (Source: Reddit r/OpenWebUI)

AI News Aggregation Workflow: A user shared their experience building an automated news summarization workflow using tools like n8n. This system can automatically aggregate, summarize news, publish it to a website, and even get indexed by Google News. This indicates AI’s commercial potential in content generation and news dissemination. (Source: Reddit r/ArtificialInteligence)

📚 Learning

Evolution of LLM Training Eras and Inference Optimization: LLM training methods are evolving from pre-training, RLHF+PPO, and LoRA SFT to Mid-Training and RLVR+GRPO. Concurrently, research proposes lightweight architectural components like Canon Layers that significantly enhance the depth and breadth of LLM inference by facilitating lateral information flow between adjacent tokens, and can enable weaker architectures to match SOTA models, providing a cost-effective predictive path for future architectural design. (Source: rasbt, HuggingFace Daily Papers)

Application and Optimization of Multi-Turn RL in Agentic LLMs: Addressing challenges in multi-turn interactive tasks for LLM Agents in real environments, research proposes the Turn-PPO algorithm. By performing advantage estimation at the turn-level MDP rather than token-level MDP, it improves the robustness and effectiveness of PPO in multi-turn RL. This method significantly outperformed the GRPO baseline on WebShop and Sokoban datasets, especially performing better in scenarios requiring long-range reasoning. (Source: HuggingFace Daily Papers)

New Paradigm for LLM-as-a-Judge Evaluation: Sage: Existing LLM-as-a-Judge benchmarks rely on human annotation, introducing bias and being difficult to scale. The Sage evaluation suite introduces two new metrics—local self-consistency (pairwise preference stability) and global logical consistency (preference transitivity)—to evaluate LLM judgment quality without human annotation. Research found that even SOTA models exhibit significant “situational preference” issues in complex cases, highlighting the importance of clear evaluation criteria. (Source: HuggingFace Daily Papers)

Anatomy and Challenges of Embodied AI VLA Models: A systematic review of Vision-Language-Action (VLA) models, from modules and milestones to core challenges, provides a detailed analysis of the revolutionary progress of VLA models in robotics. It focuses on five major challenges: representation, execution, generalization, safety, and datasets & evaluation, offering researchers a learning guide and future research directions. (Source: HuggingFace Daily Papers)

Meta-RL Exploration and Adaptation for LLM Agents: The LaMer framework, through cross-episode training and reflection-based contextual policy adaptation, enables LLM Agents to actively explore environments and learn from feedback during testing. This Meta-RL method significantly improved Agent performance and demonstrated better generalization capabilities in environments like Sokoban, MineSweeper, and Webshop, providing a new approach for robust adaptation of Agents in complex, unknown environments. (Source: HuggingFace Daily Papers)

Research on Enhancing LLM Reasoning Capabilities: Research from Carnegie Mellon University found that the improvement of AI model reasoning capabilities is differentially influenced by pre-training, mid-training, and reinforcement learning (RL). RL can genuinely improve reasoning under specific conditions, cross-context generalization requires pre-training, mid-training is crucial, and process-aware rewards are key. (Source: TheTuringPost, TheTuringPost)

Agentic AI Adaptation Strategies, Tech Stack, and Learning Path: Research institutions including UIUC, Stanford, and Harvard have proposed four key adaptation strategies for Agentic AI: adapting Agents through tool results, training Agents using their own outputs, independently adapting tools, and training tools with feedback from fixed Agents, providing guidance for the development and optimization of Agentic AI. Additionally, there’s information on Agentic AI’s working principles, architectural features, seven common types, and a 50-step guide to mastering Agentic AI for 2025-2026. (Source: TheTuringPost, Ronald_vanLoon, Ronald_vanLoon, Ronald_vanLoon, Ronald_vanLoon, Ronald_vanLoon)

Claude XML Structured Prompt Strategy: Anthropic officially recommends using XML structured prompts to improve Claude model’s understanding and output quality. By including tags such as <task>, <context>, <constraints>, and <output_format> in requests, it can help Claude parse prompts more accurately, with significant effects especially for complex tasks. (Source: Reddit r/ClaudeAI)

End-to-End Evaluation Guide for RAG Pipelines: Qdrant shared a comprehensive guide on end-to-end evaluation of RAG (Retrieval-Augmented Generation) pipelines. The guide combines tools like RAGAS, LangGraph, Qdrant, and OPIK to demonstrate how to build a production-grade RAG evaluation pipeline, including dataset creation, LLM-as-a-Judge evaluation methods, the effectiveness of binary evaluation, and the RAG-Triad method, aimed at ensuring the reliability of RAG systems before deployment. (Source: qdrant_engine)

NVIDIA Unsloth LLM Fine-tuning Guide: NVIDIA has released a beginner’s guide to fine-tuning LLMs using Unsloth, covering training methods such as LoRA, FFT, and RL, fine-tuning timing and use cases, as well as required data volume and VRAM, and guides on local training on devices like DGX Spark, RTX GPUs. (Source: Reddit r/LocalLLaMA)

💼 Business

Chinese AI Large Model Companies Zhipu and MiniMax Line Up for IPO: Chinese large model companies Zhipu and MiniMax (Xiyu Technology) have passed the Hong Kong Stock Exchange listing hearing, pushing for an IPO and poised to become the world’s first large model company to go public. Both companies are valued in the tens of billions of RMB, but there is still a gap compared to OpenAI’s hundred-billion-dollar valuation. Zhipu focuses on B2B and G2B markets, offering MaaS platform services; MiniMax, on the other hand, is betting on multimodal AI, deeply cultivating C2C products and pursuing a global strategy. Both companies face the challenge of soaring revenue but significant losses. (Source: 36氪)

JPMorgan CEO on AI’s Impact on Job Market and Future Skills: JPMorgan CEO Jamie Dimon believes AI will eliminate repetitive jobs but will not lead to widespread unemployment. He emphasized that the key to future career success lies in mastering three skills: technological fluency (effectively using AI tools), judgment (interpreting AI output and making high-stakes decisions), and human skills (communication, empathy, leadership). JPMorgan invests over $12 billion annually in technology, with AI already applied in hundreds of internal scenarios. (Source: Reddit r/ArtificialInteligence)

AI Accelerator Founderscape.ai: Founderscape.ai is an upcoming MMORG (Massively Multiplayer Online Role-Playing Game) platform for founders, designed to help entrepreneurs from idea to IPO, and even reach trillion-dollar valuations, leveraging AI to accelerate the startup process. (Source: amasad)

🌟 Community

AI’s Impact on the Job Market and Expert Warnings: In 2025, nearly 55,000 jobs in the US were replaced by AI, with total layoffs reaching 1.17 million. Turing Award winner Yoshua Bengio and Anthropic CEO Dario Amodei both warned that AI will lead to mass unemployment and labor market collapse, and new jobs will not be enough to offset those replaced. In the future, only those who master unique human skills such as AI tools, judgment, interpersonal communication, and cross-domain collaboration will be able to adapt. (Source: 36氪, Reddit r/ArtificialInteligence, Reddit r/ChatGPT, ClementDelangue)

LLM Hallucinations and the ‘AI Psychosis’ Phenomenon in Scientific Discovery: As LLM capabilities improve, the phenomenon of “AI psychosis” (LLM psychosis) has emerged, where models or users mistakenly believe they have achieved significant breakthroughs in areas they don’t understand, for example, claims that LLMs can prove the Navier-Stokes equations. Experts warn that LLMs’ rapid responses can lead to a false sense of understanding, but even 1% hallucination can cause serious misinformation, potentially leading to excessive skepticism towards beginner’s work and a return to credentialism, slowing down scientific progress. (Source: teortaxesTex, demishassabis, hyhieu226, arohan)

Controversy Over AI Browser Utility: Widespread skepticism exists on social media regarding the utility of AI browsers (e.g., Comet, ChatGPT Atlas). Users find that their automation features perform poorly in complex tasks, setup, maintenance, and debugging are time-consuming, and they may lead to degraded device performance. Developers point out that these tools are still in early stages, representing “more promise than reality,” but are expected to solve complex problems through agent models and visual state management in the future. (Source: Reddit r/artificial, TheTuringPost, TheTuringPost)

AI’s Impact on Content Creation and Information Trust: With the proliferation of AI-generated content, user trust in AI answers has increased, with many preferring to use AI summaries directly rather than browsing full websites. This prompts content creators to adjust strategies, focusing on how to make content discoverable and summarized by AI models. Concurrently, some argue that people trust AI for its speed and comprehensive capabilities, but still need to verify information via websites; AI is a first stop, not the ultimate authority. (Source: Reddit r/ArtificialInteligence)

Debate on the Existence and Definition of AGI: Yann LeCun argues that Artificial General Intelligence (AGI) does not exist, and human intelligence is an illusion of high specialization. DeepMind CEO Demis Hassabis counters that the brain is extremely general, and AI foundation models are approximate Turing machines with the potential to learn anything computable. Additionally, a paper proposes a definition of AGI based on “entity fidelity,” stating that intelligence is the ability to generate entities of the same concept based on conceptual examples, aiming to provide an evaluable, species-agnostic standard for intelligence. (Source: demishassabis, Reddit r/ArtificialInteligence)

AI’s Impact on Video Creation Acceleration in the Industry: A user shared their experience using AI tools (Claude Code, Gemini CLI, ElevenLabs, Remotion) to create an 18-minute animated explainer video in a few days, expressing shock at the outcome. They believe that even early versions of AI tools can achieve a “good enough” professional level, which will put a large number of intermediate motion designers, animators, and video editors at risk of unemployment, signaling that industry transformation is underway. (Source: Reddit r/ArtificialInteligence)

Future Vision and Challenges of AI Agents: Sam Altman predicts that AI’s superhuman persuasiveness will arrive before general intelligence, potentially leading to unforeseen consequences. Companies like MiniMax are committed to building Agentic models and workspaces capable of solving complex real-world problems, emphasizing that visible state management is crucial for trust and usability. (Source: teortaxesTex, MiniMax__AI)

Discussion on ClaudeAI Model Performance and Memory Function: The Reddit community discussed ClaudeAI’s usage limits, bugs, and performance issues, as well as the power and potential impact of its memory function. Users found that Claude’s memory function can remember a large amount of historical conversation details, greatly improving work efficiency, but some users chose to disable it due to its overly aggressive memory usage. (Source: Reddit r/ClaudeAI, Reddit r/ClaudeAI)

AI Applications in Retail and the Human API: A machine learning researcher, drawing from his experience as a part-time stocker at Walmart, revealed the challenges AI/automation faces in retail environments. He observed that human employees are often hired to handle areas where systems fail, such as inventory drift, visual confusion, spoilage inference, and route optimization failures, effectively acting as “human APIs” for machines. This suggests that existing automation systems still perform best in environments designed for machines. (Source: Reddit r/ArtificialInteligence)

Challenges in LLM Long-Context Evaluation: Claude models performed poorly in long-context evaluations, sparking community discussion. While Anthropic’s Opus 4.5 showed speed improvements, it still faces challenges in long-context recall and understanding, which is crucial for Agent tasks that require processing large amounts of information. (Source: scaling01, dejavucoder)

💡 Other

AI-Driven Military Technology and Drone Applications: Reports from the Ukrainian battlefield indicate an increasing role for drones in military operations, including coordinating airstrikes and conducting FPV drone swarm attacks. This shows that significant military capabilities are being invested in drone forces, foreshadowing future warfare potentially involving confrontations with industrialized drone forces. (Source: teortaxesTex, jpt401)

US Schools’ Deployment of AI Surveillance Technology Sparks Controversy: Schools across the US are rolling out AI-powered surveillance technologies, including drones, facial recognition, and even bathroom listening devices. This has raised concerns among students about privacy and trust, with 32% of students reporting feeling constantly monitored, and their willingness to report mental health issues to educators decreased. (Source: Reddit r/artificial)

Firefox to Allow Users to Disable All AI Features: Mozilla Firefox has confirmed it will soon allow users to completely disable all AI features in the browser. This move aims to address some users’ dissatisfaction with AI features being forcibly pushed, providing users with more control. (Source: Reddit r/ArtificialInteligence)

🎯 Trends

🧰 Tools

📚 Learning

💼 Business

🌟 Community

💡 Other

Related Tags

Related Posts

AI Daily – 2026-07-20

AI Daily – 2026-07-19

AI Daily – 2026-07-18