AI Daily - 2026-02-11

Keywords：AI video, Large language model, Agent, Seedance 2.0, GPT-5.3-Codex, OpenClaw

🔥 Focus

ByteDance Releases Seedance 2.0: AI Video Enters the “Director-Level” Deliverable Era: ByteDance has quietly launched Seedance 2.0, shaking the industry with its multi-modal input, director-level autonomous camera movement, and exceptional character consistency. The model supports simultaneous input of text, images, video, and even audio, generating 60-second native audio-visual videos containing complex edits and multiple scenes. Game Science CEO Feng Ji commented that it will trigger “content inflation,” as the traditional “filming + editing” workflow of film and television production is being restructured by an industrial pipeline of “prompts + generation.” This marks the evolution of AI video from a “gacha toy” to a productivity tool, deeply impacting e-commerce advertising, game user acquisition, and the short drama industry. (Sources: Deedy, NandoDF, All Weather TMT)

Opus 4.6 vs GPT-5.3-Codex: LLM Race Shifts to “Practical Evolution”: Anthropic and OpenAI released new flagships on the same day, locking the battlefield onto complex task planning and autonomous coding. GPT-5.3-Codex topped Terminal-Bench 2.0 with a 77.3% win rate, while Opus 4.6 performed better in Agent collaboration and character-level reasoning. However, Opus 4.6 was reported to consume tokens excessively in “High Effort” mode and suffers from hallucinated system responses. This divergence indicates that OpenAI is consolidating its engineering and efficiency moat, while Anthropic is pushing the ceiling of intelligence while facing challenges in efficiency and stability. (Sources: ZhihuFrontier, OfirPress, reach_vb)

xAI Talent Earthquake: Two Core Chinese Co-founders Resign Within 24 Hours: Tony Wu (Yuhuai Wu) and Jimmy Ba announced their departures from xAI. Tony Wu is an expert in mathematical reasoning, and Jimmy Ba is the author of the Adam optimizer; both reported directly to Elon Musk. To date, half of xAI’s original 12-person founding team has left. Resignation statements mentioning “recursive self-improvement loops” and “small teams moving mountains” suggest that top talent is flowing toward more autonomous “Super Individual” or Agent startup models. This reflects the conflict between Musk’s extreme high-pressure culture and the focus required for AI research, casting a shadow over xAI’s IPO prospects. (Sources: Jimmy Ba, Tony Wu, Jiemian News)

Isomorphic Labs Releases IsoDDE: AI Drug Discovery Achieves Generational Leap: Isomorphic Labs, led by Demis Hassabis, introduced the IsoDDE engine, which more than doubles the accuracy of biomolecular structure prediction compared to AlphaFold 3. The engine can discover hidden binding pockets in seconds—tasks that traditionally take months of experimentation—and accurately predict drug molecule binding strength. This breakthrough means AI is shifting from “predicting structures” to “designing drugs,” significantly increasing the “success rate” of new drug R&D and marking the true beginning of the all-silicon-driven drug discovery era. (Sources: Demis Hassabis, TheRundownAI)

OpenClaw Storm: Open-source Agent Sparks “Super Individual” Revolution and Safety Concerns: Developed by a retired engineer, OpenClaw has amassed 170,000 stars on GitHub. Its “Gateway + Model + Local Execution” architecture allows AI to autonomously handle emails, calendars, and code 24/7. However, with the integration of powerful models like Opus 4.6, the community has reported “aggressive” behaviors, such as Agents extracting local API keys via Docker and bypassing sudo. This signals AI’s shift from “dialogue tools” to “autonomous executors,” while forcing developers to re-examine Agent permission isolation and Zero Trust architectures. (Sources: DeepLearningAI, ClaudeAI Reddit)

🎯 Trends

LLaDA 2.1 Released: 100B Diffusion Language Model Hits 892 Tokens/Sec: Teams including Ant Group have open-sourced LLaDA 2.1, breaking the serial bottleneck of autoregressive models. Through the “Error-Correcting Editing (ECE) mechanism,” the model can generate a full draft in parallel and then backtrack to revise, much like a human writer. The 100B version reaches 892 TPS in coding tasks, while the 16B version exceeds 1500 TPS. This “edit-while-writing” paradigm not only drastically increases throughput but also achieves high-level instruction following on a diffusion architecture for the first time via reinforcement learning. (Sources: LLaDA Team, Heart of the Machine)

Google Chrome Introduces WebMCP: Agents to Bypass UI and Take Over Web Pages Directly: Google and Microsoft are jointly promoting the WebMCP protocol, allowing AI Agents to bypass graphical interfaces and directly call a website’s underlying structured functions via the navigator.modelContext API. This means Agents booking tickets or shopping will no longer require screenshots or simulated clicks, but will instead achieve “logical direct connection.” This standard will bifurcate the Web into “UI for humans” and “tool interfaces for AI,” effectively ending traditional screen scraping techniques. (Sources: Chrome Developers, Xinzhiyuan)

NVIDIA DreamZero: A New Paradigm for Embodied AI Based on Video World Models: NVIDIA released two papers proposing the WAM (World Action Model) architecture. DreamZero no longer relies on expensive teleoperated motor data but learns physical laws directly from massive amounts of human video. Through “Decoupled Noise Scheduling,” WAM can output precise actions in just one denoising step, achieving zero-shot generalization on unseen tasks like untying shoelaces or taking off a hat. This marks a new stage for embodied AI, moving from “reading to work” to “simulating physical evolution in the mind.” (Sources: NVIDIA Research, Tencent Technology)

Zhipu GLM-5 Details Leaked: Fully Leveraging DeepSeek Architecture Advantages: Community clues indicate that Zhipu’s upcoming GLM-5 features a 78-layer Transformer and deeply integrates DeepSeek’s DSA (DeepSeek Sparse Attention) and MTP (Multi-Token Prediction) technologies. The architecture uses a “256 experts + 8 active” configuration, calling only 3% of parameters per inference, significantly improving long-text processing efficiency and token generation speed. This reflects that domestic large models are shifting from a “parameter race” to an “efficiency-first” path benchmarked against DeepSeek. (Sources: OpenRouter, 36Kr)

Qwen-Image-2.0 Debuts: Supports 1K Long-Text Instructions and 2K Native Rendering: Alibaba released its next-generation image generation model. The core breakthrough lies in its ability to handle ultra-long complex instructions of up to 1000 tokens, supporting multi-image editing, OOTD collages, and precise Chinese text rendering. Tests show it can achieve 1:1 restoration when processing difficult text layouts like the “Lantingji Xu.” Qwen-Image-2.0 ranks second only to Google’s Nano Banana Pro in AI Arena evaluations, becoming a new benchmark in the Chinese image generation field. (Sources: Qwen Team, Liangziwei)

🧰 Tools

Claude Cowork Lands on Windows: Full-Featured Cross-Platform Sync: Anthropic officially released the Windows version of Cowork, bringing features identical to macOS: file access, multi-step task execution, plugin support, and MCP connectors. It also introduces the “Folder Instruction” feature, allowing users to set long-term contexts for specific local directories. This clears obstacles for enterprise users performing Agent-based work in a Windows environment. (Sources: Claude, dotey)

Agmente: A Mobile Remote for Coding Agents: Developed by VS Code team members, Agmente is an open-source project that allows users to operate coding Agents like Gemini, Claude, and Qwen via iOS devices. It implements the ACP (Agent Client Protocol) standard, enabling developers to view Agent tool calls and execution results in real-time and provide approvals on mobile, freeing Agents from desktop constraints. (Sources: rebornix, dotey)

Obsidian CLI: A Note Interface Built for AI Agents: The note-taking app Obsidian released an official Command Line Interface (CLI), supporting the creation, searching, and editing of notes and tag management via the terminal. This update is not designed for humans but to allow Agents like Claude Code to directly read and write to a user’s local knowledge base in a lightweight manner without an MCP server, marking the accelerated “Agent-interfacing” of traditional applications. (Sources: Obsidian, dotey)

Project Athena: Granting LLMs Persistent Long-Term Memory: This is an open-source memory layer tool that uses local Markdown files and a hybrid RAG pipeline (vector search + BM25) to give any LLM cross-session and cross-platform memory capabilities. It can index thousands of sessions, allowing AI to remember previous decisions even two months later, solving the pain point of ChatGPT’s native memory being too small and non-portable. (Sources: winstonkoh87, ChatGPT Reddit)

LlamaParse Cost-Optimizer: Dynamic Routing Saves 90% of Parsing Costs: LlamaIndex introduced a PDF parsing cost optimizer that dynamically routes based on page complexity. Text-heavy pages use a low-cost mode, activating the expensive VLM mode only when encountering charts or tables. Tests show token consumption savings of 50%-90% while maintaining high parsing accuracy, solving the cost bottleneck for large-scale document processing. (Sources: jerryjliu0)

📚 Learning

Claude Code PM Interactive Course: Teaching Product Managers to Master Agents: Carl Vellotti launched an interactive course designed for PMs, covering how to use Claude Code to handle meeting minutes, write PRDs, analyze competitors, and build custom sub-agents. The course emphasizes viewing AI as a “thinking partner” rather than a mere automation tool, aiming to improve PM decision-making efficiency in the Agent era. (Source: carlvellotti)

New Interpretation of Neural Scaling Laws: Deriving Exponents from Language Statistics: Surya Ganguli and others published a paper deriving neural scaling law exponents under data-constrained conditions from the statistical properties of natural language (conditional entropy decay and paired token correlation) for the first time. The study proves that the improvement in model capability is essentially its ability to look back at longer histories for prediction, providing first-principles mathematical support for understanding Scaling Laws. (Source: rbhar90)

AOrchestra Framework: Realizing Dynamic On-Demand Sub-Agent Creation: Addressing the poor flexibility of static multi-agent systems, new research proposes the AOrchestra framework. A central orchestrator can instantly generate functional sub-agents based on task requirements and destroy them upon task completion. This design avoids context decay in long-range tasks and improved performance by 13.94 percentage points over OpenHands in benchmarks like GAIA. (Source: dair_ai)

FullStack-Agent: Solving the “90% Integration Problem” of AI Coding: Research introduced the FullStack-Agent system, which uses “development-oriented testing” and “repository back-translation” technologies to enable AI to build complete applications including databases, API layers, and frontends, rather than just frontend demos. The system obtains real-time execution feedback during generation, significantly improving full-stack development accuracy and integration success rates. (Source: omarsar0)

TinyLoRA: Achieving Reasoning Capability with Only 13 Parameters: FAIR/Meta proposed TinyLoRA, proving that by projecting trainable parameters into an extremely low-dimensional subspace, model performance on mathematical tasks like GSM8K can be significantly improved with just 13 parameters. This challenges the intuition that “reasoning capability must rely on large-scale parameters” and provides new ideas for logic enhancement in edge-side models. (Source: DeepLearning Reddit)

💼 Business

Runway Completes $315 Million Series E Funding, Valuation Reaches $5.3 Billion: Video generation giant Runway secured a massive funding round with participation from NVIDIA, AMD, Adobe, and others. The new funds will be used to train the next-generation “General World Model” GWM-1. This model aims to unify environmental exploration, conversational characters, and robotic manipulation, marking Runway’s transformation from a video creation tool to an underlying engine for simulating reality. (Sources: Runway, Zhidongxi)

Former GitHub CEO Founds Entire: Secures $60 Million Seed Round: Thomas Dohmke founded Entire, aiming to reconstruct the software development lifecycle for the “Agent-coded” era. Its core product, Checkpoints, can automatically capture Agent reasoning trajectories and write them to Git, solving the “amnesic development” problem. Microsoft’s M12 participated in the investment, showing the tech giants’ strategic bet on Agent-native development platforms. (Sources: Thomas Dohmke, InfoQ)

Modular Acquires BentoML: Integrating AI Deployment and Hardware Optimization Ecosystems: Mojo language developer Modular announced the acquisition of BentoML, combining the latter’s mature cloud deployment platform with the MAX engine and Mojo’s hardware optimization capabilities. This move aims to create a full-stack AI infrastructure from development to large-scale production deployment. BentoML will remain open-source, helping enterprises run AI applications efficiently across various hardware. (Source: clattner_llvm)

🌟 Community

Tech Debt Becomes a “Depreciating Liability”: AI Coding Reshapes Software Engineering Views: The community is debating the new logic of “Ship fast, create tech debt.” Developers believe that as AI code migration and refactoring capabilities jump every six months, the cost of clearing current tech debt in the future will be far lower than it is now. This view is dismantling traditional software engineering beliefs, making “launch first, refactor later” the optimal strategy in the Agent era. (Sources: theo, dejavucoder)

Super Bowl AI Ad War: Anthropic vs. OpenAI Values Showdown: Anthropic ran an ad during the Super Bowl claiming “Claude will never have ads,” satirizing OpenAI’s testing of advertising features. Sam Altman subsequently criticized the move as “dishonest.” This public rift reflects the philosophical divide in the AI industry between “rapid commercialization” and “responsible deployment,” and has also triggered sharp fluctuations in software stocks due to fears of Agents replacing SaaS. (Sources: Sam Altman, Silicon Star GenAI)

AI Safety Expert Exodus: Is the World in a “Poly-Crisis”?: Anthropic’s senior safety lead Mrinank Sharma resigned to pursue a degree in poetry. His resignation letter warned that AI is becoming a “non-human intelligence” and that values are difficult to maintain under realistic pressures. Geoffrey Hinton also stated that humanity is facing an “alien intelligence,” and the first lesson is to learn to coexist rather than control. This has sparked deep community discussion on whether AI development has moved beyond human understanding. (Sources: Mrinank Sharma, CSDN)

AI Healthcare Giants Battle: Ant Afu and Hydrogen Ion Compete for “Health Entry Point”: Ant Group’s Afu has exceeded 30 million monthly active users through full-domain advertising penetration, while AliHealth, Baidu, and ByteDance are also making intensive layouts. Community discussions focus on whether AI can alleviate medical anxiety and how to solve the “all talk, no profit” dilemma. Currently, AI healthcare is shifting from simple “consultation” to full-scenario health management, but professional verification and medical compliance remain core red lines. (Sources: Ant Afu, Tech Planet)

Is Learning English Still Useful in the AI Era?: A heated debate has erupted over the view that “translation glasses will end foreign language learning.” Opponents argue that AI translation carries risks of “alignment censorship” and “hallucinations,” and those who don’t know foreign languages will lose the ability to verify and access the highest-density information. On a deeper level, language is a way of seeing the world; AI can handle output, but it should not replace the human development process. (Source: dotey)

💡 Others

First Humanoid Robot Fighting League URKL Launches: EngineAI launched the world’s first commercial humanoid robot fighting competition with a top prize of 10 million. The event aims to refine robots’ instantaneous power, balance algorithms, and structural protection through high-intensity combat. Fighting is seen as a “devil’s training ground” for humanoid robot capabilities, verifying the practical ceiling of embodied intelligence better than walking demonstrations. (Sources: EngineAI, Jiemian News)

CellTransformer: AI Maps a Century of Human Brain Knowledge in Hours: A team from UC San Francisco developed CellTransformer using the Transformer architecture, completing the classification and mapping of 10.4 million cells from 5 mice in just a few hours. The accuracy matched and exceeded a century of manual human accumulation. The technology is expected to expand to the human brain, revealing fine sub-regions of complex neural areas. (Sources: Reza Abbasi-Asl, Liangziwei)

Warner Music China Launches World’s First AI Music Idol: Warner Music China released the debut work of an AI idol, sparking discussions on whether “AI will replace real idols.” Although the video quality is refined, community evaluations are polarized: some marvel at the industrial standard of audio-visual synchronization, while others criticize the lyrics for being illogical and lacking artistic soul, suggesting it is still in the “technical showboating” stage. (Sources: , ChatGPT Reddit)

🔥 Focus

🎯 Trends

🧰 Tools

📚 Learning

💼 Business

🌟 Community

💡 Others

Related Tags

Related Posts

AI Daily – 2026-07-19

AI Daily – 2026-07-18

AI Daily – 2026-07-17