AI Daily - 2026-07-04

Keywords：Large Model, AI Chip, Agent, GPT-5.6 series, Jalapeño inference chip, Multi-model collaborative orchestration

🔥 Focus

OpenAI Releases GPT-5.6 Series and Self-Developed Inference Chip Jalapeño : OpenAI has launched the GPT-5.6 series of models, including Sol, Terra, and Luna, featuring long-chain reasoning and multi-agent collaboration modes. However, due to US government restrictions, the initial release is only open to select institutions. Meanwhile, OpenAI has co-developed its first custom LLM inference chip, Jalapeño, with Broadcom. Manufactured using TSMC’s process, it aims to drastically cut the daily inference costs of ChatGPT and improve response speeds. This marks a shift in competition among AI giants from a pure algorithm arms race to the depths of the underlying custom hardware supply chain. (Source: 36氪)

Claude Fable 5 and Mythos 5 Return Amid Safety Controversies : Returning after regulatory turmoil, Claude Fable 5 has drawn negative reviews due to overly strict safety classifiers, which have falsely flagged a large number of normal requests and automatically downgraded them to Opus. Meanwhile, during testing, hackers discovered that Fable 5 outputs unfiltered “unreadable chains of thought” during deep reasoning, using self-created symbols and interjections for shorthand. This indicates that reasoning LLMs, under reinforcement learning, are spontaneously moving away from human language to improve efficiency, while also exposing new challenges for AI safety and interpretability. (Source: 量子位)

Shengshu Technology Releases Real-Time Interactive Video Large Model Vidu S1 : Shengshu Technology has launched Vidu S1, a real-time interactive video generation model that supports voice-controlled video content generation, infinite-length real-time generation, and 540P/25FPS real-time interaction. Users only need to upload a starting frame image and a custom voice tone, and the model can generate the character’s expressions and actions in real time, running smoothly on consumer-grade GPUs. This breakthrough transitions video generation from “offline playback” to the era of “two-way real-time interaction,” significantly lowering the barrier to developing digital humans and virtual companionship. (Source: 机器之心)

Damo Academy Releases ElementsClaw AI Agent for Superconducting Materials : Alibaba’s Damo Academy, in collaboration with several universities, has released ElementsClaw, the first AI agent dedicated to discovering superconducting materials. Combining a 1-billion-parameter geometric deep graph neural network with large language models, the agent screened 2.4 million stable crystals in just 28 GPU hours, predicted 68,000 potential superconductors, and successfully synthesized 4 entirely new superconducting materials previously unknown to humanity in experiments. This achievement significantly improves the R&D efficiency of superconducting materials and drives the application of AI in hard sciences. (Source: 量子位)

Baiyao Technology Releases Virtual Cell World Model AURA CellOS : Baiyao Technology has released AURA CellOS, an AI virtual cell world model based on the LLM-JEPA architecture. With 12B parameters, the model was trained on 390 million human single-cell transcriptomic data points, covering over 40 tissues and 260 cell types. Through multi-view representation learning and JEPA joint embedding prediction, CellOS enables the model to understand the intrinsic evolutionary patterns of cell states for the first time, achieving world-leading performance in tasks such as perturbation response prediction, and providing a key computational foundation for AI pharmaceuticals and cell therapy. (Source: 量子位)

🎯 Dynamics

Microsoft Releases First Independent Reasoning Large Model MAI-Thinking-1 : Microsoft introduced the MAI-Thinking-1 reasoning model at its Build conference. It features a 1-trillion-parameter Mixture of Experts (MoE) architecture and was trained entirely from scratch without any distillation or fine-tuning from other models. Supporting a 250k-token context window, the model performs strongly on mathematics and STEM reasoning benchmarks, even surpassing Claude Sonnet 4.6 on AIME 2025. This marks a critical step for Microsoft in breaking its dependence on OpenAI technology and building its own autonomous AI tech stack. (Source: DeepLearning.AI Blog)

Dongfeng Yijing Partners with Huawei Qiankun Smart Driving ADS 5 for Extreme Scenario Testing : Dongfeng’s high-end brand Yijing X9, equipped with Huawei Qiankun Smart Driving ADS 5, completed real-world testing in extreme scenarios such as nighttime, heavy rain, and sudden pedestrian crossings (“ghost cuts”). ADS 5 adopts the brand-new WEWA 2.0 architecture, integrating multi-agent game theory and safety risk field algorithms to reduce costs and improve efficiency while lowering collision risks. The tests showed that it could achieve zero-intervention autonomous cruising in unknown warehouse scenarios, marking a generational leap for smart driving systems from rule-driven to data- and AI-agent-driven paradigms. (Source: 量子位)

Sakana AI Releases Multi-Model Collaborative Orchestration System Fugu Series : Japanese AI startup Sakana AI has released the Fugu and Fugu-Ultra models. Rather than relying on a single model, this system dynamically dispatches underlying models (such as Claude, GPT, Gemini, etc.) at the API gateway layer using evolutionary algorithms and reinforcement learning to collaboratively complete complex tasks. Fugu-Ultra achieved a record 95.5% accuracy on the GPQA-Diamond science benchmark. This indicates that multi-model collaborative orchestration has become an important trend for avoiding single-vendor lock-in and optimizing inference costs. (Source: DeepLearning.AI Blog)

DeepSeek DSpark Acceleration Technology Successfully Ported to Local Mac : Open-source engineer Abdur Rahim has ported DeepSeek’s DSpark speculative decoding technology to Apple Silicon, releasing the mlx-dspark project. On an M4 Pro chip, the project boosted local inference speeds for Gemma-4 12B and Qwen3-4B by 1.6x and 1.4x respectively, with outputs completely identical to the original models under the same temperature sampling. This attempt proves that speculative decoding technology has massive potential for cost reduction and efficiency improvement on consumer-grade edge devices. (Source: 36氪)

SJTU DENG Lab Open-Sources World-Language-Action Model WLA : The DENG Lab at Shanghai Jiao Tong University has open-sourced the WLA model, unifying world modeling, language reasoning, and robotic action generation into a single 2B-parameter autoregressive framework. WLA can not only predict fine-grained physical dynamics but also predict subtask sequences through natural language and maintain a memory buffer for long-horizon planning. On the long-term memory benchmark RMBench, WLA achieved a 56.5% success rate, nearly double that of the runner-up method, with an inference latency of just 40ms. (Source: 机器之心)

Tencent Cloud to Directly Provide DeepSeek-V4 Model Services : Tencent Cloud announced that it will offer DeepSeek-V4 model services on its TokenHub platform starting in mid-July, running directly from DeepSeek’s own network. This partnership benefits from Tencent Cloud’s technical support while demonstrating DeepSeek’s confidence in its own compute clusters. It provides enterprise customers with a more resilient and network-guaranteed computing infrastructure option when calling frontier open-source large models. (Source: teortaxesTex)

🧰 Tools

LangChain Open-Sources Universal Memory Wiki Agent OpenWiki : LangChain has open-sourced the OpenWiki project, which garnered 1.7k GitHub stars within just 3 days of its release. The project currently focuses on building structured memory wikis for codebases, solving the issue of agents forgetting key information in long-horizon tasks by automatically extracting and organizing project context. The development team plans to expand it to more general data sources like Notion, Google Drive, Slack, and web search to build a global agent memory hub. (Source: LangChain)

Vercel Releases Eve Agent Framework and Integrates LlamaIndex : Vercel has launched the Eve agent framework, and LlamaIndex quickly built an integration template for it. This template provides Eve with a set of read-only filesystem tools, enabling it to resolve paths, read text, and work with the LiteParse tool to parse unstructured documents into clean Markdown format. This combination offers agents an out-of-the-box, reliable workflow to efficiently navigate and understand complex local document collections. (Source: jerryjliu0)

Hugging Face Releases Automatic Optimization Framework for LLM Prompt Fine-Tuning : Hugging Face demonstrated an automatic prompt fine-tuning framework in its Harness Optimization project. The framework uses Claude as a proposer to rewrite the agent’s peripheral prompts and tool-calling code through automatic iteration and validation, without modifying the underlying model weights. Tests showed that this method improved a frozen open-source model’s score in complex legal evaluations from 0% to a level comparable to Sonnet 4.6, while reducing task costs by 7x. (Source: ClementDelangue)

Fable 5-Based PPT Design Tool baoyu-design Updated : Developer Baoyu has updated his open-source baoyu-design agent skill, adding support for PPT animations and AI image generation calls. Leveraging Fable 5’s deep understanding of the PPTX XML format, the tool successfully overcomes the animation export limitations that Opus 4.8 previously could not handle. Users can now directly generate HTML-formatted PPTs with animations and seamlessly export them as editable PPTX files for Keynote or PowerPoint. (Source: dotey)

Termiprotocol Builds 3D Virtual Office for Claude Code : Developers have launched the open-source project Termiprotocol, building a 3D virtual office interface for terminal agents like Claude Code and Codex. Every action executed by the agent in the terminal (such as reading/writing files, web searching, running code) is presented in real time as a 3D robot typing or flipping through filing cabinets in the office. It also features intuitive token consumption monitoring and task boards, greatly enhancing the visualization and fun of agent workflows. (Source: Reddit r/ClaudeAI)

Hugging Face and Cerebras Launch Gemma 4 Voice Demo : Hugging Face and Cerebras have collaborated to build a fully open-source real-time voice interaction demo. Based on the Gemma 4 model and Cerebras’ ultra-low latency inference hardware, the demo achieves an ultra-fast speech-to-speech conversation experience. Users can directly test, fork, and tweak the project on Hugging Face Spaces, providing an excellent template for the open-source community to develop low-latency voice assistants. (Source: huggingface)

📚 Learning

ByteDance Releases EdgeBench, an Evaluation Benchmark for Long-Horizon Agent Learning : ByteDance’s Seed team has released EdgeBench, an evaluation benchmark designed to study how agents continuously learn from environmental feedback over ultra-long cycles of 12 to 72 hours. After testing agents for a cumulative 38,000 hours of runtime, researchers found that the relationship between agent performance improvement and environment interaction time precisely fits a log-sigmoid function, proving that accumulating and reusing task experience is key to driving long-term agent progress. (Source: arankomatsuzaki)

Samsung and Peking University Release LiveClawBench, a Systematic Benchmark for AI Agents : Samsung’s LLM team, in collaboration with Peking University and other institutions, has released LiveClawBench to evaluate the performance of personal assistant agents in complex workflows. The benchmark contains 134 executable tasks and proposes a three-dimensional complexity factor system. Experiments show that for frontier models, the task domain only explains about 9.6% of score fluctuations, whereas the task’s “complexity profile” has an explanatory power of 18.6%, revealing that cross-service dependencies and goal resolution are the primary causes of agent instability. (Source: 机器之心)

Renmin University Publishes Data Agent Benchmark CoDA-Bench : A research team from Renmin University of China has introduced CoDA-Bench, a benchmark designed to jointly evaluate the code intelligence and data intelligence of agents. The benchmark places agents in a complex Linux sandbox containing over 1,000 distractor files, requiring them to autonomously explore the file system, locate relevant data, and write code for analysis. Experiments show that even the best-performing system achieved an accuracy of only 61.1% on CoDA-Bench, revealing that “failing to find the right data” is the core bottleneck of current code agents. (Source: 机器之心)

Meta Discovers “Overthinking” Flaw in Quantized Reasoning Models and Proposes Solution : In a study, Meta revealed a peculiar failure mode in quantized reasoning models: rather than just experiencing a drop in capability after quantization, the models begin to “overthink.” In up to 52% of failure cases, the model had already reached the correct answer midway through reasoning, but because quantization increased the sampling probability of hesitant tokens (such as “wait,” “but,” “maybe”), it fell into endless self-reflection and ultimately overturned the correct conclusion. By applying a small decoding penalty to hesitant tokens, Meta successfully reduced overthinking errors by 58%. (Source: TheTuringPost)

Stanford and UC Berkeley Release Robotic Vision-Language Reward Model RoboReward : A research team from Stanford and UC Berkeley has introduced RoboReward, a 4B/8B-parameter robotic vision-language reward model. By text-relabeling and truncating videos of successful actions, the study generated a large number of high-quality negative and incomplete attempt samples, addressing the pain point of traditional reinforcement learning lacking failure cases. Experiments show that using RoboReward for reward evaluation significantly outperforms similar general-purpose large models in real robotic arm grasping and drawer-opening tasks. (Source: DeepLearning.AI Blog)

CMU Open-Sources All 23 Video Lectures and Materials for Advanced NLP Course : Carnegie Mellon University (CMU) Professor Sean Welleck has uploaded all 23 video lectures of his “Advanced Natural Language Processing (ANLP Spring 2026)” course to YouTube, and open-sourced the accompanying lecture notes and 20 code examples. The course systematically covers seven major topics, including NLP basics, model architectures, learning and inference, evaluation methods, reinforcement learning and agents (RL & Agents), and model scaling and efficiency, making it an excellent resource for deep learning of the underlying principles of large models. (Source: gneubig)

Sebastian Raschka’s New Book “Build a Reasoning Model (From Scratch)” Officially Published : Renowned machine learning expert Sebastian Raschka’s new book, Build a Reasoning Model (From Scratch), has been officially published. The 440-page, full-color book provides runnable code examples. It systematically introduces the underlying logic of reasoning models, including cutting-edge technologies such as inference scaling, reinforcement learning (RL), and model distillation, making it an excellent read for building foundational theories of large models. (Source: cwolferesearch)

💼 Business

Unitree Robotics’ STAR Market IPO Registration Approved by CSRC : The China Securities Regulatory Commission (CSRC) has officially approved Unitree Robotics’ registration application for an initial public offering on the STAR Market. Unitree plans to publicly issue no less than 10% of its shares, raising 4.202 billion RMB, with a valuation of approximately 42 billion RMB. The prospectus shows that Unitree achieved a revenue of 1.708 billion RMB in 2025, with a net profit of 591 million RMB. Its humanoid robot shipments exceeded 5,500 units, accounting for a 32.4% global market share. This marks that the first profitable embodied AI complete-machine enterprise in China is about to officially list on the A-share market. (Source: 36氪)

Silicon Flow Submits Prospectus to HKEX, Aiming to Be the “First Token Factory Stock” : AI infrastructure startup Silicon Flow has officially submitted its prospectus to the Hong Kong Stock Exchange (HKEX). Positioned as a “Token wholesaler” for the AI industry, the company provides unified multi-model access services through standardized APIs. Following its Series B+ funding round, its valuation reached 7.74 billion RMB, with shareholders including giants like Alibaba, Tencent, Huawei, and Meituan. The prospectus shows that the company had 716,000 public cloud paying customers in 2025, but due to massive early investments in compute and promotion, it recorded a net loss of 345 million RMB. (Source: 36氪)

Guangxiang Technology Completes Angel Round Financing of Hundreds of Millions of RMB : Embodied AI startup Guangxiang Technology announced the completion of an angel round of financing totaling hundreds of millions of RMB, co-invested by Zhuhai Technology Group, Industrial Securities Capital, Pinehills Capital, and others. Incubated by a team from Tsinghua University, Guangxiang Technology has established a technical route of “physics-native intelligence” and built a system consisting of a reinforcement learning algorithm matrix, high-fidelity physical data assets, and an intelligent development platform. Its self-developed embodied robot, Phi-Bot X1, has achieved high-precision continuous operation validation at automotive production line welding and loading stations. (Source: 量子位)

🌟 Community

Alibaba Issues Internal Notice to Completely Ban Claude and Claude Code : Alibaba has issued an urgent internal notice requiring employees to completely uninstall and disable Anthropic products, including Claude and Claude Code, by July 10. Previously, reverse engineering by the security community revealed that Claude Code had a hidden detection mechanism for Chinese users and AI labs. It collected system time zones, proxies, and API keywords through obfuscated code, and transmitted environmental fingerprints back to servers by tampering with Unicode characters in prompts, raising corporate concerns over core code leaks. (Source: 36氪)

Godot Game Engine Officially Announces Tightening of AI Code Contribution Policy : Open-source game engine Godot has announced changes to its contribution policy, completely banning the use of AI to generate large blocks of code and the automatic submission of PRs by AI agents. The officials stated bluntly that although AI lowers the barrier to writing code, it has led to an influx of low-quality PRs, severely draining the review energy of volunteer maintainers. Worse, many submitters cannot explain the AI-generated code at all, nor can they maintain it when bugs arise. Godot emphasized that AI cannot be responsible for code; the project needs people who truly understand the code. (Source: 36氪)

Meta Plans to Launch Cloud Service Metamate to Rent Out Idle AI Compute : Bloomberg reports that Meta is planning to form a cloud infrastructure business unit, “Metamate,” to sell idle AI compute capacity and model access to external customers, aiming to generate $10 billion to $15 billion in annual revenue by 2027. This news has sparked market concerns about a potential temporary oversupply of AI compute, leading to a widespread sell-off in the global chip and memory sectors over the following two trading days. (Source: 36氪)

Academia Finds AI Large Models Gradually Replacing Humans in Mathematics Research : As OpenAI’s internal model independently disproved the “unit distance graph conjecture” proposed by physicist Erdős—which humans had studied for nearly 80 years—AI’s mathematical reasoning capabilities have experienced a cliff-like leap. This has triggered “existential crisis” anxiety among math PhD students, with some joking that they can only escape reality by playing Honor of Kings. Scholars believe that AI’s efficiency in standardized and formalized mathematics research is forcing human mathematicians to rethink their own research value and aesthetic boundaries. (Source: 36氪)

Cloudflare Report Shows Machine Web Traffic Surpasses Human Traffic for the First Time : Cybersecurity service provider Cloudflare released a report stating that among all web access requests received by its hosted websites, approximately 57.4% came from artificial intelligence and automated programs (such as AI training crawlers, agents, etc.), while requests from real humans accounted for only 42.6%. This is the first time in internet history that machine traffic has surpassed human traffic, marking a fundamental shift in the paradigm of web interaction and dealing a systemic blow to traditional traffic-based advertising monetization business models. (Source: 36氪)

💡 Others

Researchers Discover “Dr. Elena Rodriguez” Ghost Name in Large Language Models : Samsung Labs disclosed an interesting finding in a paper on “Contrastive Decoding Diffing (CDD)”: during the reverse recovery of training data for multiple fine-tuned models, a fictional scientist’s name, “Dr. Elena Rodriguez,” appeared with high frequency in the outputs of models across different domains. Investigation revealed that this was because Claude Sonnet 3.6 favored this name heavily when generating synthetic training data, leading to it being unintentionally “baked” into all fine-tuned models that used this synthetic data. (Source: Reddit r/MachineLearning)

Reddit Users Mock Perplexity Pro for Quietly Limiting Usage Quotas : Many users subscribed to Perplexity Pro ($200/year) complained on Reddit that they found their “unlimited file upload” and “Deep Research” features quietly restricted and grayed out recently. Users accused the platform of unilaterally cutting Pro user benefits without sending any emails, announcements, or updating the terms of service, sparking a broad discussion in the community about SaaS service transparency and consumer rights protection. (Source: Reddit r/artificial)

Reddit User Shares Using Fable 5 to Organize a Decade of World-Building Settings and Auto-Generate a Wiki : A fiction writer shared on Reddit that he compiled his decade-long accumulation of novel drafts, world-building settings, and messy notes—totaling hundreds of thousands of words—into a single PDF and used Fable 5 to analyze it. After consuming 90% of his single-session quota, Fable 5 perfectly organized the content into structured characters, events, and geographical entries, and automatically generated Markdown files that could be directly imported into the World Anvil wiki system, helping him break through a long-standing creative bottleneck. (Source: Reddit r/ClaudeAI)

AI Daily – 2026-07-04

🔥 Focus

🎯 Dynamics

🧰 Tools

📚 Learning

💼 Business

🌟 Community

💡 Others

Leave a Reply Cancel reply

🔥 Focus

🎯 Dynamics

🧰 Tools

📚 Learning

💼 Business

🌟 Community

💡 Others

Related Tags

Related Posts

AI Daily – 2026-07-03

AI Daily – 2026-02-14

AI Daily – 2026-02-13

Leave a Reply Cancel reply