Anahtar Kelimeler:AI çıkarım, açık kaynak modeli, büyük dil modeli, vLLM çıkarım motoru, Qwen3-TTS ses sentezi, Ajanik akıl yürütme
🔥 Spotlight
vLLM Core Team Raises $150M to Found Inferact: The founding members of open-source inference engine vLLM announced the creation of startup Inferact, securing $150M in seed funding led by a16z and Lightspeed at an $800M valuation. This signals the AI industry’s competitive focus has officially shifted from “model training” to “inference services.” As model scale and architecture complexity increase, running models cost-effectively has become the core bottleneck. Inferact aims to position vLLM as the “inference Linux” of the AI era, solving hardware fragmentation through standardized software stacks. This move reflects strong capital market recognition of AI infrastructure layers, where reduced inference costs will directly accelerate AI application democratization (Source: woosuk_k, 36Kr)

TTT-Discover: AI Achieves Scientific Breakthroughs Through Test-Time Training: New research called TTT-Discover demonstrates AI’s potential to surpass human capabilities in mathematics, kernel engineering, and algorithm design. By employing reinforcement learning during testing, this method enables continuous learning for specific problems rather than relying solely on frozen pretrained weights. Experiments showed record-breaking performance on Erdős’ minimum overlap problem and GPU kernel optimization with less than $500 worth of compute. This proves “inference-time computation” not only enhances logical abilities but also serves as an engine for new discoveries, suggesting AI will evolve from “knowledge carriers” to true “scientific researchers” (Source: charles_irl, _akhaliq)

Qwen3-TTS Released: New Milestone in Open-Source Speech Synthesis: Alibaba’s Qwen team launched the Qwen3-TTS model series featuring 3-second voice cloning and 10-language support with streaming latency as low as 97ms. The model family includes VoiceDesign, CustomVoice and Base versions using dual-track LM architecture, achieving SOTA performance in voice quality, emotional control and inference speed. The community considers this the most disruptive open-source TTS release to date, with its Apache 2.0 license and strong edge-device compatibility (e.g., MLX-Audio support) expected to significantly advance personalized voice assistants and real-time conversation applications (Source: Alibaba_Qwen, Reddit)

HLE and GPQA Benchmarks Audited: Shocking Error Rates: Independent researchers conducted forensic audits of “Humanity’s Last Exam” (HLE) and GPQA, revealing ~58% verification errors in HLE and ~26.8% defects in GPQA due to OCR mistakes and typos. Many cases labeled as “model hallucinations” actually involved correct answers being marked wrong because models couldn’t “telepathically” detect formatting errors in questions. This discovery raises serious concerns about current AI leaderboard reliability. We may be “vaporizing” top models with broken yardsticks, where labs spend millions optimizing error-fitting rather than genuine intelligence improvement (Source: Reddit)

🎯 Trends
Meta Llama 4 Internal Version Criticized Before Team Restructure: Meta CTO Bosworth expressed disappointment with early Llama 4 versions, calling them “opinionless” and mediocre. Under Alexandr Wang’s leadership, Meta has reorganized its AI team for a H1 release. Internal debates continue about whether and how to open-source the model. This reflects how parameter scaling alone fails to impress in the AGI race, making unique “thinking patterns” and post-training optimization the new battleground (Source: ylecun)
OpenAI API Business Hits $1B Monthly ARR: Sam Altman announced OpenAI’s API business added over $1B in annual recurring revenue last month. This explosive growth shows that while ChatGPT dominates mindshare, the B2B developer market is becoming OpenAI’s true growth engine. As enterprise AI applications scale from pilots to production, API consumption grows exponentially, cementing OpenAI’s position as AI’s “compute and intelligence wholesaler” (Source: sama)
Agentic Reasoning Review: From Static Thinking to Dynamic Action: A 135-page survey paper systematically presents a new LLM intelligence paradigm—Agentic Reasoning. Researchers argue LLMs excel in closed-loop settings but struggle in open dynamic environments due to missing “action.” The framework divides reasoning into basic, self-evolutionary, and collective multi-agent dimensions, suggesting AI’s future lies not in bigger models but in continuous evolution through environmental interaction (Source: omarsar0)

“Vibe Coding” Sparks “Understanding Bankruptcy” Concerns: As tools like Claude Code and Devin proliferate, developers debate “vibe coding”—where AI completes hours of work instantly, potentially eroding deep codebase understanding and creating “understanding debt.” While short-term productivity gains reach 20-30%, long-term debugging complexity may grow exponentially. Future software development might involve “monitoring situations” rather than “writing logic,” demanding new code quality assurance systems (Source: jon_stokes, jeremyphoward)
🧰 Tools
GitHub Copilot SDK Released: Embed Agent Workflows Anywhere: GitHub launched a programmable SDK allowing developers to integrate Copilot’s core engine directly into applications. Without complex orchestration layers, developers can define intents and behaviors for Copilot to execute tasks. This transforms AI assistants from standalone tools into plug-and-play universal capabilities, dramatically lowering barriers for autonomous agent applications (Source: pierceboggan)
Devin Review: Revolutionizing Code Review: Cognition introduced Devin Review to help developers escape low-quality “code garbage” through AI’s deep understanding of complex PRs. Beyond identifying logic errors, it builds code comprehension maps to prevent maintenance disasters from over-relying on AI generation. Community feedback highlights exceptional performance in large-scale refactoring and cross-module changes (Source: cognition, swyx)

LlamaParse v2: Structured Document Processing Revolution: LlamaIndex rebuilt its document parsing API with v2 and the new LlamaCloud SDK. The update simplifies configuration, supports precise structured output control (e.g., Markdown, JSON), and achieves full parity between Python and TypeScript. This provides stronger infrastructure for RAG applications handling complex, multi-column documents with charts (Source: jerryjliu0)

VibeTensor: First Fully AI-Generated Deep Learning System: NVlabs open-sourced VibeTensor, a complete deep learning framework autonomously created by AI agents, including 47K lines of auto-generated Triton kernel code. While current efficiency trails PyTorch in critical paths (dubbed the “Frankenstein Effect”), it demonstrates AI’s capability to design complex low-level systems, heralding the era of “AI writing AI” (Source: JvNixon)

💼 Business
Meta Plans $2-3B Manus AI Acquisition: Reports indicate Meta has reached a deal to acquire autonomous agent startup Manus AI for billions. The move aims to integrate its market-proven agent capabilities across Facebook, Instagram and WhatsApp, reflecting social giants’ post-chatbot hunger for proactive task execution (Source: DeepLearningAI)

LiveKit Secures $100M Series C: Voice AI infrastructure platform LiveKit raised $100M to simplify voice AI application development. With real-time voice interaction (like OpenAI’s advanced voice mode) becoming essential, demand for low-latency, high-reliability voice streaming services is exploding (Source: juberti)
Fei-Fei Li’s World Labs Seeks $500M at $5B Valuation: Fei-Fei Li’s “spatial intelligence” startup World Labs is negotiating new funding. World models are seen as gaming and robotics’ next wave, enabling AI to understand physical world laws (Source: kylebrussell)
📚 Learning
Andrew Ng Launches Gemini CLI Course: DeepLearning.AI’s new course teaches how to build agents using open-source Gemini CLI, covering practical skills for orchestrating GitHub, Canva and Google Workspace via MCP servers. It emphasizes understanding open-source agent architecture for transparent AI decision logic (Source: AndrewYNg)
MoE Routing Algorithm Deep Dive: A systematic lecture on MoE routing algorithms is now on YouTube, covering MoE fundamentals, routing mechanisms, expert overload issues and optimization solutions—ideal for developers wanting to understand high-performance models like DeepSeek (Source: ben_burtenshaw)
LLM Self-Refinement Tutorial Updated: Sebastian Raschka updated his LLM tutorial’s Chapter 5 on inference-time scaling, implementing model self-evaluation and improvement from scratch to reveal the math and engineering behind LLM reasoning methods (Source: nerdai)

🌟 Community
OpenAI’s “AI-Assisted Discovery” Profit Share Sparks Controversy: OpenAI’s CFO revealed potential future profit-sharing from client discoveries aided by AI, triggering backlash. Critics argue this contradicts nonprofit origins and presents legal/ethical challenges in defining “AI’s contribution percentage,” possibly driving top research institutions toward open-source alternatives (Source: scaling01, rao2z)
Claude’s New Constitution & “Emotional States” Debate: Anthropic released Claude’s new constitution stating its “emotional states” mimic human text. Community reactions split between seeing this as savvy IPO-prep marketing and believing such “emotional tuning” significantly improves performance in high-pressure tasks like debugging (Source: Reddit)

AI Hardware Wave: The Battle for Interaction Gateways: ByteDance, Meta and OpenAI are racing into AI hardware (glasses, recording beans, earphones), fundamentally fearing “users abandoning apps.” In the Agentic era, whoever controls the sensors closest to users’ senses controls the primary traffic gateway. This isn’t just hardware competition but a scramble for native physical-world data to overcome exhausted high-quality internet text data (Source: 36Kr)
💡 Misc
AI Storage Demand Surges: SanDisk Stock Soars: As LLMs generate massive KV caches and AI video explodes, datacenter demand for high-speed storage skyrockets. Nvidia’s new architecture offloading cache directly to SSD makes storage a critical AI capex component (Source: Yuchenj_UW)
Python 3.13 Removing GIL: AI Implications: Python core developers announced the end of GIL (Global Interpreter Lock), significant for AI. Python can now truly leverage multi-core CPUs for parallel computing, dramatically improving data preprocessing and multi-threaded inference efficiency (Source: code_star)
