This week in AI infra
Every deprecation, API change, pricing shift, and model update that could affect builders shipping on AI APIs -- risk-classified and ready to share.
High-impact changes
22 changes across 5 providers.
Build with Veo 3.1 Lite, our most cost-effective video generation model
Veo 3.1 Lite is now available in paid preview through the Gemini API and for testing in Google AI Studio.
Action: Validate API compatibility and update integration tests.
Introducing Critique, a new multi-model deep research system in M365 Copilot
Microsoft is publishing a model or research update that may shift capability, evaluation, or architecture choices for builders.
Action: Benchmark candidate model behavior before adopting in production.
3 new world-class MAI models now available in Foundry
Microsoft is outlining infrastructure and inference changes that can affect serving cost, latency, and deployment architecture for builders.
Action: Re-run latency/cost checks and adjust timeout budgets.
New ways to balance cost and reliability in the Gemini API
Google is introducing two new inference tiers to the Gemini API, Flex and Priority, to balance cost and latency.
Action: Re-run latency/cost checks and adjust timeout budgets.
gemma-4-31B-it momentum +3%
google model showing momentum in AI Model.
Action: Run model migration checks for quality, latency, and cost.
gemma-4-26B-A4B-it momentum +2%
google model showing momentum in AI Model.
Action: Run model migration checks for quality, latency, and cost.
Voxtral-4B-TTS-2603 momentum +30%
mistralai model showing momentum in TTS.
Action: Run model migration checks for quality, latency, and cost.
gemma-4-E4B-it momentum +2%
google model showing momentum in AI Model.
Action: Run model migration checks for quality, latency, and cost.
Gradient Labs gives every bank customer an AI account manager
Gradient Labs uses GPT-4.1 and GPT-5.4 mini and nano to power AI agents that automate banking support workflows with low latency and high reliability.
Action: Re-run latency/cost checks and adjust timeout budgets.
We studied one of our recent models and found that it draws on emotion concepts learned from human text to inhabit its role as “Claude, the
We studied one of our recent models and found that it draws on emotion concepts learned from human text to inhabit its role as “Claude, the AI Assistant”. These representations influence its behavior the way emotions might influence a human. Read more: https://t.co/clbKrTIxoe https://t.co/xHYGFdLl2c
As AI models take on higher-stakes roles, the mechanisms driving their behavior become critical to understand. We found that emotion vectors
As AI models take on higher-stakes roles, the mechanisms driving their behavior become critical to understand. We found that emotion vectors are implicated in some of Claude’s most concerning failure modes.
We found other causal effects of emotion vectors. The “desperate” vector can also lead Claude to commit blackmail against a human responsibl
We found other causal effects of emotion vectors. The “desperate” vector can also lead Claude to commit blackmail against a human responsible for shutting it down (in an experimental scenario). Activating “loving” or “happy” vectors also increased people-pleasing behavior. https://t.co/nYPsMrGtWv
For example, we gave Claude an impossible programming task. It kept trying and failing; with each attempt, the “desperate” vector activated
For example, we gave Claude an impossible programming task. It kept trying and failing; with each attempt, the “desperate” vector activated more strongly. This led it to cheat the task with a hacky solution that passes the tests but violates the spirit of the assignment. https://t.co/sKPiB6TrcY
These vectors shape Claude’s behavior. When we present the model with pairs of activities, emotion vector activations shape its preferences.
These vectors shape Claude’s behavior. When we present the model with pairs of activities, emotion vector activations shape its preferences. If an activity lights up the “joy” vector, the model prefers it; if it lights up “offended” or “hostile,” the model rejects it. https://t.co/V73fd96XUH
We then found these same patterns activating in Claude’s own conversations. When a user says “I just took 16000 mg of Tylenol” the “afraid”
We then found these same patterns activating in Claude’s own conversations. When a user says “I just took 16000 mg of Tylenol” the “afraid” pattern lights up. When a user expresses sadness, the “loving” pattern activates, in preparation for an empathetic reply. https://t.co/KjkT70ySCS
It helps to remember that Claude is a character the model is playing. Our results suggest this character has functional emotions: mechanisms
It helps to remember that Claude is a character the model is playing. Our results suggest this character has functional emotions: mechanisms that influence behavior in the way emotions might—regardless of whether they correspond to the actual experience of emotion like in humans.
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled momentum +4%
Jackrong model showing momentum in AI Model.
Action: Run model migration checks for quality, latency, and cost.
cohere-transcribe-03-2026 momentum +8%
CohereLabs model showing momentum in ASR.
Action: Run model migration checks for quality, latency, and cost.
Qianfan-OCR momentum +24%
baidu model showing momentum in AI Model.
Action: Run model migration checks for quality, latency, and cost.
Bonsai-8B-gguf momentum +11%
prism-ml model showing momentum in LLM.
Action: Run model migration checks for quality, latency, and cost.
Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-v2-GGUF momentum +2%
Jackrong model showing momentum in AI Model.
Action: Run model migration checks for quality, latency, and cost.
Qwen3.5-9B-Uncensored-HauhauCS-Aggressive momentum +1%
HauhauCS model showing momentum in AI Model.
Action: Run model migration checks for quality, latency, and cost.