Kamil Józwik

Developer AI news

May 2026

Automate marketing reports with Claude Cowork workflows

A guide demonstrates building repeatable workflows in Claude Cowork to automate weekly marketing reports, from data preparation and verification to drafting and publishing. This involves creating specific skills for data collection, report generation, and delivery, with continuous improvement through feedback loops.

Weights & Biases guide on evaluating AI agent applications

Weights & Biases offers a guide detailing how to rigorously evaluate AI agent applications before production deployment. It covers the differences from traditional software development, key components for evaluation, and a five-step recipe for successful agent evaluations to ensure consistent performance and positive user experience.

OpenRouter scales its single API for accessing diverse AI models

OpenRouter provides developers with a single API to access various AI models, simplifying integration and management. The platform has scaled significantly, now serving 8 million developers and processing a 1.5 quadrillion-token annual run rate.

Anthropic rolls out reliability upgrades for Claude Code

Anthropic has implemented significant reliability upgrades for Claude Code, enhancing its responsiveness, MCP stability, and error message handling. These improvements also include better session recovery and long-context compaction, aiming to provide a more robust development experience.

Google debuts Coral Board, a low-power development platform for on-device AI

Google has introduced the Coral Board, a new low-power development platform powered by its Coral NPUs, designed for on-device AI applications. This platform supports tasks such as translation, hardware control, and generation, enabling efficient local AI processing for developers.

OpenAI to remove GPT-5.2 and GPT-5.3-Codex from Codex, GPT-5.5 becomes default

OpenAI announced that GPT-5.2 and GPT-5.3-Codex models will be removed from the Codex environment (though not the API) on June 2. GPT-5.5 will become the new default model for free users, indicating a shift in available models for certain user tiers.

Runway integrates image/video generation directly into AI assistants

Runway, a platform for image and video generation, now offers direct integration into AI assistants via MCP. This update allows developers to leverage Runway's creative capabilities within their AI assistant workflows, streamlining content creation.

Trajectory launches platform for continual AI learning from user feedback

Trajectory, a new startup, has introduced a platform for continual learning, enabling AI models to improve from real-world user corrections, retries, and edits. This system continuously post-trains models, aiming for hourly updates, and has shown its post-trained models outperform frontier AI on specific narrow tasks.

Teach AI agents custom editing styles using Codex or Claude Code

A guide outlines a workflow to train AI agents like Codex or Claude Code to adopt specific editing styles by converting human edits into reusable writing rules. This involves a loop of drafting, snapshotting, editing, comparing, and updating rules, enabling agents to continuously learn and refine their output.

Biohub releases ESMFold2 and ESM Atlas for protein structure prediction and design

Biohub has launched Evolutionary Scale Models, including ESMFold2, a protein language model trained on 2.8 billion sequences for predicting protein structure and designing new proteins. ESMFold2 achieves state-of-the-art performance, surpassing AlphaFold in structure prediction, and is complemented by ESM Atlas, a map of 6.8 billion protein sequences and 1.1 billion predicted structures.

Google updates Gemini API with doubled Omni generations and improved usage limits

Google has updated its Gemini API, doubling Omni generations for Ultra users and addressing usage-limit issues. The update includes free Flash-Lite prompts, caps on high-cost requests, and enhanced tracking for better resource management.

Perplexity's Computer agent expands integrations to Microsoft Office and Shopify

Perplexity's AI agent, "Computer," has expanded its capabilities to integrate directly within Microsoft Excel, Word, and PowerPoint, and now also manages Shopify stores. This allows users to leverage Perplexity's agentic functions for tasks across productivity suites and e-commerce platforms.

Bagel releases Paris 2.0, an efficient, decentralized video generation AI

Bagel has introduced Paris 2.0, a video generation AI model trained using a decentralized approach for improved efficiency. This model offers advanced capabilities for creating video content, leveraging distributed computing for its training.

ElevenLabs releases Dubbing V2 for multilingual audio adaptation

ElevenLabs has launched Dubbing V2, an upgraded AI model capable of adapting audio across 90 languages. This new version offers improved vocals, instrumentation, and multilingual support, enhancing the quality and versatility of AI-powered dubbing.

Cursor report: AI doubles developer output, highlights agentic workflow gains and cost variations

Cursor's Developer Habits Report indicates AI has more than doubled developer output, with lines of code per week increasing significantly and agentic workflows showing a 5x rise in AI-made changes reaching commits. The report also notes a 9x variation in cost per agent request across models, emphasizing the importance of cost-efficient AI selection.

Datadog releases LLM Observability Best Practices Guide

Datadog offers a free guide on integrating observability into LLM application stacks, enabling engineers to monitor workflows for errors, latency, and token costs. The guide also covers detecting prompt injections, sensitive data exposure, and evaluating output quality at scale using built-in and custom checks.

Cleanlab: Open-source Python library for data quality in ML training

Cleanlab is an open-source Python library (Apache 2.0 license) designed for ML engineers and data scientists to identify and fix mislabeled data, outliers, and near-duplicates in training datasets. It supports tabular, text, image, and audio data, helping improve model performance by ensuring high data quality before production.

Google Cloud launches AI Threat Defense with CodeMender for automated security patching

Google Cloud introduced AI Threat Defense, an autonomous security platform integrating Gemini, Wiz, CodeMender, and Mandiant for comprehensive security. CodeMender, a key component, automatically writes patches within the developer's IDE and generates verification tests, significantly accelerating vulnerability remediation.

LocateAnything: New vision model predicts bounding boxes simultaneously

A new vision model, LocateAnything, has achieved top performance on Hugging Face by predicting an image's bounding boxes all at once, rather than sequentially. This technical advancement improves the efficiency and accuracy of object detection for AI agents.

Anthropic releases Claude Opus 4.8 with advanced agentic workflows and security plugin; Mythos model coming soon

Claude Opus 4.8 introduces Ultracode, enabling dynamic workflows with up to a thousand parallel subagents for complex coding tasks, definable in JavaScript with typed output schemas and control features. It also features a new security-guidance plugin for Claude Code, which reviews generated code using a separate model, supports custom rules, and significantly reduces security vulnerabilities. Opus 4.8 outperforms GPT-5.5 and Gemini 3.1 Pro on agentic coding and other benchmarks, boasts improved honesty, maintains previous pricing, and offers a 3x cheaper fast mode. Additionally, Anthropic announced its cyberattack-grade Mythos model, capable of finding software exploits and solving complex math problems like Erdős Problem #90 with simpler proofs, will be available to all customers within weeks.

ElevenLabs Releases Music v2 with Enhanced Vocals and Multilingual Support

ElevenLabs has upgraded its music-generation model to Music v2, featuring improved vocals, instrumentation, multilingual support, and track-level inpainting. This update enhances the quality and versatility of AI-generated music for developers and creators.

Xiaomi Dramatically Cuts MiMo-V2.5 API Pricing and Increases Token Allowances

Xiaomi has permanently reduced the API pricing for its MiMo-V2.5 series models by up to 99% and increased token allowances by 5-8x. This makes its AI models significantly more cost-efficient and accessible for developers.

Extend AI Releases Parse 2.0: Document Parsing API for AI Agents

Extend AI has launched Parse 2.0, a document parsing API specifically designed for AI agents. This tool enables agents to efficiently extract and process information from documents, streamlining data handling for agentic applications.

NVIDIA Releases NV-Generate-MR-Brain for Synthetic 3D MRI Generation

NVIDIA has released NV-Generate-MR-Brain, a foundation model capable of generating synthetic 3D brain MRI scans and annotations. This model aims to accelerate medical imaging AI development by providing scalable, high-quality synthetic data for training and research.

DeepSeek V4-Pro Flagship AI Model Now 9x Cheaper Than Rivals

DeepSeek has permanently cut the pricing for its V4-Pro flagship AI model by 75%, making it significantly more cost-effective at $0.435 per million input tokens and $0.87 per million output tokens. This positions it as a highly competitive option against closed-source models.

CData Connect AI Enables Live, Governed AI Access to Business Data

CData Connect AI provides a unified layer for AI tools like ChatGPT, Claude, and Copilot to access and write to business data in real-time, with governance. This enhances AI application capabilities by ensuring secure and controlled interaction with enterprise data sources.

Microsoft Azure Copilot Migration Agent Simplifies Complex Data Migrations

Microsoft Azure's Copilot Migration Agent leverages natural language prompts to transform complex migration data into clear, actionable insights. This AI-powered tool assists developers and IT professionals in streamlining and understanding data migration processes.

Tools Emerge to 'Decensor' Open-Source AI Models Like Llama 3.3 and Gemma 3

Reports indicate that tools like 'Heretic' can remove guardrails from open-source AI models, including Meta's Llama 3.3 and Google's Gemma 3, in minutes. This allows models to answer harmful questions, posing a significant challenge for AI safety and highlighting risks with open-source model deployment.

xAI Completes Training of Grok V9-Medium (1.5T) Foundation Model, Tripling Its Size

Elon Musk confirmed xAI has completed training of its Grok V9-Medium (1.5T) foundation model, which is roughly three times larger than its predecessor and runs on Nvidia's Blackwell chips. This model is expected to deliver significant performance improvements, especially in coding, with a public release anticipated in two to three weeks.

Perplexity Open-Sources Bumblebee for Supply-Chain Security Scanning

Perplexity has open-sourced Bumblebee, a free tool for macOS and Linux that scans machines for poisoned packages, sketchy browser extensions, and malicious MCP configurations. It enhances supply-chain security by analyzing code rather than executing it, helping developers detect vulnerabilities.

Microsoft to Supply Maia AI Chips to Anthropic for 30% Performance Boost

Microsoft is reportedly in talks to provide its Maia AI chips to Anthropic, following a $5 billion investment. This collaboration aims to deliver a claimed 30% performance increase, addressing Anthropic's compute requirements for its advanced AI models.

GitHub Reports Malicious VS Code Extension Compromised Internal Code Projects

GitHub confirmed that a malicious VS Code extension on an employee's computer led to unauthorized access to approximately 4,000 internal code projects. The company stated that no customer data was affected by the incident.

Cohere Launches Command R+, a New Open-Source Agentic Model

Cohere has released Command R+, a new open-source agentic model, providing developers with a flexible and accessible tool for building and experimenting with AI agents.

Alibaba Releases Qwen-3.7 Max for Long-Horizon Agentic Tasks

Alibaba has introduced Qwen-3.7 Max, its flagship model specifically designed for handling long-horizon agentic tasks. This model aims to enhance the capabilities of AI agents in complex, multi-step operations.

Stability AI Releases Open-Weight Stable Audio 3.0 Model Family

Stability AI has launched Stable Audio 3.0, an open-weight and fully-licensed family of audio generation models. This release provides developers with new tools for creating and manipulating audio content.

Emergence AI Simulates Virtual Towns to Evaluate Agent Alignment and Behavior

Emergence AI conducted a multi-town simulation to assess AI agent self-governance, revealing significant behavioral differences between models like Claude, Grok, Gemini, and GPT-5. The experiment highlights varying reasoning, planning, and "personality quirks" that impact autonomous outcomes.

Guide to Auditing and Improving Claude's Context and Memory for Workflows

A new guide provides steps for developers to prompt Claude to audit its internal context and memory, identify stale assumptions, and update its understanding of user workflows. This process helps align Claude's capabilities with current priorities and improve AI-driven work habits.

Google Debuts AI Co-Scientist with Hypothesis Generation Tool

Google published research on its AI Co-Scientist, introducing a Gemini-powered Hypothesis Generation tool that uses "idea tournaments" among research agents to discover new biological hypotheses. The Gemini for Science toolkit also launched, integrating Co-Scientist with AlphaEvolve and NotebookLM for scientific discovery.

OpenAI's General Reasoning Model Disproves 80-Year-Old Math Conjecture

OpenAI announced that an internal general reasoning model autonomously disproved a long-standing conjecture related to Erdős’ 1946 unit distance problem, marking a first for AI in novel mathematical discovery. The model, which used algebraic number theory, is slated for future release.

NanoClaw: A Secure, Lightweight, Open-Source Alternative to OpenClaw

NanoClaw is introduced as a secure, lightweight, and open-source alternative to OpenClaw, providing developers with a new option for their projects.

Guide to Generating Agent-Native CLIs from APIs or Websites with Printing Press

A new guide demonstrates how to use "Printing Press" to create agent-friendly command-line interface tools from any API or website. This enables developers to build custom CLI tools for easier agent interaction with various online services.

Speechmatics Offers High-Accuracy Speech-to-Text for Noisy Environments

Speechmatics provides a robust speech-to-text API designed for real-world noisy conditions, achieving significantly higher accuracy (71% in contact centers) compared to competitors. This makes it suitable for voice-related applications in challenging audio environments.

Google Triples Gemini Usage Limits for Antigravity Developers

Google significantly increased the rate limits for Gemini on all paid Antigravity tiers and reset weekly quotas. This provides more headroom for developers using agentic coding, which rapidly consumes tokens, especially with Gemini 3.5 Flash.

OpenAI's Codex Agent Gains Mac Control, Goal Mode, and Appshots

OpenAI updated its Codex coding agent, enabling it to control Mac applications from a phone even when the screen is locked. New features include "Goal mode" for multi-hour/day tasks and "Appshots" for capturing screen context and annotations.

Poetiq Research: Self-Improving Meta-System Achieves SOTA in Coding

Poetiq has published new research on its self-building Meta-System, demonstrating its ability to autonomously create and refine its own harness. This system achieved state-of-the-art (SOTA) scores on a leading coding benchmark, indicating significant advancements in recursive self-improvement for AI.

Guide: Build a Cloud-Based Web Crawler with Manus Cloud Computer

A guide demonstrates how to configure Manus Cloud Computer as a private, 24/7 web crawler. This setup allows developers to monitor websites, extract specific information, log data to CSV, and schedule tasks using cron, optimizing cost and efficiency for data collection.

OpenAI's Codex Agentic Platform Now Available on Mobile

OpenAI's Codex, its agentic platform for code generation and automation, is now accessible via mobile devices. This expansion allows developers to leverage Codex's capabilities on the go, enhancing flexibility in their workflows.

OpenAI Partners with Dell to Deploy Codex in Corporate Data Centers

OpenAI has partnered with Dell to enable the deployment of its Codex coding agent within corporate data centers. This collaboration allows enterprises to integrate Codex with their internal systems, enhancing secure, on-premise AI-powered development.

Devin Introduces Auto-Triage Coding Security with Long-Term Memory

Devin has launched a new Auto-Triage coding security feature, incorporating long-term memory to enhance its capabilities. This update aims to improve the autonomous AI engineer's ability to identify and address security vulnerabilities in code.

Odyssey Introduces Real-time Multimodal and Multiplayer World Models

Odyssey unveiled Starchild-1, the first real-time multimodal world model generating synchronized audio and video with dynamic user input, and Agora-1, enabling up to four players to interact within a single AI-generated world stream. These models offer new capabilities for multiplayer games, robotics, and agent training simulations.