OpenAI Dev Day 2025: The Day AI Became a Platform

I’ve been building AI-powered apps long enough to know the ground can shift under a shipped feature. Last year felt like setup. This year, OpenAI laid out a platform play: in-chat apps, an agent toolkit, a high-stakes reasoning tier, a production voice stack, and Sora 2 for video.

Below is what matters and where the edges are.

Apps inside ChatGPT: the chat becomes an OS layer

ChatGPT now runs third-party apps directly inside the thread. Early partners include Spotify, Canva, Zillow, Booking.com, Expedia, Coursera, and Figma. You invoke them conversationally and interact via embedded UI components. An Apps SDK is in preview; app submissions and a directory are planned later this year. Expect commerce hooks to follow. The Verge+2WIRED+2

Why it matters: This treats ChatGPT like a conversational operating system. If discovery and permissions UX are good, many tasks that require app-hopping collapse into one interface. Watch for data-sharing prompts and monetization terms to harden over the next quarter. Business Insider

AgentKit: opinionated rails for agentic apps

OpenAI announced AgentKit:

  • Agent Builder: visual canvas and versioning for multi-agent workflows.

  • Connector Registry: centralized connections across products; Global Admin required; prebuilt Dropbox, Google Drive, SharePoint, Teams; MCP server support.

  • ChatKit: embeddable chat UIs with streaming, threads, and “model thinking” visualizations.

  • Evaluations: datasets with graders, trace grading, prompt optimization, and third-party measurement support.
    Rollout is staggered: some parts GA, Agent Builder in beta; pricing rides the standard API rates. Venturebeat

Why it matters: It collapses the current “stack of glue” for agents. The trade-off is lock-in. If you adopt Builder + Connectors, migration costs rise even if underlying models stay swappable. Venturebeat

GPT-5 Pro: accuracy and depth, at a cost

OpenAI positioned GPT-5 Pro for high-stakes domains where deeper reasoning beats speed. Public material and testing notes point to very large contexts (~272k input, ~128k output) and heavier “thinking” by default through the Responses API. Multiple developer reports note higher latency than GPT-4.1/4o and higher unit costs. Use when correctness dominates UX. Wolfia+3Simon Willison’s Weblog+3OpenAI+3

Practical take: Gate it behind routing. Default to GPT-5 or cheaper models for routine turns; escalate to 5 Pro only for steps that truly need it. OpenAI

gpt-realtime: production voice, simpler pipeline

OpenAI’s realtime stack is now positioned as production-ready. The gpt-realtime model runs speech-to-speech with end-to-end audio processing and lower cost vs earlier previews. Voice quality and instruction following improved, with faster turn-taking and less orchestration compared to STT→LLM→TTS chains. Azure parity lags; OpenAI’s own platform leads availability. OpenAI+2WinBuzzer+2

Why it matters: This makes natural voice agents feasible for support, IT help desks, and field workflows. Cost and latency profiles finally align with real call volumes. OpenAI

Sora 2: higher fidelity video and a consumer app

Sora 2 ships with better physical consistency, synchronized audio, more control over camera and style, and is available via API and a new consumer Sora app. Early brand pilots, like Mattel for concepting, suggest real product-design workflows beyond social clips. OpenAI+2SiliconANGLE+2

Why it matters: Expect creative and marketing teams to slot Sora 2 into storyboarding, pre-viz, and ad mockups. Governance and rights management remain open issues for enterprises. OpenAI

What this changes for builders

1) Treat ChatGPT like a distribution surface. If your product benefits from zero-install, build an in-chat app and test conversion vs your web flow. Track discoverability in the new directory and measure attach rates. The Verge+1

2) Use AgentKit surgically. Start with evaluation and ChatKit where they reduce toil. Keep business logic modular so you can re-host agents if needed. Document connector mappings to avoid sticky dependencies later. Venturebeat

3) Route by stakes.

  • Routine turns: GPT-5 or smaller.

  • Critical steps: GPT-5 Pro via Responses API with strict timeouts and retries.

  • Voice turns: gpt-realtime for low-latency speech UX. OpenAI+1

4) Budget for inference. High-reasoning calls are slow and pricey relative to 4.1/4o. Set P95 budgets, stream outputs, and cache deterministically where possible. OpenAI Community+1

5) Plan governance for generated media. Lock templates, watermark outputs where policy requires, and pre-approve Sora styles for brand safety. OpenAI

My takeaways

  • Platform > feature. Apps in-chat and AgentKit shift OpenAI from “model vendor” to “runtime + distribution.” That’s leverage if you play inside the walls and risk if you need portability. WIRED+1

  • Voice is ready. gpt-realtime plus cheaper pricing puts live agents within reach for many call flows. Pilot with contained intents first. OpenAI+1

  • Video is moving from novelty to workflow. Sora 2 belongs in prototyping pipelines today, with policy guardrails. OpenAI

Next
Next

I’m Spending Too Much on Windsurf (and Vibe Coding)