OpenAI Releases Realtime API for Building Voice Agents

IBL News | New York

OpenAI made its Realtime API generally available this week, enabling developers to build voice agents. This API supports remote MCP servers, image inputs, and phone calling through Session Initiation Protocol (SIP), making voice agents more capable through access to additional tools and context.

The company also released its most advanced speech-to-speech model yet—gpt-realtime.

The new model follows complex instructions, shows stronger reasoning, produces speech that sounds more natural and expressive, and it’s better at interpreting system messages and developer prompts.

“The new speech-to-speech model in OpenAI’s Realtime API could make searching for a home on Zillow or exploring financing options feel as natural as a conversation with a friend, helping simplify decisions like buying, selling, and renting a home.”

AI companies are in the race to offer voice agents that speak with the intonation, emotion, and pace of a human.