On May 13, OpenAI unveiled the latest advancement in AI technology with the launch of GPT-4o (“o” for “Omni”). This new iteration of the model, which powers ChatGPT, marks a significant leap forward as it seamlessly integrates voice, text, and vision capabilities.
Key Features:
- GPT-4o reasons across voice, text, and vision.
- It allows access to GPT-4o intelligence for all users, free of charge.
- Paid users still enjoy up to five times the capacity limits of free users.
- GPT-4o operates twice as fast as GPT-4 Turbo and at half the cost.
Statements:
- Mira Murati, Chief Technology Officer at OpenAI, highlighted the enhanced efficiencies of GPT-4o and expressed enthusiasm about extending AI benefits to all users.
- CEO Sam Altman emphasized the availability of GPT-4o to all ChatGPT users, regardless of subscription status, aligning with OpenAI’s mission to democratize AI access.
Inclusivity and Accessibility:
- OpenAI’s move away from a subscription-based model signifies a shift towards inclusivity and accessibility in AI technology.
- GPT-4o will be rolled out gradually to support over 50 languages, including various Indian languages.
- Significant token usage reductions have been achieved for these languages, enhancing accessibility across diverse linguistic communities.
Additionally, the company provided a demonstration of the model’s real-time conversational speech capabilities. In a blog post, it stated that the model can generate responses to audio inputs in as little as 232 milliseconds, with an average response time of 320 milliseconds, comparable to human conversational speed.
Furthermore, the company highlighted that GPT-4o maintains GPT-4 Turbo-level performance in text, reasoning, and coding intelligence, while also pushing boundaries in multilingual, audio, and vision capabilities.
OpenAI Unveils Desktop Version of ChatGPT with Refreshed UI
In addition to its other announcements, the company revealed plans to introduce a desktop version of ChatGPT along with a revamped user interface. Both free and paid users will soon have access to a new ChatGPT desktop application designed for macOS, with a Windows version slated for release later in the year.
“We recognize the increasing complexity of these models, but our goal is to make the interaction experience more natural and seamless. We want users to focus solely on collaborating with GPTs without being distracted by the UI,” stated Murati.
This announcement coincides with the eve of Google I/O 2024, the highly anticipated event where Google is expected to unveil its latest AI products and innovations.
Model safety and limitations
GPT-4o incorporates safety measures across various modalities, including techniques like data filtering and post-training behavior refinement. New safety systems have been developed specifically to regulate voice outputs.
Evaluation of GPT-4o has been conducted according to OpenAI’s Preparedness Framework and voluntary commitments. Cybersecurity, CBRN (chemical, biological, radiological, and nuclear threats), persuasion, and model autonomy were assessed, revealing that GPT-4o does not exceed a Medium risk level in any category. These evaluations utilized automated and human assessments throughout the training process, testing both pre- and post-safety-mitigation versions of the model.
External Red Teaming:
GPT-4o underwent rigorous external red teaming involving over 70 experts from various domains such as social psychology, bias and fairness, and misinformation. The goal was to identify and address risks associated with newly introduced modalities, enhancing the safety of interactions with GPT-4o.
Future Modalities and Safety Measures:
While text and image inputs and outputs are currently publicly released, efforts are underway to address the technical infrastructure, post-training usability, and safety aspects necessary for releasing other modalities. Initial audio outputs will be limited to preset voices and comply with existing safety policies. An upcoming system card will provide further details regarding GPT-4o’s modalities.
Model Limitations and Feedback:
Through testing and iteration, several limitations have been identified across all modalities of GPT-4o. OpenAI welcomes feedback to pinpoint tasks where GPT-4 Turbo still outperforms GPT-4o, aiding in the continuous improvement of the model.
Model availability
GPT-4o represents the latest advancement in deep learning, focusing on practical usability. Extensive efforts spanning two years were dedicated to enhancing efficiency across all layers of the system. As a result, a more widely accessible GPT-4 level model, GPT-4o, is now available. Its capabilities will be introduced incrementally, with extended red team access commencing immediately.
The rollout of GPT-4o’s text and image functionalities begins today within ChatGPT. It will be accessible in the free tier and for Plus users, offering message limits up to five times higher. Additionally, an alpha version of Voice Mode featuring GPT-4o will be introduced within ChatGPT Plus in the upcoming weeks.
Developers now have access to GPT-4o via the API, serving as both a text and vision model. GPT-4o boasts twice the speed, half the cost, and five times the rate limits compared to its predecessor, GPT-4 Turbo. Support for GPT-4o’s new audio and video capabilities will be gradually extended to a select group of trusted partners within the API shortly.