Revolutionary New Voice Mode for ChatGPT

Jul 29, 2024·By NextMind

Future Prospects and Innovations from OpenAI

According to OpenAI, their advanced voice mode, which is hard to distinguish from a real human voice, will soon be available for ChatGPT's paid subscribers. This innovation opens up many new possibilities. OpenAI founder Sam Altman announced on social network X that alpha testing for the new voice mode for ChatGPT will begin next week. Initially, it will only be available for paid subscribers, but in the future, ChatGPT's "human" voice will likely be accessible to all users.

AI - chatbot. Artificial Intelligence concept

To recap, ChatGPT first gained the ability to "listen" and "speak" last fall, and its voice already sounded quite natural even in Russian. However, in May this year, voice mode 2.0 was announced, and it is now becoming available to some users.

What are the new features in this mode? First and foremost, the ultra-fast response time. If the current voice mode responds with a delay of 3-5 seconds, the new mode reduces this delay to just 230-320 milliseconds, which is more than 10 times faster. This significantly enhances the immersive experience when interacting with the neural network. It also enables ChatGPT to act as a real-time translator. Thanks to the rapid response time, users can interrupt the assistant, and it will instantly stop speaking.

a computer chip with the word gat printed on it

Another important innovation is that ChatGPT has learned to better imitate emotions and change intonations depending on the context of the conversation. It can laugh, express surprise, or be serious. The assistant also understands your intonations and background sounds much better.

The most intriguing feature is the ability to create a custom voice for ChatGPT. Instead of choosing from several preset voices, users can now assign the assistant the voice of any famous or fictional character.

This week, OpenAI also announced the alpha testing of SearchGPT – their search engine where AI and the search engine work as a single unit.

By the end of the year, the company might have even more ambitious plans. They plan to train ChatGPT not only to understand images but also to process video in real time, for example, from a smartphone camera or computer screen. Additionally, a new, more powerful model with advanced reasoning, logic, and self-learning capabilities may replace GPT-4o. Finally, they promise to open access to the powerful video generator Sora, which was impressively demonstrated back in February.

Dark Background Example