ChatGPT's Evolution: From Texts to Photos and Voice Conversations

In a world where voice-activated assistants and photo-based search are rapidly becoming the norm, OpenAI's ChatGPT is stepping up its game.

by Faruk Imamovic
SHARE
ChatGPT's Evolution: From Texts to Photos and Voice Conversations
© Getty Images News/Leon Neal

In a world where voice-activated assistants and photo-based search are rapidly becoming the norm, OpenAI's ChatGPT is stepping up its game. With the ability to handle voice commands and respond to photo-based queries, this chatbot is gearing up to meet the evolving needs of its users.

Voice Interactions: More Human than Ever Before

OpenAI is introducing a voice conversation feature for ChatGPT available on both Android and iOS platforms. This advancement, first rolled out for Plus and Enterprise users, allows individuals to engage in two-way voice conversations with ChatGPT.

How does this work? By navigating to Settings > New Features, users can activate this feature and choose from five distinct voice options, created with the help of professional actors. OpenAI reveals that these interactions are powered by a new text-to-speech model that can generate human-like voices from just a brief snippet of text.

Its state-of-the-art Whisper speech recognition system efficiently translates spoken words into text, ensuring smooth communication.

Innovations in Photo-based Queries

The chatbot's capabilities aren't limited to voice alone.

OpenAI's new photo-related feature promises to be a game-changer. Users can now prompt ChatGPT with images—whether it's a malfunctioning grill or a mathematical problem—and receive relevant suggestions or solutions.

Imagine pointing your camera at the contents of your fridge and getting dinner recommendations in return! This is made possible by leveraging the prowess of GPT-3-5 and GPT-4 for photo recognition. To use it, one simply taps on the photo or "plus" icon, and selects or takes a picture.

The platform even allows users to ask questions about multiple photos or use a drawing tool for emphasis.

Addressing Security and Ethical Concerns

However, with innovation comes responsibility. OpenAI is keenly aware of potential misuses—like impersonating voices for malicious intent—and is working proactively to mitigate risks.

They've limited ChatGPT's voice conversation feature and are collaborating with trusted partners to address other potential issues. Furthermore, OpenAI's collaboration with the Be My Eyes app reflects their commitment to socially responsible tech.

This application assists visually impaired individuals by connecting them with volunteers through video calls. Yet, OpenAI is cautious. They've restrained ChatGPT's capacity to make direct assertions about individuals in photos to ensure accuracy and protect privacy.

Beyond ChatGPT: A Voice Revolution

While OpenAI is making waves with ChatGPT, other tech giants aren't far behind. Microsoft's Copilot AI interface showcases prowess in problem-solving, and Spotify's collaboration with OpenAI offers a Voice Translation tool, turning podcasts multilingual.

Chatgpt
SHARE