voice Archives - Page 2 of 2

Openai’s new voice models already speak as customer service agents. His next destination: the call centers

March 21, 2025 by usatoday24

Since the beginning of the year, the objective of great technological ones has been clear: that we talk to artificial intelligence (ia). Openai, Microsoft, Google and Meta have added voice functions to their assistants. But this seems to be just the beginning. The industry advances at a frantic pace and the way we interact with these tools continues to evolve. Tell the voice agents ‘hello’. Sam Altman’s company has been betting on text agents with tools such as Operator either Computer-Useing agents. However, Openai already has it ready if next great movement to continue highlighting in the race for the development of AI: to promote a new and powerful generation of voice agents. New models on stage. OpenAI has announced The launch of new audio models to turn voice into text and vice versa. They are not in chatgpt, but in the APIwhere developers can use them to create voice agents. The important thing? They aim to be much more precise and to bring customization to the next level. The new OpenAI models, built on GPT-4O and GPT-4O-minipromise to improve Whisper Already its previous text to voice tools, which will also remain active through the API. But it is not just a matter of performance: now they can also modulate their tone to sound, for example, “as an empathic customer service agent.” Destination: the call centers. Openai makes it clear where they point with this launch. He assures that “for the first time, developers can tell the model not only to say, but also how to say it, which allows more personalized experiences for use cases ranging from customer service to creative narrative.” According to Openai, this technology will allow creating much richer “conversational experiences.” If we take into account that Chatgptpowered by GPT-3.5arrived in November 2022, it is evident that the progress has been vertiginous. And everything indicates that these models will end up arriving at the call centers. We might think that at first the interactions will be somewhat limited, but well above the current voice systems. They will move away from traditional automated assistants and will be much more natural. Over time, the line between a conversation with a person and an AI could become almost imperceptible. Images | Charanjeet Dhiman | OpenAI In Xataka | We have tried Sesame’s conversational. It is the experience closest to a “human voice” that we have seen In Xataka | China has found an unusual strategy to avoid US mosquadillas with AI: bet on the Open Source

Siri is not the only broken toy in the world voice attendees. Their rivals still do

March 10, 2025 by usatoday24

Apple is still choking artificial intelligence implementation. The company confirmed at the end of the week that Siri’s advanced functions would take “More than expected” In arriving, without giving a specific date but advancing that until 2026 we will not have news. The company is late, but it is not the only one in trouble with its intelligent assistant. The great alternative to Siri is Gemini, a solution that most Android manufacturers are beginning to implement in collaboration with Google and that follows very, very green. Don’t wait for the new Siri soon. One of Apple Intelligence’s reasons was The new Siri. Native integration with chatgpt, Natural language understandinganalysis of the content of our phone to meet in detail … recently we could Test Apple Intelligence beta And the conclusion was clear: everything was half building or, directly, it wasn’t. Weeks later, Apple confirmed that Siri’s smartest version “will take longer than expected.” His artificial intelligence is still in beta and the arrival of all the news from this spring was expected. It won’t be so. Apple decided not to get into the AI car when its main rivals were at a point of relative maturity, and These delays They have taken her to the current situation. His rival rubs his hands. Meanwhile, the answer on Android is being clear. This operating system is owned by Google, and Google has Gemini as vitamin assistant with artificial intelligence. Thus, most phones (Oppo, Samsung, Xiaomi, etc.) that are sold in Europe, arrive from Gemini. This is the agent that replaces the classic Google (OK, Google) assistant that we have been using on our phones for so many years, with the main difference of being a Google response to tools such as Chatgpt. Not everything that shines. Gemini has improved, and much, Since we tried it in February 2024. Gemini Live now It is completely freehas no problem to execute simple actions (alarms, searches, etc.), but it is still very far from being a natural assistant. One of its main problems is precisely that the distinction between Gemini and Gemini Live dilutes the use we want to give as an assistant. If, for example, I ask Gemini what I can do today, he will give me an especially extensive answer. If I want to stop talking (in addition, Gemini’s tone is quite robotic and unnatural) I cannot do it comfortably, since the only way that allows interruptions is that of Gemini Live. In other words, in an independent app (such as Gemini or Chatgpt) this distinction between conversational modes makes sense. In a fast and native assistant, everything should be available in the most accessible way. And no, if you tell Gemini if you can speak using Gemini Live, do not activate this mode, start talking without stopping what this way is. Gemini also does not have access to native applications (it only works by extensions and, today, there are very few. It is not even able to make adjustments as simple as lowering/uploading the brightness of the phone, and the same happens with the volume. Much less can change basic system adjustments if we ask. There are no more rivals in sight (still). The only Android manufacturer who bet on a conversational assistant was Samsung with Bixby. This assistant is still alive in One UI 7, but it is so secondary that Samsung herself preinstall Gemini on her phones and its extensions are key to the operation of Galaxy AI. In China, the great manufacturers are beginning to integrate Depseek as the native but, for the moment, there is no advanced voice or native integration. Honor wants to change everything With its AI agent, one capable of performing all kinds of requests, including the most important, those of native adjustments. Image | Apple In Xataka | The new Siri forgets the devices where it is more important: the Homepod and Apple Watch

We have tried Sesame’s conversational. It is the experience closest to a “human voice” that we have seen

March 6, 2025 by usatoday24

Theodore Twombly, the main character of the movie ‘Her‘, fell in love with a machine called Samantha. He didn’t even need to see her or touch her. It was enough to listen to his voice, which was actually that of actress Scarlett Johansson. That was science fiction, but little by little we are approaching a point where to fall in love with a machine is no longer. We have been seeing some time with replikathe AI service that allows virtual avatars to become our friends or something else. That service achieves it with an AI model that generates text, such as chatgpt. Until now we chatted with the machines, but little by little we are beginning to talk to them. Chatgpt’s voice modes precisely give that optionand in fact the company He had to withdraw one of his voices for being too similar to the Scarlett Johansson. But now an artificial intelligence startup called Skew me has gone one step further. At the end of February the company He published a demonstration of its voice conversational generation model (CSM, by conversation Speech Model), and its impact has been remarkable. Some users have informed of feeling an emotional connection with the male and female voices of the model (“Maya” and “thousands”). One of them, who published his impressions in Hacker News, explained How “I am even a little worried about whether I start feeling emotionally linked to a voice assistant with this level of so human sound.” Anyone can try to speak with Maya or thousands Thanks to that demo on the Sesame website. The only obstacle is that conversations must be in English: these models do not speak other languages at the moment. I just did it for a few minutes, and the operation of this conversational chatbot is really surprising. The voice is warm and close, but above all I perfectly imitate the way a person would speak. With pauses, doubts or intonation changes. The voice generation is instantaneous, there is no latency, and certainly the sensation is to be having a conversation with another human being. It’s strange, exciting and disturbing at the same time. As they explain In his blog Those responsible, “in Sesame our goal is to achieve a” presence of the voice “, that magical quality that makes oral interactions look real, are understood and valued.” They are pointing to something similar to what Replika pointed out: to create “conversational companions” that offer a genuine dialogue with which to build some confidence over time. These models are not perfect. Maya, for example, has demonstrated do strange things From time to time, but comments on Some forums of discussion like this Reddit They make it clear that the quality of these models is spectacular. If you want to check the quality of this model, attentive to this. Source: Reddit. And if you do not believe it, take a look at this conversation that Gavin Purcell, one of those responsible for the podcast Ai for Humanshe posted on Reddit arguing unsuccessfully with the machine to try to find its limits. It does not seem to achieve it, and in fact it is impossible to detect that one of the interlocutors is a machine. His speed of answer, his changes in tones, his choice of phrases and words … is amazing. Sesame’s conversational chatbot It also allows you to interpret different roles (“Roleplaying”), something that for example Openai usually limits. Openai has been working on their voice modes for chatgpt, and Grok 3 has also implemented different synthesized voices and also adjust to diverse personalities. There is even a “deranged” and another “sexy” voice, for example, which demonstrates once again that Musk and Xai do not mind experimenting As they comment In Ars Technicain Sesame they have achieved this advance thanks to two models (one trunk and another decoder) that work together. Both are based on architecture calls, and Sesame has raised three different sizes. The largest of all combines a trunk model of 8,000 million parameters with a decoder of 300 million, which results in a joint 8.3b model. To train it they have used a million hours of audio files in English. The comments In a debate In Hacker News they made it clear that the quality of Sesame’s voices is almost human, but even users continued to notice that something failed. One of Sesame’s co -founders, Brendan Iribe, I participated In the debate thanking those comments and confirming that they still have a lot of work ahead. Is “still too anxious Often inappropriate in his tone, prosody and rhythm “, He explained, and has problems with the interruptions, times and fluidity of the conversation. “Today we are firmly In the valley (disturbing)“, he said,” but we are optimistic and we can get out of it. “ The possibilities seem almost unlimited for these types of models, but they are both for good and for worse. Its use to supplant identities, for example, has already given some serious scares. Here is the Creation of a “family password” It can be very useful to avoid part of those problems, although at the moment you are not allowed to clone voices. We will see how AI companies react to these types of problems, but everything indicates that this future in which We will talk constantly (and we will even fall in love) with the machines It is getting closer. In Xataka | Be careful with falling in love with your chatbot: in Openai they warn that GPT-4O can reduce the need to socialize with human beings

Now you can send you voice messages and images

February 5, 2025 by usatoday24

At the end of last year, before Deepseek It appeared on the scene and provoked an earthquake in the artificial intelligence sector, Openai surprised us With an interesting movement: we could keep a number on our agenda and talk with Chatgpt directly from WhatsApp. The main advantage was to access the chatbot without installing its application. However, there were also some important limitations. By using chatgpt in WhatsApp, we lost key functions such as voice entry and the ability to analyze images, two tools that make them usually very practical. OpenAI Chatgpt Power in WhatsApp: Voice Notes and Images Now, Openai has taken another step. If you have the number +1 (800) 242-8478 Saved on your agenda, you can now send voice notes to Chatgpt, as if it were any other contact. This means that, instead of writing, you can simply talk to you and receive an answer in text without leaving the messaging app. But there is more. Another of the great novelties is that Chatgpt in WhatsApp now has vision capabilities. If you send you a photo, you can analyze it and interpret what you see. Do you want to know what is in an image? Recognize an object? Locate the place where a photograph was taken? All this is already possible. Despite these improvements, the chatbot in WhatsApp still has some limitations. For example, we still can’t forward that kilometer audio To summarize it. It is time to keep listening to them. Nor can you seek information on the Internet in real time, which limits its usefulness in consultations on recent topics. For those who seek the most complete experience, the official chatgpt application remains the best option. Not only does it maintain all the functions mentioned, but it also allows internet access and the use of advanced voice mode with The Camera Live functionwhich allows you to see the world real time. Images | OpenAI | Screen capture In Xataka | The AI raises a huge change in our mobiles. One that will have (at least) 32 GB of RAM

Log In

Sign In

Forgot password?

Your password reset link appears to be invalid or expired.

Log in

Privacy Policy

Add to Collection

No Collections