In December 2024 OpenAi surprised the world with a simply brutal function: Chatgpt had “eyes” and was able to see and interpret the world in real time. The demo was simply impressive: the app, through the camera, could recognize everything I saw. And everything is everything.
In early 2025, Google announced great novelty to Gemini Liveyour advanced voice mode. A way that comes to compete directly with this chatgpt function, and that is already available for Google Pixel 9 and Samsung Galaxy S25. As long as you pay the Advanced subscription.
I have been able to prove this function in a Google Pixel 9 Pro. And yes, it’s as impressive as you might think.
The interface. Acting the new “Vision” modes of Gemini Live is quite simple. You just have to open the app and go to the advanced voice mode (the icon of the lower right corner).


Once Gemini Live is opened, you will see two new direct accesses: one to give access to the camera and another to give access to your screen. Because yes, you can also read the content of the screen in real time.
Camera mode. When activating the camera mode, Gemini will see everything that the camera transmits. It is simply spectacular how it is capable of recognizing absolutely everything, and how quickly is recognizing concrete aspects such as plant types, technological device model (without putting anything in it).
We can ask you everything, and it serves as a guide, translator and … private professor. The latter has seemed spectacular: solve equations, psychotechnical problems and all kinds of questions explaining the step by step.
Screen mode. This mode is perhaps the most movie at the privacy level but, if we are willing, Gemini is able to read everything you see on the screen. We can ask you any issues related to it.
In this case I have not seemed so useful, since Google Lens gives us in a look at the necessary information if we are looking for something in particular. However, it is another sample of Gemini’s new potential.


Do not trust AI, never. As always with AI, the recommendation is not to trust it. It is curious as, in a general plane of my desk, he has been able to perfectly recognize my computer. However, by focusing it directly, he told me that he did not see any computer.
I have helped him asking if he is a Mac Mini M1 or a M4… and the answer has been that an M1 (they are very easily distinguishable by ports and size). He has also been wrong reading some numbers when asked about some psychotechnical test and, ultimately, you have to be quite above so that it works well.
Nor of your questions. The problem shared by Gemini Live and GPT’s advanced conversation mode is clear: they are too much ask. To foster conversation, the answers with a question always end, something especially annoying in this mode of vision.
It is very difficult to get to the grain, since it usually interrupts the complete answer with some question. It is still a smaller problem shared with all AI, but breaks the conversational flow a bit.
Despite this, Gemini Live’s vision seems barbaric.
Image | Xataka
GIPHY App Key not set. Please check settings