Google thinks web browsing should be left up to AI with its new “Gemini 2.5 Computer Use” model

Google just introduced Gemini 2.5 Computer Use, an AI model with human like web browsing capabilities
Asus Ad

Meet Google’s new AI model, the Gemini 2.5 Computer Use. A model that interacts with the internet just like a human would. Earlier this year, Google promised to bring computer use capabilities to developers through the Gemini API, and they have done it with this one. So, let’s take a closer look at what it is and what it can do.

What is Gemini 2.5 Computer Use?

In case you didn’t notice, that name’s a dead giveaway.

This is an AI model that can use your computer like a human sitting at the keyboard, clicking and typing through apps or websites. It is primarily designed to interact with web browsers, is capable of 13 different actions, and is built on Gemini 2.5 Pro’s visual and reasoning capabilities. All this without needing structured APIs and simply just clicking, typing, and scrolling like normal human beings.

It’s like basically giving the AI model tools to navigate the digital world.

How it works

So, Gemini 2.5 Computer Use basically works in a loop that operates via the “computer_use” tool in the Gemini API:

First, it sends a request to the model along with any custom user-defined functions if there are any. Then it will prompt the model with the request as well as a screenshot of the current UI.

After that, the model analyzes the request and then generates a response like “click at a certain coordinate” or “type certain text”. It also checks if the response is safe or not, like accepting unwanted cookies on a website, which requires further confirmation.

Then it will execute the action in the client environment (browser).

And lastly, it will capture the new screenshot of the GUI, and the function response will be sent back to the model. Similarly, it will work differently if the safety confirmation has been denied.

Gemini 2.5 Computer Use Model flow
Gemini 2.5 Computer Use Model flow

The model repeats from the analysis step until the given task has been completed or an error occurs. The flowchart above can help to ease the understanding even more. Google has also uploaded a demo video regarding the Computer Use model that organizes sticky notes.

Why is it useful?

If you have used AI agents before, you would know this is a pretty big deal for users. It will greatly help in automating workflows or making your own personal assistant even easier.

The Gemini 2.5 Computer Use model even impressed in benchmarks like Online-Mind2Web (web tasks) and AndroidWorld (mobile) with over 65% accuracy and comparatively lower latency of below 225 ms than other competitive models like Claude Sonnet 4.5 at around 280 ms.

Gemini 2.5 Computer Use high accuracy while maintaining low latency

You may also like:

What are the downsides of Gemini 2.5 Computer Use?

As I mentioned before, the model will require a screenshot of your screen along with the URL and action histories that are sent to Google’s server. If you use the model to automate tasks involving personal data, the screenshot could capture sensitive information like your address or name.

The main problem comes when malicious sites are built on manipulating this model with fake UI elements. This could further lead to downloading malicious software or, as mentioned above, exposing your personal information. Although we can expect it to be trained to avoid doing such acts.

My final thoughts

At last, I would say this step towards a “computer use” model by Google is quite innovative and useful. Think about the time and effort it can save in doing simple yet boring tasks that require you to do the same thing over and over again. But like I said before, it has its own risks and downsides too.

Nevertheless, if you plan on using this model or trying it out, you can do so right now via the Google AI Studio or demo it on Browserbase.

  • In the meantime, find out what an AI laptop is: