New Gemini’s screen Analysis is insane for Automation
AI Summary
The video demonstrates the new feature in Google’s Gemini app that allows users to upload videos for processing, including both the visual content and audio. This enables advanced automation workflows where users can record their screen, explain tasks verbally, and have Gemini process the video to generate automation scripts. The example shown involves a user recording a short video to instruct Gemini to navigate a website, click specific buttons, and extract follower descriptions. The generated script is then run through Nanobrowser, an open-source browser automation tool, executing the task fully automatically. This approach speeds up automation creation and improves reliability by combining video processing with audio instructions. The presenter highlights the ease and power of using Gemini for building custom automation, and also mentions Google AI Studio as a free alternative for similar uses.