TalkBack uses Gemini Nano to increase image accessibility for users with low vision
Posted by Terence Zhang – Developer Relations Engineer and Lisie Lillianfeld – Product Manager
TalkBack is Android’s screen reader in the Android Accessibility Suite that describes text and images for Android users who have blindness or low vision. The TalkBack team is always working to make Android more accessible. Today, thanks to Gemini Nano with multimodality, TalkBack automatically provides users with blindness or low vision more vivid and detailed image descriptions to better understand the images on their screen.
Increasing accessibility using Gemini Nano with multimodality
Advancing accessibility is a core part of Google’s mission to build for everyone. That’s why TalkBack has a feature to describe images when developers didn’t include descriptive alt text. This feature was powered by a small ML model called Garcon. However, Garcon produced short, generic responses and couldn’t specify relevant details like landmarks or products.
The development of Gemini Nano with multimodality was the perfect opportunity to use the latest AI technology to increase accessibility with TalkBack. Now, when TalkBack users opt in on eligible devices, the screen reader uses Gemini Nano’s new multimodal capabilities to automatically provide users with clear, detailed image descriptions in apps including Google Photos and Chrome, even if the device is offline or has an unstable network connection.
“Gemini Nano helps fill in missing information,” said Lisie Lillianfeld, product manager at Google. “Whether it’s more details about what’s in a photo a friend sent or the style and cut of clothing when shopping online.”
Going beyond basic image descriptions
Here’s an example that illustrates how Gemini Nano improves image descriptions: When Garcon is presented with a panorama of the Sydney, Australia shoreline at night, it might read: “Full moon over the ocean.” Gemini Nano with multimodality can paint a richer picture, with a description like: “A panoramic view of Sydney Opera House and the Sydney Harbour Bridge from the north shore of Sydney, New South Wales, Australia.”
“It’s amazing how Nano can recognize something specific. For instance, the model will recognize not just a tower, but the Eiffel Tower,” said Lisie. “This kind of context takes advantage of the unique strengths of LLMs to deliver a helpful experience for our users.”
Using an on-device model like Gemini Nano was the only feasible solution for TalkBack to provide automatically generated detailed image descriptions for images, even while the device is offline.
“The average TalkBack user comes across 90 unlabeled images per day, and those images weren’t as accessible before this new feature,” said Lisie. The feature has gained positive user feedback, with early testers writing that the new image descriptions are a “game changer” and that it’s “wonderful” to have detailed image descriptions built into TalkBack.
Balancing inference verbosity and speed
One important decision the Android accessibility team made when implementing Gemini Nano with multimodality was between inference verbosity and speed, which is partially determined by image resolution. Gemini Nano with multimodality currently accepts images in either 512 pixels or 768 pixels.
“The 512-pixel resolution emitted its first token almost two seconds faster than 768 pixels, but the output wasn’t as detailed,” said Tyler Freeman, a senior software engineer at Google. “For our users, we decided a longer, richer description was worth the increased latency. We were able to hide the perceived latency a bit by streaming the tokens directly to the text-to-speech system, so users don’t have to wait for the full text to be generated before hearing a response.”
A hybrid solution using Gemini Nano and Gemini 1.5 Flash
TalkBack developers also implemented a hybrid AI solution using Gemini 1.5 Flash. With this server-based AI model, TalkBack can provide the best of on-device and server-based generative AI features to make the screen reader even more powerful.
When users want more details after hearing an automatically generated image description from Gemini Nano, TalkBack gives the user an option to listen to more by running the image through Gemini Flash. When users focus on an image, they can use a three-finger tap to open the TalkBack menu and select the “Describe Image” option to send the image to Gemini 1.5 Flash on the server and get even more details.
By combining the unique advantages of both Gemini Nano’s on-device processing with the full power of cloud-based Gemini 1.5 Flash, TalkBack provides blind and low-vision Android users a helpful and informative experience with images. The “describe image” feature powered by Gemini 1.5 Flash launched to TalkBack users on more Android devices, so even more users can get detailed image descriptions.
Compact model, big impact
The Android accessibility team recommends developers looking to use the Gemini Nano with multimodality prototype and test on a powerful, server-side model first. There developers can understand the UX faster, iterate on prompt engineering, and get a better idea of the highest quality possible using the most capable model available.
While Gemini Nano with multimodality can include missing context to improve image descriptions, it’s still best practice for developers to provide detailed alt text for all images on their apps or websites. If the alt text is not provided, TalkBack can help fill in the gaps.
The Android accessibility team’s goal is to create inclusive and accessible features, and leveraging Gemini Nano with multimodality to provide vivid and detailed image descriptions automatically is a big step towards that. Furthermore, their hybrid approach towards AI, combining the strengths of both Gemini Nano on device and Gemini 1.5 Flash in the server, showcases the transformative potential of AI in promoting inclusivity and accessibility and highlights Google’s ongoing commitment to building for everyone.
Get started
Learn more about Gemini Nano for app development.
This blog post is part of our series: Spotlight Week on Android 15, where we provide resources — blog posts, videos, sample code, and more — all designed to help you prepare your apps and take advantage of the latest features in Android 15. You can read more in the overview of Spotlight Week: Android 15, which will be updated throughout the week.
You Might Also Like
12 Best Places To Sell Sports Cards
Looking for the best places to sell sports cards? Do you have a box of sports cards from your childhood...
The Second Developer Preview of Android 16
Posted by Matthew McCullough – VP of Product Management, Android Developer The second developer preview of Android 16 is now...
13 Best Jobs for 11 Year Olds To Make Money
Are you wondering what the best jobs for 11-year-olds are? Many kids want to earn money, but finding jobs can...
Celebrating Another Year of #WeArePlay
Posted by Robbie McLachlan – Developer Marketing This year #WeArePlay took us on a journey across the globe, spotlighting 300...