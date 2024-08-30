We use digital assistants like Google Assistant and Siri to help us manage tasks, answer questions, and stay organized. While helpful, they fall short of the artificial intelligence depicted in sci-fi movies, like the intelligent, intuitive systems that seem almost human. However, a new chapter is beginning. Google's Gemini Nano is setting the stage for smarter and faster interactions with your device, whether it's the latest Google Pixel or another Android smartphone. Let's take a closer look at Gemini Nano.

What is Gemini Nano?

Gemini Nano is a small but powerful AI model designed for local use on low-power devices. According to benchmarks, this large language model performs well in tasks like text summarization and reading comprehension. It also performs well with more complex operations such as reasoning, STEM tasks, and coding.

The model comes in two variants: Nano-1, with 1.8 billion parameters suited for low-memory devices, and Nano-2, offering 3.25 billion parameters for more memory-intensive environments. Gemini Nano operates on Android devices that use the Android AICore system service. It is available on the Google Pixel 9 series, Pixel 8 Pro, Pixel 8, Pixel 8a, Samsung Galaxy S24 series, Galaxy Z Fold 6, and Z Flip 6 devices, with more devices and Google Chrome support on the way.

Gemini Nano: A multimodal, multilingual model

The Gemini models are built using a dataset that is both multimodal and multilingual. This dataset includes data sourced from web documents, books, and code, as well as images, audio, and video. Multimodal AIs like Gemini Nano simultaneously process and understand multiple data types, including text, images, audio, and video.

This capability allows it to perform tasks that demand an integrated understanding of diverse media, leading to more informed and context-rich outputs. Thanks to its multilingual capabilities, Gemini Nano understands, processes, and produces content in various languages, making it accessible worldwide.

This feature allows effortless cross-language communication, allowing for real-time translation and content generation in multiple languages to meet the needs of a diverse audience. Gemini is available in more than 40 languages, and Google is teaching it how to respond in more languages.

Gemini Nano operates on your device

Gemini Nano processes data locally by operating on your device without sending data to cloud servers. This ensures sensitive information stays on your phone, protecting your privacy and preventing external transmission or data storage.

This is crucial when using end-to-end encrypted messaging apps, where suggestions and corrections are made without your messages ever leaving the device. This also means that Gemini Nano doesn't rely on an active internet connection to work effectively. You can access it offline.

How Gemini Nano learns from larger Gemini models

At their core, Gemini models are built upon a Transformer decoder framework, optimized for stable, large-scale training and efficient inference on Google's Tensor Processing Units (TPUs). However, both Nano-1 and Nano-2 versions are distilled from larger Gemini models, which is how they retain exceptional performance despite their small size. In machine learning, distilled refers to a process known as knowledge distillation. This technique involves training a smaller model (often called the student model) to mimic the behavior and performance of a larger, more complex model (known as the teacher model).

The larger Gemini models (Gemini Pro and Gemini Ultra), which were trained on large datasets and possess in-depth knowledge, serve as the teachers. These models are often too large to be deployed on devices with limited resources, like smartphones. The student model, in this case, Gemini Nano, is a smaller version of the teacher model. During the distillation process, the student model is trained to replicate the outputs of the teacher model as closely as possible but with fewer parameters.

Instead of only learning from the original dataset, the student model learns from the dataset and the outputs provided by the teacher model. This helps the student model capture the teacher's knowledge and patterns to perform well on tasks, even though it's smaller. This distillation process results in Gemini Nano retaining much of the accuracy and capability of the larger Gemini models but in a compact form suitable for on-device deployment.

AICore is the AI command center for Android

AICore is a system-level module within the Android OS that serves as the command center for managing AI tasks. When an Android app needs to perform AI-related operations, it interacts with AICore via the Google AI Edge SDK. AICore's architecture includes multiple built-in safety features to ensure AI tasks comply with Google's safety standards and are consistent with Google's Private Compute Core principles.

Google integrates Gemini Nano into Chrome

During the Google I/O 2024 event, Google revealed that Gemini Nano would soon be available in Chrome. You can explore Gemini Nano through Chrome Canary, the experimental version of the browser. By integrating Gemini Nano into Chrome's desktop app, Google improved the browser's AI features, positioning Chrome to leverage Gemini Nano like Microsoft Edge does with Copilot.

For developers, embedding AI within Chrome means they can develop web applications that use powerful AI functions without relying on cloud-based solutions. They can use APIs for tasks like translation or summarization that execute locally on users' devices, allowing users' devices to handle some of the resource load instead of their servers.

What are the current uses of Gemini Nano in Android?

Several features are powered by Gemini Nano on the latest Google Pixel phones, with new features likely to be introduced in the future. Here are a few use cases.

Recorder app

The Recorder uses Gemini Nano to summarize your recorded conversations, interviews, presentations, and lectures into digestible main points. It does this on your device without an internet connection.

Pixel Screenshots

With the Pixel Screenshots app, Gemini Nano's image processing analyzes the content of your screenshots, extracts text, and makes it searchable. The app also uses Gemini Nano to generate answers to your questions based on the content.

Gboard

Leveraging Gemini Nano, Gboard delivers contextually relevant and smart reply suggestions locally and quickly, improving the communication experience across various platforms.

Google Gemini vs. Apple Intelligence: Who leads in on-device AI?

The competition in on-device AI is getting intense. Both iOS and Android systems process on-device tasks locally when possible, resorting to the cloud only when necessary. While Google's Gemini model handles everything in-house, Apple Intelligence outsources it to ChatGPT. It's an interesting race between the two tech giants as they refine on-device AI.