Google Gemini: Everything you need to know about Google's next-gen multimodal AI

By Tyler Lacoma , Manuel Vonau , and Steven Winkelman

Updated Feb 8, 2024

Google Gemini is here, with a whole new approach to multimodal AI: Here's what you should know.

Source: Google

Gemini is slated to become a central part of Google's identity. It's the new name for Google's experimental ChatGPT competitor Bard and the underlying large language model powering the answers. It's also replacing Duet AI in Workspace as well as Google Assistant on your phone, with its longer and more powerful AI-generated answers providing you with richer information. As you can see, Gemini is a sum of different products, which makes it rather complicated to explain. We're here to help you understand what it is, how it works, and what you can expect from it.

What is Google Gemini?

Gemini is Google's latest evolution of Bard and Assistant

On February 8, 2024, Google announced a big rebranding of Bard, its experimental AI chatbot. The tool is now known as Gemini, but it essentially still provides the same features as Bard before it, save for a minor redesign of the website. In the simplest terms, this version of Gemini is an interface for you that makes it possible to use Google's large language model. Other popular generative AIs are ChatGPT and DALLE-E. Generative AIs can make video, audio, and imagery. As an AI chatbot, Gemini focuses on creating text that answers your questions naturally and conversationally, but it was also recently updated to support image generation.

The initials 'AI' against a light gray background

What is generative AI?

An agent of the human will, an amplifier of human cognition. Discover the power of generative AI

Gemini is available as a free product, but if you'd like to access more features and get more accurate answers, you can also subscribe to the new Gemini Advanced. It's part of the Google One subscription, which received a new tier following the introduction of Gemini. The new Google One AI Premium plan is available for $26 per month or $20 if you pay yearly. On top of access to a better AI model, it also features 2TB of Google Drive storage and more Google One features.

Gemini is also coming to Google Workspace. Right now, the office suite features a "Duet AI" chatbot, which will be rebranded to "Gemini for Workspace." Those who subscribe to the Google One AI Premium plan will get access to Gemini in Gmail, Drive, Docs, and more, just like businesses. The new name will also roll out to Google Cloud customers.

Along with the Bard rebranding, Google also released a Gemini app for Android, something that was never available for the older version of the chatbot. Once you install Gemini on your phone or opt into it via Google Assistant, it's possible to switch over to it. This unlocks a few new features on your phone. You can use the familiar "Hey Google" voice command to access it and ask it questions. Gemini is aware of what's shown on your screen, so you can ask it to generate text or answers based on what's visible. A lot of Google Assistant features are also available through Gemini, like setting timers, the option to make calls, and smart home controls. Google is working on expanding this legacy functionality in the future.

What are AI repetition penalties?

Here's your remedy when AI gives you the wrong answers

Gemini Advanced is available in English in over 150 countries, and it will roll out to more regions and languages in the future. The new Gemini Android experience is only available in the US in English. Google is rolling it out in more regions quickly, so check back if you have access to it periodically.

Gemini is also Google's most powerful generative AI model yet

Google presentation with blue illustration showing the three versions of Gemini and their complexity.

Source: Google

Let's move on to the model that powers the chat and speech interface discussed earlier, which is confusingly also called Gemini. This Gemini is a suite of generative AI services marketed specifically for businesses interested in expanding their AI services. It’s a family of multimodal AI models (we’ll get more into that below) created by the Google DeepMind project.

Google Gemini is still new. Google added an English-tuned version of Gemini Pro to Google Bard in December 2023. Despite the name, Google describes Gemini Pro as the “lite” version of the AI model, although it looks more like the standard version to us. The family also includes Gemini Ultra, the premium AI that Google wants to be the flagship of the suite. This is the one that powers the paid version of the Gemini Advanced chatbot.

Gemini Nano rounds out the trio. Nano is the mobile-friendly version of the large language model that launched on the Google Pixel 8 Pro with its December Feature Drop. It allows for on-device processing and will eventually make its way to other Android phones.

It looks like Google is slowly walking away from using Gemini as the name for its underlying language model. When it announced that Bard was rebranding to Gemini, it introduced the paid version as "Gemini Advanced with Ultra 1.0" and called the free version "Gemini with Pro 1.0." This avoids not-so-elegant naming schemes like "Gemini Advanced powered by Gemini Ultra."

Is Google Gemini a chatbot? Can it create content?

Google presentation should Gemini AI specialties against dark background.

Source: Google

As discussed above, Gemini can certainly create content, but Gemini is far more ambitious than a chatbot, and that requires some explanation.

Gemini is technically an LLM or large language model, which means it’s a machine learning framework that’s taught by dumping a bunch of human stuff (online content, generally) in it and helping it make rules to understand that content. Do that enough, and LLMs can process language data enough to put together their own sentences and mimic certain styles as we see ChatGPT and Bard doing — like expert puzzle solvers creating mathematical ways to “solve” human speech. The more they learn, the better they can get at it.

Most LLMs specialize in only a couple of things, like speech or images. That helps keep them focused and reduces the enormous resources they tend to require. Google is particularly skilled at creating efficient AI models that are deeply trained on a more limited array of content, which contrasts to OpenAI’s system of throwing almost everything it can at the AI.

However, Gemini appears to be different from the existing models, because it’s been trained as multimodal from the very beginning. Multimodal just means that the AI can learn and create all kinds of content, not just one “language.” Gemini can handle speech, match, reasoning problems, code, images (including emojis), video, audio, and more. It’s like the polymath or Renaissance Man of the LLM world.

As you can see with our image examples, that seems to make Gemini very good at understanding context and interpreting that information correctly for users, regardless of subject matter.

Gemini recognizing a cat video and commenting on what's happening with a cat pun.

Source: Google

Based on the data we have, Gemini appears to be very good at what it does within scope. It scored a 90% on the Massive Multitask Language Understanding (MMLU) test, which is better than most human language experts and in line with Google’s past performance. Google also says Gemini beats out existing AI models in 30 out of 32 academic tests used to score LLMs. However, other reports also say that while Gemini Pro can beat GPT-3.5 (which powered much of the ChatGPT content we’ve seen this year) but is beaten by the newer GPT-4, while Gemini Ultra narrowly beats GPT-4. It’s, uhh, a very competitive field right now.

No AI currently on the market is quite as multimodal as Gemini, which means businesses that use this trained AI can adapt it to nearly anything. That holds particular value for companies, which may want to customize AI services to do anything from recognizing counterfeit handbags to imitating a helpful Swedish uncle on a customer service chat. Google also mentions a few other possibilities, like:

Explaining physics problems to students
Processing raw audio to look for certain signals
Analyzing user intent to create customizable kits and packages for a person
Helping scientists spot links in published research that they would have missed
Winning all the competitive programming contests that it’s allowed at

Is Google Gemini different from Google Bard?

Google Gemini shown recognizing a small drawing is a bird in water.

Source: Google

Yes. Gemini is different from Google Bard, but a little context makes this answer far less confusing. Until February 2024, Google Bard was the user interface Google used with its various LLMs. The original Bard that was launched in early 2023 was a much earlier attempt at consumer-facing AI (remember, in the context of these early 2020s AI LLMs, even several months can be a long time).

When it launched in March 2023, Bard used Google's LaMDA (Language Model for Dialogue Applications) model. A few months later, Bard got its first major update with the release of PaLM 2 at Google I/O. In December 2023, Google gave Bard its biggest update yet with the switch to the Gemini Pro model. In February 2024, the Bard brand was discontinued altogether, with the interface itself now also called Gemini.

What's the deal with PaLM 2 now that Gemini has been released?

It’s complicated, and we don’t have a good look behind the scenes. PaLM 2 was a massive update to Google’s language-focused LLM made earlier in 2023. PaLM 2 excels at language tasks like translation, and while Google has made PaLM 2 modules that handle other things like reading medical scans, it’s not as natively multimodal as Gemini. However, it does provide lightweight AI services for businesses that want to build their own AIs by tapping into the work Google has already done, using the Google Vertex platform which Gemini is also on.

Gemini and PaLM 2 don’t appear to be competitors, and Gemini is the model most people will interact with when using AI products and hardware. Google DeepMind, formed from the merging of the two previous projects Brain Team and DeepMind, is in charge of both. Google refers to PaLM 2 and Gemini as two separate AI models with different foci, though they may work together for certain tasks.

Where can I find Google Gemini?

Gemini recognizes drawings of planets and corrects their order.

Source: Google

If you want to use the user-facing version of Google Gemini, just visit the Gemini website or download the Gemini app on your Android phone. On the Apple iPhone, Gemini is available within the regular Google app.

If you're a developer interested in using the underlying AI model for your own projects, stop by DeepMind’s webpage for Gemini, and look for a sign-up option to learn more or a sign-in option for your developer account, so you can get started with the Gemini Pro API kit. Then you can start incorporating Gemini services into your apps and tailoring specific Gemini models to your needs.

Keep in mind, Gemini is designed for organizational and developer use only, primarily via the Vertex platform. It’s for companies that want tailored AI solutions, which they would then offer to customers through their own apps and websites. If you, as a consumer, want to experience Gemini your best bet is Google Bard or related Google services.

What does Gemini cost to use?

For consumers, the basic version of Gemini with Pro 1.0 is free to use. To get access to the Gemini Advanced with Ultra 1.0, you need to subscribe to the Google One AI Premium plan. It costs $26 per month or $240 per year, with the yearly discount averaging out to $20 per month.

For developers and companies using the underlying Gemini AI model, specific Gemini pricing is difficult to parse right now. We suggest taking a look at Google Vertex and its pricing for all generative AI services, which vary based on the type of content and the specific service a business is interested in.

Is Google Gemini safe?

DeepMind says that Gemini was trained with safety in mind and will be deployed responsibly. Google is very vague about what that entails, but it likely means that Gemini won’t be able to do anything too naughty, invasive or illegal.

Left largely untouched is the question of how Gemini is consuming our content, proprietary work, and conversations, as well as how it could be used to take jobs, make money in unethical ways, or exploit vulnerable groups. Those are questions raised about all LLMs, and currently, we have a whole lot more questions than answers.

One thing to keep in mind when you converse with Google Gemini is that all of your words may be used to further train the AI. Your conversations could also be audited and reviewed by Google workers tasked with improving the product, as prominently disclosed when you first open Gemini. Be mindful of what you share with the AI and don't give out private information you wouldn't be comfortable saying out loud elsewhere on the internet.

Gemini is now on the board: Keep watching Google

Google continues to refine its AI models and introduce them as a way to position itself as the go-to source for professional AI development, something the company is working on the face of steep competition from sources like OpenAI. Gemini is an ambitious entry that’s trained to do a little bit of everything, making it one of the most capable models yet. Gemini is now at the front and center of Google's AI efforts, with the branding not only referring to the underlying AI model, but also the consumer-facing products that are getting incorporated into all kinds of Google services. To learn a bit more, take a look at our piece on LLMs.