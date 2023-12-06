You may have noticed that OpenAI, creators of ChaptGPT technology, is going through some turmoil with massive resignations and re-assignations, notably with its CEO Sam Altman leaving for a potential job with Microsoft then returning with an all-new board. It’s wild over there. You may have also noticed that Google has been making great strides in business-and-consumer-facing AI over the past year, with the massive LLM (large language model) PaLM 2 update, releasing Google Bard, and generally solidifying its AI services into cohesive platforms.

Put all that together, and it’s really no surprise Google has taken this opportunity to launch a brand generative (a.k.a., it makes stuff) AI, this one called Google Gemini. It’s new, it’s fun, it’s kinda weird: Here’s what you should know about Google Gemini’s AI.

What is Google Gemini?

Google Gemini is a new suite of generative AI services Google is launching, specifically for businesses interested in expanding their AI services. It’s a family of multimodal AI models (we’ll get more into that below) created by the Google DeepMind project.

Google Gemini is currently very new. Google added an English-tuned version of Gemini Pro to Google Bard in December 2023. Despite the name, Google describes Gemini Pro as the “lite” version of the AI model, although it looks more like the standard version to us. The family also includes Gemini Ultra, the premium AI that Google wants to be the flagship of the suite.

Gemini Nano rounds out the trio. Nano is the mobile-friendly version of the large language model that is launching on the Google Pixel 8 Pro with its December Feature Drop. It allows for on-device processing and will eventually make its way to other Android phones.

Is Google Gemini a chatbot? Can it create content?

Gemini can certainly create content, but Gemini is far more ambitious than a chatbot, and that requires some explanation.

Gemini is technically an LLM or large language model, which means it’s a machine learning framework that’s taught by dumping a bunch of human stuff (online content, generally) in it and helping it make rules to understand that content. Do that enough, and LLMs can process language data enough to put together their own sentences and mimic certain styles as we see ChatGPT and Bard doing — like expert puzzle solvers creating mathematical ways to “solve” human speech. The more they learn, the better they can get at it.

Most LLMs specialize in only a couple of things, like speech or images. That helps keep them focused and reduces the enormous resources they tend to require. Google is particularly skilled at creating efficient AI models that are deeply trained on a more limited array of content, which contrasts to OpenAI’s system of throwing almost everything it can at the AI.

However, Gemini appears to be different from the usual LLM, because it’s been trained as multimodal from the very beginning. Multimodal just means that the AI can learn and create all kinds of content, not just one “language.” Gemini can handle speech, match, reasoning problems, code, images (including emojis), video, audio, and more. It’s like the polymath or Renaissance Man of the LLM world.

As you can see with our image examples, that seems to make Gemini very good at understanding context and interpreting that information correctly for users, regardless of subject matter.

Based on the data we have, Gemini appears to be very good at what it does…within scope. It scored a 90% on the Massive Multitask Language Understanding (MMLU) test, which is better than most human language experts and in line with Google’s past performance. Google also says Gemini beats out existing AI models in 30 out of 32 academic tests used to score LLMs. However, other reports also say that while Gemini Pro can beat GPT-3.5 (which powered much of the ChatGPT content we’ve seen this year) but is beaten by the newer GPT-4, while Gemini Ultra narrowly beats GPT-4. It’s, uhh, a very competitive field right now.

However, no AI currently on the market is quite as multimodal as Gemini, which means businesses that use this trained AI can adapt it to nearly anything. That holds particular value for companies, which may want to customize AI services to do anything from recognizing counterfeit handbags to imitating a helpful Swedish uncle on a customer service chat. Google also mentions a few other possibilities, like:

Explaining physics problems to students

Processing raw audio to look for certain signals

Analyzing user intent to create customizable kits and packages for a person

Helping scientists spot links in published research that they would have missed

Winning all the competitive programming contests that it’s allowed at

Is Google Gemini different from Google Bard?

Not really. Bard was a much earlier attempt at consumer-facing AI (remember, in the context of these early 2020s AI LLMs, even several months can be a long time). But with the release of Gemini, Google is updating Google Bard with Gemini Pro technology, so all those benefits are now part of Bard. Of course, Bard’s tools are much more limited than what Gemini is capable of, but Bard is best seen as a part of Gemini now.

How does this all relate to PaLM 2?

It’s complicated, and we don’t have a good look behind the scenes. PaLM 2 was a massive update to Google’s language-focused LLM model made earlier in 2023. PaLM 2 excels at language tasks like translation, and while Google has made PaLM 2 modules that handle other things like reading medical scans, it’s not as natively multimodal as Gemini. However, it does provide lightweight AI services for businesses that want to build their own AIs by tapping into the work Google has already done, using the Google Vertex platform which Gemini is also on.

Gemini and PaLM 2 don’t appear to be competitors in any sense, at least not now. Google DeepMind, formed from the merging of the two previous projects Brain Team and DeepMind, is in charge of both. It seems likely the two are feeding into each other at some level. But for now, Google is still referring to them as two separate AI models with different foci.

Where can I find Google Gemini?

Stop by DeepMind’s webpage for Gemini, and look for a sign-up option to learn more or a sign-in option for your dev account so you can get started with the Gemini Pro API kit. Then you can start incorporating Gemini services into your apps and tailoring specific Gemini models to your needs. Only Gemini Pro will be available on December 13, 2023, with the other versions following later.

Keep in mind, Gemini is designed for organizational and developer use only, primarily via the Vertex platform. It’s for companies that want tailored AI solutions, which they would then offer to customers through their own apps and websites. If you, as a consumer, want to experience Gemini your best bet is Google Bard or related Google services.

What Does Google Gemini Cost to Use?

Specific Gemini pricing is difficult to parse right now. We suggest taking a look at Google Vertex and its pricing for all generative AI services, which vary based on the type of content and the specific service a business is interested in.

Is Google Gemini safe?

DeepMind says that Gemini was trained with safety in mind and will be deployed responsibly. Google is very vague about what that entails, but it likely means that Gemini won’t be able to do anything too naughty, invasive or illegal.

Left largely untouched is the question of how Gemini is consuming our content, proprietary work, and conversations…as well as how it could be used to take jobs, make money in unethical ways, or exploit vulnerable groups. Those are questions raised about all LLMs, and currently, we have a whole lot more questions than answers.

Gemini is now on the board: Keep watching Google

Google continues to refine its AI models and introduce them as a way to position itself as the go-to source for professional AI development, something the company is working on the face of steep competition from sources like OpenAI. Gemini is an ambitious entry that’s trained to do a little bit of everything, making it one of the most capable models yet. Expect Gemini to get incorporated into all kinds of Google services in the coming year, which will remain a fascinating time for all AI. To learn a bit more, take a look at our piece on LLMs.