Google is making significant strides with Gemini, its flagship suite of generative AI models, apps, and services. But what exactly is Gemini? How can you use it? And how does it compare to other generative AI tools like OpenAI’s ChatGPT, Meta’s Llama, and Microsoft’s Copilot?

To help you stay up to date with the latest Gemini developments, we’ve created this comprehensive guide, which will be regularly updated as new models, features, and news about Google’s Gemini plans are released.

What is Gemini?

Gemini is Google’s highly anticipated next-generation generative AI model family. Developed by Google’s AI research labs, DeepMind and Google Research, Gemini comes in four main versions:

  1. Gemini Ultra – A very large model.
  2. Gemini Pro – A large model, though smaller than Ultra. The latest version, Gemini 2.0 Pro Experimental, is considered Google’s flagship.
  3. Gemini Flash – A faster, “distilled” version of Pro. There’s also a more compact and quicker variant called Gemini Flash-Lite, and a version with advanced reasoning abilities, Gemini Flash Thinking Experimental.
  4. Gemini Nano – Two small models: Nano-1 and the more capable Nano-2, designed for offline use.

All Gemini models are natively multimodal, meaning they can process and analyze more than just text. Google claims these models were pre-trained and fine-tuned on various public, proprietary, and licensed data sets, including audio, images, videos, codebases, and multilingual text.

This multimodal capability differentiates Gemini from previous models like Google’s LaMDA, which was trained solely on text data. LaMDA, for instance, can only process and generate text-based content (e.g., essays or emails), while Gemini models can handle a broader range of data types.

It’s worth noting that the ethics and legality of training AI models on public data without the explicit consent of data owners are still unclear. Google has an AI indemnification policy to protect certain Google Cloud customers from potential lawsuits, but this policy contains exceptions. Caution is advised, especially if you plan to use Gemini for commercial purposes.

What’s the difference between the Gemini apps and Gemini models?

Google’s Gemini apps function as client interfaces that connect to various Gemini models, overlaying a chatbot-like interaction on top. Think of these apps as the front-end interface for Google’s generative AI, similar to ChatGPT and Anthropic’s Claude series.

Google Gemini Mobile App

Gemini can be accessed on the web, as well as via the Gemini mobile app on Android and iOS. On Android, the Gemini app has replaced the traditional Google Assistant app. On iOS, Gemini functionality is integrated within the Google and Google Search apps.

A recent addition on Android allows users to bring up the Gemini overlay while using any app, enabling questions about the content on-screen (e.g., a YouTube video). Simply press and hold the power button or say, “Hey Google” to trigger the overlay.

Gemini apps support input through text, voice commands, and images (including files like PDFs and soon videos) from Google Drive. These conversations can seamlessly sync between the mobile app and the web version, provided you’re logged into the same Google Account.

Gemini Advanced

Gemini’s capabilities extend beyond the core apps into key Google services like Gmail and Google Docs. To access most of these advanced features, you’ll need the Google One AI Premium Plan, which costs $20 per month. This plan unlocks Gemini’s integration with Google Workspace apps (Docs, Maps, Slides, Sheets, Drive, and Meet), as well as advanced functionality through Gemini Advanced.

Gemini Advanced offers priority access to new features, allows you to run and edit Python code directly in Gemini, and supports a larger context window — about 750,000 words (1,500 pages) compared to the regular 24,000 words (48 pages) in the basic Gemini app.

Advanced users also get access to the Deep Research feature, which helps generate research briefs based on complex queries, such as redesigning a kitchen or exploring a research topic. Additionally, memory functionality lets Gemini build on previous conversations for more personalized responses, while NotebookLM expands usage by turning PDFs into AI-generated podcasts.

Other exclusive features for Gemini Advanced users include trip planning in Google Search, where the AI generates travel itineraries based on flight details, meal preferences, and local attractions. Corporate customers can access Gemini Business and Gemini Enterprise plans for additional tools and support within Google Workspace.

Gemini in Google Services

Gemini is also making its way into popular Google apps like Gmail, Docs, Slides, Sheets, and Maps. In Gmail, Gemini assists with writing and summarizing emails, while in Docs, it helps draft and refine content. In Slides, it can generate slides and custom images, and in Sheets, it organizes data and formulates tables.

Google Maps and Drive also benefit from Gemini, as it can summarize reviews, generate travel recommendations, and provide quick facts about files and projects. In Meet, Gemini is used to provide translated captions in real-time.

On Chrome, Gemini introduces AI writing tools that help rewrite existing content or generate new text based on the current webpage. In Google Photos, Gemini aids in natural language search, while YouTube leverages Gemini for brainstorming video ideas.

Gemini in Gmail

Code Assistance and Security

Developers can tap into Gemini’s power via Google’s code assist tools, like the newly rebranded Code Assist (formerly Duet AI for Developers), which helps with code completion and error reduction. Gemini also supports security applications, such as Threat Intelligence, where it can analyze potentially malicious code and perform natural language searches for threats.

Gemini Extensions and Gems

Announced at Google I/O 2024, Gemini Advanced users can create “Gems” — custom chatbots powered by Gemini models. Gems can be generated from natural language prompts and shared with others or kept private. Available globally in 150 countries, Gems can integrate with various Google services, including Calendar, Keep, and YouTube Music.

Gemini’s apps also support “Gemini Extensions,” which enhance interaction with Google services like Drive, Gmail, and YouTube. In the future, Gemini will work with additional apps like Calendar, Keep, and more to perform specific tasks directly within those platforms.

Gemini Live and In-Depth Voice Chats

Gemini Live, available in mobile apps and on Pixel Buds Pro 2, allows users to engage in in-depth voice chats. This feature supports real-time interruptions and adaptation to speech patterns, making conversations feel more natural. Eventually, Gemini is expected to gain visual understanding, responding to photos and videos from your smartphone’s camera.

Image Generation via Imagen 3

Gemini users can generate images using Google’s Imagen 3 model, which promises more accurate text-to-image translations, improved creativity, and fewer visual errors compared to its predecessor. After a brief pause in early 2024, image generation of people has been reintroduced for users on paid Gemini plans.

Gemini for Teens and Smart Home Devices

In June 2024, a teen-focused Gemini experience was launched, providing additional safeguards and an AI literacy guide. It mirrors the standard Gemini experience, but with extra policies for responsible use.

Gemini is also expanding to Google smart home devices, such as Google TV, Nest thermostats, and speakers. For example, Gemini uses your preferences to curate content suggestions on Google TV and summarize reviews. The Nest ecosystem will soon benefit from AI-powered features like video summaries and automated actions based on natural language commands.

What Can the Gemini Models Do?

Gemini models are multimodal, meaning they can perform a wide range of tasks such as transcribing speech, captioning images and videos, and generating images. Google’s Gemini models promise to deliver a wealth of capabilities in the future, but users should be mindful of the current limitations, such as encoded biases and the risk of hallucinations.

Gemini Ultra

Gemini Ultra, while not widely available yet, is said to excel in tasks like physics problem-solving and scientific research, thanks to its multimodal capabilities. It can process images and generate scientific charts, though it’s still not fully accessible in product offerings.

Gemini Pro and Flash

Gemini 2.0 Pro, the latest version, outperforms its predecessors in coding and reasoning tasks, supporting up to 1.4 million words of input and offering advanced functionalities like code execution and research assistance. Gemini Flash, a lighter and faster model, excels in tasks like summarization and image captioning, providing significant speed improvements over its predecessors.

Gemini Nano

Designed to run directly on devices, Gemini Nano powers features like Smart Reply in Gboard and Summarize in Recorder on Pixel devices. It’s optimized for efficiency and privacy, handling tasks like summarizing recorded conversations offline.

Pricing for Gemini Models

Gemini’s models are available via API with a pay-as-you-go pricing structure. Free options exist, though they come with usage limits and exclude certain features. For example, Gemini 1.5 Pro starts at $1.25 per 1 million input tokens, while Gemini 2.0 Flash is priced at 10 cents per 1 million input tokens.

Project Astra and Future Developments

Project Astra, an ongoing initiative from Google DeepMind, is focused on developing AI-powered apps with real-time, multimodal understanding. Though still in the testing phase, Astra promises to enable simultaneous processing of live video and audio, with future plans potentially including smart glasses.

Will Gemini Be Available on iPhone?

There are indications that Gemini may come to the iPhone in the future. Apple is reportedly in talks with Google to integrate Gemini and other third-party models into its Apple Intelligence suite, though details are still sparse.