Gemini (Bard)：Google's Multimodal Large Language Model

Introduction to Gemini

Gemini is a series of large multimodal language models developed by Google DeepMind, inheriting technologies from LaMDA and PaLM 2, featuring three products: Gemini Ultra, Gemini Pro, and Gemini Nano. The project was launched on December 6, 2023, positioning itself as a direct competitor to OpenAI’s GPT-4. Gemini demonstrates marked improvements in handling long texts and focuses on enhancing user creativity and productivity.

As an AI chat assistant developed by Google, Gemini supports applying Google’s latest AI technologies across various scenarios.

At its core, Gemini is a multimodal native large model capable of processing and understanding various input and output forms including text, code, video, audio, and images. This enables Gemini to perform natural and rich multimodal intelligent interactions, providing users with intelligent conversational support and suggestions whether writing articles, making plans, learning new knowledge, or seeking inspiration and creativity.

It not only understands users’ questions and needs but also dynamically adjusts its response strategies in real-time based on user feedback and preferences, improving dialogue fluency and personalization. As an intelligent partner, Gemini assists in stimulating users’ mental vitality and enriching creativity.

Gemini Diagram

Gemini Feature List

Writing: Assists in generating stories, poems, emails, resumes, reports, and various other text types, providing inspiration and suggestions.
Planning: Supports diverse plan-making for travel, work, study, and life, offering solutions tailored to user goals and conditions.
Learning: Provides knowledge materials across languages, science, arts, history, etc., along with related questions and advice.
Creativity: Sparks inspiration, expands thinking, and offers fun ideas and challenges.
Video: Supports video search, summarization, editing, and other functions.
Audio: Implements speech recognition, synthesis, and voice assistant applications.
Image: Supports image recognition, generation, and editing, offering examples and suggestions.
Coding: Aids in understanding and generating code in Python, Java, C++, Go, and other programming languages, including explanations.

Gemini Chinese Interface

Gemini Usage Help

Access Website or Application

Visit the Gemini Official Website to start experiencing it directly, or download the mobile apps via Google Play or Apple App Store.
Log In

Create or log in with a Google account for first-time use.
Input Questions or Requests

Describe your question naturally in the input box.
Send Request

Press Enter or click the “Send” button, then wait for Gemini to generate a reply.

After Gemini replies, users can copy, share, or like/dislike the content and participate in feedback and evaluation.

Settings notes:

Supported on web and mobile; logging into a Google account enables chatting.
Supports text and voice input; multilingual support includes Chinese, English, Japanese, Spanish, and more.
Different chat themes can be selected to obtain corresponding functional services.
Feedback can be given anytime; Gemini continuously optimizes answers based on opinions.

The preferred language of the Google account affects Gemini plugin features; choosing English enables more plugin functions.

Google’s Free Open Flash Model API

At Google IO 2024, Google released the Gemini 1.5 Flash model and provided developers almost free trial opportunities.

Daily free quotas include 1.5 billion tokens, with detailed free package contents as follows:

Gemini 1.5 Flash Free Package
- 15 requests per minute (RPM)
- 1 million tokens per minute (TPM)
- 1,500 token requests per day (RPD)
- Free context caching service, storing up to 1 million tokens per hour
Gemini 1.5 Pro Free Fine-Tuning Package
- 2 requests per minute (RPM)
- 32,000 tokens per minute (TPM)
- 50 requests per day (RPD)
Text-embedding-004 Fine-Tuned Model
- 1,500 requests per minute (RPM)

Additionally, Google AI Studio offers free access for developers; Gemini 1.5 Pro supports a 2M token context window.

Detailed pricing information here: https://ai.google.dev/pricing

Google AI Studio official site: https://aistudio.google.com

Gemini API Diagram

Official articles mention that Gemini 1.5 Pro is optimized for complex scenarios such as translation, programming, and reasoning, while Gemini 1.5 Flash focuses on high-frequency rapid response needs like summarization, chatting, illustration generation, and data extraction from documents and tables.

Gemini Model Overview

Gemini 1.5 Flash supports multimodal inputs, allowing users to directly send text, images, audio, and video for interaction, a feature currently not available in GLM-4-Flash.

Moreover, it supports structured output (e.g., JSON format), enhancing response stability and convenience of information extraction. This feature can be experienced at Google AI Studio.

Structured Output Diagram

Gemini Pricing

Item	Price	Description
Gemini Advanced	$19.99/month	Includes use of the Ultra 1.0 AI model, 2TB Google One storage, and integration & discounts in Gmail, Docs, etc.
Gemini Basic	Free	Allows use of standard AI models, 15GB Google One storage, and integration in Gmail, Docs, etc.

Gemini offers two subscription options:

The free Basic plan, with limited functionality, 60 requests per minute limit, 15GB monthly storage, and no access to the Ultra 1.0 model.
The paid Advanced plan, supporting the powerful Ultra 1.0 AI model, with 2TB storage and additional Google One services.

Gemini (Bard)：Google's Multimodal Large Language Model

Introduction to Gemini

Gemini Feature List

Gemini Usage Help

Google’s Free Open Flash Model API

Gemini Model Overview

Gemini Pricing

Gemini Downloads

Google Gemini Web Version

Google Gemini Mobile App

Google AI Studio — Use Gemini on the Google AI Studio platform