Gemini Pro and Vision: Free Python API — A Quick Start-Up Guide

Abhishek Maheshwarappa
3 min readJan 20, 2024

In this article, we will explore how to use Gemini Pro and Vision Pro for free through their API.

For those who prefer a hands-on approach, there’s a Google Colab available for experimentation.

Introduction

Gemini is a class of large language models developed by Google DeepMind. It’s a multimodal model designed to seamlessly work with and interpret text, code, images, video, and audio. Gemini is seen as a rival to GPT-4, another multimodal model developed by OpenAI.

Image credit Google Deepmind

Gemini comes in three different versions

  1. Gemini Ultra
    a. largest and most capable model for highly complex tasks
    b. still in preview access
    c. efficiently serveable at scale on TPU accelerators
  2. Gemini Pro
    a. best model for scaling across a wide range of tasks
    b. available now
  3. Gemini Nano
    a. most efficient model for on-device tasks
    b. still in preview access
    c. trained by distilling from larger Gemini models
    d. two versions of Nano, with 1.8B (Nano-1) and 3.25B (Nano-2) parameters, targeting low and high memory devices respectively

Free Pricing

Developers have free access to Gemini Pro and Gemini Pro Vision through , with up to 60 requests per minute, making it suitable for most app development needs.

Google’s Gemini Setup in Google Colab

Ready to play — Google Colab

Step 1

Use Google Colab to get started

Step 2

Google AI Python SDK

It enables developers to use Google’s state-of-the-art generative AI models 1. Gemini
2. PaLM

Supports:

  • Generate text from text-only input
  • Generate text from text-and-images input (for Gemini only)
  • Build multi-turn conversations (chat)
  • Embedding
pip install google-generativeai

Step 3

Setting up your API key, follow the link and setup the API key required for accessing Gemini.

Google API Key generation.

Add the key to the secrets manager in the left panel.Give it the name GOOGLE_API_KEY.
Configure it with following code

GOOGLE_API_KEY=userdata.get('GOOGLE_API_KEY')

genai.configure(api_key=GOOGLE_API_KEY)

Step 4

1. Generate text from text inputs

model = genai.GenerativeModel('gemini-pro')

response = model.generate_content("What is medium blogs")

The response from the Gemini-pro is

2. Generate text from image and text inputs

model = genai.GenerativeModel('gemini-pro-vision')

Example output for the image

Pic Credits — howtocook
response = model.generate_content(img)
response = model.generate_content(["How to make this Recipe", img], stream=True)

Use this Google Colab notebook link is here.

References

  1. https://blog.google/technology/ai/google-gemini-ai/
  2. https://ai.google.dev/models/gemini

Stay on the cutting-edge of AI! 🌟 Follow me on Medium, connect on LinkedIn, and explore my GitHub for insightful AI projects about the latest trends in AI technologies and models. Dive into the world of AI with me and discover new horizons! 📚💻

--

--