/gemini-ai-toolkit

Unlock the potential of Google's Gemini AI models with this versatile toolkit. Offering seamless chat, text generation, and multimodal interactions, supporting various file types, including PDF's, images, videos, audio, text and more. Enjoy real-time responses, customizable parameters, and easy integration for diverse AI tasks.

Primary LanguagePythonMIT LicenseMIT

Google Gemini AI

Gemini AI Toolkit

maintained - yes contributions - welcome

Google Gemini AI

Note

This toolkit supports Google's newest Gemini 2.0 model & 1.5 models, as well as the experimentals models (as of December 13, 2024)

The Gemini AI Toolkit is the easiest way for developers to build with Google's Gemini AI models. It offers seamless integration for chat, text generation, and multimodal interactions, allowing you to process and analyze text, images, audio, video, code, and more—all in one comprehensive package with minimal dependencies.

🚀 Features

  • Multimodal Interaction: Effortlessly process and analyze a wide array of file types—including PDFs, images, videos, audio files, text documents, and code snippets—unlocking new dimensions of AI-assisted understanding.
  • Interactive Chat: Engage in dynamic, context-aware conversations with Gemini, enabling real-time dialogue that adapts to your needs.
  • Smart File Handling: Seamlessly upload and process files from local paths or URLs, with automatic temporary storage management to keep your workspace clutter-free.
  • Command Support: Utilize intuitive commands to control the toolkit's functionality, enhancing efficiency and user experience.
  • Customizable Parameters: Tailor your AI interactions by enabling structured JSON output for automated processing, using streaming responses for faster interactions, and adjusting temperature, token limits, and safety thresholds and more to suit your needs
  • Lightweight Design: Enjoy a streamlined experience with minimal dependencies—primarily leveraging the requests package—making setup and deployment a breeze.

📋 Table of Contents

🛠 Installation

  1. Clone the repository:

    git clone https://github.com/RMNCLDYO/gemini-ai-toolkit.git
  2. Navigate to the repository folder:

    cd gemini-ai-toolkit
  3. Install the required dependencies:

    pip install -r requirements.txt

🔑 Configuration

  1. Obtain an API key from Google AI Studio.

  2. You have three options for managing your API key:

    Click here to view the API key configuration options
    • Setting it as an environment variable on your device (recommended for everyday use)

      • Navigate to your terminal.
      • Add your API key like so:
        export GEMINI_API_KEY=your_api_key

      This method allows the API key to be loaded automatically when using the wrapper or CLI.

    • Using an .env file (recommended for development):

      • Install python-dotenv if you haven't already: pip install python-dotenv.
      • Create a .env file in the project's root directory or rename example.env in the root folder to .env and replace your_api_key_here with your API key.
      • Add your API key to the .env file like so:
        GEMINI_API_KEY=your_api_key

      This method allows the API key to be loaded automatically when using the wrapper or CLI, assuming you have python-dotenv installed and set up correctly.

    • Direct Input:

      • If you prefer not to use a .env file, you can directly pass your API key as an argument to the CLI or the wrapper functions.

        CLI

        --api_key "your_api_key"

        Wrapper

        api_key="your_api_key"

      This method requires manually inputting your API key each time you initiate an API call, ensuring flexibility for different deployment environments.

💻 Usage

Multimodal Mode

For processing multiple input types including audio, video, text, images, code and a wide range of files. This mode allows you to upload files (from local paths or URLs), chat with the AI about the content, and maintain a knowledge base throughout the conversation.

CLI

python cli.py --multimodal --prompt "Analyze both of these files and provide a summary of each, one by one. Don't overlook any details." --files file1.jpg https://example.com/file2.pdf

Wrapper

from gemini import Multimodal

Multimodal().run(prompt="Analyze both of these files and provide a summary of each, one by one. Don't overlook any details.", files=["file1.jpg", "https://example.com/file2.pdf"])

Chat Mode

For interactive conversations with the AI model.

CLI

python cli.py --chat

Wrapper

from gemini import Chat

Chat().run()

Text Mode

For generating text based on a prompt or a set of instructions.

CLI

python cli.py --text --prompt "Write a story about a magic backpack."

Wrapper

from gemini import Text

Text().run(prompt="Write a story about a magic backpack.")

🔧 Special Commands

During interaction with the toolkit, you can use the following special commands:

  • /exit or /quit: End the conversation and exit the program.
  • /clear: Clear the conversation history (useful for saving API credits).
  • /upload: Upload a file for multimodal processing.
    • Usage: /upload file_path_and_or_url [optional prompt]
    • Example: /upload file1.jpg https://example.com/file2.pdf Analyze the files and provide a summary of each

⚙️ Advanced Configuration

Description CLI Flags CLI Usage Wrapper Usage
Chat mode -c, --chat --chat See mode usage above.
Text mode -t, --text --text See mode usage above.
Multimodal mode -m, --multimodal --multimodal See mode usage above.
User prompt -p, --prompt --prompt "Your prompt here" prompt="Your prompt here"
File inputs -f, --files --files file1.jpg https://example.com/file2.pdf files=["file1.jpg", "https://example.com/file2.pdf"]
Enable streaming -s, --stream --stream stream=True
Enable JSON output -js, --json --json json=True
API Key -ak, --api_key --api_key "your_api_key" api_key="your_api_key"
Model name -md, --model --model "gemini-2.0-flash-exp" model="gemini-2.0-flash-exp"
System prompt -sp, --system_prompt --system_prompt "Set custom instructions" system_prompt="Set custom instructions"
Max tokens -mt, --max_tokens --max_tokens 1024 max_tokens=1024
Temperature -tm, --temperature --temperature 0.7 temperature=0.7
Top-p -tp, --top_p --top_p 0.9 top_p=0.9
Top-k -tk, --top_k --top_k 40 top_k=40
Candidate count -cc, --candidate_count --candidate_count 1 candidate_count=1
Stop sequences -ss, --stop_sequences --stop_sequences ["\n", "."] stop_sequences=["\n", "."]
Safety categories -sc, --safety_categories --safety_categories ["HARM_CATEGORY_HARASSMENT"] safety_categories=["HARM_CATEGORY_HARASSMENT"]
Safety thresholds -st, --safety_thresholds --safety_thresholds ["BLOCK_NONE"] safety_thresholds=["BLOCK_NONE"]

📊 Supported Models

Base Models

Model Inputs Context Length
gemini-2.0-flash-exp Text, images, audio, video 8192
gemini-1.5-flash Text, images, audio, video 8192
gemini-1.5-flash-8b Text, images, audio, video 8192
gemini-1.5-pro Text, images, audio, video 8192
gemini-1.0-pro ( Set to be deprecated on 2/15/2025 ) Text 2048

Experimental Models

Model Inputs Context Length
gemini-exp-1114 Text, images, audio, video 8192
gemini-1.5-pro-exp-0827 Text, images, audio, video 8192
gemini-1.5-flash-8b-exp-0924 Text, images, audio, video 8192

Note

The availability of specific models may be subject to change. Always refer to Google's official documentation for the most up-to-date information on model availability and capabilities. See base models docs here and experimental model docs here.

🔒 Error Handling and Safety

The Gemini AI Toolkit now includes robust error handling to help you diagnose and resolve issues quickly. Here are some common error codes and their solutions:

HTTP Code Status Description Solution
400 INVALID_ARGUMENT Malformed request body Check API reference for correct format and supported versions
400 FAILED_PRECONDITION API not available in your country Enable billing on your project in Google AI Studio
403 PERMISSION_DENIED API key lacks permissions Verify API key and access rights
404 NOT_FOUND Resource not found Check if all parameters are valid for your API version
429 RESOURCE_EXHAUSTED Rate limit exceeded Ensure you're within model rate limits or request a quota increase
500 INTERNAL Unexpected error on Google's side Retry after a short wait; report persistent issues
503 UNAVAILABLE Service temporarily overloaded/down Retry after a short wait; report persistent issues

For rate limit errors (429), the toolkit will automatically pause for 15 seconds before retrying the request.

📁 Supported File Types

The Gemini AI Toolkit supports a wide range of file types for multimodal processing. Here are the supported file extensions:

Category File Extensions
Images jpg, jpeg, png, webp, gif, heic, heif
Videos mp4, mpeg, mpg, mov, avi, flv, webm, wmv, 3gp
Audio wav, mp3, aiff, aac, ogg, flac
Text/Documents txt, html, css, js, ts, csv, md, py, json, xml, rtf, pdf

Note

Google's Files API lets you store up to 20 GB of files per project, with a per-file maximum size of 2 GB. Files are stored for 48 hours.

💾 Caching and Cleanup

The Gemini AI Toolkit implements a caching mechanism for downloaded files to improve performance and reduce unnecessary network requests. Here's how it works:

  1. When a file is downloaded from a URL, it's stored in a temporary cache folder (.gemini_ai_toolkit_cache).
  2. The file will be used to process the request and will be stored locally due to Google's upload requirements.
  3. The cache is automatically cleaned up at the end of each session to prevent accumulation of temporary files.

You don't need to manage this cache manually, but it's good to be aware of its existence, especially if you're processing large files or have limited storage space.

🤝 Contributing

Contributions are welcome!

Please refer to CONTRIBUTING.md for detailed guidelines on how to contribute to this project.

🐛 Issues and Support

Encountered a bug? We'd love to hear about it. Please follow these steps to report any issues:

  1. Check if the issue has already been reported.
  2. Use the Bug Report template to create a detailed report.
  3. Submit the report here.

Your report will help us make the project better for everyone.

💡 Feature Requests

Got an idea for a new feature? Feel free to suggest it. Here's how:

  1. Check if the feature has already been suggested or implemented.
  2. Use the Feature Request template to create a detailed request.
  3. Submit the request here.

Your suggestions for improvements are always welcome.

🔁 Versioning and Changelog

Stay up-to-date with the latest changes and improvements in each version:

  • CHANGELOG.md provides detailed descriptions of each release.

🔐 Security

Your security is important to us. If you discover a security vulnerability, please follow our responsible disclosure guidelines found in SECURITY.md. Please refrain from disclosing any vulnerabilities publicly until said vulnerability has been reported and addressed.

📄 License

Licensed under the MIT License. See LICENSE for details.