The Gemini AI Toolkit provides a comprehensive (lightweight) API wrapper and command-line interface for interacting with Google's Gemini Pro 1.0 and the upcoming Gemini Pro 1.5 & Ultra large language models. It simplifies complex API calls into accessible commands, facilitating tasks like multi-turn chat (chatbot), text generation, and image captioning & analysis functionality. This toolkit is ideal for everyday users who prefer the terminal, or users, developers, and researchers looking to integrate advanced AI capabilities into their projects without the need to understand the intricacies of direct API communication.
- Chat Functionality: Engage in interactive conversations with Gemini's advanced conversational models.
- Image Captioning: Analyze images to generate descriptive captions or insights.
- Text Generation: Produce creative and contextually relevant text based on prompts.
- Command-Line Interface (CLI): Access the full suite of functionalities directly from the command line.
- Python Wrapper: Simplify interaction with Google's Gemini models in only 2 lines of code.
- Streamed Responses: Receive responses as they are generated for real-time interaction.
- Safety Settings Integration: Tailor safety filters to prevent the generation of inappropriate or unsafe content.
- Flexible Configuration: Customize the token limits, safety thresholds, stop sequences, temperature and more.
- Minimal Dependencies: Built to be efficient and lightweight, requiring only the
requests
package for operation.
Python 3.x
- An API key from Google AI Studio
The following Python packages are required:
requests
: For making HTTP requests to Google's Gemini API.
The following Python packages are optional:
python-dotenv
: For managing API keys and other environment variables.
To use the Gemini AI Toolkit, clone the repository to your local machine and install the required Python packages.
Clone the repository:
git clone https://github.com/RMNCLDYO/gemini-ai-toolkit.git
Navigate to the repositories folder:
cd gemini-ai-toolkit
Install the required dependencies:
pip install -r requirements.txt
- Obtain an API key from Google AI Studio.
- Create or rename the .env file in the project's root directory and add your API key:
GEMINI_API_KEY=your_api_key
The Gemini AI Toolkit can be used in three different modes: Chat
, Text
, and Vision
. Each mode is designed for specific types of interactions with the Gemini models.
Chat mode is intended for chatting with an AI model (similar to a chatbot) or building conversational applications. It supports multi-turn dialogues with the model.
CLI
python cli.py --chat
Wrapper
from gemini import Chat
Chat().run()
An executable version of this example can be found here. (You must move this file to the root folder before running the program.)
Text mode is suitable for generating text content based on a provided prompt.
CLI
python cli.py --text --prompt "Write a story about a magic backpack."
Wrapper
from gemini import Text
Text().run(prompt="Write a story about a magic backpack.")
An executable version of this example can be found here. (You must move this file to the root folder before running the program.)
Vision mode allows for generating text based on a combination of text prompts and images.
CLI
python cli.py --vision --prompt "Describe this image." --image "image_path_or_url"
Wrapper
from gemini import Vision
Vision().run(prompt="Describe this image.", image="image_path_or_url")
An executable version of this example can be found here. (You must move this file to the root folder before running the program.)
Enable streaming mode to receive responses as they are generated without waiting for the full response.
CLI
python cli.py --chat --stream
Wrapper
from gemini import Chat
Chat().run(stream=True)
Description | CLI Flag(s) | CLI Usage | Wrapper Usage |
---|---|---|---|
Enable chat mode | -c , --chat |
--chat | See mode usage above. |
Enable text mode | -t , --text |
--text | See mode usage above. |
Enable vision mode | -v , --vision |
--vision | See mode usage above. |
User prompt | -p , --prompt |
--prompt "Write a story about a magic backpack." | prompt="Write a story about a magic backpack." |
Image file path or url | -i , --image |
--image "image_path_or_url" | prompt="Describe this image.", image="image_path_or_url" |
API key for authentication | -a , --api_key |
--api_key "your_api_key" | api_key="your_api_key" |
Model to use | -m , --model |
--model "gemini-1.0-pro-latest" | model="gemini-1.0-pro-latest" |
Enable streaming mode | -s , --stream |
--stream | stream=True |
Maximum tokens to generate | -mt , --max_tokens |
--max_tokens 1024 | max_tokens=1024 |
Sampling temperature | -tm , --temperature |
--temperature 0.7 | temperature=0.7 |
Nucleus sampling threshold | -tp , --top_p |
--top_p 0.9 | top_p=0.9 |
Top-k sampling threshold | -tk , --top_k |
--top_k 40 | top_k=40 |
Number of candidates to generate | -cc , --candidate_count |
--candidate_count 1 | candidate_count=1 |
Stop sequences for completion | -ss , --stop_sequences |
--stop_sequences ["\n", "."] | stop_sequences=["\n", "."] |
Safety categories for filtering | -sc , --safety_categories |
--safety_categories ["HARM_CATEGORY_HARASSMENT"] | safety_categories=["HARM_CATEGORY_HARASSMENT"] |
Safety thresholds for filtering | -st , --safety_thresholds |
--safety_thresholds ["BLOCK_NONE"] | safety_thresholds=["BLOCK_NONE"] |
- Supported MIME types:
PNG
,JPEG
,WEBP
,HEIC
,HEIF
. - Maximum 4MB of data (including images and text).
- Images larger than 3072 x 3072 pixels are scaled down while preserving aspect ratio.
Contributions are welcome!
Please refer to CONTRIBUTING.md for detailed guidelines on how to contribute to this project.
Encountered a bug? We'd love to hear about it. Please follow these steps to report any issues:
- Check if the issue has already been reported.
- Use the Bug Report template to create a detailed report.
- Submit the report here.
Your report will help us make the project better for everyone.
Got an idea for a new feature? Feel free to suggest it. Here's how:
- Check if the feature has already been suggested or implemented.
- Use the Feature Request template to create a detailed request.
- Submit the request here.
Your suggestions for improvements are always welcome.
Stay up-to-date with the latest changes and improvements in each version:
- CHANGELOG.md provides detailed descriptions of each release.
Your security is important to us. If you discover a security vulnerability, please follow our responsible disclosure guidelines found in SECURITY.md. Please refrain from disclosing any vulnerabilities publicly until said vulnerability has been reported and addressed.
Licensed under the MIT License. See LICENSE for details.