An OpenAI-compatible API server implementation for Google's Gemma 3 language model with function calling capabilities, optimized for Apple Silicon.
- OpenAI-compatible API endpoints
- Function calling support with robust validation
- Python script execution with sandbox support
- System information retrieval with real-time monitoring
- Streaming responses for long-running operations
- Metal GPU acceleration for Apple Silicon
- Comprehensive logging and error tracking
- Memory-efficient conversation handling
- Built-in benchmarking suite
- Extensive test coverage
- Advanced image analysis tools for Pokemon cards
- Smart image cropping with focus area detection
- Automated card part annotation with bounding boxes and labels
# Create and activate virtual environment using uv
uv venv
source .venv/bin/activate
# Install dependencies
uv pip install -r requirements.txt
# Download model (if not already present)
mkdir -p gemma-3-4b-it-q4_0
# Download gemma-3-4b-it-q4_0.gguf into the above directorygemma-3-python/
├── gemma_proxy.py # Main server implementation
├── example_tools.py # General purpose tool implementations
├── tool_parser.py # Tool parsing utilities
├── utils/
│ ├── monitoring.py # Performance monitoring
│ └── schema_validator.py # JSON schema validation
│
├── # Pokemon Card Analysis Modules
├── pokemon_card_utils.py # Shared utilities for Pokemon card tools
├── pokemon_card_analyzer.py # Card analysis functionality
├── pokemon_card_annotator.py # Card annotation tools
├── pokemon_card_extractor.py # Graded card extraction tools
│
├── # Test Scripts
├── test_graded_card_extraction.py # Test for graded card extraction
├── test_annotate_pokemon.py # Test for card annotation
│
├── docs/ # Documentation and example images
└── samples/ # Sample data for testing
│ └── system_prompt.md # System prompt template
├── scripts/ # Directory for Python scripts
│ └── sample_script.py # Example script
├── tests/ # Test directory
│ ├── __init__.py
│ ├── test_api.py # API endpoint tests
│ ├── test_functions.py # Function calling tests
│ ├── test_model.py # Model behavior tests
│ ├── test_tool_executor.py # Tool execution tests
│ ├── test_tool_framework.py # Tool framework tests
│ └── test_tool_parser.py # Tool parsing tests
├── examples/ # Example implementations
│ ├── sample_script.py
│ └── test.sh
├── docs/ # Documentation
│ ├── api.md # API documentation
│ ├── functions.md # Function calling guide
│ └── development.md # Development guide
├── test_analyze_pokemon.py # Test script for Pokemon card analysis
├── test_annotate_pokemon.py # Test script for Pokemon card annotation
├── test_pokemon_smart_crop.py # Test script for smart cropping Pokemon cards
├── output/ # Directory for test output (timestamped subdirectories)
│ └── ANNOTATED_CARDS--* # Timestamped directories with annotation results
└── requirements.txt # Project dependencies
The server can be configured through environment variables or a config file:
GEMMA_MODEL_PATH: Path to model file (default: gemma-3-4b-it-q4_0/gemma-3-4b-it-q4_0.gguf)GEMMA_SYSTEM_PROMPT: Path to system prompt (default: prompts/system_prompt.md)GEMMA_CONTEXT_SIZE: Model context size (default: 8192)GEMMA_NUM_THREADS: Number of inference threads (default: 4)GEMMA_API_HOST: API host address (default: 127.0.0.1)GEMMA_API_PORT: API port (default: 1337)
Start the server:
uv run python gemma3.pyThe server will be available at http://127.0.0.1:1337/v1
- List Models
curl http://127.0.0.1:1337/v1/models- Chat Completion
curl http://127.0.0.1:1337/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gemma-3-4b-it",
"messages": [
{"role": "user", "content": "What is the current CPU usage?"}
],
"functions": [
{
"name": "get_system_info",
"description": "Get system information",
"parameters": {
"type": "object",
"properties": {
"info_type": {
"type": "string",
"enum": ["cpu", "memory", "disk", "all"]
}
},
"required": ["info_type"]
}
}
]
}'The server supports an OpenAI-compatible function calling implementation with these built-in functions:
-
execute_python- Execute Python scripts in a controlled environment- Parameters:
script_name: Name of the script to executearguments: Command line arguments (optional)venv_path: Virtual environment path (optional)
- Parameters:
-
get_system_info- Retrieve system information- Parameters:
info_type: Type of information to retrieve ("cpu", "memory", "disk", or "all")
- Parameters:
Custom functions can be added by implementing a function handler:
from utils.function_handler import register_function
@register_function
def custom_function(param1: str, param2: int) -> dict:
"""
Function documentation
"""
# Implementation
return {"result": "value"}The project includes several image processing tools as examples:
Analyze an image and return descriptions of its contents.
curl http://localhost:1338/v1/tools/analyze_image \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://example.com/image.jpg",
"analyze_objects": true,
"analyze_text": false
}'Apply various filters and transformations to an image.
curl http://localhost:1338/v1/tools/apply_image_filter \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://example.com/image.jpg",
"filter_type": "grayscale"
}'The project includes specialized tools for Pokemon card analysis, organized in modular files:
Analyze a Pokemon card image and extract information about its contents.
curl http://localhost:1338/v1/tools/analyze_pokemon_card \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://example.com/pokemon_card.jpg",
"analyze_objects": true,
"analyze_text": false
}'Intelligently crops an image to specified dimensions while focusing on the most important area.
curl http://localhost:1338/v1/tools/smart_crop_image \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://example.com/image.jpg",
"target_width": 300,
"target_height": 400,
"focus_area": "center"
}'Identify and annotate parts of a Pokemon card with bounding boxes and labels.
curl http://localhost:1338/v1/tools/annotate_pokemon_card \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://example.com/pokemon_card.jpg",
"label_color": "red",
"box_type": "rectangle"
}'Supported options:
- label_color: red, green, blue, yellow, white, black
- box_type: rectangle, circle
Extracts a Pokemon card and grade label from a graded card case (PSA, BGS, CGC, etc.).
curl http://localhost:1338/v1/tools/extract_graded_card \
-H "Content-Type: application/json" \
-d '{
"image_url": "https://example.com/graded_pokemon_card.jpg"
}'This tool performs the following operations:
- Detects if a Pokemon card is in a professional grading case
- Extracts just the card itself by cropping away the plastic case
- Separately extracts the grade label showing the numerical grade
- Identifies the grading company (PSA, BGS, CGC) when possible
The following examples demonstrate how to use the Pokemon card analysis tools in practice. The examples use the test scripts included in the repository.
This example shows a complete workflow for analyzing Pokemon cards:
- Run the test script:
source .venv/bin/activate
python test_annotate_pokemon.py- Results Overview:
The script processes both front and back Pokemon card images and generates annotated versions with different styles. Results are saved to a timestamped directory (e.g., output/ANNOTATED_CARDS--2025-05-15-11-51-37/).
docs/pokenmon-card-front.webp
# Python code example for front card annotation
async def annotate_front_card():
image_path = "docs/pokenmon-card-front.webp"
image_data_uri = image_to_data_uri(image_path)
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:1338/v1/tools/annotate_pokemon_card",
json={
"image_url": image_data_uri,
"label_color": "red",
"box_type": "rectangle"
}
)
# The response contains the annotated image as a data URI
annotated_image_uri = response.text.strip('"')
# Save or display the annotated image
output_path = "output/annotated_front_card.png"
save_data_uri_to_file(annotated_image_uri, output_path)docs/pokenmon-card-back.webp.webp
# Python code example for back card annotation
async def annotate_back_card():
image_path = "docs/pokenmon-card-back.webp.webp"
image_data_uri = image_to_data_uri(image_path)
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:1338/v1/tools/annotate_pokemon_card",
json={
"image_url": image_data_uri,
"label_color": "green",
"box_type": "rectangle"
}
)
# Process response...# Python code example for smart cropping
async def smart_crop_pokemon_card():
image_path = "docs/pokenmon-card-front.webp"
image_data_uri = image_to_data_uri(image_path)
# Crop to focus on just the Pokemon image in the center
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:1338/v1/tools/smart_crop_image",
json={
"image_url": image_data_uri,
"target_width": 300,
"target_height": 300,
"focus_area": "center"
}
)
# The response contains the cropped image as a data URI
cropped_image_uri = response.text.strip('"')
# Save or display the cropped image
output_path = "output/cropped_pokemon.png"
save_data_uri_to_file(cropped_image_uri, output_path)This example shows how to combine analysis, cropping, and annotation in a single workflow:
async def complete_pokemon_card_analysis(image_path):
"""Perform complete analysis of a Pokemon card"""
image_data_uri = image_to_data_uri(image_path)
results = {}
# Step 1: Analyze the image
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:1338/v1/tools/analyze_image",
json={
"image_url": image_data_uri,
"analyze_objects": True
}
)
results["analysis"] = response.json()
# Step 2: Smart crop the image
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:1338/v1/tools/smart_crop_image",
json={
"image_url": image_data_uri,
"target_width": 400,
"target_height": 400,
"focus_area": "center"
}
)
cropped_image_uri = response.text.strip('"')
results["cropped_image"] = cropped_image_uri
# Step 3: Annotate the card
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:1338/v1/tools/annotate_pokemon_card",
json={
"image_url": image_data_uri,
"label_color": "red",
"box_type": "rectangle"
}
)
annotated_image_uri = response.text.strip('"')
results["annotated_image"] = annotated_image_uri
return resultsThe following example demonstrates how to process graded Pokemon cards to extract the card from its case and the grade label for analysis:
async def process_graded_card(image_path):
"""Extract a card and grade label from a graded Pokemon card case"""
image_data_uri = image_to_data_uri(image_path)
async with httpx.AsyncClient() as client:
response = await client.post(
"http://localhost:1338/v1/tools/extract_graded_card",
json={
"image_url": image_data_uri
}
)
result = json.loads(response.text)
if result["is_graded_card"]:
print(f"Detected graded card from {result['grade_type']}")
# Save the extracted card
card_path = f"output/extracted_card.png"
save_data_uri_to_file(result["card_image"], card_path)
# Save the extracted grade label
if result["grade_label_image"]:
label_path = f"output/grade_label.png"
save_data_uri_to_file(result["grade_label_image"], label_path)
# In a production application, you could now run OCR on the grade label
# to extract the numerical grade
else:
print("This doesn't appear to be a graded card")Running the test script creates output in the following format:
output/
├── GRADED_CARDS--2025-05-15-12-12-55/
│ ├── pokemon-card_visualization.png # Visualization of detection results
│ ├── pokemon-card_card.png # Extracted card without case
│ ├── pokemon-card_grade_label.png # Extracted grade label
│ ├── summary.json # Metadata about processed images
│ └── test_script.py # Copy of the test script used
The test scripts organize outputs into timestamped directories for easy reference:
output/
├── ANNOTATED_CARDS--2025-05-15-11-51-37/
│ ├── pokenmon-card-front_annotated_red_rectangle.png
│ ├── pokenmon-card-front_annotated_green_rectangle.png
│ ├── pokenmon-card-front_annotated_blue_circle.png
│ ├── pokenmon-card-front_annotated_yellow_circle.png
│ ├── pokenmon-card-back.webp_annotated_red_rectangle.png
│ ├── pokenmon-card-back.webp_annotated_green_rectangle.png
│ ├── pokenmon-card-back.webp_annotated_blue_circle.png
│ ├── pokenmon-card-back.webp_annotated_yellow_circle.png
│ ├── summary.json
│ └── test_script.py
Each run also creates a summary.json file with metadata about all processed images.
Run the benchmark suite:
python -m benchmarks.benchmark_runnerThis will test:
- Model loading time
- Inference latency
- Memory usage
- Function calling overhead
- Response streaming performance
Run the test suite:
pytest tests/See Development Guide for:
- Code style guide
- Pull request process
- Testing guidelines
- Documentation requirements
MIT License - See LICENSE file for details
- Fork the repository
- Create your feature branch
- Make your changes
- Run tests and benchmarks
- Submit a pull request
See Contributing Guide for details.


