GPT-5 API Testing Dashboard - Python & Streamlit

A comprehensive Streamlit-based testing dashboard for OpenAI's GPT-5 API, providing systematic evaluation of all model variants, advanced parameter testing, and detailed performance analytics.

Read the comprehensive blog post: GPT-5 API Testing: Building GPT-5 API Testing Dashboard with Streamlit.

🚀 Features

Core Testing Capabilities

Multi-Model Support: Test all GPT-5 variants (gpt-5, gpt-5-mini, gpt-5-nano, gpt-5-chat-latest)
Advanced Parameter Testing: Reasoning effort, verbosity controls, and custom tools
Real-Time Progress Tracking: Live updates during API calls with detailed status information
Comprehensive Error Handling: Robust response parsing with fallback strategies

Testing Categories

Basic Tests: Connectivity validation and model comparison
New Features: Reasoning effort and verbosity parameter exploration
Capabilities: Coding tests, instruction following, and factual accuracy
Performance: Speed comparison and token efficiency analysis

Analytics & Visualization

Interactive Dashboard: Real-time performance metrics and trend analysis
Advanced Charts: Plotly-powered visualizations for response time, token usage, and efficiency
Export Options: JSON export for test results
Session Management: Persistent result tracking across testing sessions

📦 Installation

Prerequisites

Python 3.8 or higher
OpenAI API key with GPT-5 access
4GB+ available RAM for optimal performance

Quick Start

Save the code Save the provided gpt-5-testing.py file to your local directory

Create a virtual environment

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install streamlit openai pandas plotly python-dotenv requests

Set up environment variables (optional)

export OPENAI_API_KEY=your_openai_api_key_here

Run the application
```
streamlit run gpt-5-testing.py
```

The dashboard will open in your browser at http://localhost:8501

🔧 Configuration

API Key Setup

You can provide your OpenAI API key in two ways:

Environment variable: Set OPENAI_API_KEY in your environment
Direct input: Enter your key in the sidebar when running the app

Optional Environment Variables

export OPENAI_API_KEY=your_openai_api_key_here
export LOG_LEVEL=INFO
export MAX_CONCURRENT_TESTS=3
export RESULTS_RETENTION_DAYS=7

📊 Usage Guide

Available Test Categories

Basic Tests

Basic Connectivity: Validates API access across all GPT-5 models
Model Comparison: Side-by-side performance analysis

New Features

Reasoning Effort: Tests minimal, low, medium, and high reasoning levels
Verbosity Controls: Compares low, medium, and high verbosity outputs
Custom Tools: (Framework provided for future implementation)
Allowed Tools: (Framework provided for future implementation)

Capabilities

Coding Tests: Frontend generation, bug fixing, SQL generation, algorithm implementation
Instruction Following: (Framework provided)
Factual Accuracy: (Framework provided)

Performance

Speed Comparison: (Framework provided)
Token Efficiency: (Framework provided)

Custom Testing

The custom test tab allows you to:

Select any GPT-5 model variant
Configure reasoning effort (minimal, low, medium, high)
Set verbosity levels (low, medium, high)
Specify custom test names
Set maximum output tokens
Enter custom prompts

Results Dashboard

View comprehensive analytics including:

Total tests run and success rates
Average response times and token usage
Interactive charts showing performance trends
Detailed results table with filtering options
JSON export functionality

🏗️ Code Structure

Main Components

gpt-5-testing.py                 # Complete Streamlit application (single file)

Key Classes and Functions

GPT5StreamlitTester: Core testing framework class
TestResult: Dataclass for storing test outcomes
make_responses_api_call(): Handles GPT-5 Responses API integration
extract_response_content(): Multi-layered response parsing
display_result(): Formats and displays test results

📈 Performance Metrics Tracked

Metric	Description
Response Time	API call latency in seconds
Token Usage	Input/output token consumption
Success Rate	Percentage of successful API calls
Word Count	Length of generated responses
Test Parameters	All API parameters used for each test

🔒 Security Features

Password-type input for API keys (not stored permanently)
Session-based API key management
No persistent storage of sensitive data
Built-in error handling for API failures

🐛 Troubleshooting

Common Issues

"Required packages not installed" Error

pip install openai requests python-dotenv streamlit plotly pandas

"Failed to create OpenAI client" Error

Verify your OpenAI API key is correct
Check that you have GPT-5 access enabled
Ensure your account has sufficient credits

"Response received but content extraction failed" Warning

This indicates a response parsing issue
The raw response will be displayed for debugging
Try different model variants or simpler prompts

Empty Results or Slow Performance

Check your internet connection
Verify API rate limits aren't exceeded
Clear test results using the sidebar button

💡 Usage Tips

Start with Basic Connectivity to verify your setup
Use Model Comparison to understand performance differences
Test Reasoning Effort with complex problems to see the impact
Experiment with Verbosity for different response lengths
Export Results regularly to track performance over time
Clear Results periodically to maintain performance

📝 Example Test Scenarios

Reasoning Effort Test

Default prompt: "You have 12 balls, one of which is either heavier or lighter than the others. Using a balance scale exactly 3 times, how can you identify the odd ball and determine if it's heavier or lighter?"

Verbosity Test

Default prompt: "Explain how HTTPS encryption works and why it's important for web security"

Coding Tests

Frontend Generation: React component with TypeScript and Tailwind CSS
Bug Fixing: Python function debugging
SQL Generation: Complex database queries
Algorithm Implementation: Dynamic programming solutions

🔄 Session Management

Test results persist during your browser session
Use "Clear All Results" in the sidebar to reset
Session data is not saved between browser sessions
Export important results before closing the application

📄 Notes

This is a single-file Streamlit application
All functionality is contained in gpt-5-testing.py
No external configuration files are required
API keys are handled securely without persistent storage

Single-file GPT-5 testing solution for comprehensive API evaluation and performance analysis.

walterpinem/gpt-5-api-testing-streamlit