Created for GKE Turns 10 Hackathon #GKETurns10 #GKEHackathon
The GKE Intelligent Monitoring System is an advanced monitoring and management solution that combines the power of Google Kubernetes Engine (GKE) with AI-driven insights using Google's AI Development Kit (ADK) and Model Context Protocol (MCP). This system provides intelligent monitoring, automated troubleshooting, and proactive management capabilities for GKE clusters.
-
Intelligent Cluster Monitoring
- Real-time monitoring of GKE cluster health
- Pod lifecycle management and status tracking
- Service and deployment monitoring
- Resource utilization insights
-
AI-Powered Troubleshooting
- Automated problem detection and diagnosis
- Intelligent remediation suggestions
- Predictive analytics for potential issues
- Network connectivity analysis
-
Advanced Management Capabilities
- Dynamic scaling of deployments
- YAML manifest management
- Pod execution and log analysis
- Security context management
-
Integration Capabilities
- Seamless integration with GKE clusters
- Support for Prometheus metrics
- Cloud Monitoring integration
- Custom tooling support through MCP
-
Core tools
- Discoverable tool catalog via
GET /toolson the MCP server. - Tool proxy endpoints
POST /tool/<name>accepting JSON{ "args": [...], "kwargs": {...} }and returning{ "ok": true, "result": ... }or{ "ok": false, "error": "..." }.
- Discoverable tool catalog via
-
Core tools included:
-
get_cluster_info- Get basic cluster information, node status, and health -
list_pods- List pods with status, resource usage, and readiness -
get_pod_logs- Retrieve pod logs for troubleshooting -
describe_pod- Get detailed pod information and events -
get_service_status- Check service endpoints and networking -
get_deployment_status- Monitor deployment health and replica status -
delete_resource- Delete a Kubernetes resource (deployment, service, pod, etc.). -
suggest_troubleshooting- AI-powered troubleshooting recommendations -
automate remediation- Image pull remediation analysis for pod -
get_gke_cluster_metrics- GKE-specific performance metrics -
scale deployment- Scale a deployment to a specific number of replicas -
exec pod command- Execute a command inside a pod container. -
network_connectivity_test- Test network connectivity and DNS resolution
- Google Kubernetes Engine (GKE)
- Model Context Protocol (MCP) server via
mcp.server.fastmcp - Google ADK (
google.adk.agents.LlmAgent) for the conversational agent - Python 3.11,
kubernetesPython client,httpx,requests - Docker + Artifact Registry (or GCR) and Cloud Build for CI
kubectlmanifests and RBAC for in-cluster deployment
- Kubernetes API (in-cluster via ServiceAccount or kubeconfig)
- Google Cloud Project metadata when configured (GCP_PROJECT_ID, cluster name/zone)
- Optional metrics-server for resource metrics (node/pod top)
The system consists of three main components:
- ADK Agent - Handles AI-driven interactions and decision-making
- MCP Server - Provides tooling and Kubernetes management capabilities
- GKE Integration - Direct interface with GKE clusters and resources
graph TB
subgraph "User Interaction"
UI[User/Client]
end
subgraph "ADK Layer"
ADK[ADK Agent]
FASTAPI[FastAPI Server]
end
subgraph "MCP Layer"
MCP[MCP Server]
Tools[K8s Tools]
Health[Health Check]
Monitor[Monitoring Tools]
end
subgraph "GKE Cluster"
API[Kubernetes API]
Pods[Pods]
Services[Services]
Deploy[Deployments]
end
subgraph "Google Cloud"
Monitoring[Cloud Monitoring]
Logging[Cloud Logging]
end
UI --> FASTAPI
FASTAPI --> ADK
ADK --> MCP
MCP --> Tools
MCP --> Health
MCP --> Monitor
Tools --> API
Health --> API
Monitor --> API
Monitor --> Monitoring
API --> Pods
API --> Services
API --> Deploy
Pods --> Logging
Services --> Logging
Deploy --> Logging
classDef gcp fill:#4285F4,stroke:#4285F4,color:white;
classDef k8s fill:#326CE5,stroke:#326CE5,color:white;
classDef adk fill:#34A853,stroke:#34A853,color:white;
classDef mcp fill:#EA4335,stroke:#EA4335,color:white;
class Monitoring,Logging gcp;
class API,Pods,Services,Deploy k8s;
class ADK,FASTAPI adk;
class MCP,Tools,Health,Monitor mcp;
- Build images (Cloud Build or docker build/push).
- Apply RBAC and Deployment manifests:
kubectl apply -f k8s-manifests/k8s-mcp-rbac.yamlandk8s-manifests/k8s-mcp-deployment.yaml. - Deploy ADK agent:
kubectl apply -f deployment.yaml. - Monitor logs:
kubectl logs -f deployment/k8s-mcp-serverandkubectl logs -f deployment/adk-agent.
Key environment variables:
GCP_PROJECT_ID- Your Google Cloud project IDGKE_CLUSTER_NAME- Target GKE cluster nameGKE_ZONE- GKE cluster zone/regionMCP_SERVICE_URL- MCP server endpoint
The project uses Google's ADK LlmAgent which internally uses the Gemini 2.0 Flash model through Vertex AI. Important points about the AI configuration:
- Vertex AI Authentication: The LlmAgent requires a Vertex AI API key which is NOT set through environment variables for security reasons. Instead, it should be configured through Google Cloud's secret management system.
The system uses Kubernetes manifests for deployment, including:
- Deployment configurations
- RBAC settings
- Service accounts
- Network policies
During the development of this project for the GKE Turns 10 Hackathon, several key insights were gained:
- GKE's robust API and integration capabilities make it an ideal platform for building intelligent monitoring solutions
- Combining ADK with Kubernetes operations enables sophisticated automation and decision-making
- MCP provides a flexible framework for extending monitoring capabilities
- Real-time monitoring with AI-driven insights can significantly improve cluster management
- Enhanced predictive analytics for resource scaling
- Machine learning models for anomaly detection
- Extended automation capabilities
- Integration with additional Google Cloud services
This project was created and adapted for entry in the GKE Turns 10 Hackathon. Contributions and improvements are welcome. #GKEHackathon
MIT License
