awslabs/data-on-eks
DoEKS is a tool to build, deploy and scale Data & ML Platforms on Amazon EKS
HCLApache-2.0
Pinned issues
Issues
- 2
Observability for rayserve-vllm pattern using Prom and Grafana dashboards.
#586 opened by shivam-dubey-1 - 2
- 0
Mistral 7B with vLLM, Ray Serve on Trn/Inf
#650 opened by askulkarni2 - 1
Openmeta data on EKS
#610 opened by uniyati - 0
Cloudwatch metrics for jark stack
#645 opened by shivam-dubey-1 - 2
RAG pattern with LangChain or LlamaIndex
#567 opened by vara-bonthu - 0
LLM Gateway Using LiteLLM
#632 opened by vara-bonthu - 3
Create blueprint using S3 mountpoint for external jar files to run Spark workloads
#598 opened by ratnopamc - 0
YOLO with NVIDIA Triton Inference Server
#639 opened by freschri - 2
Using S3 MountPoint CSI driver and S3 Express one zone to expedite ML workloads
#601 opened by meetreks - 0
Support localization for the DoEKS website
#634 opened by hitsub2 - 6
- 0
- 0
[Feature] Include the Apache DolphinScheduler on Amazon EKS solution into unified repository maintenance
#621 opened by SEZ9 - 3
- 2
Deploy Starrocks on EKS via Terraform
#585 opened by seanmeng2022 - 4
- 3
[jupyterhub] Jupyternoebook is not launching
#566 opened by purnasanyal - 2
feat: Inference using vLLM with RayServe on Inf2
#591 opened by ratnopamc - 3
Aerospike Blueprint
#597 opened by chrismld - 2
- 0
Binpacking support for DoEKS
#614 opened by hitsub2 - 1
[ALL] Migrate all blueprints to use Karpenter v1
#611 opened by askulkarni2 - 1
- 2
- 3
Error: failed to create containerd task: failed to create shim task: OCI runtime create failed
#557 opened by pythonking6 - 3
- 0
StableDiffusion w/ RayServe on inf2 broken
#604 opened by askulkarni2 - 0
Add CloudFlare Analytics to website
#593 opened by linesa-dot - 1
- 4
JARK Stack - Error while launching training step in the dogbooth Jupyter notebook
#537 opened by rivasdam - 3
Support S3 gateway endpoint for EMR on EKS
#553 opened by hitsub2 - 2
- 1
- 3
feat: Observability Tooling
#519 opened by omrishiv - 2
Jupyterhub with OIDC Example
#576 opened by omrishiv - 0
Implement custom metric based scaling of GPUs for NVIDIA Triton vLLM pattern
#569 opened by ratnopamc - 2
- 2
NVIDIA NIM LLM Hosting Pattern
#560 opened by hustshawn - 1
Elyra feature implementation to Jhub environment
#572 opened by halilmr - 2
- 0
vLLM with RayServe pattern
#547 opened by shivam-dubey-1 - 3
Make it possible to disable kuberay-operator
#534 opened by askulkarni2 - 2
[Website] Add Scalability Best Practices & Considerations for DoEKS Workloads
#532 opened by brianhammons - 5
Failing to schedule pod with default configuration
#529 opened by JM322 - 3
Chore: Kubernetes cluster version upgrades
#520 opened by raykrueger - 0
Ray Logging and Dashboard Metrics Export to S3 with Custom Dashboard for Historical Clusters
#552 opened by vara-bonthu - 0
Ray Observability with Prometheus and AMP
#551 opened by vara-bonthu - 2
Incorrect command to provide Linux permission on the AWS Trainium on EKS Blueprint
#533 opened by AbrahamArellano - 0
Re-introduce plan-examples.yml with a proper fix
#525 opened by askulkarni2