awslabs/data-on-eks
DoEKS is a tool to build, deploy and scale Data & ML Platforms on Amazon EKS
HCLApache-2.0
Issues
- 5
spark-k8s-operator requires the awscli, which doesn't work on terraform enterprise
#513 opened by dacort - 3
Bump EFA Plugin to v0.5.0
#503 opened by lindarr915 - 0
- 0
- 2
Incorrect command to provide Linux permission on the AWS Trainium on EKS Blueprint
#533 opened by AbrahamArellano - 1
- 0
- 2
- 1
- 2
Failing to schedule pod with default configuration
#529 opened by JM322 - 0
JARK Stack - Error while launching training step in the dogbooth Jupyter notebook
#537 opened by rivasdam - 0
Make it possible to disable kuberay-operator
#534 opened by askulkarni2 - 3
- 1
- 1
- 0
[Website] Add Scalability Best Practices & Considerations for DoEKS Workloads
#532 opened by brianhammons - 1
Data on EKS does not support Kubeflow Platform
#483 opened by ajayvohra2005 - 1
- 1
Appropriate attribution?
#475 opened by asmacdo - 3
- 3
VPC Endpoints not working for multiple blueprints
#442 opened by bbgu1 - 0
[Inference]: Llama3 on Inf2 with Trainium-inferentia blueprint wtih RayServe
#517 opened by askulkarni2 - 0
Re-introduce plan-examples.yml with a proper fix
#525 opened by askulkarni2 - 1
Chore: Kubernetes cluster version upgrades
#520 opened by raykrueger - 3
Doc: Update DoEKS website for Ray content
#445 opened by askulkarni2 - 0
feat: Observability Tooling
#519 opened by omrishiv - 0
- 0
- 2
- 5
Add support for AWS Batch
#456 opened by delagoya - 0
[Inference]: Mistral7B on Inf2 with Trainium-inferentia blueprint wtih RayServe
#498 opened by vara-bonthu - 1
Spark Operator example with S3 express
#455 opened by vara-bonthu - 2
Add JARK Stack documentation into DoEKS Website
#468 opened by lusoal - 0
- 0
Move Trainium on EKS from under Blueprints to Gen AI -> Training -> BERT-Large on Trainium section
#488 opened by sheetaljoshi - 0
deploy gradio app for llama2 on inf2/ray to k8s
#495 opened by harishvs - 0
Add temprature, topk, topk and other input params to UI for llama2 gradio application on inf2/ray cluster
#493 opened by harishvs - 0
- 1
Test horizontal scaling of Ray Worker Pods
#449 opened by ratnopamc - 2
Enable JARK workshop to run on workshop studio
#443 opened by askulkarni2 - 1
Inf2 Worker Node Groups has multiple taints
#478 opened by lindarr915 - 2
Kueue with Ray
#444 opened by askulkarni2 - 6
- 4
failed calling webhook "mservice.elbv2.k8s.aws"
#458 opened by mayurbhagia - 0
Add model caching for Stable Diffusion
#448 opened by ratnopamc - 0
- 2
GradioUI App as a container deployment for trainium-inferentia blueprint examples
#454 opened by vara-bonthu - 0
- 0
- 0