georgian-io-archive/hydra
A cloud-agnostic ML Platform that will enable Data Scientists to run multiple experiments, perform hyper parameter optimization, evaluate results and serve models (batch/realtime) while still maintaining a uniform development UX across cloud environments
HCLApache-2.0
Issues
- 0
Parallelize the job scheduling
#88 opened by albertoa - 0
- 0
Remove hard coded values
#86 opened by albertoa - 0
Cost estimation in hydra
#85 opened by coder46 - 0
Create Hydra library
#84 opened by coder46 - 0
Use smart_open to support YAML config files using cloud object storage
#82 opened by sayonsivakumaran - 0
- 0
Beam GCP Bug fix
#79 opened by coder46 - 0
- 0
Add Database to store metadata of each run
#75 opened by coder46 - 0
Support local/fast_local modes in Windows OS
#74 opened by coder46 - 0
- 0
IAC for AWS Batch Infra
#69 opened by coder46 - 0
- 9
- 0
Fix bug with GPU Training in AWS
#59 opened by coder46 - 0
Create hydra init in Hydra
#67 opened by sayonsivakumaran - 0
- 0
- 0
Setup MLFlow Tracking infra IAC
#40 opened by coder46 - 0
IAC - Add optional modules to allow VPC and subnet creation within Terraform
#64 opened by sayonsivakumaran - 0
- 0
- 0
- 0
Track grid runs as a single unit
#37 opened by coder46 - 0
- 0
IAC - add iac for instrumentation alerting
#58 opened by coder46 - 0
- 0
Allow for training even with uncommitted changes
#54 opened by coder46 - 1
- 1
- 0
Releasing to PyPI using github actions
#22 opened by coder46 - 0
Add AWS Support
#26 opened by coder46 - 0
Alchemy classifier training on Hydra
#39 opened by coder46 - 0
Clone Hydra-ml-projects repo and submit training jobs for GCP, AWS, local and fast_local modes
#44 opened by coder46 - 0
- 0
AWS Batch jobs stuck in RUNNABLE state
#51 opened by coder46 - 0
- 0
Using exception subclasses
#49 opened by MJafarMashhadi - 0
- 2
Persist job artifacts to S3
#33 opened by coder46 - 0
GCP showing not enough compute credits
#23 opened by coder46 - 0
Setting up encryption for training datasets
#25 opened by coder46 - 0
Store job metadata onto a database
#36 opened by coder46 - 0
- 0
Allow live debugging on training jobs
#34 opened by coder46 - 0
Stream job logs via command line or Cloudwatch
#32 opened by coder46 - 0
- 0
Add test coverage to github actions
#12 opened by coder46 - 0
Pip install failing
#21 opened by tsa87