[Request] Please provide a run instructions
Opened this issue · 7 comments
In order to collaborate and work together, since I'm not very much familiar with K8s provisioning, configuration, and the necessary steps for deploying my workload against them, would you be kind enough to provide a README or at least links to instructions that you found useful to setup my own cluster, credentials, account, etc?
I would like to be able to throw as much money as I want at the problem and replicate your current setup and work (I have corporate time and money to throw at the problem coming up soon). Once I'm up and running there is a lot more value I should be able to bring.
Things I had to do so far... (although I'm still not up and running locally with latest restructure).
-
Install rabbitmq
brew install rabbitmq
-
Enable recent_history_exchange
/usr/local/opt/rabbitmq/sbin/rabbitmq-plugins enable rabbitmq_recent_history_exchange
-
Install some more python modules
pprint
aioamqp
pika
- Comment out some GCP and CloudStorage code (blob-related) since I don't have a properly configured account yet.
AGMP:dotaclient andrzej.gorski$ git diff
diff --git a/optimizer.py b/optimizer.py
index 08c21c7..ba3fa7b 100644
--- a/optimizer.py
+++ b/optimizer.py
@@ -67,8 +67,8 @@ class DotaOptimizer:
# TODO(tzaman): Set logdir ourselves?
self.writer = SummaryWriter()
logger.info('Checkpointing to: {}'.format(self.log_dir))
- client = storage.Client()
- self.bucket = client.get_bucket(self.BUCKET_NAME)
+ #client = storage.Client()
+ #self.bucket = client.get_bucket(self.BUCKET_NAME)
if pretrained_model is not None:
logger.info('Downloading: {}'.format(pretrained_model))
@@ -257,8 +257,8 @@ class DotaOptimizer:
self.writer.add_scalar(name, metric, self.episode)
# Upload events to GCS
- blob = self.bucket.blob(self.events_filename)
- blob.upload_from_filename(filename=self.events_filename)
+ #blob = self.bucket.blob(self.events_filename)
+ #blob.upload_from_filename(filename=self.events_filename)
self.upload_model()
@@ -305,8 +305,8 @@ class DotaOptimizer:
logger.exception('Failed pushing latest weights to RMQ')
# Upload to GCP.
- blob = self.bucket.blob(rel_path)
- blob.upload_from_string(data=state_dict_b) # Model
+ #blob = self.bucket.blob(rel_path)
+ #blob.upload_from_string(data=state_dict_b) # Model
- Currently you have to run
python3.7 optimizer.py
andpython3.7 agent.py
andpython3.7 -m dotaservice
Currently things run but it doesn't seem to be doing anything (all rewards are 0) ... not sure what I'm missing yet.
2019-01-08 09:19:17,422 INFO === Starting Episode 0.
2019-01-08 09:19:17,423 INFO Starting game.
2019-01-08 09:19:17,429 INFO Received new model: version=0, size=1207690b
2019-01-08 09:19:17,432 INFO Updated weights to version 0
2019-01-08 09:19:47,400 INFO Player 0 rollout.
2019-01-08 09:19:47,401 INFO Player 0 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:19:47,528 INFO Player 5 rollout.
2019-01-08 09:19:47,529 INFO Player 5 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:00,962 INFO Player 0 rollout.
2019-01-08 09:20:00,963 INFO Player 0 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:01,039 INFO Player 5 rollout.
2019-01-08 09:20:01,040 INFO Player 5 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:14,576 INFO Player 0 rollout.
2019-01-08 09:20:14,577 INFO Player 0 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:14,651 INFO Player 5 rollout.
2019-01-08 09:20:14,652 INFO Player 5 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:29,547 INFO Player 0 rollout.
2019-01-08 09:20:29,548 INFO Player 0 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:29,619 INFO Player 5 rollout.
2019-01-08 09:20:29,620 INFO Player 5 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:42,469 INFO Received new model: version=1, size=1207690b
2019-01-08 09:20:42,473 INFO Updated weights to version 1
2019-01-08 09:20:44,718 INFO Player 0 rollout.
2019-01-08 09:20:44,719 INFO Player 0 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:44,802 INFO Player 5 rollout.
2019-01-08 09:20:44,803 INFO Player 5 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:50,055 INFO Player 0 rollout.
2019-01-08 09:20:50,056 INFO Player 0 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:50,088 INFO Player 5 rollout.
2019-01-08 09:20:50,089 INFO Player 5 reward sum: 0.00 subrewards:
{'death': -0.0, 'denies': 0.0, 'hp': 0.0, 'kills': 0.0, 'lh': 0.0, 'xp': 0.0}
2019-01-08 09:20:50,122 INFO Game finished.
Well, it looks like it is working... just we don't have any "driver" to make it go to Location(0,0,0) so initially it is wondering around aimlessly.
I do see it moving around in GUI mode.
@TimZaman - thanks for continued updates of README.md - definitely helps although for someone that hasn't created accounts or used Google Cloud Platform many things are still very unclear.
So far I have installed:
- ksonnet :
brew install ksonnet/tap/ks
- kubernetes :
brew install kubernetes-cli
Regarding GKE - Seems like I need to enable a bunch of GCP APIs before I can do anything. Browsing around the internet I finally got to:
https://www.kubeflow.org/docs/started/getting-started-gke/
That lists the following APIs as needed:
- Compute Engine
- GKE
- Identity and Access Management (IAM)
- Deployment Manager
I can work on getting those enabled next although what is the expected "realistic" pricing I am looking at here on a monthly basis?
On GCP you get $300 for free, which is equal to around 40 cores for a month. (20 agents). But you don't really need it. You can just install k8s locally (minikube). I do currently use GCP (Google Cloud Storage, part of GCP) to save and resume the model/tensorboard. I find that pretty handy. GCS itself (google cloud storage) is super cheap (so you can use the $300 towards that goal). Alternatively, we make GCS optional, feel free to add support for that.
I also have K8s running on a raspberry pi cluster of 4 machines. However, those are ARM chips so they cannot run dota. You can also setup your own k8s cluster with a few old machines. Dota needs around 1 core per agent. I was running on my mac pro (6core) 8 agents while using a total of 80% of all CPU.
Alternatively, you have all agents run in multiple docker containers with port forwarding to your local machine, where you have rmq and optimizer running.
Oh btw, nice that you got it working! Yeah what you posted is exact how it should work! Nice! And it will take a few hours for it to go towards mid, but then it will accelerate bc of the XP, then the last hits, etc.
Here is a model with the latest dota trained last night on top-of-tree (0.3.4) [your code!]
exp1_job2_model_000001576.pt.zip
brew install gcc (prereq for rabbitmq)
xcode-select --install (because running on Mac requires Xcode)