dcgym/iroko

Trouble getting run_ray.py to work

viswanathgs opened this issue · 4 comments

Hey, nice work!

I'm trying to play with this and reproduce results from https://arxiv.org/abs/1812.09975. I followed the installation instructions and got run_basic.py working, but sudo python run_ray.py --tune fails with the following exception:

Error creating interface pair (s1-eth2,h1-eth0): RTNETLINK answers: File exists

This happens without --tune as well. I've tried doing sudo mn -c, but it doesn't help. Here's the entire log file: https://pastebin.com/3gvZwNTT.

Any help appreciated. Thank you!

Thanks!
Hmm this is odd, I cannot reproduce it. This issue normally happens when two environments are launched in parallel. The reason is that actual virtual interfaces are created with the _start_env() call in env_iroko.py. If two virtual interfaces have the same name, the Mininet call fails.
To mitigate this problem we typically use
config["env_config"]["parallel_envs"] = True and pass it to the iroko environment. In run_ray.py this should also be triggered if you use ray scheduling or multiple workers at once.
Once parallel_envs is true, all interfaces are launched with a unique ID that should prevent those conflicts.

Oh this seems to happen with Python3 only, I really should have tested for that...
I will take a look.

Okay so the issue is the way ray uses the reset() call. Unfortunately, it does not play well with my recent changes and causes conflicts when trying to cleanup and reinitialize the Mininet work.
You can try this ugly hotfix until I have found a more elegant solution.

diff --git a/dc_gym/env_iroko.py b/dc_gym/env_iroko.py
index d97d97e..280eafa 100644
--- a/dc_gym/env_iroko.py
+++ b/dc_gym/env_iroko.py
@@ -87,6 +87,7 @@ class DCEnv(openAIGym):
         self._set_gym_spaces(self.conf)
         # Set the active traffic matrix
         self.input_file = None
+        self.net_man = None
         self.set_traffic_matrix(self.conf["tf_index"])
         self.state_man = StateManager(self.conf, self.topo)
         # handle unexpected exits scenarios gracefully
@@ -96,7 +97,9 @@ class DCEnv(openAIGym):
         atexit.register(self.close)
 
     def _start_env(self):
-        self.net_man = NetworkManager(self.topo, self.conf["agent"].lower())
+        if not self.net_man:
+            self.net_man = NetworkManager(
+                self.topo, self.conf["agent"].lower())
         # initialize the traffic generator and state manager
         self.traffic_gen = TrafficGen(self.net_man, self.conf["transport"])
         self.state_man.start(self.net_man)
@@ -205,7 +208,7 @@ class DCEnv(openAIGym):
         if hasattr(self, 'traffic_gen'):
             log.info("Stopping traffic")
             self.traffic_gen.stop_traffic()
-        if hasattr(self, 'net_man'):
+        if self.net_man:
             log.info("Stopping network.")
             self.net_man.stop_network()
         if hasattr(self, 'state_man'):

Perfect, works now. Thanks for the quickfix!

Feel free to close this issue or keep this open until the actual fix if you prefer.