RealVNF/distributed-drl-coordination

Questions and ideas about moving to hierarchical multi-agent DRL

burnCalories opened this issue ยท 9 comments

hi @stefanbschneider
Recently, I have finally completed the task of migrating d-drl-coordination SB3 to the rllib version. After adding the curiosity module, I found that a similar success rate can be achieved even with rewards of success and failure.

I have some interesting ideas. d-drl-coordination is coordinated based on flows with different arrival times. If DRL is upgraded to MARL, for example, three to four agents can process flows in parallel. There may be a higher success rate in MMPP mode or det real world trace mode.

After almost searching the documentation and community of rllib, I found that there is very little information on how to modify a custom environment to a multi-agent environment in it. Could you help me providing some information about this? :)

I would be happy to share my project with you. However, I have been quite busy recently. Once I have time to upload the complete project, I will let you know as soon as possible. :)

Hi @burnCalories
Happy to hear that you have worked more with our code and have been able to use it with RLlib + successfully test it with curiosity.
Once your project is ready, I would he happy to learn about it and refer to it from our Readme. No rush; just let me know when it is complete :)


Regarding multi agent:

In our approach, the flow control decisions are made in a distributed way at each node whenever a flow arrives. This is in line with the discrete-event approach of coord-sim that triggers an event and requests an action, eg, for each flow arrival.

A different approach, more in line with typical multi-agent RL, is following a discrete-time approach, where multiple agents (eg, nodes) take actions in fixed intervals/time steps in parallel (all agents in every time step).
This would require reframing the problem a bit, eg, by installing scheduling rules at each node that are then updated regularly in each step. This is in line with what we did in DeepCoord - just with a single agent: https://github.com/RealVNF/DeepCoord

For implementation in RLlib, here is an example for a custom environment (mobile-env) that we built for a different use case (controlling multi-cell selection in wireless networks) and its wrapper for RLlib' MultiAgentEnv: https://github.com/stefanbschneider/mobile-env/blob/main/mobile_env/wrappers/multi_agent.py
Hopefully this helps.

I agree ! typical multi-agent DRL is more suitable for the DeepCoord approach (modeling based on each node). But due to the ingenuity of the DeepCoord approach ! Adjusting to become multi-agent may undermine the advantages of centralization - multiple coordination schemes. and MARL integrated in rllib is more suitable for discrete actions.

In my ideal plan, network coordination can be divided into two layers, with the upper layer centrally controlling the overall situation and formulating optimization plans based on different needs. lower layer quickly places according to the scheduling plan of the upper layer (A central controller and lower level repeater similar to SDN). However, the current MARL and Hierarchical RL solutions seem to be unable to complete this task, mostly targeting competition and cooperation solutions in game scenarios.

Nice idea! I've been thinking of a similar approach too: Hierarchical coordination where the top level(s) do coarse-grained decisions on a longer timeframe and larger part of the network. And the lower level(s) perform fast, local, fine-grained coordination.

I have one paper with a hierarchical approach, but it focuses on typical mathematical optimization without DRL + it does not consider the different timeframes.

Would be really interesting to build sth like that with hierarchical DRL. On the top level something like DeepCoord and on the lower level distributed agents similar to this repo/paper here.
I saw that RLlib supports it, but I haven't worked with hierarchical DRL myself.

oh great suggestion ! I've read a few papers on hierarchical reinforcement learning, where a central policy controller issues sub-policies that are driven by different rewards given to different levels in the game scenario.

In the realvnf scenario, different policies can be distinguished based on the lifetime of each flow (e.g., TTL<200, using distributed approach to rapid decision making; TTL>200, Using a centralized strategy is good for optimization)

Here I have some questions about whether unified observations are needed if hierarchical DRL is used (since deepcoord and d-drl observations are made in different ways). Because of the differences between the two approach, I don't know yet how to unify the observations.

I'm going to continue working on RLlib hierarchical DRL. Could you keep this issue open for now? I'll keep Posting as we get new developments.

Sure, I'll keep it open :)

Hi @burnCalories is your migrated RLLib-Version available anywhere? I'd love to check it out.

Hi @stefanbschneider . I have posted the RAY project for/distributed drl coordination on my GitHub homepage, with the following link https://github.com/burnCalories/distributed_VNF. Thank you for publicly releasing the project code, which has taught me a lot about DRL+coordinate. I hope my project code can also help others.

In addition, regarding the VNF underlying simulator, I added a node memory module in my previous experiment. If necessary, I am willing to create a branch to add to my code

@burnCalories Thanks for sharing! Would be great if you could add a reference to our GitHub repo + mention the corresponding paper to cite in your Readme since large parts of our code and work are copied in your repo.

I also added your project to our Readme :)

Is it ok to close this issue now?

Sure๏ผŒ thanks