Joint Computing, Pushing, and Caching Optimization for Mobile Edge Computing Networks via Soft Actor-Critic Learning
A Deep-Reinforcement Learning Approach for activity optimization in mobile edge computing (MEC) network.
Joint Computing, Pushing, and Caching Optimization for Mobile Edge Computing Networks Via Soft Actor-Critic Learning
Xiangyu Gao, Yaping Sun, Hao Chen, Xiaodong Xu, and Shuguang Cui
arXiv technical report (arXiv 2309.15369)
@ARTICLE{10275097, author={Gao, Xiangyu and Sun, Yaping and Chen, Hao and Xu, Xiaodong and Cui, Shuguang},
journal={IEEE Internet of Things Journal},
title={Joint Computing, Pushing, and Caching Optimization for Mobile Edge Computing Networks Via Soft Actor-Critic Learning},
year={2023}, volume={}, number={}, pages={1-1},
doi={10.1109/JIOT.2023.3323433}}
(Dec. 3, 2023) Release the source code and sample data.
Mobile edge computing (MEC) networks bring computing and storage capabilities closer to edge devices, which reduces latency and improves network performance. However, to further reduce transmission and computation costs while satisfying user-perceived quality of experience, a joint optimization in computing, pushing, and caching is needed. In this paper, we formulate the joint-design problem in MEC networks as an infinite-horizon discounted-cost Markov decision process and solve it using a deep reinforcement learning (DRL)-based framework that enables the dynamic orchestration of computing, pushing, and caching. Through the deep networks embedded in the DRL structure, our framework can implicitly predict user future requests and push or cache the appropriate content to effectively enhance system performance. One issue we encountered when considering three functions collectively is the curse of dimensionality for the action space. To address it, we relaxed the discrete action space into a continuous space and then adopted soft actor-critic learning to solve the optimization problem, followed by utilizing a vector quantization method to obtain the desired discrete action. Additionally, an action correction method was proposed to compress the action space further and accelerate the convergence. Our simulations under the setting of a general single-user, single-server MEC network with dynamic transmission link quality demonstrate that the proposed framework effectively decreases transmission bandwidth and computing cost by proactively pushing data on future demand to users and jointly optimizing the three functions. We also conduct extensive parameter tuning analysis, which shows that our approach outperforms the baselines under various parameter settings.
- Python 3.6
- Preferred system: Linux
- Pytorch-1.5.1
- Other packages (refer to
requirement
)
The configurations are in config
file. Note that the values of some parameters are limited to a few options because modifyting them needs the support of a .csv file in ./data/
folder.
usage: main.py [-h] [--env-name ENV_NAME] [--exp-case EXP_CASE] [--policy POLICY]
[--eval EVAL] [--gamma G] [--tau G] [--lr G] [--alpha G]
[--automatic_entropy_tuning G] [--seed N] [--batch_size N]
[--num_steps N] [--hidden_size N] [--updates_per_step N]
[--start_steps N] [--target_update_interval N] [--replay_size N]
[--cuda]
Note: There is no need for setting Temperature(--alpha
) if --automatic_entropy_tuning
is True.
- PTDFC: Proactive Transmission and Dynamic-computing-Frequency reactive service with Cache
python main.py --automatic_entropy_tuning True --target_update_interval 1000 --lr 1e-4 --exp-case case3 --cuda
Baselines:
- DFC: Dynamic-computing-Frequency reactive service with Cache
- DFNC: Dynamic-computing-Frequency reactive service with No Cache
- MFU-LFU: Most-Frequently-Used proactive transmission and Least-Frequently-Used cache replacement
- MRU-LRU: Most-Recently-Used proactive transmission and Least-Recently-Used cache replacement
Use the specific value for --exp-case
argument
Algorithms | --exp-case |
---|---|
PTDFC | case3 |
DFC | case4 |
DFNC | case2 |
MFU-LFU | case7 |
MRU-LRU | case6 |
tensorboard --logdir=runs --host localhost --port 8088
sac_joint_compute_push_cache Args
optional arguments:
-h, --help show this help message and exit
--env-name ENV_NAME Wireless Comm environment (default: MultiTaskCore)
--exp-case EXP_CASE Evaluation Algorithm (default: case3)
--policy POLICY Policy Type: Gaussian | Deterministic (default:
Gaussian)
--eval EVAL Evaluates a policy a policy every 10 episode (default:
True)
--gamma G discount factor for reward (default: 0.99)
--tau G target smoothing coefficient(τ) (default: 0.005)
--lr G learning rate (default: 3e-4)
--alpha G Temperature parameter α determines the relative
importance of the entropy term against the reward
(default: 0.2)
--automatic_entropy_tuning G
Automaically adjust α (default: False)
--seed N random seed (default: 123456)
--batch_size N batch size (default: 256)
--num_steps N maximum number of steps (default: 5000001)
--hidden_size N hidden size (default: 256)
--updates_per_step N model updates per simulator step (default: 1)
--start_steps N Steps sampling random actions (default: 10000)
--target_update_interval N
Value target update per no. of updates per step
(default: 1000)
--replay_size N size of replay buffer (default: 1000000)
--cuda run on CUDA (default: False)
This codebase is released under MIT license (see LICENSE).
This project is not possible without multiple great opensourced codebases. We list some notable examples below.