/DRL-UCS-AoI-Threshold

[ToN 2024] Ensuring Threshold AoI for UAV-assisted Mobile Crowdsensing by Multi-Agent Deep Reinforcement Learning with Transformer

Primary LanguagePythonGNU General Public License v3.0GPL-3.0

DRL-UCS(AoI_th)

This is the code accompanying the paper: "Ensuring Threshold AoI for UAV-assisted Mobile Crowdsensing by Multi-Agent Deep Reinforcement Learning with Transformer" by Hao Wang, Chi Harold Liu, et al, under review at IEEE/ACM Transactions on Networking.

📄 Description

Unmanned aerial vehicle (UAV) crowdsensing (UCS) is an emerging data collection paradigm to provide reliable and high quality urban sensing services, with age-of-information (AoI) requirement to measure data freshness in real-time applications. In this paper, we explicitly consider the case to ensure that the attained AoI always stay within a specific threshold. The goal is to maximize the total amount of collected data from diverse Point-of-Interests (PoIs) while minimizing AoI and AoI threshold violation ratio under limited energy supplement. To this end, we propose a decentralized multi-agent deep reinforcement learning framework called ``DRL-UCS($\text{AoI}{th}$)" for multi-UAV trajectory planning, which consists of a novel transformer-enhanced distributed architecture and an adaptive intrinsic reward mechanism for spatial cooperation and exploration. Extensive results and trajectory visualization on two real-world datasets in Beijing and San Francisco show that, DRL-UCS($\text{AoI}{th}$) consistently outperforms all six baselines when varying the number of UAVs, AoI threshold and generated data amount in a timeslot.

🔧 Dependencies

Installation

  1. Clone repo

    git clone https://github.com/BIT-MCS/DRL-UCS-AoI-Th.git
    cd DRL-UCS-AoI-Th
  2. Create Virtual Environment

    conda create -n ucs python==3.8
    conda activate ucs
    
  3. Install dependent packages

    pip install -r requirements.txt
    cd DRL-UCS-AoI
    python setup.py develop
    

💻 Training

Get the usage information of the project

cd /DRL-UCS-AoI-Th/DRL_UCS_AoI/adept/scripts

Then the usage information will be shown as following, more configuration can be found in the default config file config/bj.json.

Distributed Options:
    --nb-learners <int>         Number of distributed learners [default: 1]
    --nb-workers <int>          Number of distributed workers [default: 4]
    --ray-addr <str>            Ray head node address, None for local [default: None]

Topology Options:
    --actor-host <str>        Name of host actor [default: ImpalaHostActor]
    --actor-worker <str>      Name of worker actor [default: ImpalaWorkerActor]
    --learner <str>           Name of learner [default: ImpalaLearner]
    --exp <str>               Name of host experience cache [default: Rollout]
    --nb-learn-batch <int>    Number of worker batches to learn on (per learner) [default: 2]
    --worker-cpu-alloc <int>     Number of cpus for each rollout worker [default: 8]
    --worker-gpu-alloc <float>   Number of gpus for each rollout worker [default: 0.25]
    --learner-cpu-alloc <int>     Number of cpus for each learner [default: 1]
    --learner-gpu-alloc <float>   Number of gpus for each learner [default: 1]
    --rollout-queue-size <int>   Max length of rollout queue before blocking (per learner) [default: 4]

Environment Options:
    --env <str>             Environment name [default: PongNoFrameskip-v4]
    --rwd-norm <str>        Reward normalizer name [default: Clip]
    --manager <str>         Manager to use [default: SubProcEnvManager]
    --dataset <str>         "beijing" or "sanfrancisco"
    --nb_agent <int>        Number of agents 

Script Options:
    --nb-env <int>          Number of env per worker [default: 32]
    --seed <int>            Seed for random variables [default: 0]
    --nb-step <int>         Number of steps to train for [default: 10e6]
    --load-network <path>   Path to network file
    --load-optim <path>     Path to optimizer file
    --resume <path>         Resume training from log ID .../<logdir>/<env>/<log-id>/
    --config <path>         Use a JSON config file for arguments
    --eval                  Run an evaluation after training
    --prompt                Prompt to modify arguments

Optimizer Options:
    --lr <float>               Learning rate [default: 0.0007]
    --grad-norm-clip <float>  Clip gradient norms [default: 0.5]

Logging Options:
    --tag <str>                Name your run [default: None]
    --logdir <path>            Path to logging directory [default: /tmp/adept_logs/]
    --epoch-len <int>          Save a model every <int> frames [default: 1e6]
    --summary-freq <int>       Tensorboard summary frequency [default: 10]

Algorithm Options:
    --use_transformer <bool>   Whether use GTrXL
    --use_intrinsic   <bool>   Whether use RND-controlled intrinsic reward
    --bg  <float>              b_g in GTrXL

You can also train from config file using the following command:

python actorlearner.py --config ../config/bj.json # for Beijing Dataset
python actorlearner.py --config ../config/sf.json # for San Francisco Dataset

🏁 Testing

Get the usage information of testing:

python evaluate.py -h 
Required:
    --logdir <path>     Path to train logs (.../logs/<env-id>/<log-id>)
Options:
    --epoch <int>           Epoch number to load [default: None]
    --actor <str>           Name of the eval actor [default: ACActorEval]
    --gpu-id <int>          CUDA device ID of GPU [default: 0]
    --nb-episode <int>      Number of episodes to average [default: 30]
    --start <float>         Epoch to start from [default: 0]
    --end <float>           Epoch to end on [default: -1]
    --seed <int>            Seed for random variables [default: 512]
    --custom-network <str>  Name of custom network class

To evaluate the trained model, using the following command:

python evaluate.py --logdir ${your_log_path}

👏 Reference

This codebase is based on adept and Ray which are open-sourced. Please refer to that repo for more documentation.

📧 Contact

If you have any question, please email wanghao@bit.edu.cn.