The code of the paper Deep Reinforcement Learning for Online Assortment Customization: A Data-Driven Approach
is presented at this repository. The code works with Python3.9
and torch1.13
. The folder rlGT
is used for simulation with rank-list ground truth. The folder rlGT_reuse
is used for simulation with reusable products and rank-list ground truth. The folder expedia
is used for expedia data pre-processing. The folder realdata
is used for simulation with expedia data.
main.py
is the file to call to start the training. uti.py
includes functions for data generating and choice model fitting, and the results are saved in folder GT
. train.py
is used for training and testing. net.py
instantiates the A2C agent and other_agents.py
provides benchmark agents instantiations . env.py
simulates the rank-list based environment. feature.py
instantiates the resnet choice model.
to run the training for 10 products, each with 10 initial inventory levels, 4 types of customers, cardinality 4, and Load Factor 1.0 :
python main.py --A2C=True --num_products=10 --ini_inv=10 --num_cus_types=4 --cardinality=4 --seed_range=20 --net_seed=10 --net_seed=0 --share_lr=0.001 --actor_lr=0.0001 --critic_lr=0.0001 --step=100 --lr_min=0.00001 --e_rate=0.001 --a_rate=1 --c_rate=1 --lr_decay_lambda=0.999
to run with different Load Factors, change ini_inv to 8, 9, 11.
--detail=True : see the change of inventory and critic in the testing process.
different usage time distributions are chosen by line 71-74 and line 153-156 in env.py
.
run with :
python main.py --A2C=True --num_products=10 --ini_inv=10 --num_cus_types=4 --cardinality=4 --seed_range=20 --net_seed=10 --net_seed=0 --share_lr=0.001 --actor_lr=0.0001 --critic_lr=0.0001 --step=100 --lr_min=0.0001 --e_rate=0.001 --a_rate=1 --c_rate=1 --lr_decay_lambda=0.999
read_realdata.py
fits a gated-Assort-Net from expedia data to act as the environment feedback.
run with :
python main.py --num_products=30 --ini_inv=2 --num_cus_types=4 --selling_length=200 --seed_range=20 --net_seed=47 --share_lr=0.005 --actor_lr=0.005 --critic_lr=0.01 --step=50 --lr_min=0.00001 --e_rate=0.001 --a_rate=1 --c_rate=1 --lr_decay_lambda=0.999