rachelselinar/DREAMPlaceFPGA

Placement Runtime in DREAMPlaceFPGA

Closed this issue · 5 comments

Why did I experiment with the ISPD'2016 FPGA01 benchmark on a Linux server that consists of an Intel (R) Xeon (R) W-2123 CPU @ 3.60GHz (8 cores) and the result was a GP of 17.94 seconds and an L+D of 52.141 seconds?

Hi Kang,
Could you please share the run log?
It is hard to tell what options you used to run the tool.

Benchmark:ISPD'2016 FPGA03
CPU: Intel i9-10920X (24) @ 4.800GHz
GPU: NVIDIA TITAN Xp
run in 24 threads:

[INFO   ] DREAMPlaceFPGA - Placement completed in 22.48 seconds
[INFO   ] DREAMPlaceFPGA - write placement solution to results/design/design.gp.pl took 0.567 seconds
[INFO   ] DREAMPlaceFPGA - Legalization and Detailed Placement run using elfPlace (CPU): ./thirdparty/elfPlace_LG_DP --aux benchmarks/ispd2016/FPGA03/design.aux --numThreads 24 --pl results/design/design_final.pl
[INF 2023-09-20 17:10:21    0.00 sec]  ----- Command-Line Options -----
[INF 2023-09-20 17:10:21    0.00 sec]  numThreads = 24
[INF 2023-09-20 17:10:21    0.00 sec]  --------------------------------
[INF 2023-09-20 17:10:21    0.00 sec]  Parsing file benchmarks/ispd2016/FPGA03/design.aux
[INF 2023-09-20 17:10:21    0.00 sec]  Parsing file benchmarks/ispd2016/FPGA03/design.lib
[INF 2023-09-20 17:10:21    0.00 sec]  Parsing file benchmarks/ispd2016/FPGA03/design.scl
[INF 2023-09-20 17:10:21    0.01 sec]  Parsing file benchmarks/ispd2016/FPGA03/design.nodes
[INF 2023-09-20 17:10:22    0.32 sec]  Parsing file benchmarks/ispd2016/FPGA03/design.pl
[INF 2023-09-20 17:10:22    0.32 sec]  Parsing file benchmarks/ispd2016/FPGA03/design.nets
[INF 2023-09-20 17:10:24    3.00 sec]  GP instance stddev = 2.05, trunc = 2.50
[INF 2023-09-20 17:10:24    3.00 sec]  Import placement from file gp.pl
[INF 2023-09-20 17:14:41  259.53 sec]  Export solution to file results/design/design_final.pl
[INFO   ] DREAMPlaceFPGA - Legalization and detailed placement completed in 260.486 seconds
[INFO   ] DREAMPlaceFPGA - Completed Placement in 283.556 seconds

run in 12 threads:

[INFO   ] DREAMPlaceFPGA - Placement completed in 22.26 seconds
[INFO   ] DREAMPlaceFPGA - write placement solution to results/design/design.gp.pl took 0.595 seconds
[INFO   ] DREAMPlaceFPGA - Legalization and Detailed Placement run using elfPlace (CPU): ./thirdparty/elfPlace_LG_DP --aux benchmarks/ispd2016/FPGA03/design.aux --numThreads 12 --pl results/design/design_final.pl
[INF 2023-09-20 17:17:34    0.00 sec]  ----- Command-Line Options -----
[INF 2023-09-20 17:17:34    0.00 sec]  numThreads = 12
[INF 2023-09-20 17:17:34    0.00 sec]  --------------------------------
[INF 2023-09-20 17:17:34    0.00 sec]  Parsing file benchmarks/ispd2016/FPGA03/design.aux
[INF 2023-09-20 17:17:34    0.00 sec]  Parsing file benchmarks/ispd2016/FPGA03/design.lib
[INF 2023-09-20 17:17:34    0.00 sec]  Parsing file benchmarks/ispd2016/FPGA03/design.scl
[INF 2023-09-20 17:17:35    0.02 sec]  Parsing file benchmarks/ispd2016/FPGA03/design.nodes
[INF 2023-09-20 17:17:35    0.32 sec]  Parsing file benchmarks/ispd2016/FPGA03/design.pl
[INF 2023-09-20 17:17:35    0.32 sec]  Parsing file benchmarks/ispd2016/FPGA03/design.nets
[INF 2023-09-20 17:17:37    3.00 sec]  GP instance stddev = 2.05, trunc = 2.50
[INF 2023-09-20 17:17:37    3.00 sec]  Import placement from file gp.pl
[INF 2023-09-20 17:22:01  266.03 sec]  Export solution to file results/design/design_final.pl
[INFO   ] DREAMPlaceFPGA - Legalization and detailed placement completed in 267.003 seconds
[INFO   ] DREAMPlaceFPGA - Completed Placement in 289.904 seconds

its my FPGA03 json

{
    "aux_input" : "benchmarks/ispd2016/FPGA03/design.aux",
    "gpu" : 1,
    "num_threads" : 12,
    "num_bins_x" : 512,
    "num_bins_y" : 512,
    "global_place_stages" : [
        {"num_bins_x" : 512, "num_bins_y" : 512, "iteration" : 2000, "learning_rate" : 0.01, "wirelength" : "weighted_average", "optimizer" : "nesterov"}
    ],
    "target_density" : 1.0,
    "density_weight" : 8e-5,
    "random_seed" : 1000,
    "scale_factor" : 1.0,
    "global_place_flag" : 1,
    "legalize_flag" : 0,
    "detailed_place_flag" : 0,
    "dtype" : "float32",
    "deterministic_flag" : 0
}

When 'global_place_flag' is set to 1 and 'legalize_flag' is set to 0, Global placement is run on DREAMPlaceFPGA, and the remaining stages - legalization (LG) and detailed placement (DP) are run using elfPlace binary included in the thirdparty folder.
When running LG and DP using elfPlace, the runtime is affected by the available number of threads in the machine irrespective of the thread count set in the json file. Any other jobs running on the same machine will affect the CPU runtime.
Please try a closed experiment with no other jobs running on the CPU to see impact of the number of threads.

As GP is accelerated on GPU and LG + DP is run on CPU, the runtime for LG + DP is expected to be larger than GP.
To run LG on GPU, set the 'legalize_flag' to 1.