/a2perf-circuit-training

Primary LanguagePythonApache License 2.0Apache-2.0

⚠️ IMPORTANT NOTICE ⚠️

This is a submodule of the A2Perf project. For complete documentation and usage instructions, please refer to the main A2Perf README.


Circuit Training: An open-source framework for generating chip floor plans with distributed deep reinforcement learning.

Circuit Training is an open-source framework for generating chip floor plans with distributed deep reinforcement learning. This framework reproduces the methodology published in the Nature 2021 paper:

A graph placement methodology for fast chip design. Azalia Mirhoseini, Anna Goldie, Mustafa Yazgan, Joe Wenjie Jiang, Ebrahim Songhori, Shen Wang, Young-Joon Lee, Eric Johnson, Omkar Pathak, Azade Nazi, Jiwoo Pak, Andy Tong, Kavya Srinivasa, William Hang, Emre Tuncer, Quoc V. Le, James Laudon, Richard Ho, Roger Carpenter & Jeff Dean, 2021. Nature, 594(7862), pp.207-212. [PDF]

Our hope is that Circuit Training will foster further collaborations between academia and industry, and enable advances in deep reinforcement learning for Electronic Design Automation, as well as, general combinatorial and decision making optimization problems. Capable of optimizing chip blocks with hundreds of macros, Circuit Training automatically generates floor plans in hours, whereas baseline methods often require human experts in the loop and can take months.

Circuit training is built on top of TF-Agents and TensorFlow 2.x with support for eager execution, distributed training across multiple GPUs, and distributed data collection scaling to 100s of actors.

Table of contents

Features
Installation
Quick start
Testing
Releases
Results
FAQ
How to contribute
AI Principles
Contributors
How to cite
Disclaimer

Features

  • Places netlists with hundreds of macros and millions of stdcells (in clustered format).
  • Computes both macro location and orientation (flipping).
  • Optimizes multiple objectives including wirelength, congestion, and density.
  • Supports alignment of blocks to the grid, to model clock strap or macro blockage.
  • Supports macro-to-macro, macro-to-boundary spacing constraints.
  • Supports fixed macros.
  • Supports DREAMPlace as the stdcell placer.
  • Allows users to specify their own technology parameters, e.g. and routing resources (in routes per micron) and macro routing allocation.
  • Generates clustered netlists.
  • TILOS-AI-Institute has created a script to convert LEF/DEF and Bookshelf to the Netlist Protocol Buffer used as the input for circuit training.

Installation

⚠️ Circuit Training only supports Linux based OSes.

⚠️ Circuit Training requires Python 3.9 or greater.

Stable

Circuit Training is a reseach project. We are not currently creating PyPi builds. Stable in this instance is relative to HEAD and means that the code was tested at this point in time and branched. With upstream libraires constantly changing; older branches may end up rotting faster than expected.

The steps below install the most recent branch and the archive is in the releases section. There are two methods for installing; but before doing either one you need to run the preliminary setup](#preliminary-setup).

Preliminary Setup

Before following the instructions set the following variables and clone the repo:

$ export CT_VERSION=0.0.3
# Currently supports python3.9, python3.10, and python3.11
# The docker is python3.9 only.
$ export PYTHON_VERSION=python3.9
$ export DREAMPLACE_PATTERN=dreamplace_20230414_2835324_${PYTHON_VERSION}.tar.gz
# If the verson of TF-Agents in the table is not current, change this command to
# match the version tf-agenst that matches the branch of Circuit Training used. 
$ export TF_AGENTS_PIP_VERSION=tf-agents[reverb]

# Clone the Repo and checkout the desired branch.
$  git clone https://github.com/google-research/circuit_training.git
$  git -C $(pwd)/circuit_training checkout r${CT_VERSION}

Using the docker

Do not forget to do the prelimary setup. The cleanest way to use Circuit Training is to use the docker, these commands will create a docker with all the dependencies needed:

$ export REPO_ROOT=$(pwd)/circuit_training

# Build the docker image.
$ docker build --pull --no-cache --tag circuit_training:core \
    --build-arg tf_agents_version="${TF_AGENTS_PIP_VERSION}" \
    --build-arg dreamplace_version="${DREAMPLACE_PATTERN}" \
    --build-arg placement_cost_binary="plc_wrapper_main_${CT_VERSION}" \
    -f "${REPO_ROOT}"/tools/docker/ubuntu_circuit_training ${REPO_ROOT}/tools/docker/

# Run the end2end smoke test using the image. Takes 10-20 minutes.
$ mkdir -p ${REPO_ROOT}/logs
$ docker run --rm -v ${REPO_ROOT}:/workspace --workdir /workspace circuit_training:core \
    bash tools/e2e_smoke_test.sh --root_dir /workspace/logs

Install locally

Do not forget to do the prelimary setup.

Circuit Training installation steps:

  • Install our DREAMPlace binary.
  • Install TF-Agents and The Placement Cost Binary
  • Run a test

Install DREAMPlace

Follow the instructions for DREAMPlace but do not change the ENV VARS that you already exported previously.

Install TF-Agents and the Placement Cost binary

These commands install TF-Agents and the placement cost binary.

# Installs TF-Agents with stable versions of Reverb and TensorFlow 2.x.
$  pip install $TF_AGENTS_PIP_VERSION
# Copies the placement cost binary to /usr/local/bin and makes it executable.
$  sudo curl https://storage.googleapis.com/rl-infra-public/circuit-training/placement_cost/plc_wrapper_main_${CT_VERSION} \
     -o  /usr/local/bin/plc_wrapper_main
$  sudo chmod 555 /usr/local/bin/plc_wrapper_main

Run a test.

These commands run a basic unit test; if the current stable tf-agents is not the version you installed, then edit the tox.ini file and change tf-agents[reverb] to tf-agents[reverb]~=<version you want>

tox -e py39-stable -- circuit_training/grouping/grouping_test.py

HEAD

We recommand using stable branches; but our team does work from the HEAD. The main issue is HEAD breaks when upstream libraries are broken and our HEAD utilizes other nightly created libraries adding to the variablity.

The steps below install the most recent branch and the archive is in the releases section. There are two methods for installing; but before doing either one you need to run the preliminary setup.

Preliminary Setup

Before following the instructions set the following variables and clone the repo:

# Currently supports python3.9, python3.10, and python3.11
# The docker is python3.9 only.
$ export PYTHON_VERSION=python3.9
$ export DREAMPLACE_PATTERN=dreamplace_${PYTHON_VERSION}.tar.gz

# Clone the Repo and checkout the desired branch.
$  git clone https://github.com/google-research/circuit_training.git

Using the docker

Do not forget to do the prelimary setup. The cleanest way to use Circuit Training is to use docker, these commands will create an image with all the dependencies needed:

$ export REPO_ROOT=$(pwd)/circuit_training

# Builds the image with current DREAMPlace and Placement Cost Binary.
$ docker build --pull --no-cache --tag circuit_training:core \
    --build-arg tf_agents_version="tf-agents-nightly[reverb]" \
    -f "${REPO_ROOT}"/tools/docker/ubuntu_circuit_training ${REPO_ROOT}/tools/docker/

# Run the end2end smoke test using the image. Takes 10-20 minutes.
$ mkdir -p ${REPO_ROOT}/logs
$ docker run --rm -v ${REPO_ROOT}:/workspace --workdir /workspace circuit_training:core \
    bash tools/e2e_smoke_test.sh --root_dir /workspace/logs

Install locally

Circuit Training installation steps:

  • Install our DREAMPlace binary.
  • Install TF-Agents Nightly and the placement cost binary
  • Run a test

Install DREAMPlace

Follow the instructions for DREAMPlace but do not change the ENV VARS that you already exported previously.

Install TF-Agents and the Placement Cost binary

These commands install TF-Agents and the placement cost binary.

# Installs TF-Agents with stable versions of Reverb and TensorFlow 2.x.
$  pip install tf-agents-nightly[reverb]
# Copies the placement cost binary to /usr/local/bin and makes it executable.
$  sudo curl https://storage.googleapis.com/rl-infra-public/circuit-training/placement_cost/plc_wrapper_main \
     -o  /usr/local/bin/plc_wrapper_main
$  sudo chmod 555 /usr/local/bin/plc_wrapper_main

Run a test.

These commands run a basic unit test.

tox -e py39-nightly -- circuit_training/grouping/grouping_test.py

Install DREAMPlace

DREAMPlace is not provided as a PyPi package and needs to be compilede. We provide compiled versions of DREAMPlace taken from our branch for a range of Python versions built for our docker image (Ubuntu 20.4). We also use them for presubmit testing. If our binaries are not compatible with your OS tool chain, you will need to compile your own version. We use this script to create our DREAMPlace binary.

# These ENV VARS may have been set above, do not export again if already set.
$ export PYTHON_VERSION=python3.9
$ export DREAMPLACE_PATTERN=dreamplace_${PYTHON_VERSION}.tar.gz
# Installs DREAMPlace into `/dreamplace`. Anywhere is fine as long as PYTHONPATH
# is set correctly.
$  mkdir -p /dreamplace
# Picks the binary that matches your version of Python.
$  curl https://storage.googleapis.com/rl-infra-public/circuit-training/dreamplace/dreamplace_python3.9.tar.gz -o /dreamplace/dreamplace.tar.gz

# Unpacks the package.
$  tar xzf /dreamplace/dreamplace.tar.gz -C /dreamplace/

# Sets the python path so we can find Placer with `import dreamplace.Placer`
# This also needs to put all of DREAMPlace at the root because DREAMPlace python
# is not setup like a package with imports like `dreamplace.Param`.
$  export PYTHONPATH="${PYTHONPATH}:/dreamplace:/dreamplace/dreamplace"

# DREAMPlace requires some additional system and python libraries
# System packages
$  apt-get install -y \
      flex \
      libcairo2-dev \
      libboost-all-dev

# Python packages
$  python3 -mpip install pyunpack>=0.1.2 \
      patool>=1.12 \
      timeout-decorator>=0.5.0 \
      matplotlib>=2.2.2 \
      cairocffi>=0.9.0 \
      pkgconfig>=1.4.0 \
      setuptools>=39.1.0 \
      scipy>=1.1.0 \
      numpy>=1.15.4 \
      torch==1.13.1 \
      shapely>=1.7.0

Quick start

The best quick start is to run the end2end smoke test and then look at the full distributed example Circuit training for Ariane RISC-V.

Testing

# Runs tests with nightly TF-Agents.
$  tox -e py39-nightly,py310-nightly,py311-nightly
# Runs with latest stable TF-Agents.
$  tox -e py39-stable,py310-stable,py311-stable

# Using our Docker for CI.
## Build the docker
$  docker build --tag circuit_training:ci -f tools/docker/ubuntu_ci tools/docker/
## Runs tests with nightly TF-Agents.
$  docker run -it --rm -v $(pwd):/workspace --workdir /workspace circuit_training:ci \
     tox -e py39-nightly,py310-nightly,py311-nightly
## Runs tests with latest stable TF-Agents.
$  docker run -it --rm -v $(pwd):/workspace --workdir /workspace circuit_training:ci \
     tox -e py39-stable,py310-stable,py311-stable

Releases

While running at HEAD likely works, working from a branch has advantages of being more stable. We have tagged the code base to mark compatibility with stable releases of the underlying libraries. For DREAMPlace the filename pattern can be used to install DREAMPle for the versions of Python supported. For the Placement Cost binary, the ULR is to the version of the PLC used at the time the branch was cut.

Release Branch / Tag TF-Agents DREAMPlace PL
HEAD main tf-agents-nightly[reverb]
0.0.3 v0.0.3 tf-agents[reverb]~=0.16.0 dreamplace_20230414_b31e8af_python3.9.tar.gz placement cost binary
0.0.2 v0.0.2 tf-agents[reverb]~=0.16.0

Results

The results below are reported for training from scratch, since the pre-trained model cannot be shared at this time.

Ariane RISC-V CPU

View the full details of the Ariane experiment on our details page. With this code we are able to get comparable or better results training from scratch as fine-tuning a pre-trained model. At the time the paper was published, training from a pre-trained model resulted in better results than training from scratch for the Ariane RISC-V. Improvements to the code have also resulted in 50% less GPU resources needed and a 2x walltime speedup even in training from scratch. Below are the mean and standard deviation for 3 different seeds run 3 times each. This is slightly different than what was used in the paper (8 runs each with a different seed), but better captures the different sources of variability.

Metric Proxy Wirelength Proxy Congestion Proxy Density
mean 0.1013 0.9174 0.5502
std 0.0036 0.0647 0.0568

The table below summarizes the paper result for fine-tuning from a pre-trained model over 8 runs with each one using a different seed.

Metric Proxy Wirelength Proxy Congestion Proxy Density
mean 0.1198 0.9718 0.5729
std 0.0019 0.0346 0.0086

Frequently Asked Questions

We wrote this FAQ to answer frequently asked questions about our work. Please reach out to us if you have any other questions!

What is the goal and philosophy of our team?

Our goal is to help chip designers do their jobs better and faster, and we welcome any method that moves us in that direction. To ensure that we are solving real world problems, we work closely with chip designers to understand and address their needs.

What is the impact of our work?

To our knowledge, this is the first deep reinforcement learning (RL) method used in production to design hardware products. More specifically, the RL method described in the Nature paper generated macro placements that were frozen and taped out in Google’s AI accelerator chip (TPU-v5).

We are also excited to see that top EDA and chip design companies (e.g. Synopsys, Cadence, NVIDIA, etc.) have announced initiatives to use similar RL-based methods in their tools and chip design efforts.

Have we evaluated our method on open-source benchmarks?

We are focused on modern sub-10nm chips like TPU and Pixel, but we did publish an article in MLCAD 2021 led by Prof. David Pan and his student Zixuan Jiang, where we report results on the open-source ISPD 2015 benchmarks after unfixing macros. In any case, we have open-sourced our method, so the community is free to try it out on any benchmark.

How do we compare to commercial autoplacers?

Due to licensing agreements, we cannot publish any public comparison with commercial autoplacers. However, we can say that our strongest baseline is the physical design team working directly with the assistance of commercial autoplacers, and we outperform this baseline (see “manual” baseline in Table 1 of our Nature article).

How do we perform clustering of standard cells?

In our Nature paper, we describe how to use hMETIS to cluster standard cells, including all necessary settings. For detailed settings, please see Extended Data Table 3 from our Nature article. Internally, Google pays for a commercial license, but non-commercial entities are welcome to use a free open-source license

Regardless, our method runs on unclustered netlists as well, so you can skip the preprocessing step if you wish, though we’ve found clustering to benefit both our RL method and baseline placers. The complexity of our method scales with the number of macros, not the number of standard cells, so the runtime will not be overly affected.

What netlist formats do we support?

Our placer represents netlists in the open-source protocol buffer format. You can learn more about the format here. To run on netlists in other formats (e.g. LEF/DEF or Bookshelf), you can convert to protocol buffer format. Please see our quick start guide for an example of how to use this format on the open-source RISC-V Ariane CPU.

Why do we claim “fast chip design” when RL is slower than analytic solvers?

When we say “fast”, we mean that we actually help chip designers do their jobs faster, not that our algorithm runs fast per se. Our method can, in hours, do what a human chip designer needs weeks or months to perform.

If an analytic method optimizes for wirelength and produces a result in ~1 minute, that’s obviously faster than hours of RL optimization; however, if the result does not meet design criteria and therefore physical design experts must spend weeks further iterating in the loop with commercial EDA tools, then it’s not faster in any way that matters.

In our Nature experiments, why do we report QoR metrics rather than wirelength alone?

Our goal is to develop methods that help chip designers do their job better and faster. We therefore designed the experiments in our paper to mimic the true production setting as closely as possible, and report QoR (Quality of Result) metrics.

QoR metrics can take up to 72 hours to generate with a commercial EDA tool, but are highly accurate measurements of all key metrics, including wirelength, horizontal/vertical congestion, timing (TNS and WNS), power, and area.

QoR metrics are closest to physical ground truth and are used by production chip design teams to decide which placements are sent for manufacturing. In contrast, proxy costs like approximate wirelength and congestion can be computed cheaply and are useful for optimization, but are not used to make real world decisions as they can vary significantly from QoR.

It is also worth noting that metrics like wirelength and routing congestion directly trade off against each other (e.g. placing nodes close to one another increases congestion, but reduces wirelength), so optimizing or evaluating for wirelength alone is unlikely to result in manufacturable chip layouts.

In our Nature experiments, do we perform any postprocessing on the RL results?

No. In our Nature experiments, we do not apply any postprocessing to the RL results.

In our open-source code, we provide an optional 1-5 minute coordinate descent postprocessing step, which we found to slightly improve wirelength. You are welcome to turn it on or off with a flag, and to compare performance with or without it.

What was the process for open-sourcing this code?

Open-sourcing our code involved partnering with another team at Google (TF-Agents). TF-Agents first replicated the results in our Nature article using our codebase, then reimplemented our method and replicated our results using their own implementation, and then open-sourced their implementation as it does not rely on any internal infrastructure.

Getting approval to open-source this code, ensuring compliance with export control restrictions, migrating to TensorFlow 2.x, and removing dependencies from all Google infrastructure was quite time-consuming; but we felt that it was worth the effort to be able to share our method with the community.

How to contribute

We're eager to collaborate with you! See CONTRIBUTING for a guide on how to contribute. This project adheres to TensorFlow's code of conduct. By participating, you are expected to uphold this code of conduct.

Principles

This project adheres to Google's AI principles. By participating, using or contributing to this project you are expected to adhere to these principles.

Main Contributors

We would like to recognize the following individuals for their code contributions, discussions, and other work to make the release of the Circuit Training library possible.

  • Sergio Guadarrama
  • Summer Yue
  • Ebrahim Songhori
  • Joe Jiang
  • Toby Boyd
  • Azalia Mirhoseini
  • Anna Goldie
  • Mustafa Yazgan
  • Shen Wang
  • Terence Tam
  • Young-Joon Lee
  • Roger Carpenter
  • Quoc Le
  • Ed Chi

How to cite

If you use this code, please cite both:

@article{mirhoseini2021graph,
  title={A graph placement methodology for fast chip design},
  author={Mirhoseini, Azalia and Goldie, Anna and Yazgan, Mustafa and Jiang, Joe
  Wenjie and Songhori, Ebrahim and Wang, Shen and Lee, Young-Joon and Johnson,
  Eric and Pathak, Omkar and Nazi, Azade and Pak, Jiwoo and Tong, Andy and
  Srinivasa, Kavya and Hang, William and Tuncer, Emre and V. Le, Quoc and
  Laudon, James and Ho, Richard and Carpenter, Roger and Dean, Jeff},
  journal={Nature},
  volume={594},
  number={7862},
  pages={207--212},
  year={2021},
  publisher={Nature Publishing Group}
}
@misc{CircuitTraining2021,
  title = {{Circuit Training}: An open-source framework for generating chip
  floor plans with distributed deep reinforcement learning.},
  author = {Guadarrama, Sergio and Yue, Summer and Boyd, Toby and Jiang, Joe
  Wenjie and Songhori, Ebrahim and Tam, Terence and Mirhoseini, Azalia},
  howpublished = {\url{https://github.com/google_research/circuit_training}},
  url = "https://github.com/google_research/circuit_training",
  year = 2021,
  note = "[Online; accessed 21-December-2021]"
}

Disclaimer

This is not an official Google product.