/AutoDRIVE-Simulator-HPC

HPC Deployments of AutoDRIVE Simulator

Primary LanguagePythonBSD 2-Clause "Simplified" LicenseBSD-2-Clause

AutoDRIVE Simulator: Rancher Cluster Deployments

This branch hosts resources for deployments of the AutoDRIVE Simulator on Rancher HPC cluster.

This project utlizes the Kubernetes API to enable dynamic, scalable, and disposable AutoDRIVE simulations on an HPC cluster. The basic structure of a project's deployment is displayed in the graphic below. Each simulation pod contains an AutoDRIVE Simulator container and an AutoDRIVE Devkit container integrated with the Python HPC Framework Data Logging Module. Simulation batches are scripted with the Python HPC Framework Automation Module to allow for dynamic simulation cases across HPC resources. Simulation data is collected from a control server pod located inside the Kubernetes cluster, which exports data to a thin client. Additionally, live simulations can be monitored from the AutoDRIVE HPC Webviewer.

Workflow Diagram

SETUP

Prerequisites:

  • kubectl
  • Docker
  • Python 3.8+
  • Python packages listed in requirements.txt
  • Desired versions of AutoDRIVE Simulator & AutoDRIVE Devkit
  • Configure kubectl with access to the desired cluster

Rancher/Gitlab Container Instructions

These steps outline how to properly build and run Docker images hosted on a GitLab container registry, as well as how to properly configure a Rancher cluster to pull from a project's container registry. These steps assume Docker & kubectl are already installed on a user's machine.

  1. Download the KubeConfig from the top right-hand corner of the Rancher dashboard & apply it to your kubectl by pointing the environment variable KUBECONFIG to the downloaded configuration file's location. Be sure to test your connectivity to the cluster with a basic kubectl command such as kubectl get pods.

KubeConfig Download

  1. An authentication token is needed in order to access the GitLab container registry for the project. Use the auth command inside /Docker/Makefile to generate an authentication file for the container registry, and insert your token in as the requested password. This will also generate an rcd-reg-cred.yaml file, which can be applied to the cluster to give Kubernetes access to the GitLab registry. Be sure to update the appropriate Username & Server IP in the Makefile command.

  1. To build a Docker image that is going to be hosted in the container registry tag it using the structure <server_url>/<gitlab_group>/<project_name>/<container_name>.

  1. After building the Docker image with the proper naming structure & authenticating with Docker, you should be able to push the built image to the container registry with docker push <tag>. The pushed container should appear in the selected GitLab project's registry found by selecting Deploy → Container Registry. You may need your project owner to enable this feature. /Docker/Makefile has examples of commands to deploy the containers used in this project.

Container Registry

  1. You should now be able to run docker images hosted in the container registry on the cluster & use them in your deployments.

USAGE

If using Clemson University's Rancher cluster, example_script.py is a script that provides a baseline use case.

FILE STRUCTURE

  • Docker: The Docker directory contains all necessary files to compile Docker images containing the AutoDRIVE Simulator, AutoDRIVE Devkit, AutoDRIVE HPC Webviewer, and the backend control server. A Makefile contains the necessary commands to compile each Docker image. The Dockerfiles expect the AutoDRIVE_API and AutoDRIVE_Simulator directories to be placed here (populated with Simulator & Devkit files). It should be noted the logger.py file may need to be moved into a new AutoDRIVE_API folder & integrated into the AutoDRIVE DevKit script in order to enable data collection from pods inside the cluster.

  • Kubernetes: The Kubernetes directory contains the YAML files for deployments used in the cluster.

  • Python: The Python directory holds all the necessary scripts to control simulations running in the cluster, using automation_module.py. Most variables can be updated using the config.ini file.

KNOWN ISSUES

  1. The control server can sometimes crash if simulation conditions are not configured to have multiple conditions to iterate through. This can generally be fixed by restarting the pod or adding duplicate conditions based on the desired number of iterations. A future goal of this project is to re-implement the way simulation configurations are handled to not be generated by the control server inside the cluster.

  2. Sometimes queries to the Kubernetes API inside the cluster can result in network errors (both if called via Python subprocess + kubetl or via the Python Kubernetes client). There are a large number of timeouts/repeat attempts built into the automation library, but it can still be an issue for large workloads querying the same node in a cluster multiple times.

CITATION

@eprint{AutoDRIVE-HPC-RZR-2024,
title={Off-Road Autonomy Validation Using Scalable Digital Twin Simulations Within High-Performance Computing Clusters}, 
author={Tanmay Vilas Samak and Chinmay Vilas Samak and Joey Binz and Jonathon Smereka and Mark Brudnak and David Gorsich and Feng Luo and Venkat Krovi},
year={2024},
eprint={2405.04743},
archivePrefix={arXiv},
primaryClass={cs.RO}
}

This work has been accepted at 2024 Ground Vehicle Systems Engineering and Technology Symposium (GVSETS). Distribution Statement A. Approved for public release; distribution is unlimited. OPSEC #8451.