ilayna/Single-GPU-passthrough-amd-nvidia

script tweak ideas

Ferratauris opened this issue · 1 comments

Hi

I am not one of the best Github users and I do not code much

So sorry is this is not the place to put this. But I really want to help you as you have helped me.

I have a weird Niche for my Passthrough setup. First let me explain

I needed to have a GPU Passthrough on my VM but I wanted to still be able to turn off the VM and have the Host use the Dedicated GPU once the Virtual Machine is turned off.

Single GPU Passthrough was the best option. I copied you code, and it worked flawlessly.

But I have shared graphics and dedicated graphics, I wanted to be able to run the VM with the VM using the dedicated GPU but the host must still display using the shared GPU

I used Bard AI and grinded for hours trying to find a start script that will Passthrough the dedicated GPU and leave the shared GPU on

And then an end script that will reattach the dedicated gpu and resume display on it and turn off display on the shared GPU again.

Here are the start and stop scripts based on your script. And it works perfectly. (The only problem I see is that perhaps the display switching, it is set up specifically for my display setup and I do not know how to make this work on any device.

You are the original script writer so I believe that you can take my "dirty" code and clean it up

And publish it for the world to see and use

Thank you

Start Script

#!/bin/bash

#############################################################################
##   ______ _        _ _______     _         _   ##
##  (_____ \(_)       | |(_______)    | |        | |  ##
##   _____) )_ _  _ _____ | | _  _  _  _ | |__  _____  __| |  ##
##  | ____/| |( \ / )| ___ || || |__| | | |_| || |_) )| ____|( (_| |  ##
##  | |   | | ) X ( | ____|| || |__| | | |_| || |_) )| ____|( (_| |  ##
##  |_|   |_|(_/ \_)|_____) \_)\______)|____/ |____/ |_____) \____|  ##
##                                     ##
#############################################################################
###################### Credits ###################### ### Update PCI ID'S ###
## Lily (PixelQubed) for editing the scripts    ## ##          ##
## RisingPrisum for providing the original scripts ## ##  update-pciids  ##
## Void for testing and helping out in general   ## ##          ##
## .Chris. for testing and helping out in general ## ## Run this command ##
## WORMS for helping out with testing       ## ## if you dont have ##
##################################################### ## names in you're  ##
## The VFIO community for using the scripts and  ## ## lspci feedback  ##
## testing them for us!              ## ## in your terminal ##
##################################################### #######################

################################# Variables #################################


## Adds current time to var for use in echo for a cleaner log and script ##
DATE=$(date +"%m/%d/%Y %R:%S :")

## Sets dispmgr var as null ##
DISPMGR="null"


################################## Script ###################################


echo
 
"$DATE Beginning of Startup!"



function check_display_manager_status {
  ## Get display manager on systemd based distros ##

  
if [[ -x /run/systemd/system ]] && echo
 
"$DATE Distro is using Systemd"; then
    DISPMGR="$(grep 'ExecStart=' /etc/systemd/system/display-manager.service | awk -F'/' '{print $(NF-0)}')"

    
echo
 
"$DATE Display Manager = $DISPMGR"

    ## Check if display manager is running ##
    if systemctl is-active --quiet "$DISPMGR.service"; then
      echo "$DATE Display manager is running."
      return 0
    else
      echo "$DATE Display manager is not running."
      return 1
    fi
  fi
}

function restart_display_manager {
  ## Restart display manager using systemd ##
  if [[ -x /run/systemd/system ]] && echo "$DATE Distro is using Systemd"; then
    systemctl restart "$DISPMGR.service"
    echo "$DATE Display manager restarted."
  fi
}

####################################################################################################################
## Checks to see if the display manager is running. If not, it will start it. If it is in an error state, it will restart it. ##
####################################################################################################################

if ! check_display_manager_status; then
  echo "$DATE Display manager is not running. Starting it."
  restart_display_manager
elif systemctl status "$DISPMGR.service" | grep "failed" &>/dev/null; then
  echo "$DATE Display manager is in an error state. Restarting it."
  restart_display_manager
else
  echo "$DATE Display manager is running and healthy."
fi



##############################################################################################################################
## Unbind VTconsoles if currently bound (adapted and modernised from https://www.kernel.org/doc/Documentation/fb/fbcon.txt) ##
##############################################################################################################################
if test -e "/tmp/vfio-bound-consoles"; then
    rm -f /tmp/vfio-bound-consoles
fi
for (( i = 0; i < 16; i++))
do
  if test -x /sys/class/vtconsole/vtcon"${i}"; then
      if [ "$(grep -c "frame buffer" /sys/class/vtconsole/vtcon"${i}"/name)" = 1 ]; then
	       echo 0 > /sys/class/vtconsole/vtcon"${i}"/bind
           echo "$DATE Unbinding Console ${i}"
           echo "$i" >> /tmp/vfio-bound-consoles
      fi
  fi
done

sleep "1"

# Identify any processes using the GPU
GPU_PROCESSES=$(ps aux | grep -E "nvidia|amdgpu|i915" | awk '{print $2}')

# Kill the GPU processes
for PID in $GPU_PROCESSES; do
 kill -9 $PID
done

# Unload the GPU drivers
sudo modprobe -r nvidia amdgpu i915

# Identify only the GPU connected via PCI (Not the graphics on the APU or integrated graphics)
GPU_PCI_ADDR=$(lspci | grep -E "VGA controller|Display controller" | grep -v "VGA compatible" | head -n 1)

# Detach only the GPU connected via PCI (Not the graphics on the APU)
sudo virsh nodedev-detach --driver vfio $GPU_PCI_ADDR

# Load all VFIO drivers
sudo modprobe vfio-pci

# Reload the display drivers
case $(lspci -nn | grep VGA | awk '{print $3}') in
 Intel)
  sudo modprobe i915
  ;;
 NVIDIA)
  sudo systemctl start nvidia-persistenced
  sudo modprobe nvidia
  ;;
 AMD)
  sudo amdconfig --load
  ;;
 *)
  # Other GPU types
  echo "Warning: Unknown GPU type. Display may not resume properly."
  ;;
esac

function check_display_manager_status {
  ## Get display manager on systemd based distros ##

  
if [[ -x /run/systemd/system ]] && echo
 
"$DATE Distro is using Systemd"; then
    DISPMGR="$(grep 'ExecStart=' /etc/systemd/system/display-manager.service | awk -F'/' '{print $(NF-0)}')"

    
echo
 
"$DATE Display Manager = $DISPMGR"

    ## Check if display manager is running ##
    if systemctl is-active --quiet "$DISPMGR.service"; then
      echo "$DATE Display manager is running."
      return 0
    else
      echo "$DATE Display manager is not running."
      return 1
    fi
  fi
}

function restart_display_manager {
  ## Restart display manager using systemd ##
  if [[ -x /run/systemd/system ]] && echo "$DATE Distro is using Systemd"; then
    systemctl restart "$DISPMGR.service"
    echo "$DATE Display manager restarted."
  fi
}

####################################################################################################################
## Checks to see if the display manager is running. If not, it will start it. If it is in an error state, it will restart it. ##
####################################################################################################################

if ! check_display_manager_status; then
  echo "$DATE Display manager is not running. Starting it."
  restart_display_manager
elif systemctl status "$DISPMGR.service" | grep "failed" &>/dev/null; then
  echo "$DATE Display manager is in an error state. Restarting it."
  restart_display_manager
else
  echo "$DATE Display manager is running and healthy."
fi

# Change the display options to only display on the first DisplayPort monitor and set it as the primary monitor
xrandr --output DisplayPort-1 --auto --primary

# Turn off the other two monitors
xrandr --output HDMI-A-0 --off
xrandr --output DisplayPort-2 --off

# Log back in to the user connected to Linux
sudo su - $USER

End Script

## Adds current time to var for use in echo for a cleaner log and script ##
DATE=$(date +"%m/%d/%Y %R:%S :")

## Sets dispmgr var as null ##
DISPMGR="null"


################################## Script ###################################


echo
 
"$DATE Beginning of Startup!"



function check_display_manager_status {
  ## Get display manager on systemd based distros ##

  
if [[ -x /run/systemd/system ]] && echo
 
"$DATE Distro is using Systemd"; then
    DISPMGR="$(grep 'ExecStart=' /etc/systemd/system/display-manager.service | awk -F'/' '{print $(NF-0)}')"

    
echo
 
"$DATE Display Manager = $DISPMGR"

    ## Check if display manager is running ##
    if systemctl is-active --quiet "$DISPMGR.service"; then
      echo "$DATE Display manager is running."
      return 0
    else
      echo "$DATE Display manager is not running."
      return 1
    fi
  fi
}

function restart_display_manager {
  ## Restart display manager using systemd ##
  if [[ -x /run/systemd/system ]] && echo "$DATE Distro is using Systemd"; then
    systemctl restart "$DISPMGR.service"
    echo "$DATE Display manager restarted."
  fi
}

####################################################################################################################
## Checks to see if the display manager is running. If not, it will start it. If it is in an error state, it will restart it. ##
####################################################################################################################

if ! check_display_manager_status; then
  echo "$DATE Display manager is not running. Starting it."
  restart_display_manager
elif systemctl status "$DISPMGR.service" | grep "failed" &>/dev/null; then
  echo "$DATE Display manager is in an error state. Restarting it."
  restart_display_manager
else
  echo "$DATE Display manager is running and healthy."
fi



##############################################################################################################################
## Unbind VTconsoles if currently bound (adapted and modernised from https://www.kernel.org/doc/Documentation/fb/fbcon.txt) ##
##############################################################################################################################
if test -e "/tmp/vfio-bound-consoles"; then
    rm -f /tmp/vfio-bound-consoles
fi
for (( i = 0; i < 16; i++))
do
  if test -x /sys/class/vtconsole/vtcon"${i}"; then
      if [ "$(grep -c "frame buffer" /sys/class/vtconsole/vtcon"${i}"/name)" = 1 ]; then
	       echo 0 > /sys/class/vtconsole/vtcon"${i}"/bind
           echo "$DATE Unbinding Console ${i}"
           echo "$i" >> /tmp/vfio-bound-consoles
      fi
  fi
done

sleep "1"

# Identify any processes using the GPU
GPU_PROCESSES=$(ps aux | grep -E "nvidia|amdgpu|i915" | awk '{print $2}')

# Kill the GPU processes
for PID in $GPU_PROCESSES; do
 kill -9 $PID
done

# Unload the GPU drivers
sudo modprobe -r nvidia amdgpu i915

################################## Script ###################################

echo "$DATE Beginning of Teardown!"

## Unload VFIO-PCI driver ##
modprobe -r vfio_pci
modprobe -r vfio_iommu_type1
modprobe -r vfio

if grep -q "true" "/tmp/vfio-is-nvidia" ; then

    ## Load NVIDIA drivers ##
    echo "$DATE Loading NVIDIA GPU Drivers"
    
    modprobe drm
    modprobe drm_kms_helper
    modprobe i2c_nvidia_gpu
    modprobe nvidia
    modprobe nvidia_modeset
    modprobe nvidia_drm
    modprobe nvidia_uvm

    echo "$DATE NVIDIA GPU Drivers Loaded"
fi

if  grep -q "true" "/tmp/vfio-is-amd" ; then

    ## Load AMD drivers ##
    echo "$DATE Loading AMD GPU Drivers"
    
    modprobe drm
    modprobe amdgpu
    modprobe radeon
    modprobe drm_kms_helper
    
    echo "$DATE AMD GPU Drivers Loaded"
fi

DATE=$(date +"%m/%d/%Y %R:%S :")

## Sets dispmgr var as null ##
DISPMGR="null"


################################## Script ###################################


echo
 
"$DATE Beginning of Startup!"



function check_display_manager_status {
  ## Get display manager on systemd based distros ##

  
if [[ -x /run/systemd/system ]] && echo
 
"$DATE Distro is using Systemd"; then
    DISPMGR="$(grep 'ExecStart=' /etc/systemd/system/display-manager.service | awk -F'/' '{print $(NF-0)}')"

    
echo
 
"$DATE Display Manager = $DISPMGR"

    ## Check if display manager is running ##
    if systemctl is-active --quiet "$DISPMGR.service"; then
      echo "$DATE Display manager is running."
      return 0
    else
      echo "$DATE Display manager is not running."
      return 1
    fi
  fi
}

function restart_display_manager {
  ## Restart display manager using systemd ##
  if [[ -x /run/systemd/system ]] && echo "$DATE Distro is using Systemd"; then
    systemctl restart "$DISPMGR.service"
    echo "$DATE Display manager restarted."
  fi
}

####################################################################################################################
## Checks to see if the display manager is running. If not, it will start it. If it is in an error state, it will restart it. ##
####################################################################################################################

if ! check_display_manager_status; then
  echo "$DATE Display manager is not running. Starting it."
  restart_display_manager
elif systemctl status "$DISPMGR.service" | grep "failed" &>/dev/null; then
  echo "$DATE Display manager is in an error state. Restarting it."
  restart_display_manager
else
  echo "$DATE Display manager is running and healthy."
fi



##############################################################################################################################
## Unbind VTconsoles if currently bound (adapted and modernised from https://www.kernel.org/doc/Documentation/fb/fbcon.txt) ##
##############################################################################################################################
if test -e "/tmp/vfio-bound-consoles"; then
    rm -f /tmp/vfio-bound-consoles
fi
for (( i = 0; i < 16; i++))
do
  if test -x /sys/class/vtconsole/vtcon"${i}"; then
      if [ "$(grep -c "frame buffer" /sys/class/vtconsole/vtcon"${i}"/name)" = 1 ]; then
	       echo 0 > /sys/class/vtconsole/vtcon"${i}"/bind
           echo "$DATE Unbinding Console ${i}"
           echo "$i" >> /tmp/vfio-bound-consoles
      fi
  fi
done

sleep "1"

############################################################################################################
## Rebind VT consoles (adapted and modernised from https://www.kernel.org/doc/Documentation/fb/fbcon.txt) ##
############################################################################################################

input="/tmp/vfio-bound-consoles"
while read -r consoleNumber; do
  if test -x /sys/class/vtconsole/vtcon"${consoleNumber}"; then
      if [ "$(grep -c "frame buffer" "/sys/class/vtconsole/vtcon${consoleNumber}/name")" \
           = 1 ]; then
    echo "$DATE Rebinding console ${consoleNumber}"
	  echo 1 > /sys/class/vtconsole/vtcon"${consoleNumber}"/bind
      fi
  fi
done < "$input"


echo "$DATE End of Teardown!"

Okay, Full disclosure

I had ants and I honestly thought the begin script could use some tweaking
First, the script assumes that the host machine has shared graphics and runs accordingly
Second It assumes that the virtual machine is taking control of the GPU

I realized that not all hosts will have a shared graphics and a dedicated graphics and not all guest machines will make use of PCI passthrough.

I tweaked a bit and was able to get the start script really good, but I was not able to get the end script to run successfully.

Even when I used the 2 scripts posted above, I was not able to get the end script to do what it needs to do.

So here is the start script tweaked (I hope) to work on all host machines, whether they have a Shared GPU a Dedicated GPU or a combination of the two

Please use the script as I said. I never use this kind of stuff but I thought this is a fun project and wanted to see how we can make the perfect PCI passthrough script.

I am only able to test this on amd hardware. so It works for my host device.
i will leave it up to you and the community to test on other combinations of hardware.

Please feel free to improve or tweak if needed.
I can't wait to see what improvement you can come up with.

I will post a new end script once I have a working one

Thank you

#!/bin/bash

#############################################################################
##     ______  _                _  _______         _                 _     ##
##    (_____ \(_)              | |(_______)       | |               | |    ##
##     _____) )_  _   _  _____ | | _    _   _   _ | |__   _____   __| |    ##
##    |  ____/| |( \ / )| ___ || || |  | | | | | ||  _ \ | ___ | / _  |    ##
##    | |     | | ) X ( | ____|| || |__| | | |_| || |_) )| ____|( (_| |    ##
##    |_|     |_|(_/ \_)|_____) \_)\______)|____/ |____/ |_____) \____|    ##
##                                                                         ##
#############################################################################
###################### Credits ###################### ### Update PCI ID'S ###
## Lily (PixelQubed) for editing the scripts       ## ##                   ##
## RisingPrisum for providing the original scripts ## ##   update-pciids   ##
## Void for testing and helping out in general     ## ##                   ##
## .Chris. for testing and helping out in general  ## ## Run this command  ##
## WORMS for helping out with testing              ## ## if you dont have  ##
##################################################### ## names in you're   ##
## The VFIO community for using the scripts and    ## ## lspci feedback    ##
## testing them for us!                            ## ## in your terminal  ##
##################################################### #######################

################################# Variables #################################

######################
## Define constants ##
######################

## Adds current time to var for use in echo for a cleaner log and script ##
DATE=$(date +"%m/%d/%Y %R:%S :")

GPU_PCI_ADDR=$(lspci | grep -E "VGA controller|Display controller" | grep -v "VGA compatible" | head -n 1)

################################## Script ###################################

echo "$DATE Beginning of Startup!"

##################################################################
## Check if the PCI graphics are connected to a virtual machine ##
##################################################################
  
if [[ $(virsh domblkinfo --domain | grep -q "$GPU_PCI_ADDR") -eq 1 ]]; then
  ## If the PCI graphics are connected to a virtual machine, skip the reattachment process. ##
  echo "$DATE PCI graphics are connected to a virtual machine, skipping reattachment."
else
  ## If the PCI graphics are not connected to a virtual machine, reattach them. ##
  sudo virsh nodedev-attach --driver vfio $GPU_PCI_ADDR

  ## Unload the VFIO driver if it is loaded. ##
  if [[ $(lsmod | grep -q vfio-pci) -eq 1 ]]; then
    sudo modprobe -r vfio-pci
  fi
fi

###################################
## Identify the display manager. ##
###################################

DISPMGR=$(systemctl list-unit-files --type=service | grep -E "sddm|gdm|lightdm" | awk '{print $1}')

########################################################
## Check if the display manager is in an error state. ##
########################################################

if [[ $(systemctl status "$DISPMGR.service" | grep -q "failed") -eq 1 ]]; then
  ## If the display manager is in an error state, reload it. ##
  echo "$DATE Display manager is in an error state, reloading."
  sudo systemctl reload "$DISPMGR.service"
fi

###########################
## Identify the GPU type ##
###########################

GPU_TYPE=$(lspci -nn | grep VGA | awk '{print $3}')

############################################################
## Check if the host machine has only a PCI connected GPU ##
############################################################

if [[ $(lspci -nn | grep VGA | wc -l) -eq 1 ]] && [[ $GPU_TYPE != "Intel" ]]; then
  ## If the host machine has only a PCI connected GPU, skip the parts where the display drivers are reloaded, and where the displaymanager is checked, instead it should turn off the display manager. and leave the display drivers unloaded. ##
  echo "$DATE Host machine has only a PCI connected GPU, skipping display driver reload and display manager check."
  sudo systemctl stop $DISPMGR.service
elif [[ $(lspci -nn | grep VGA | wc -l) -gt 1 ]] || [[ $GPU_TYPE == "Intel" ]]; then
  ## If the host machine has a combination of both an APU or intel share graphics and a PCI connected GPU, do everything shown in the script above. ##
  ## Reload the appropriate GPU driver. ##
  case $GPU_TYPE in
    Intel)
      sudo modprobe i915
      ;;
    NVIDIA)
      sudo systemctl start nvidia-persistenced
      sudo modprobe nvidia
      ;;
    AMD)
      sudo amdconfig --load
      ;;
    *)
      ## Other GPU types ##
      echo "Warning: Unknown GPU type. Display may not resume properly."
      ;;
  esac

  ## Reload the display manager if needed. ##
  reload_display_manager_if_needed

  ## Set the primary display output. ##
  xrandr --output DisplayPort-1 --auto --primary

  ## Turn off the HDMI-A-0 and DisplayPort-2 display outputs. ##
  xrandr --output HDMI-A-0 --off
  xrandr --output DisplayPort-2 --off

  ## Switch back to the normal user account. ##
  sudo su - $USER

  ## Print a message indicating that the startup process is complete. ##
  echo "$DATE End of Startup!"
fi