Computer-Set-Up-SOP

  • It's a repo of how to set up your computer in the lab
  • Contains some scripts and resource
  • Just follow the step by step insturction, and you can set up your computer successfully idealy
  • If you found any condition or situation that show you fail, please open an issue.
  • You also can take this link as reference
  • here is some trouble shooting
  • you can also take this link as reference
  • E.g Xeon + Ubuntu + GTX2080 :https://www.cnblogs.com/Rohn/p/10971326.html

Step.1 Assembles the computer

Step.1-1 Set your budjet and make your order

Step.2 OS Installation

Step.3 Network, Link, or Remote Setup

Step.3-1 SSH Setup

sudo apt-get install openssh-server

Step.3-2 Remote Setup (Optional)

Step.4 Nvidia GPU Driver Setup

Step.4-1 Use the command below

#install common
sudo apt update
sudo apt upgrade
sudo apt install ubuntu-drivers-common

#install driver
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt update
sudo ubuntu-drivers autoinstall
  • You can run the script gpu_driver_setup.sh directly

Step.5 Install Anaconda

Step.6 Create Computinf Environment

Step.6-1 Using Anaconda create a vitual computing environment

conda create -n env_name
  • If you want to specify which python you want to use, e.g, python3.6
conda create -n env_name python=3.6
  • If you want to also install all anaconda package
conda create -n env_name python=3.6 anaconda
  • Here is a very useful link for using anaconda

Step.6-2 Install Pytorch GPU

conda install -c anaconda pytorch-gpu

Step.6-3 (Option)Install CUDA & cudnn

  • If you really need to install CUDA and cudnn, take this link as reference

Step.7 Troubleshoooting

  • Q.1 Cannot login / Resolution is wrong after reboot or power shutdown

    # Step.1 Goto to the TTY mode
      Press "ctrl + alt + F1" when you see the normal ubuntu login scene
      If you want to leave TTY mode, you can use the way list below:
      Press "ctrl + alt + F7"
    
    # Step.2 Login with you account and password
    
    # Step.3 Test "nvidia-smi" to know that whether the driver can commuicate with gpu or not
    
    # Step.4 If "Step.3" shows that the driver cannot commuicate with the gpu, please continue to use the rest of steps, and if doesn't
      shows that, please conntact the manger or your senior
    
    # Step.5 Use the scripts below:
      sudo apt-get upgrade
      # If your nvidia driver version is 375: 
      sudo apt-get install nvidia-375
      sudo dpkg-reconfigure nvidia-375
      # And if does not, please modify the script above into:
      sudo apt-get install nvidia-driver_version
      sudo dpkg-reconfigure nvidia-driver_version
    
    # Step.6 Rebbot the computer
      sudo reboot
    
    # Step.7 It's should be normal now! You can login as usual.
    
  • Q.2 Cannot login successfully; Cycle login

    $ sudo apt update
    $ sudo apt install --reinstall unity unity-common unity-lens* ubuntu-desktop lightdm
    $ sudo apt autoremove --purge
    $ reboot
    
  • Q.3 If cannot contact with the GPU

  • Q.4 If cannot use multi-gpu for training, ex: NV-Link fail

  • Q.5 If k8s container fail, ex: youcannot deploy the model