skat00sh/basic-deep-learning-tools

The repo was created as a part of a student round discussino at Fraunhofer AISEC

Python

basic-deep-learning-tools

The repo was created as a part of a student round discussion at Fraunhofer-AISEC

Steps

Create a conda environment using

    conda create -n basic_torch python=3.10

In order to run this repo,you can directly create the clone of the same environment using the following command(From directory root):

    conda env create --file basic_torch.yml

If you install more dependencies or change version of already existing libraries, make sure to update the enrionment file using:

  conda env export --name basic_torch > basic_torch.yml

Tips to use on server

Run the training using the screen command so that in case your connection times out and you're disconnected, your training won't halt
Some commonly used screen commands

   # Name your screen in case you are running multiple python scripts
   screen -S my_screen_name
   
   # To disconnect from the screen without killing the process press **Ctrl + a, d**
   
   # To reconnect with the screen
   screen -r my_screen_name
   
   # To list all the running screen
   screen -ls

Setup Pycharm or VsCode with remote Python interpreter to check if your local changes work with the libraries installed in the server conda env
In case you don't use PyCharm or VSCode, then using pudb is a good option for debugging directly from the terminal in case your code breaks on the server
There are many ways to organise a deep learning repo but the basic structure I prefer is for a barebones repo is:

 REPO_ROOT
├── basic_torch.yml
├── config.yml
├── data
│ └── MNIST
├── model.py
├── README.md
├── train.py
└── utils.py

Conda autocomplete bashscript link