johnolafenwa/DeepStack

Training locally

Opened this issue · 8 comments

Hello, i'm interested to create a custom model locally.
I tried to do it online but i get kicked out after 12 hours
I followed every step in the guide, only need the last correct string to see if works.
-CUDA + CUDNN + PyTorch installed and verified with
python
followed by
import torch torch.cuda.is_available()
-Cloned DeepStack Trainer with git clone https://github.com/johnolafenwa/deepstack-trainer

Looks all fine and ready but i'm lack of knowledge and this python3 train.py --dataset-path "/path-to/my-dataset don't works
Maybe i need to change the path.. i tried but not found the correct one.

Also, how can i use different type of --model --batch-size epochs ?
An example of a custom string help me a lot, my folder "my-dataset" with "train" and "test" (edited with LabelIMG in YOLO) is ready to be used in my desktop :)
Thanks

Hello @kpally4 , the --dataset-path should point to the directory where your train and test folders are located. For the --model and other parameters, see this guide https://colab.research.google.com/drive/1gbTr_4xpDk3cpnbAVbMVxtyp-3XuUPix?usp=sharing

Thanks for the reply @johnolafenwa
To be honest i'm really confused, never done it before... i don't know why nothing happen when i give the command line
As said, CUDA CUDNN PyTorch looks like functional
https://i.imgur.com/AykRDnQ.png
I'm not sure about the last requirement in the guide pip install -r requirements.txt
i mean, i selected in order stable(1.7.1), Windows, Pip, Python, 10.1 from the pytorch site and copy paste the line
https://i.imgur.com/2CilGut.png
Then tried the step of clone trainer
Then tried to launch the trainer with python3 train.py --dataset-path "D:\My-Dataset" --model "yolo5s" --batch-size 32
with the path of where is located the folder called "My-Dataset" with test and train inside in "D"
but nothing happened, i tried in multiple prompt/powershell ...
Maybe i missing something
Help when you can please :)

Any help from anyone who already trained his "train" and "test" folders locally?
I'm stucked at the same point as described above :(

The path from your WSL (linux) environment is not the same.

This is how i run it:

Open a command prompt.
bash <- Start bash to jump into your linux environment.
cd /mnt/c/temp/deepstack/deepstack-trainer <- change folder to my Windows C:\temp\deepstack\deepstack-trainer
python3 train.py --dataset-path "/mnt/c/temp/deepstack/data" <- My test and train folders are placed in C:\temp\deepstack\data*

I hope this helps.

The path from your WSL (linux) environment is not the same.

This is how i run it:

Open a command prompt.
bash <- Start bash to jump into your linux environment.
cd /mnt/c/temp/deepstack/deepstack-trainer <- change folder to my Windows C:\temp\deepstack\deepstack-trainer
python3 train.py --dataset-path "/mnt/c/temp/deepstack/data" <- My test and train folders are placed in C:\temp\deepstack\data*

I hope this helps.

Thanks for the reply, i finally understand the point where a linux enviroment was required... well.. anyway
I got it, and changed my paths in my own way.
cd /mnt/c/windows/deepstack/deepstack-trainer
Followed by python3 train.py --dataset-path "/mnt/c/windows/deepstack/data

The result is
Traceback (most recent call last): File "train.py", line 11, in <module> import numpy as np ModuleNotFoundError: No module named 'numpy'

And i don't know what that mean

You are missing dependencies, i did too.

I fixed it by running the setup code included in the google colab in my bash environment:

!git clone https://github.com/johnolafenwa/deepstack-trainer %cd deepstack-trainer !pip install -r requirements.txt !pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

You are missing dependencies, i did too.

I fixed it by running the setup code included in the google colab in my bash environment:

!git clone https://github.com/johnolafenwa/deepstack-trainer %cd deepstack-trainer !pip install -r requirements.txt !pip install torch==1.7.0+cu110 torchvision==0.8.1+cu110 torchaudio===0.7.0 -f https://download.pytorch.org/whl/torch_stable.html

I tried with the same method but I only succeeded by installing each item from time to time.
Like this:
python3 -m pip install numpy
python3 -m pip install torch
python3 -m pip install tensorboard
python3 -m pip install tqdm
pip install image
sudo apt install python3-opencv -y
pip3 install torchvision
pip3 install matplotlib
pip3 install scipy

At this point this is what happened as result of python3 train.py --dataset-path "/mnt/c/windows/deepstack/data"
mFQc23f

I changed 2 lines in "yolo.py"
from
b[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
b[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls
to this
b.data[:, 4] += math.log(8 / (640 / s) ** 2) # obj (8 objects per 640 image)
b.data[:, 5:] += math.log(0.6 / (m.nc - 0.99)) if cf is None else torch.log(cf / cf.sum()) # cls

At this point looks like it start to works BUT it crash into a:
Segmentation fault (core dump)

and stopped
Also it use CPU instead of GPU
I don't know how to proceed

cydj5tG