An easy-to-use template for pytorch dl projects.
- Use
python start_new_proj.py --proj_name xx --proj_loc /path/to/proj_parent_dir
to extent to a new project. What you need to implement are the data, model, loss, metric, progress_img_saver. All other func for training and evaluation have been provided. - When you start a new proj with proj_name, the custom lib will be renamed by your proj_name. Recommend to use Camel-Case like (ProjName).
- Install required lib by
pip install -r requirements
. Major lib are: torch, numpy, loguru, tensorboard, pyyaml pre-commit install
to install pre-commit for formatting.pre-commit run --all-files
for checking all files.
Use python train.py --config configs/default.yaml
to start training.
All params should be referred to configs/default.yaml
- Setting
--gpu_ids -1
will only use cpu, good for debugging. Referscripts/cpu.sh
for more detail.
- Use launch: You can refer to
script/gpu.sh
for training on gpu. Single/Multi-gpu with local machine and distributed machines are allowed. - Use slurm: You can refer to
script/slurm.sh
for training on gpu usingslurm
. Single/Multi-gpu with local machine and distributed machines are allowed.
@master_only
in all functions allows only the rank=0
node performing func.
-
Use yaml to save configs. Mainly saved at
configs/
. If you want to set or update by argument, you can directly add--arg value
during input. -
All arguments in yaml are in levels, and input arguments should be
--level1.level2...
- We use
loguru
to save and show the log. Onlyrank=0
process shows the log. You canadd_log
and set msg_level
-
You can set
--resume
as the checkpoint_path, or the checkpoint folder which will load thelastest.pt.tar
. But this only reads the model, you have to set--configs xxx
as the configs in the existing expr folder. -
In
resume
mode, if you setprogress.start_epoch
as-1
. It will resume training. -
If
progress.start_epoch
is0
, it will load the weight and fine-tune from epoch 0. You should set a different expr name likexxx_finetune
for separation.
-
All updated configs will be saved in the experiment. You just need to run
job.sh
in the exp to reproduce result. -
The script is for starting cpu training. You need to modify the
job.sh
to use gpu.
-
You can add your model at
custom.models
withxxx_model.py
. -
Add
@MODEL_REGISTRY.register()
to the class for registration. -
Some backbones/components are provided in
common.models
.
-
dir.data_dir
in config is the main data_dir for all dataset. Should not specify it for any single dataset. You should modify youcustom.xx_dataset.py
to make the address specified for you dataset. -
You can add your dataset at
custom.datasets
withxxx_dataset.py
. -
Add
@DATASET_REGISTRY.register()
to the class for registration.
To set dataset used in train/val/eval, set
dataset
train:
type: xxDataset
augmentation:
xxx:
val:
eval:
Missing val/eval will not do validation and eval during training.
-
You can modify the function
custom.dataset.transform.get_transforms
for choosing data transformation. -
Some basic function are provided in
common.dataset.transform.augmentation
.
-
You can add your loss at
custom.loss
withxxx_loss.py
. -
Add
@LOSS_REGISTRY.register()
to the class for registration.
To set loss
loss:
loss1:
weight: 1.0
other: xxx
augmentation:
loss2:
weight: 2.0
-
Weights will be combined in loss_factory in
custom.loss.__init__
, you don't need to multiply weight in each implementation. -
When implementing metric, you have to put
inputs
to theoutput
device. Refer tocustom.loss.img_loss
for example.
The resulting loss dict will be:
loss:
names: [loss1, loss2, ...]
loss1: xx.xx
loss2: xx.xx
...
sum: xx.xx
-
Similar to Loss to calculate all metrics in once. But you don't need to set weights here, and no 'sum' is calculated.
-
Add
@METRIC_REGISTRY.register()
to the class for registration. -
When implementing metric, you have to put
inputs
to theoutput
device. Refer tocustom.metric.custom_metric
for example. -
The resulting metric dict will be:
metric:
names: [metric1, metric2, ...]
metric1: xx.xx
metric2: xx.xx
...
- Support grad on the whole model by
clip_gradients
. You can setclip_warm
as positive number in order to useclip_gradients_warmup
after warmup period.
-
Validation will be performed on
val
dataset everyprogress.epoch_val
epoch. Monitor will record result like loss, imgs. -
You can specify the valid cfgs in
dataset.val
to change the dataset details. -
If
progress.save_progress_val
isTrue
, will saveprogress.max_samples_val
result intoexperiments/expr_name/progress/val
.
-
Evaluation will be performed on
eval
dataset everyprogress.epoch_eval
dataset. All result will be locally recorded inexperiments/expr_name/eval
for each epoch. But generally you should not make it in training progress. Local evaluation is better to avoid over-fitting. -
You can specify the valid cfgs in
dataset.eval
to change the dataset details. -
Metric will be needed for quantitative evaluation.
-
If
progress.init_eval
isTrue
, will evaluate with init model or resume model.
-
If you want to evaluate on a trained model, you can use
python evaluate.py
and set--configs configs/eval.yaml
and--model_pt /path/to/model
for evaluation. Result will be written to--dir.eval_dir results/eval_sample
. -
eval.yaml
should contain param for--dataset.eval
,--model
,--metric
.
-
Tests for
common
class andcustom
are intests
. You should implement your tests forcustom
class when needed. -
We use unittest. You can run
python -m unittest test_file
on tests in the whole file.python -m unittest discover test_dir
on tests in the whole directory.python -m unitttest test_dir.test_file.test_method
on test for single func.
-
A tensorboard monitor will be used during training to record train/val loss, vals, images, etc.
-
All result in progress will be saved in
experiments/expr_name/event
. Usetensorboard --logdir=experiments/expr_name/event
to check. -
At the same time, if you set
progress.local_progress
as True, imgs will be written toexperiments/expr_name/progress
. -
Change
render_progress_img
incustom_trainer
for different visual results.
We provide simple samples of CUDA extensions for simple add_matrix function, and a python wrapper
to use it like a torch.nn.Module
.
More detail please see official doc.
Install it by getting into custom/ops
and run python setup.py install
. Or run sh ./scripts/install_ops.sh
.
Run it by python custom/ops/add_matrix.py
or
run tests by python -m unittest tests/tests_custom/tests_ops/tests_ops.py
.
You need to have a new folder in custom/ops/
to include the source cpp-wrapper and cuda implementation.
A python wrapper is suggested to put under custom/ops/func.py
to use the func for usage.
__global__
: call by cpu, run on gpu. Function must bevoid
.__device__
: call by gpu, run on gpu__host__
: call by cpu, run on cpu__host__ __device__
: both cpu and gpu__global__ __host__
is not allow.
grid - block - thread
is the level structure of GPU computation unit.
- index = blockIdx.x * blockDim.x + threadIdx.x = the thread id in a grid
- stride = blockDim.x = total num of thread in a block. Commonly a block can be used to handle one batch.
- stride = blockDim.x * gridDim.x = total num of thread in a grid
- use this is called
grid-stride loop
- use this is called
- 2d/1d grid/block are all supported based on your input tensor shape.
- Ref to doc1 and doc2 for detail.
To put a tensor into cuda kernel, it uses
AT_DISPATCH_FLOATING_TYPES(A.scalar_type(), "sample_cuda", // this will switch actual scalar type ([&] { kernel_func<scalar_t><<<blocks, threads>>>( A.data_ptr<scalar_t>(), B.data_ptr<scalar_t>(), ); }));
If you use A.data_ptr<scalar_t>()
to send the pointer, it will be hard to access the elements in kernel func.
You can instead use PackedAccessor
, which is like
torch::PackedTensorAccessor<scalar_t, 2, torch::RestrictPtrTraits, size_t>()
to allow easier access.
In some case, it is helpful to store by-product for backward grad calculation. But in pure inference mode, it is not good to do such calculation during forward pass. It is helpful to pass an indicator in customized forward pass.
This indicator should be [any(input.requires_grad)
and torch.is_grad_enabled()
] to check
whether any input requires_grad and whether it is in the no_grad context. In the .cu
kernel, you should have the
grad calculation by yourself.
- inference, demo
- onnx or other implementation
- deploy and web server
- online project homepage
- colab
- setup.py
This project template refers to: