AI 核心库

说明：此项目不在维护，以后学习和使用商汤的MMCV:https://github.com/open-mmlab/mmcv

一、项目由来

在工业流程中，深度学习应用过程包括：

TrainVal(针对特定场景，特定数据集训练一个模型)
Eval(使用验证/测试集测试，得到工业上的性能指标)
Demo(将模型做个demo给客户展示)
Export(将模型转成其他格式)
Deploy(将模型部署到具体的设备上，并权衡速度与准确率)
APP(将整个工业流程封装成带界面的APP)

在深度学习训练中，训练过程包括以下组件：

dataloaders(数据加载)
models(模型：分类、分割、检测、异常检测等Task)
losses(损失函数)
optims(优化器)
schedulers(学习率调整策略)
evaluator(训练过程中的性能评价)
trainers(训练过程, 将上述组件联系起来, 并且包含一些调优trick)
- resume
- fune turning
- 日志监控(训练过程日志输出，tensorboard，...)
- 权重输出
- multigpus(是否使用多GPU形式训练)
- mixedprecisions(是否使用混合精度进行训练)
- ......

目前，我所遇到的深度学习项目基本都能用这两个维度概括，为了方便以后使用，在此将两个维度整理成这个项目.

二、项目结构

1. 目录介绍

|-- configs # 分类、分割、目标检测、异常检测配置文件
|   |-- SemanticSegmentation
|   |-- ObejctDetection
|   |-- AnomalyDetection
|   `-- ImageClassification
|-- custom_modules  # 自定义组件
|   `-- __init__.py
|-- dao   # 核心库
|   |-- __init__.py
|   |-- dataloaders # 数据集、数据加载器、数据增强等
|   |-- evaluators  # 验证器
|   |-- losses      # 损失函数
|   |-- models      # 模型
|   |   |-- ImageClassification   # 分类
|   |   |-- ObejctDetection       # 目标检测
|   |   |-- SemanticSegmentation  # 分割
|   |   |-- anomaly               # 异常检测
|   |   `-- backbone              # 主干网络
|   |-- optimizers  # 优化器
|   |-- schedulers  # 学习率调整策略
|   |-- trainers    # 训练器
|   |-- register.py # 注册, dao中所有组件都注册在这里
|   `-- utils       # 工具库
|-- main.py         # 启动文件
|-- notes           # 笔记
|   |-- SemanticSegmentation
|   |-- ObejctDetection
|   |-- AnomalyDetection
|   `-- ImageClassification

2. Train流程

|- trainer
|	|- before_train(train之前的操作，eg. dataloader,model,optim,... setting)
|	|	|- 1.logger setting：日志路径，tensorboard日志路径，日志重定向等
|	|	|- 2.model setting：获取模型
|	|	|- 3.optimizer setting：获取优化器，不同权重层，优化器参数不同的设置
|	|	|- 4.resume setting：resume，fune turning等设置
|	|	|- 5.dataloader setting：数据集dataset定义-->Transformer(数据增强）-->Dataloader（数据加载)--> ...
|	|   |- 6.loss setting: 损失函数选择，有的实验可以略掉，因为在model中定义了
|	|   |- 7.scheduler setting：学习率调整策略选择
|	|   |- 8.other setting: 补充2model setting，EMA，DDP模型等设置
|	|   |- 9.evaluator setting：验证器设置，包括读取验证集，计算评价指标等
|	|- train_in_epoch(训练一个epoch的操作)
|	|	|- before_epoch(一个epoch之前的操作)
|	|	|	|- 判断此次epoch使用进行马赛克增强；
|	|	|	|- 修改此次epoch的损失函数；
|	|	|	|- 修改此次epoch的日志信息格式;
|	|	|	|- ...
|	|	|- train_in_iter(训练一次iter的操作，一次完整的forward&backwards)
|	|	|	|- before_iter(一次iter之前的操作)
|	|	|	|	|- nothing todo
|	|	|	|- train_one_iter(一次iter的操作)
|	|	|	|	|- 1.记录data time和iter time
|	|	|	|	|- 2.(预)读取数据
|	|	|	|	|- 3.forward
|	|	|	|	|- 4.计算loss
|	|	|	|	|- 5.backwards
|	|	|	|	|- 6.optimizer 更新网络权重
|	|	|	|	|- 7.是否进行EMA操作
|	|	|	|	|- 8.lr_scheduler修改学习率
|	|	|	|	|- 9.记录日志信息(datatime,itertime,各种loss,...)
|	|	|	|- after_iter(一次iter之后的操作)
|	|	|	|	|- 1.打印一个iter的日志信息(epoch,iter,losses,gpu mem,lr,eta,...)
|	|	|	|	|- 2.是否进行图片的resize，即多尺度训练
|	|	|- after_epoch(一个epoch之后的操作)
|	|	|	|- 1.保存模型
|	|	|	|- 2.是否进行，evaluator
|	|- after_train(训练之后的操作)
|	|	|- 输出最优的结果

3. Eval/Demo/Export流程

Eval/Demo/Export是Train中的子集，或者过程比较简单，此处不再赘述，看代码即可

三、环境搭建

1. 容器制作

本项目的实验环境全部是在Docker中完成的，docker镜像的制作过程如下

sudo docker pull nvcr.io/nvidia/pytorch:22.02-py3
sudo docker run -it -v /home/user/data:/root/temp nvcr.io/nvidia/pytorch:22.02-py3 bash

apt-get install net-tools
apt-get install inetutils-ping
apt install ssh
vim /etc/ssh/sshd_config
修改root可登录
把AIServer, AICore, setup, start.sh文件放到/ai目录下
ln -s /ai/data/AITorch/hub/ /root/.cache/torch/
ln -s /opt/conda/bin/python /usr/bin/python
ln -s /opt/conda/bin/pip /usr/bin/pip
pip install python-socketio eventlet nvgpu xlsxwriter # AIServer
pip install dotmap loguru albumentations torchcam timm torchsummary scikit-image xlsxwriter  # AICore

四、如何使用

以下只举例，分别列出了分类、分割、目标检测、异常检测中一个模型的train、evaluate、demo、export过程。

1. ImageClassification

1. trainval
# 单机单卡
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-sgdWarmupBiasBnWeight-clsDataloader-crossEntropyLoss-warmCosLr-clsEvaluator-gpus-trainval-linux.json

# 单机多卡
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 2 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-sgdWarmupBiasBnWeight-clsDataloader-crossEntropyLoss-warmCosLr-clsEvaluator-gpus-trainval-linux.json

# 多机多卡
# use master ip 10.1.130.111
CUDA_VISIBLE_DEVICES=0,1  NCCL_SOCKET_IFNAME=eth0 NCCL_IB_DISABLE=1 NCCL_DEBUG=INFO  python main.py  --dist-url 'tcp://10.1.130.111:803' --dist-backend 'nccl' --num_machines 2 --machine_rank 0 --devices 2 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-sgdWarmupBiasBnWeight-clsDataloader-crossEntropyLoss-warmCosLr-clsEvaluator-gpus-trainval-linux.json
CUDA_VISIBLE_DEVICES=0,1  NCCL_SOCKET_IFNAME=eth0 NCCL_IB_DISABLE=1 NCCL_DEBUG=INFO  python main.py  --dist-url 'tcp://10.1.130.111:803' --dist-backend 'nccl' --num_machines 2 --machine_rank 1 --devices 2 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-sgdWarmupBiasBnWeight-clsDataloader-crossEntropyLoss-warmCosLr-clsEvaluator-gpus-trainval-linux.json

# use docker ip 172.17.0.2
CUDA_VISIBLE_DEVICES=0,1  python main.py  --dist-url 'tcp://172.17.0.2:1234' --dist-backend 'nccl' --num_machines 2 --machine_rank 0 --devices 2 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-sgdWarmupBiasBnWeight-clsDataloader-crossEntropyLoss-warmCosLr-clsEvaluator-gpus-trainval-linux.json
CUDA_VISIBLE_DEVICES=0,1  python main.py  --dist-url 'tcp://172.17.0.2:1234' --dist-backend 'nccl' --num_machines 2 --machine_rank 1 --devices 2 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-sgdWarmupBiasBnWeight-clsDataloader-crossEntropyLoss-warmCosLr-clsEvaluator-gpus-trainval-linux.json

2. eval
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-ClsDataloader-ClsEvaluator-eval-linux.json

3. demo
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-demo-linux.json

4. export
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-export-linux.json

2. SemanticSegmentation

1. trainval
# 单机单卡
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/SemanticSegmentation/seg-UNet_resnet50-sgdWarmupBiasBnWeight-segDataset-crossEntropyLoss-warmCosLr-segEvaluator-gpus-trainval-linux.json

# 单机多卡
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 2 -c /ai/AICore/configs/SemanticSegmentation/seg-UNet_resnet50-sgdWarmupBiasBnWeight-segDataset-crossEntropyLoss-warmCosLr-segEvaluator-gpus-trainval-linux.json

# 多机多卡
# use master ip 10.1.130.111
CUDA_VISIBLE_DEVICES=0,1  NCCL_SOCKET_IFNAME=eth0 NCCL_IB_DISABLE=1 NCCL_DEBUG=INFO  python main.py  --dist-url 'tcp://10.1.130.111:803' --dist-backend 'nccl' --num_machines 2 --machine_rank 0 --devices 2 -c /ai/AICore/configs/SemanticSegmentation/seg-UNet_resnet50-sgdWarmupBiasBnWeight-segDataset-crossEntropyLoss-warmCosLr-segEvaluator-gpus-trainval-linux.json
CUDA_VISIBLE_DEVICES=0,1  NCCL_SOCKET_IFNAME=eth0 NCCL_IB_DISABLE=1 NCCL_DEBUG=INFO  python main.py  --dist-url 'tcp://10.1.130.111:803' --dist-backend 'nccl' --num_machines 2 --machine_rank 1 --devices 2 -c /ai/AICore/configs/SemanticSegmentation/seg-UNet_resnet50-sgdWarmupBiasBnWeight-segDataset-crossEntropyLoss-warmCosLr-segEvaluator-gpus-trainval-linux.json

# use docker ip 172.17.0.2
CUDA_VISIBLE_DEVICES=0,1  python main.py  --dist-url 'tcp://172.17.0.2:1234' --dist-backend 'nccl' --num_machines 2 --machine_rank 0 --devices 2 -c /ai/AICore/configs/SemanticSegmentation/seg-UNet_resnet50-sgdWarmupBiasBnWeight-segDataset-crossEntropyLoss-warmCosLr-segEvaluator-gpus-trainval-linux.json
CUDA_VISIBLE_DEVICES=0,1  python main.py  --dist-url 'tcp://172.17.0.2:1234' --dist-backend 'nccl' --num_machines 2 --machine_rank 1 --devices 2 -c /ai/AICore/configs/SemanticSegmentation/seg-UNet_resnet50-sgdWarmupBiasBnWeight-segDataset-crossEntropyLoss-warmCosLr-segEvaluator-gpus-trainval-linux.json

2. eval
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/SemanticSegmentation/seg-UNet_resnet50-segDataset-segEvaluator-eval-linux.json

3. demo
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/SemanticSegmentation/seg-UNet_resnet50-demo-linux.json

4. export
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/SemanticSegmentation/seg-UNet_resnet50-export-linux.json

3. ObjectDetection

TODO

1. trainval
# 单机单卡
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-sgdWarmupBiasBnWeight-clsDataloader-crossEntropyLoss-warmCosLr-clsEvaluator-gpus-trainval-linux.json

# 单机多卡
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 2 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-sgdWarmupBiasBnWeight-clsDataloader-crossEntropyLoss-warmCosLr-clsEvaluator-gpus-trainval-linux.json

# 多机多卡
# use master ip 10.1.130.111
CUDA_VISIBLE_DEVICES=0,1  NCCL_SOCKET_IFNAME=eth0 NCCL_IB_DISABLE=1 NCCL_DEBUG=INFO  python main.py  --dist-url 'tcp://10.1.130.111:803' --dist-backend 'nccl' --num_machines 2 --machine_rank 0 --devices 2 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-sgdWarmupBiasBnWeight-clsDataloader-crossEntropyLoss-warmCosLr-clsEvaluator-gpus-trainval-linux.json
CUDA_VISIBLE_DEVICES=0,1  NCCL_SOCKET_IFNAME=eth0 NCCL_IB_DISABLE=1 NCCL_DEBUG=INFO  python main.py  --dist-url 'tcp://10.1.130.111:803' --dist-backend 'nccl' --num_machines 2 --machine_rank 1 --devices 2 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-sgdWarmupBiasBnWeight-clsDataloader-crossEntropyLoss-warmCosLr-clsEvaluator-gpus-trainval-linux.json

# use docker ip 172.17.0.2
CUDA_VISIBLE_DEVICES=0,1  python main.py  --dist-url 'tcp://172.17.0.2:1234' --dist-backend 'nccl' --num_machines 2 --machine_rank 0 --devices 2 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-sgdWarmupBiasBnWeight-clsDataloader-crossEntropyLoss-warmCosLr-clsEvaluator-gpus-trainval-linux.json
CUDA_VISIBLE_DEVICES=0,1  python main.py  --dist-url 'tcp://172.17.0.2:1234' --dist-backend 'nccl' --num_machines 2 --machine_rank 1 --devices 2 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-sgdWarmupBiasBnWeight-clsDataloader-crossEntropyLoss-warmCosLr-clsEvaluator-gpus-trainval-linux.json

2. eval
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-ClsDataloader-ClsEvaluator-eval-linux.json

3. demo
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-demo-linux.json

4. export
CUDA_VISIBLE_DEVICES=0,1  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/ImageClassification/cls-efficientnetb0-export-linux.json

4. AnomalyDetection

1. trainval
# 单机单卡
CUDA_VISIBLE_DEVICES=0  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/AnomalyDetection/anomaly-PaDiM2_L-MVTecDataset-trainval-linux.json

2. demo
CUDA_VISIBLE_DEVICES=0  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/AnomalyDetection/anomaly-PaDiM2_L-MVTecDataset-demo-linux.json

4. export
CUDA_VISIBLE_DEVICES=0  python main.py --num_machines 1 --machine_rank 0 --devices 1 -c /ai/AICore/configs/AnomalyDetection/anomaly-PaDiM2_L-MVTecDataset-export-linux.json

五、支持组件

整个项目是由trainer(train,eval,demo,export), dataloader, dataset,model,optimizer,loss,scheduler,evaluator组件组成。其中，trainer负责将其他组件拼接起来，接下来会说明一些组件，但是不是每个都说下，详细请看代码。

Dataset

Classification

分类数据集

        data_dir:str  数据集文件夹路径，文件夹要求是
            |-dataset
                |- 类别1
                    |-图片
                |- 类别2
                |- ......
                |- train.txt
                |- val.txt
                |- test.txt
                |- labels.txt

        image_set:str "train.txt", "val.txt" or "test.txt"
        in_channels:int  输入图片的通道数，目前只支持1和3通道
        input_size:tuple 输入图片的HW
        preproc:albumentations.Compose 对图片进行预处理
        cache:bool 是否对图片进行内存缓存
        separator:str labels.txt, train.txt, val.txt, test.txt 的分割符（name与id）
        images_suffix:list[str] 可接受的图片后缀

Segmentation

分割数据集

        data_dir:str  数据集文件夹路径，文件夹要求是
            |-dataset
                |- images
                    |-图片
                |- masks
                    |-图片
                |- train.txt
                |- val.txt
                |- test.txt
                |- labels.txt

        image_set:str "train.txt or val.txt or test.txt"
        in_channels:int  输入图片的通道数，目前只支持1和3通道
        input_size:tuple 输入图片的HW
        preproc:albumentations.Compose 对图片进行预处理
        cache:bool 是否对图片进行内存缓存
        images_suffix:str 可接受的图片后缀
        mask_suffix:str 可接受的图片后缀

AnomalyDetection

        异常检测数据集，（MVTecDataset类型）

        data_dir:str  数据集文件夹路径，文件夹要求是
            📂datasets 数据集名称
              ┣ 📂 ground_truth  test测试文件夹对应的mask
              ┃     ┣ 📂 defective_type_1    异常类别1 mask（0，255）
              ┃     ┗ 📂 defective_type_2    异常类别2 mask
              ┣ 📂 test  测试文件夹
              ┃     ┣ 📂 defective_type_1    异常类别1 图片
              ┃     ┣ 📂 defective_type_2    异常类别2 图片
              ┃     ┗ 📂 good
              ┗ 📂 train 训练文件夹
              ┃     ┗ 📂 good

        preproc:albumentations.Compose 对图片进行预处理
        image_set:str "train.txt or val.txt or test.txt"； train.txt是训练，其余是测试
        in_channels:int  输入图片的通道数，目前只支持1和3通道
        cache:bool 是否对图片进行内存缓存
        image_suffix:str 可接受的图片后缀
        mask_suffix:str 可接受的图片后缀

ObjectDetection

bboxes格式：https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/
pascal_voc
[x_min,y_min,x_max,y_max]->[98,345,420,462]

albumentations
normalized[x_min,y_min,x_max,y_max]->[0.153125,0.71875,0.65625,0.9625]

coco
[x_min,y_min,width,height]->[98,345,322,117]

yolo
normalized[x_center,y_center,width,height]->[0.4046875, 0.8614583,0.503125,0.24375]

我采用的是albumentations格式作为ObjectDetection的bbox数据格式, 即：
normalized[x_min,y_min,x_max,y_max, class_id] ->[0.153125,0.71875,0.65625,0.9625,15]
然后dataset读取的格式为yolo格式，即：
normalized[cx,cy,w,h], Dataset的__getitem__返回为image:ndarray(c,h,w), labels:ndarray normalized[batchSize_id, class_id,cx,cy,w,h], image_path:(str, str)


 Function: 目标检测数据集

        data_dir:str  数据集文件夹路径，文件夹要求如下
            |-dataset
                |- images   存放所有图片的文件夹
                    |-图片
                |- labels   存放所有标注文件的图片
                    |-txt文件
                |- train.txt    训练集相对路径
                |- val.txt      验证集相对路径
                |- test.txt     测试集相对路径
                |- labels.txt   标签

        image_set:str "train.txt or val.txt or test.txt"
        in_channels:int  输入图片的通道数，目前只支持1和3通道
        input_size:tuple 输入图片的HW, 当需要cache image时需要，此处用不到
        preproc:albumentations.Compose 对图片进行预处理
        preproc_pixel:albumentations.Compose 对图片进行预处理, 针对COCO数据集中无bbox情况
        cache:bool 是否对图片进行内存缓存
        images_suffix:str 可接受的图片后缀
        mask_suffix:str 可接受的图片后缀

Models

1. BackBone:notes

timm:code ref

2. ImageClassification:notes

分类全部在timm基础上修改的, 因此code ref同上

efficientnetb0
efficientnetb1
resnet18
resnet34
resnet50
resnet101

效果对比

模型	外观数据PZ-验证集top1	外观数据PZ-测试集top1	权重

3. SemanticSegmentation:notes

Unet:code ref
Unet++:code ref
PSPNet:code ref
PSPNet2:code ref
DeepLabv3:code ref
DeepLabv3Plus:code ref
DeepLabv3Plus2:code ref

效果对比

注意：以下实验均是跑通即可，均未调优，epoch为80.

模型	主干网络	PascalVoc-验证集mIoU	PascalVoc -测试集mIoU
UNet	resnet50	0.69	未测
UNet++	resnet50	0.70	未测
PSPNet	resnet50	0.72	未测
PSPNet2	resnet50	0.76	未测
DeepLabV3	resnet50	0.76	未测
DeepLabV3Plus	resnet50	0.76	未测
DeepLabV3Plus2	resnet50	0.75	未测

4. ObjectDetection:notes

YoloV3 code ref|code ref2
YOLOX code ref

5. AnomalyDetection:notes

PaDim code ref
PaDiM2 code ref

Loss:notes

CrossEntropyLoss: 多分类交叉熵
Binary Cross-Entropy：二进制交叉熵损失函数
Weighted Binary Cross-Entropy：加权交叉熵损失函数
Balanced Cross-Entropy：平衡交叉熵损失函数
Focal Loss: code ref

Multi-GPU

分布式组件

Pytorch原生分布式
- DDP原理1: https://zhuanlan.zhihu.com/p/76638962
- DDP原理2: https://zhuanlan.zhihu.com/p/343951042
- DDP随机种子: https://bbs.cvmart.net/articles/5491

六、部署说明

深度学习模型训练完毕后需要部署在嵌入式、主机、手机等设备上，部署过程请参考我的另一个项目AIDeploy

七、扩充步骤

扩充步骤

一个完整的训练包括dataset, dataloader, model, evaluator,loss,optimizer,scheduler,trainer等组件，因此，完成一个训练需要把这些组件都写一遍，在AICore这个项目中，编写顺序是：

dataset：依照参考代码，确定数据的输出格式和增强方式，编写Dataset类
dataloader: 依照参考代码，确定dataloader的所有参数，包括collect_fn等参数，编写Dataloader类
model: 依照参考代码，确定model的构造函数和forward函数返回值，编写model类

trainer：依照参考代码和已有trainer过程，编写trainer代码，并补充所需组件

    def run(self):
        self._before_train()
        # epochs
        for self.epoch in range(self.start_epoch, self.max_epoch):
            self._before_epoch()
            # iters
            for self.iter in range(self.max_iter):
                self._before_iter()
                self._train_one_iter()
                self._after_iter()
            self._after_epoch()
        self._after_train()
       
    
    1.self._before_train()
    	1.Logger Setting
        2.Model Setting;
        3.Optimizer Setting;
        4.Resume setting;
        5.DataLoader Setting;
        6.Loss Setting;
        7.Scheduler Setting;
        8.Evaluator Setting;

八、Debug

1. 问题1：

ONNX不支持adaptive_avg_pool2d

# RuntimeError: Unsupported: ONNX export of operator adaptive_avg_pool2d,
# since output size is not factor of input size.
# Please feel free to request support or submit a pull request on PyTorch GitHub.

解决方法

替换nn.AdaptiveAvg2d
参考1

hsaigroup/AICore

AI 核心库

一、项目由来

二、项目结构

1. 目录介绍

2. Train流程

3. Eval/Demo/Export流程

三、环境搭建

1. 容器制作

四、如何使用

1. ImageClassification

2. SemanticSegmentation

3. ObjectDetection

4. AnomalyDetection

五、支持组件

Dataset

Classification

Segmentation

AnomalyDetection

ObjectDetection

Models

1. BackBone:notes

2. ImageClassification:notes

效果对比

3. SemanticSegmentation:notes

效果对比

4. ObjectDetection:notes

5. AnomalyDetection:notes

Loss:notes

Multi-GPU

六、部署说明

七、扩充步骤

扩充步骤

八、Debug

1. 问题1：