无监督实现有？

Question

无监督实现有？

Opened this issue 9 months ago · 9 comments

Betricy commented 9 months ago

请问可以提供无监督的实现细节以及log.txt吗？我自己修改的无监督代码，不能达到论文的指标，只有80%多

Answer 1 · 2024-01-09T08:57:35.000Z

你好！
无监督部分的代码还没有整理好，但可以参考如下工作的无监督实现，与我们的实现pipeline基本一致：

TMGF：https://github.com/RikoLi/WACV23-workshop-TMGF/tree/main
ClusterContrast：https://github.com/alibaba/cluster-contrast-reid

另外，无监督方法下性能受聚类相关参数、contrastive temperature影响较明显，可参考上述工作中的设定。
祝工作顺利！

Answer 2 · 2024-01-09T08:59:22.000Z

你好！无监督部分的代码还没有整理好，但可以参考如下工作的无监督实现，与我们的实现pipeline基本一致：

TMGF：https://github.com/RikoLi/WACV23-workshop-TMGF/tree/main

ClusterContrast：https://github.com/alibaba/cluster-contrast-reid

另外，无监督方法下性能受聚类相关参数、contrastive temperature影响较明显，可参考上述工作中的设定。祝工作顺利！

好的，非常感谢，我调一调参数试一试

Answer 3 · 2024-02-14T08:11:33.000Z

Hello @RikoLi, great work! In addition to replacing the backbone of the O2CAP with PCL-CLIP, do I need to adjust any parameters? Should I load any pretrained weights?

By only changing the backbone, I have encountered difficulty in making the model converge on the MSMT17 dataset.

The following is the code of the backbone that I used. Here I assume that cfg.MODEL.SIE_CAMERA and cfg.MODEL.SIE_VIEW are both false as default.

import torch
import torch.nn as nn
from timm.models.layers import trunc_normal_


def weights_init_kaiming(m):
    classname = m.__class__.__name__
    if classname.find("Linear") != -1:
        nn.init.kaiming_normal_(m.weight, a=0, mode="fan_out")
        nn.init.constant_(m.bias, 0.0)

    elif classname.find("Conv") != -1:
        nn.init.kaiming_normal_(m.weight, a=0, mode="fan_in")
        if m.bias is not None:
            nn.init.constant_(m.bias, 0.0)
    elif classname.find("BatchNorm") != -1:
        if m.affine:
            nn.init.constant_(m.weight, 1.0)
            nn.init.constant_(m.bias, 0.0)


def weights_init_classifier(m):
    classname = m.__class__.__name__
    if classname.find("Linear") != -1:
        nn.init.normal_(m.weight, std=0.001)
        if m.bias:
            nn.init.constant_(m.bias, 0.0)


import clip.clip as clip


def load_clip_to_cpu(backbone_name, h_resolution, w_resolution, vision_stride_size):
    url = clip._MODELS[backbone_name]
    model_path = clip._download(url)

    try:
        # loading JIT archive
        model = torch.jit.load(model_path, map_location="cpu").eval()
        state_dict = None

    except RuntimeError:
        state_dict = torch.load(model_path, map_location="cpu")

    model = clip.build_model(
        state_dict or model.state_dict(), h_resolution, w_resolution, vision_stride_size
    )

    return model


class ClipPCLReID(nn.Module):
    def __init__(self):
        super(ClipPCLReID, self).__init__()
        self.model_name = "ViT-B-16"
        self.STRIDE_SIZE = [16, 16]
        self.SIZE_TRAIN = [256, 128]
        self.in_planes = 768
        self.in_planes_proj = 512
        self.num_features = self.in_planes_proj

        self.bottleneck = nn.BatchNorm1d(self.in_planes)
        self.bottleneck.bias.requires_grad_(False)
        self.bottleneck.apply(weights_init_kaiming)

        self.bottleneck_proj = nn.BatchNorm1d(self.in_planes_proj)
        self.bottleneck_proj.bias.requires_grad_(False)
        self.bottleneck_proj.apply(weights_init_kaiming)

        self.h_resolution = int((self.SIZE_TRAIN[0] - 16) // self.STRIDE_SIZE[0] + 1)
        self.w_resolution = int((self.SIZE_TRAIN[1] - 16) // self.STRIDE_SIZE[1] + 1)
        self.vision_stride_size = self.STRIDE_SIZE[0]
        clip_model = load_clip_to_cpu(
            self.model_name,
            self.h_resolution,
            self.w_resolution,
            self.vision_stride_size,
        )
        clip_model.to("cuda")

        self.image_encoder = clip_model.visual

        # Trick: freeze patch projection for improved stability
        # https://arxiv.org/pdf/2104.02057.pdf
        for _, v in self.image_encoder.conv1.named_parameters():
            v.requires_grad_(False)
        print(
            "Freeze patch projection layer with shape {}".format(
                self.image_encoder.conv1.weight.shape
            )
        )

    def forward(self, x):

        cv_embed = None
        (
            _,
            image_features,
            image_features_proj,
        ) = self.image_encoder(x, cv_embed)
        img_feature = image_features[:, 0]
        img_feature_proj = image_features_proj[:, 0]

        feat = self.bottleneck(img_feature)
        feat_proj = self.bottleneck_proj(img_feature_proj)

        out_feat = torch.cat([feat, feat_proj], dim=1)
        return out_feat

    def load_param(self, trained_path):
        param_dict = torch.load(trained_path)
        for i in param_dict:
            if not self.training and "classifier" in i:
                continue  # ignore classifier weights in evaluation
            self.state_dict()[i.replace("module.", "")].copy_(param_dict[i])
        print("Loading pretrained model from {}".format(trained_path))

    def load_param_finetune(self, model_path):
        param_dict = torch.load(model_path)
        for i in param_dict:
            self.state_dict()[i].copy_(param_dict[i])
        print("Loading pretrained model for finetuning from {}".format(model_path))


# def make_model(cfg, num_classes, camera_num, view_num):
#     model = TransReID(num_classes, camera_num, view_num, cfg)
#     return model


def clip_base(**kwargs):
    model = ClipPCLReID()
    return model

Answer 4 · 2024-03-29T02:28:02.000Z

Hello @RikoLi, great work! In addition to replacing the backbone of the O2CAP with PCL-CLIP, do I need to adjust any parameters? Should I load any pretrained weights?

By only changing the backbone, I have encountered difficulty in making the model converge on the MSMT17 dataset.

The following is the code of the backbone that I used. Here I assume that cfg.MODEL.SIE_CAMERA and cfg.MODEL.SIE_VIEW are both false as default.

import torch
import torch.nn as nn
from timm.models.layers import trunc_normal_


def weights_init_kaiming(m):
    classname = m.__class__.__name__
    if classname.find("Linear") != -1:
        nn.init.kaiming_normal_(m.weight, a=0, mode="fan_out")
        nn.init.constant_(m.bias, 0.0)

    elif classname.find("Conv") != -1:
        nn.init.kaiming_normal_(m.weight, a=0, mode="fan_in")
        if m.bias is not None:
            nn.init.constant_(m.bias, 0.0)
    elif classname.find("BatchNorm") != -1:
        if m.affine:
            nn.init.constant_(m.weight, 1.0)
            nn.init.constant_(m.bias, 0.0)


def weights_init_classifier(m):
    classname = m.__class__.__name__
    if classname.find("Linear") != -1:
        nn.init.normal_(m.weight, std=0.001)
        if m.bias:
            nn.init.constant_(m.bias, 0.0)


import clip.clip as clip


def load_clip_to_cpu(backbone_name, h_resolution, w_resolution, vision_stride_size):
    url = clip._MODELS[backbone_name]
    model_path = clip._download(url)

    try:
        # loading JIT archive
        model = torch.jit.load(model_path, map_location="cpu").eval()
        state_dict = None

    except RuntimeError:
        state_dict = torch.load(model_path, map_location="cpu")

    model = clip.build_model(
        state_dict or model.state_dict(), h_resolution, w_resolution, vision_stride_size
    )

    return model


class ClipPCLReID(nn.Module):
    def __init__(self):
        super(ClipPCLReID, self).__init__()
        self.model_name = "ViT-B-16"
        self.STRIDE_SIZE = [16, 16]
        self.SIZE_TRAIN = [256, 128]
        self.in_planes = 768
        self.in_planes_proj = 512
        self.num_features = self.in_planes_proj

        self.bottleneck = nn.BatchNorm1d(self.in_planes)
        self.bottleneck.bias.requires_grad_(False)
        self.bottleneck.apply(weights_init_kaiming)

        self.bottleneck_proj = nn.BatchNorm1d(self.in_planes_proj)
        self.bottleneck_proj.bias.requires_grad_(False)
        self.bottleneck_proj.apply(weights_init_kaiming)

        self.h_resolution = int((self.SIZE_TRAIN[0] - 16) // self.STRIDE_SIZE[0] + 1)
        self.w_resolution = int((self.SIZE_TRAIN[1] - 16) // self.STRIDE_SIZE[1] + 1)
        self.vision_stride_size = self.STRIDE_SIZE[0]
        clip_model = load_clip_to_cpu(
            self.model_name,
            self.h_resolution,
            self.w_resolution,
            self.vision_stride_size,
        )
        clip_model.to("cuda")

        self.image_encoder = clip_model.visual

        # Trick: freeze patch projection for improved stability
        # https://arxiv.org/pdf/2104.02057.pdf
        for _, v in self.image_encoder.conv1.named_parameters():
            v.requires_grad_(False)
        print(
            "Freeze patch projection layer with shape {}".format(
                self.image_encoder.conv1.weight.shape
            )
        )

    def forward(self, x):

        cv_embed = None
        (
            _,
            image_features,
            image_features_proj,
        ) = self.image_encoder(x, cv_embed)
        img_feature = image_features[:, 0]
        img_feature_proj = image_features_proj[:, 0]

        feat = self.bottleneck(img_feature)
        feat_proj = self.bottleneck_proj(img_feature_proj)

        out_feat = torch.cat([feat, feat_proj], dim=1)
        return out_feat

    def load_param(self, trained_path):
        param_dict = torch.load(trained_path)
        for i in param_dict:
            if not self.training and "classifier" in i:
                continue  # ignore classifier weights in evaluation
            self.state_dict()[i.replace("module.", "")].copy_(param_dict[i])
        print("Loading pretrained model from {}".format(trained_path))

    def load_param_finetune(self, model_path):
        param_dict = torch.load(model_path)
        for i in param_dict:
            self.state_dict()[i].copy_(param_dict[i])
        print("Loading pretrained model for finetuning from {}".format(model_path))


# def make_model(cfg, num_classes, camera_num, view_num):
#     model = TransReID(num_classes, camera_num, view_num, cfg)
#     return model


def clip_base(**kwargs):
    model = ClipPCLReID()
    return model

Hello, I also encountered the same problem about the MSMT dataset. Have you found the problem now?

Answer 5 · 2024-03-29T03:49:10.000Z

你好！无监督部分的代码还没有整理好，但可以参考如下工作的无监督实现，与我们的实现pipeline基本一致：

TMGF：https://github.com/RikoLi/WACV23-workshop-TMGF/tree/main

ClusterContrast：https://github.com/alibaba/cluster-contrast-reid

另外，无监督方法下性能受聚类相关参数、contrastive temperature影响较明显，可参考上述工作中的设定。祝工作顺利！

@RikoLi 你好，我的无监督修改，对于market数据集可以正常工作，但是msmt数据集我还是没办法运行得很好,mAP一般收敛在20多就结束了，希望能得到您的帮助

Answer 6 · 2024-03-29T05:58:59.000Z

For MSMT17 unsupervised training, you can try following configs:

Use k-reciprocal jaccard distance based DBSCAN clustering (see O2CAP implementations) with eps=0.7, k1=30, k2=6. Note: clustering parameters are important to performance.
Freeze patch projection layer in training.

Answer 7 · 2024-03-29T08:59:15.000Z

对于MSMT17无监督训练，您可以尝试以下配置：

使用基于 k 倒数杰卡德距离的 DBSCAN 聚类（请参阅 O2CAP 实现），eps=0.7、k1=30、k2=6。注意：聚类参数对性能很重要。

在训练中冻结补丁投影层。

主干预训练权重（来自 TransReID-SSL）：https://drive.google.com/file/d/18FL9JaJNlo15-UksalcJRXX-0dgo4Mz4/view ?usp=sharing
但是这个预训练主干不是clip吗？怎么用transreid-ssl的呢？
@RikoLi

Answer 8 · 2024-03-29T15:29:19.000Z

对于MSMT17无监督训练，您可以尝试以下配置：

使用基于 k 倒数杰卡德距离的 DBSCAN 聚类（请参阅 O2CAP 实现），eps=0.7、k1=30、k2=6。注意：聚类参数对性能很重要。

在训练中冻结补丁投影层。

主干预训练权重（来自 TransReID-SSL）：https://drive.google.com/file/d/18FL9JaJNlo15-UksalcJRXX-0dgo4Mz4/view ?usp=sharing
但是这个预训练主干不是clip吗？怎么用transreid-ssl的呢？
@RikoLi

抱歉，和另一个开源代码回复混了😂，感谢提醒，我会将它改正！
这里权重使用的就是CLIP image encoder的预训练权重。
如 #1 中所述，以上两点1和2对于最终性能比较重要，对于无监督，聚类参数尤其重要。

Answer 9 · 2024-05-11T16:24:09.000Z

After the following changes, I managed to reproduce MSMT17 result on O2CAP.

change the optimizer to SGD
normalize the output

       out_feat = torch.nn.functional.normalize(out_feat)