无监督实现有?
Opened this issue · 9 comments
请问可以提供无监督的实现细节以及log.txt吗?我自己修改的无监督代码,不能达到论文的指标,只有80%多
你好!
无监督部分的代码还没有整理好,但可以参考如下工作的无监督实现,与我们的实现pipeline基本一致:
- TMGF:https://github.com/RikoLi/WACV23-workshop-TMGF/tree/main
- ClusterContrast:https://github.com/alibaba/cluster-contrast-reid
另外,无监督方法下性能受聚类相关参数、contrastive temperature影响较明显,可参考上述工作中的设定。
祝工作顺利!
你好! 无监督部分的代码还没有整理好,但可以参考如下工作的无监督实现,与我们的实现pipeline基本一致:
- TMGF:https://github.com/RikoLi/WACV23-workshop-TMGF/tree/main
- ClusterContrast:https://github.com/alibaba/cluster-contrast-reid
另外,无监督方法下性能受聚类相关参数、contrastive temperature影响较明显,可参考上述工作中的设定。 祝工作顺利!
好的,非常感谢,我调一调参数试一试
Hello @RikoLi, great work! In addition to replacing the backbone of the O2CAP with PCL-CLIP, do I need to adjust any parameters? Should I load any pretrained weights?
By only changing the backbone, I have encountered difficulty in making the model converge on the MSMT17 dataset.
The following is the code of the backbone that I used. Here I assume that cfg.MODEL.SIE_CAMERA and cfg.MODEL.SIE_VIEW are both false as default.
import torch
import torch.nn as nn
from timm.models.layers import trunc_normal_
def weights_init_kaiming(m):
classname = m.__class__.__name__
if classname.find("Linear") != -1:
nn.init.kaiming_normal_(m.weight, a=0, mode="fan_out")
nn.init.constant_(m.bias, 0.0)
elif classname.find("Conv") != -1:
nn.init.kaiming_normal_(m.weight, a=0, mode="fan_in")
if m.bias is not None:
nn.init.constant_(m.bias, 0.0)
elif classname.find("BatchNorm") != -1:
if m.affine:
nn.init.constant_(m.weight, 1.0)
nn.init.constant_(m.bias, 0.0)
def weights_init_classifier(m):
classname = m.__class__.__name__
if classname.find("Linear") != -1:
nn.init.normal_(m.weight, std=0.001)
if m.bias:
nn.init.constant_(m.bias, 0.0)
import clip.clip as clip
def load_clip_to_cpu(backbone_name, h_resolution, w_resolution, vision_stride_size):
url = clip._MODELS[backbone_name]
model_path = clip._download(url)
try:
# loading JIT archive
model = torch.jit.load(model_path, map_location="cpu").eval()
state_dict = None
except RuntimeError:
state_dict = torch.load(model_path, map_location="cpu")
model = clip.build_model(
state_dict or model.state_dict(), h_resolution, w_resolution, vision_stride_size
)
return model
class ClipPCLReID(nn.Module):
def __init__(self):
super(ClipPCLReID, self).__init__()
self.model_name = "ViT-B-16"
self.STRIDE_SIZE = [16, 16]
self.SIZE_TRAIN = [256, 128]
self.in_planes = 768
self.in_planes_proj = 512
self.num_features = self.in_planes_proj
self.bottleneck = nn.BatchNorm1d(self.in_planes)
self.bottleneck.bias.requires_grad_(False)
self.bottleneck.apply(weights_init_kaiming)
self.bottleneck_proj = nn.BatchNorm1d(self.in_planes_proj)
self.bottleneck_proj.bias.requires_grad_(False)
self.bottleneck_proj.apply(weights_init_kaiming)
self.h_resolution = int((self.SIZE_TRAIN[0] - 16) // self.STRIDE_SIZE[0] + 1)
self.w_resolution = int((self.SIZE_TRAIN[1] - 16) // self.STRIDE_SIZE[1] + 1)
self.vision_stride_size = self.STRIDE_SIZE[0]
clip_model = load_clip_to_cpu(
self.model_name,
self.h_resolution,
self.w_resolution,
self.vision_stride_size,
)
clip_model.to("cuda")
self.image_encoder = clip_model.visual
# Trick: freeze patch projection for improved stability
# https://arxiv.org/pdf/2104.02057.pdf
for _, v in self.image_encoder.conv1.named_parameters():
v.requires_grad_(False)
print(
"Freeze patch projection layer with shape {}".format(
self.image_encoder.conv1.weight.shape
)
)
def forward(self, x):
cv_embed = None
(
_,
image_features,
image_features_proj,
) = self.image_encoder(x, cv_embed)
img_feature = image_features[:, 0]
img_feature_proj = image_features_proj[:, 0]
feat = self.bottleneck(img_feature)
feat_proj = self.bottleneck_proj(img_feature_proj)
out_feat = torch.cat([feat, feat_proj], dim=1)
return out_feat
def load_param(self, trained_path):
param_dict = torch.load(trained_path)
for i in param_dict:
if not self.training and "classifier" in i:
continue # ignore classifier weights in evaluation
self.state_dict()[i.replace("module.", "")].copy_(param_dict[i])
print("Loading pretrained model from {}".format(trained_path))
def load_param_finetune(self, model_path):
param_dict = torch.load(model_path)
for i in param_dict:
self.state_dict()[i].copy_(param_dict[i])
print("Loading pretrained model for finetuning from {}".format(model_path))
# def make_model(cfg, num_classes, camera_num, view_num):
# model = TransReID(num_classes, camera_num, view_num, cfg)
# return model
def clip_base(**kwargs):
model = ClipPCLReID()
return model
Hello @RikoLi, great work! In addition to replacing the backbone of the O2CAP with PCL-CLIP, do I need to adjust any parameters? Should I load any pretrained weights?
By only changing the backbone, I have encountered difficulty in making the model converge on the MSMT17 dataset.
The following is the code of the backbone that I used. Here I assume that cfg.MODEL.SIE_CAMERA and cfg.MODEL.SIE_VIEW are both false as default.
import torch import torch.nn as nn from timm.models.layers import trunc_normal_ def weights_init_kaiming(m): classname = m.__class__.__name__ if classname.find("Linear") != -1: nn.init.kaiming_normal_(m.weight, a=0, mode="fan_out") nn.init.constant_(m.bias, 0.0) elif classname.find("Conv") != -1: nn.init.kaiming_normal_(m.weight, a=0, mode="fan_in") if m.bias is not None: nn.init.constant_(m.bias, 0.0) elif classname.find("BatchNorm") != -1: if m.affine: nn.init.constant_(m.weight, 1.0) nn.init.constant_(m.bias, 0.0) def weights_init_classifier(m): classname = m.__class__.__name__ if classname.find("Linear") != -1: nn.init.normal_(m.weight, std=0.001) if m.bias: nn.init.constant_(m.bias, 0.0) import clip.clip as clip def load_clip_to_cpu(backbone_name, h_resolution, w_resolution, vision_stride_size): url = clip._MODELS[backbone_name] model_path = clip._download(url) try: # loading JIT archive model = torch.jit.load(model_path, map_location="cpu").eval() state_dict = None except RuntimeError: state_dict = torch.load(model_path, map_location="cpu") model = clip.build_model( state_dict or model.state_dict(), h_resolution, w_resolution, vision_stride_size ) return model class ClipPCLReID(nn.Module): def __init__(self): super(ClipPCLReID, self).__init__() self.model_name = "ViT-B-16" self.STRIDE_SIZE = [16, 16] self.SIZE_TRAIN = [256, 128] self.in_planes = 768 self.in_planes_proj = 512 self.num_features = self.in_planes_proj self.bottleneck = nn.BatchNorm1d(self.in_planes) self.bottleneck.bias.requires_grad_(False) self.bottleneck.apply(weights_init_kaiming) self.bottleneck_proj = nn.BatchNorm1d(self.in_planes_proj) self.bottleneck_proj.bias.requires_grad_(False) self.bottleneck_proj.apply(weights_init_kaiming) self.h_resolution = int((self.SIZE_TRAIN[0] - 16) // self.STRIDE_SIZE[0] + 1) self.w_resolution = int((self.SIZE_TRAIN[1] - 16) // self.STRIDE_SIZE[1] + 1) self.vision_stride_size = self.STRIDE_SIZE[0] clip_model = load_clip_to_cpu( self.model_name, self.h_resolution, self.w_resolution, self.vision_stride_size, ) clip_model.to("cuda") self.image_encoder = clip_model.visual # Trick: freeze patch projection for improved stability # https://arxiv.org/pdf/2104.02057.pdf for _, v in self.image_encoder.conv1.named_parameters(): v.requires_grad_(False) print( "Freeze patch projection layer with shape {}".format( self.image_encoder.conv1.weight.shape ) ) def forward(self, x): cv_embed = None ( _, image_features, image_features_proj, ) = self.image_encoder(x, cv_embed) img_feature = image_features[:, 0] img_feature_proj = image_features_proj[:, 0] feat = self.bottleneck(img_feature) feat_proj = self.bottleneck_proj(img_feature_proj) out_feat = torch.cat([feat, feat_proj], dim=1) return out_feat def load_param(self, trained_path): param_dict = torch.load(trained_path) for i in param_dict: if not self.training and "classifier" in i: continue # ignore classifier weights in evaluation self.state_dict()[i.replace("module.", "")].copy_(param_dict[i]) print("Loading pretrained model from {}".format(trained_path)) def load_param_finetune(self, model_path): param_dict = torch.load(model_path) for i in param_dict: self.state_dict()[i].copy_(param_dict[i]) print("Loading pretrained model for finetuning from {}".format(model_path)) # def make_model(cfg, num_classes, camera_num, view_num): # model = TransReID(num_classes, camera_num, view_num, cfg) # return model def clip_base(**kwargs): model = ClipPCLReID() return model
Hello, I also encountered the same problem about the MSMT dataset. Have you found the problem now?
你好! 无监督部分的代码还没有整理好,但可以参考如下工作的无监督实现,与我们的实现pipeline基本一致:
- TMGF:https://github.com/RikoLi/WACV23-workshop-TMGF/tree/main
- ClusterContrast:https://github.com/alibaba/cluster-contrast-reid
另外,无监督方法下性能受聚类相关参数、contrastive temperature影响较明显,可参考上述工作中的设定。 祝工作顺利!
@RikoLi 你好,我的无监督修改,对于market数据集可以正常工作,但是msmt数据集我还是没办法运行得很好,mAP一般收敛在20多就结束了,希望能得到您的帮助
For MSMT17 unsupervised training, you can try following configs:
- Use k-reciprocal jaccard distance based DBSCAN clustering (see O2CAP implementations) with eps=0.7, k1=30, k2=6. Note: clustering parameters are important to performance.
- Freeze patch projection layer in training.
对于MSMT17无监督训练,您可以尝试以下配置:
- 使用基于 k 倒数杰卡德距离的 DBSCAN 聚类(请参阅 O2CAP 实现),eps=0.7、k1=30、k2=6。注意:聚类参数对性能很重要。
- 在训练中冻结补丁投影层。
- 主干预训练权重(来自 TransReID-SSL):https://drive.google.com/file/d/18FL9JaJNlo15-UksalcJRXX-0dgo4Mz4/view ?usp=sharing
但是这个预训练主干不是clip吗?怎么用transreid-ssl的呢?
@RikoLi
对于MSMT17无监督训练,您可以尝试以下配置:
- 使用基于 k 倒数杰卡德距离的 DBSCAN 聚类(请参阅 O2CAP 实现),eps=0.7、k1=30、k2=6。注意:聚类参数对性能很重要。
- 在训练中冻结补丁投影层。
- 主干预训练权重(来自 TransReID-SSL):https://drive.google.com/file/d/18FL9JaJNlo15-UksalcJRXX-0dgo4Mz4/view ?usp=sharing
但是这个预训练主干不是clip吗?怎么用transreid-ssl的呢?
@RikoLi
抱歉,和另一个开源代码回复混了😂,感谢提醒,我会将它改正!
这里权重使用的就是CLIP image encoder的预训练权重。
如 #1 中所述,以上两点1和2对于最终性能比较重要,对于无监督,聚类参数尤其重要。
After the following changes, I managed to reproduce MSMT17 result on O2CAP.
- change the optimizer to SGD
- normalize the output
out_feat = torch.nn.functional.normalize(out_feat)