amzn/pecos

Could not run 'aten::_foreach_norm.Scalar' with arguments from the 'SparseCUDA' backend

runningabcd opened this issue · 6 comments

Description

When I train xtransformer with pecos model, a training error occurs

Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForXMC: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight']

  • This IS expected if you are initializing BertForXMC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertForXMC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    /usr/local/lib/python3.10/dist-packages/transformers/optimization.py:407: FutureWarning: This implementation of AdamW is deprecated and will be removed in a future version. Use the PyTorch implementation torch.optim.AdamW instead, or set no_deprecation_warning=True to disable this warning
    warnings.warn(
    Constructed training corpus len=679174, training label matrix with shape=(679174, 679174) and nnz=1429299
    Constructed training feature matrix with shape=(679174, 1134376) and nnz=1195014
    training start >>>>>>>>
    Some weights of the model checkpoint at bert-base-cased were not used when initializing BertForXMC: ['cls.seq_relationship.bias', 'cls.predictions.transform.LayerNorm.bias', 'cls.predictions.transform.dense.bias', 'cls.predictions.transform.dense.weight', 'cls.predictions.transform.LayerNorm.weight', 'cls.predictions.decoder.weight', 'cls.predictions.bias', 'cls.seq_relationship.weight']
  • This IS expected if you are initializing BertForXMC from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertForXMC from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Traceback (most recent call last):
    File "/home/Extreme_Label_Classification/tfidf/train.py", line 38, in
    custom_xtf = XTransformer.train(prob)
    File "/usr/local/lib/python3.10/dist-packages/pecos/xmc/xtransformer/model.py", line 447, in train
    res_dict = TransformerMatcher.train(
    File "/usr/local/lib/python3.10/dist-packages/pecos/xmc/xtransformer/matcher.py", line 1382, in train
    matcher.fine_tune_encoder(prob, val_prob=val_prob, val_csr_codes=val_csr_codes)
    File "/usr/local/lib/python3.10/dist-packages/pecos/xmc/xtransformer/matcher.py", line 1122, in fine_tune_encoder
    torch.nn.utils.clip_grad_norm_(
    File "/usr/local/lib/python3.10/dist-packages/torch/nn/utils/clip_grad.py", line 55, in clip_grad_norm_
    norms.extend(torch._foreach_norm(grads, norm_type))
    NotImplementedError: Could not run 'aten::_foreach_norm.Scalar' with arguments from the 'SparseCUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_foreach_norm.Scalar' is only available for these backends: [CPU, CUDA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

CPU: registered at aten/src/ATen/RegisterCPU.cpp:31034 [kernel]
CUDA: registered at aten/src/ATen/RegisterCUDA.cpp:43986 [kernel]
BackendSelect: fallthrough registered at ../aten/src/ATen/core/BackendSelectFallbackKernel.cpp:3 [backend fallback]
Python: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:144 [backend fallback]
FuncTorchDynamicLayerBackMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:491 [backend fallback]
Functionalize: registered at ../aten/src/ATen/FunctionalizeFallbackKernel.cpp:280 [backend fallback]
Named: registered at ../aten/src/ATen/core/NamedRegistrations.cpp:7 [backend fallback]
Conjugate: registered at ../aten/src/ATen/ConjugateFallback.cpp:17 [backend fallback]
Negative: registered at ../aten/src/ATen/native/NegateFallback.cpp:19 [backend fallback]
ZeroTensor: registered at ../aten/src/ATen/ZeroTensorFallback.cpp:86 [backend fallback]
ADInplaceOrView: fallthrough registered at ../aten/src/ATen/core/VariableFallbackKernel.cpp:63 [backend fallback]
AutogradOther: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradCPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradCUDA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradHIP: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradXLA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradMPS: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradIPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradXPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradHPU: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradVE: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradLazy: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradMeta: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradMTIA: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradPrivateUse1: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradPrivateUse2: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradPrivateUse3: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
AutogradNestedTensor: registered at ../torch/csrc/autograd/generated/VariableType_2.cpp:17472 [autograd kernel]
Tracer: registered at ../torch/csrc/autograd/generated/TraceType_2.cpp:16726 [kernel]
AutocastCPU: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:487 [backend fallback]
AutocastCUDA: fallthrough registered at ../aten/src/ATen/autocast_mode.cpp:354 [backend fallback]
FuncTorchBatched: registered at ../aten/src/ATen/functorch/LegacyBatchingRegistrations.cpp:815 [backend fallback]
FuncTorchVmapMode: fallthrough registered at ../aten/src/ATen/functorch/VmapModeRegistrations.cpp:28 [backend fallback]
Batched: registered at ../aten/src/ATen/LegacyBatchingRegistrations.cpp:1073 [backend fallback]
VmapMode: fallthrough registered at ../aten/src/ATen/VmapModeRegistrations.cpp:33 [backend fallback]
FuncTorchGradWrapper: registered at ../aten/src/ATen/functorch/TensorWrapper.cpp:210 [backend fallback]
PythonTLSSnapshot: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:152 [backend fallback]
FuncTorchDynamicLayerFrontMode: registered at ../aten/src/ATen/functorch/DynamicLayer.cpp:487 [backend fallback]
PythonDispatcher: registered at ../aten/src/ATen/core/PythonFallbackKernel.cpp:148 [backend fallback]

How to Reproduce?

train data like this:
4400,1580,5174 教育培训机构.道口财富是一家教育培训机构,由清控控股旗下公司联合上海陆家嘴旗下公司发起设立,为学员提供财富管理课程和创业金融课程。
5156,1188,1459 场景营销平台.北京蜂巢天下信息技术有限公司项目团队组建于2014年,总部位于北京,是基于Beacon网络的场景营销平台。专注于为本地生活服务商户提供基于场景的优惠分发,为用户提供一键接入身边优惠内容。
5156,1459 定制品在线设计及管理平台.时代定制是一个定制品在线设计及业务管理平台,主要服务于印刷和设计类企业、网站、影楼、文印店。

Steps to reproduce

from pecos.utils.featurization.text.preprocess import Preprocessor
from pecos.xmc.xtransformer.model import XTransformer
from pecos.xmc.xtransformer.module import MLProblemWithText

import os

parsed_result = Preprocessor.load_data_from_file(
    "./training-data.txt",
    "./output-labels.txt",
)
Y = parsed_result["label_matrix"]
X_txt = parsed_result["corpus"]

print(f"Constructed training corpus len={len(X_txt)}, training label matrix with shape={Y.shape} and nnz={Y.nnz}")

vectorizer_config = {
    "type": "tfidf",
    "kwargs": {
        "base_vect_configs": [
            {
                "ngram_range": [1, 2],
                "max_df_ratio": 0.98,
                "analyzer": "word",
            },
        ],
    },
}

tfidf_model = Preprocessor.train(X_txt, vectorizer_config)
X_feat = tfidf_model.predict(X_txt)

print(f"Constructed training feature matrix with shape={X_feat.shape} and nnz={X_feat.nnz}")

prob = MLProblemWithText(X_txt, Y, X_feat=X_feat)
custom_xtf = XTransformer.train(prob)

custom_model_dir = "multi_labels_model_dir"
os.makedirs(custom_model_dir, exist_ok=True)

tfidf_model.save(f"{custom_model_dir}/tfidf_model")
custom_xtf.save(f"{custom_model_dir}/xrt_model")

# custom_xtf = XTransformer.load(f"{custom_model_dir}/xrt_model")
# tfidf_model = Preprocessor.load(f"{custom_model_dir}/tfidf_model")

Error message or code output

Traceback (most recent call last):
  File "/home/Extreme_Label_Classification/tfidf/train.py", line 38, in <module>
    custom_xtf = XTransformer.train(prob)
  File "/usr/local/lib/python3.10/dist-packages/pecos/xmc/xtransformer/model.py", line 447, in train
    res_dict = TransformerMatcher.train(
  File "/usr/local/lib/python3.10/dist-packages/pecos/xmc/xtransformer/matcher.py", line 1382, in train
    matcher.fine_tune_encoder(prob, val_prob=val_prob, val_csr_codes=val_csr_codes)
  File "/usr/local/lib/python3.10/dist-packages/pecos/xmc/xtransformer/matcher.py", line 1122, in fine_tune_encoder
    torch.nn.utils.clip_grad_norm_(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/utils/clip_grad.py", line 55, in clip_grad_norm_
    norms.extend(torch._foreach_norm(grads, norm_type))
NotImplementedError: Could not run 'aten::_foreach_norm.Scalar' with arguments from the 'SparseCUDA' backend. This could be because the operator doesn't exist for this backend, or was omitted during the selective/custom build process (if using custom build). If you are a Facebook employee using PyTorch on mobile, please visit https://fburl.com/ptmfixes for possible resolutions. 'aten::_foreach_norm.Scalar' is only available for these backends: [CPU, CUDA, BackendSelect, Python, FuncTorchDynamicLayerBackMode, Functionalize, Named, Conjugate, Negative, ZeroTensor, ADInplaceOrView, AutogradOther, AutogradCPU, AutogradCUDA, AutogradHIP, AutogradXLA, AutogradMPS, AutogradIPU, AutogradXPU, AutogradHPU, AutogradVE, AutogradLazy, AutogradMeta, AutogradMTIA, AutogradPrivateUse1, AutogradPrivateUse2, AutogradPrivateUse3, AutogradNestedTensor, Tracer, AutocastCPU, AutocastCUDA, FuncTorchBatched, FuncTorchVmapMode, Batched, VmapMode, FuncTorchGradWrapper, PythonTLSSnapshot, FuncTorchDynamicLayerFrontMode, PythonDispatcher].

docker stats screenshot:

截屏2023-06-20 18 13 35

Environment

  • Operating system: Ubuntu 22.04(docker)
  • Python version:3.10
  • PECOS version:1.0.0
  • GPU version:NVIDIA-SMI 515.48.07 Driver Version: 515.48.07 CUDA Version: 11.7

help

The training data and classification labels are both in Chinese, and the training fails with X-Transformer。Does X-Transformer not support Chinese?

The training data and classification labels are both in Chinese, and the training fails with X-Transformer。Does X-Transformer not support Chinese?

Should I need change bert to roberta or bert-chinese model?

Hi @runningabcd this is an known issue with torch 2.0 and is fixed in PR, it will be updated in the next release. Downgrading torch to below 2.0 is a temp fix.

Hi @runningabcd this is an known issue with torch 2.0 and is fixed in PR, it will be updated in the next release. Downgrading torch to below 2.0 is a temp fix.

Thank you very much, I will try it.

Hi @runningabcd this is an known issue with torch 2.0 and is fixed in PR, it will be updated in the next release. Downgrading torch to below 2.0 is a temp fix.

It work!
Thanks again.