Some weights of BertLMHeadModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized
Closed this issue · 1 comments
commend I run
!python anygpt/src/infer/cli_infer_base_model.py
--model-name-or-path AnyGPT-base
--image-tokenizer-path models/seed-tokenizer-2/seed_quantizer.pt
--speech-tokenizer-path models/speechtokenizer/ckpt.dev
--speech-tokenizer-config models/speechtokenizer/config.json
--soundstorm-path models/soundstorm/speechtokenizer_soundstorm_mls.pt
--output-dir "infer_output/base"
Below is the error
NeMo-text-processing :: INFO :: Creating ClassifyFst grammars.
Using device: cuda
loading image tokenzier
/home//.cache/torch/hub/checkpoints/eva_vit_g.pth
INFO:root:freeze vision encoder
Some weights of BertLMHeadModel were not initialized from the model checkpoint at bert-base-uncased and are newly initialized: ['bert.encoder.layer.11.output_query.dense.weight', 'bert.encoder.layer.0.crossattention.self.value.weight', 'bert.encoder.layer.5.output_query.LayerNorm.bias', 'bert.encoder.layer.8.output_query.LayerNorm.weight', 'bert.encoder.layer.2.crossattention.self.query.weight', 'bert.encoder.layer.10.crossattention.output.dense.bias', 'bert.encoder.layer.5.output_query.dense.weight', 'bert.encoder.layer.2.output_query.LayerNorm.weight', 'bert.encoder.layer.7.output_query.LayerNorm.bias', 'bert.encoder.layer.7.intermediate_query.dense.bias', 'bert.encoder.layer.6.output_query.LayerNorm.bias', 'bert.encoder.layer.11.output_query.dense.bias', 'bert.encoder.layer.1.intermediate_query.dense.bias', 'bert.encoder.layer.6.output_query.dense.bias', 'bert.encoder.layer.9.intermediate_query.dense.bias', 'bert.encoder.layer.11.intermediate_query.dense.weight', 'bert.encoder.layer.6.crossattention.output.dense.weight', 'bert.encoder.layer.3.output_query.LayerNorm.bias', 'bert.encoder.layer.8.crossattention.self.key.weight', 'bert.encoder.layer.0.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.2.output_query.dense.weight', 'bert.encoder.layer.0.crossattention.self.key.bias', 'bert.encoder.layer.6.crossattention.self.query.weight', 'bert.encoder.layer.8.crossattention.self.value.weight', 'bert.encoder.layer.8.crossattention.output.dense.weight', 'bert.encoder.layer.8.crossattention.output.dense.bias', 'bert.encoder.layer.10.output_query.LayerNorm.weight', 'bert.encoder.layer.10.output_query.dense.weight', 'bert.encoder.layer.6.crossattention.self.query.bias', 'bert.encoder.layer.6.output_query.LayerNorm.weight', 'bert.encoder.layer.6.crossattention.self.value.bias', 'bert.encoder.layer.2.crossattention.self.value.weight', 'bert.encoder.layer.8.intermediate_query.dense.weight', 'bert.encoder.layer.2.output_query.LayerNorm.bias', 'bert.encoder.layer.6.crossattention.output.dense.bias', 'bert.encoder.layer.4.intermediate_query.dense.bias', 'bert.encoder.layer.10.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.2.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.1.intermediate_query.dense.weight', 'bert.encoder.layer.4.crossattention.self.key.weight', 'bert.encoder.layer.2.crossattention.self.query.bias', 'bert.encoder.layer.7.intermediate_query.dense.weight', 'bert.encoder.layer.10.crossattention.self.query.weight', 'bert.encoder.layer.9.intermediate_query.dense.weight', 'bert.encoder.layer.6.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.9.output_query.LayerNorm.bias', 'bert.encoder.layer.3.intermediate_query.dense.weight', 'bert.encoder.layer.0.crossattention.self.query.weight', 'bert.encoder.layer.0.crossattention.self.value.bias', 'bert.encoder.layer.8.output_query.LayerNorm.bias', 'bert.encoder.layer.4.output_query.dense.bias', 'bert.encoder.layer.2.crossattention.self.key.bias', 'bert.encoder.layer.1.output_query.LayerNorm.bias', 'bert.encoder.layer.4.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.6.crossattention.self.value.weight', 'bert.encoder.layer.4.crossattention.self.value.weight', 'bert.encoder.layer.0.output_query.LayerNorm.bias', 'bert.encoder.layer.9.output_query.LayerNorm.weight', 'bert.encoder.layer.4.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.0.crossattention.output.dense.weight', 'bert.encoder.layer.7.output_query.LayerNorm.weight', 'bert.encoder.layer.8.crossattention.self.key.bias', 'bert.encoder.layer.8.output_query.dense.bias', 'bert.encoder.layer.0.intermediate_query.dense.weight', 'bert.encoder.layer.2.intermediate_query.dense.weight', 'bert.encoder.layer.0.crossattention.output.dense.bias', 'bert.encoder.layer.0.crossattention.self.query.bias', 'bert.encoder.layer.3.output_query.LayerNorm.weight', 'bert.encoder.layer.6.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.10.crossattention.self.value.weight', 'bert.encoder.layer.2.crossattention.self.value.bias', 'bert.encoder.layer.11.output_query.LayerNorm.bias', 'bert.encoder.layer.6.crossattention.self.key.weight', 'bert.encoder.layer.4.crossattention.self.key.bias', 'bert.encoder.layer.0.output_query.dense.weight', 'bert.encoder.layer.4.crossattention.self.query.weight', 'bert.encoder.layer.6.crossattention.self.key.bias', 'bert.encoder.layer.5.intermediate_query.dense.weight', 'bert.encoder.layer.1.output_query.dense.weight', 'bert.encoder.layer.5.output_query.LayerNorm.weight', 'bert.encoder.layer.2.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.9.output_query.dense.weight', 'bert.encoder.layer.4.crossattention.self.query.bias', 'bert.encoder.layer.11.intermediate_query.dense.bias', 'bert.encoder.layer.6.output_query.dense.weight', 'bert.encoder.layer.5.output_query.dense.bias', 'bert.encoder.layer.6.intermediate_query.dense.weight', 'bert.encoder.layer.2.output_query.dense.bias', 'bert.encoder.layer.8.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.4.crossattention.self.value.bias', 'bert.encoder.layer.1.output_query.LayerNorm.weight', 'bert.encoder.layer.10.output_query.LayerNorm.bias', 'bert.encoder.layer.3.output_query.dense.weight', 'bert.encoder.layer.4.output_query.LayerNorm.weight', 'bert.encoder.layer.8.crossattention.self.value.bias', 'bert.encoder.layer.8.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.10.output_query.dense.bias', 'bert.encoder.layer.8.crossattention.self.query.bias', 'bert.encoder.layer.4.intermediate_query.dense.weight', 'bert.encoder.layer.0.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.0.crossattention.self.key.weight', 'bert.encoder.layer.10.crossattention.self.key.bias', 'bert.encoder.layer.0.output_query.dense.bias', 'bert.encoder.layer.4.output_query.LayerNorm.bias', 'bert.encoder.layer.3.output_query.dense.bias', 'bert.encoder.layer.7.output_query.dense.bias', 'bert.encoder.layer.3.intermediate_query.dense.bias', 'bert.encoder.layer.1.output_query.dense.bias', 'bert.encoder.layer.4.output_query.dense.weight', 'bert.encoder.layer.10.crossattention.output.dense.weight', 'bert.encoder.layer.8.crossattention.self.query.weight', 'bert.encoder.layer.10.crossattention.self.query.bias', 'bert.encoder.layer.9.output_query.dense.bias', 'bert.encoder.layer.4.crossattention.output.dense.bias', 'bert.encoder.layer.7.output_query.dense.weight', 'bert.encoder.layer.2.intermediate_query.dense.bias', 'bert.encoder.layer.10.crossattention.self.value.bias', 'bert.encoder.layer.10.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.0.output_query.LayerNorm.weight', 'bert.encoder.layer.5.intermediate_query.dense.bias', 'bert.encoder.layer.4.crossattention.output.dense.weight', 'bert.encoder.layer.8.output_query.dense.weight', 'bert.encoder.layer.6.intermediate_query.dense.bias', 'bert.encoder.layer.2.crossattention.output.dense.weight', 'bert.encoder.layer.10.intermediate_query.dense.weight', 'bert.encoder.layer.0.intermediate_query.dense.bias', 'bert.encoder.layer.2.crossattention.output.dense.bias', 'bert.encoder.layer.10.intermediate_query.dense.bias', 'bert.encoder.layer.8.intermediate_query.dense.bias', 'bert.encoder.layer.2.crossattention.self.key.weight', 'bert.encoder.layer.10.crossattention.self.key.weight', 'bert.encoder.layer.11.output_query.LayerNorm.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
missing keys: 511 unexpected keys: 146
oading music tokenizer
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
loading audio tokenizer
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
loading llm
No impact on model effectiveness