hiyouga/LLaMA-Factory

bug on google colabs code

Closed this issue · 7 comments

Reminder

  • I have read the README and searched the existing issues.

Reproduction

Fine-tune model via Command Line
I didnt do anything change , just tried as it is


/content/LLaMA-Factory
2024-05-18 12:35:28.953301: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-05-18 12:35:28.953353: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-05-18 12:35:28.954649: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-05-18 12:35:30.175448: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
05/18/2024 12:35:33 - WARNING - llamafactory.hparams.parser - We recommend enable `upcast_layernorm` in quantized training.
05/18/2024 12:35:33 - INFO - llamafactory.hparams.parser - Process rank: 0, device: cuda:0, n_gpu: 1, distributed training: False, compute dtype: torch.float16
/usr/local/lib/python3.10/dist-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
  warnings.warn(
tokenizer_config.json: 100% 51.1k/51.1k [00:00<00:00, 66.3MB/s]
tokenizer.json: 100% 9.09M/9.09M [00:01<00:00, 6.23MB/s]
special_tokens_map.json: 100% 459/459 [00:00<00:00, 2.69MB/s]
[INFO|tokenization_utils_base.py:2087] 2024-05-18 12:35:37,198 >> loading file tokenizer.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/2950abc9d0b34ddd43fd52bbf0d7dca82807ce96/tokenizer.json
[INFO|tokenization_utils_base.py:2087] 2024-05-18 12:35:37,198 >> loading file added_tokens.json from cache at None
[INFO|tokenization_utils_base.py:2087] 2024-05-18 12:35:37,198 >> loading file special_tokens_map.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/2950abc9d0b34ddd43fd52bbf0d7dca82807ce96/special_tokens_map.json
[INFO|tokenization_utils_base.py:2087] 2024-05-18 12:35:37,198 >> loading file tokenizer_config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/2950abc9d0b34ddd43fd52bbf0d7dca82807ce96/tokenizer_config.json
[WARNING|logging.py:314] 2024-05-18 12:35:37,601 >> Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
05/18/2024 12:35:37 - INFO - llamafactory.data.template - Replace eos token: <|eot_id|>
05/18/2024 12:35:37 - INFO - llamafactory.data.loader - Loading dataset identity.json...
Generating train split: 91 examples [00:00, 5377.99 examples/s]
Converting format of dataset: 100% 91/91 [00:00<00:00, 7138.91 examples/s]
05/18/2024 12:35:38 - INFO - llamafactory.data.loader - Loading dataset llamafactory/alpaca_gpt4_en...
Downloading readme: 100% 373/373 [00:00<00:00, 2.43MB/s]
Downloading data: 100% 43.3M/43.3M [00:00<00:00, 51.9MB/s]
Generating train split: 51983 examples [00:01, 32182.19 examples/s]
Converting format of dataset: 100% 500/500 [00:00<00:00, 42058.28 examples/s]
Running tokenizer on dataset: 100% 591/591 [00:00<00:00, 1637.02 examples/s]
input_ids:
[128000, 128006, 9125, 128007, 271, 2675, 527, 264, 11190, 18328, 13, 128009, 128006, 882, 128007, 271, 6151, 128009, 128006, 78191, 128007, 271, 9906, 0, 358, 1097, 445, 81101, 12, 18, 11, 459, 15592, 18328, 8040, 555, 445, 8921, 4940, 17367, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
inputs:
<|begin_of_text|><|start_header_id|>system<|end_header_id|>

You are a helpful assistant.<|eot_id|><|start_header_id|>user<|end_header_id|>

hi<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Hello! I am Llama-3, an AI assistant developed by LLaMA Factory. How can I assist you today?<|eot_id|>
label_ids:
[-100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, -100, 9906, 0, 358, 1097, 445, 81101, 12, 18, 11, 459, 15592, 18328, 8040, 555, 445, 8921, 4940, 17367, 13, 2650, 649, 358, 7945, 499, 3432, 30, 128009]
labels:
Hello! I am Llama-3, an AI assistant developed by LLaMA Factory. How can I assist you today?<|eot_id|>
config.json: 100% 1.15k/1.15k [00:00<00:00, 6.70MB/s]
[INFO|configuration_utils.py:726] 2024-05-18 12:35:49,628 >> loading configuration file config.json from cache at /root/.cache/huggingface/hub/models--unsloth--llama-3-8b-Instruct-bnb-4bit/snapshots/2950abc9d0b34ddd43fd52bbf0d7dca82807ce96/config.json
[INFO|configuration_utils.py:789] 2024-05-18 12:35:49,629 >> Model config LlamaConfig {
  "_name_or_path": "unsloth/llama-3-8b-Instruct-bnb-4bit",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 128000,
  "eos_token_id": 128009,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 14336,
  "max_position_embeddings": 8192,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 8,
  "pretraining_tp": 1,
  "quantization_config": {
    "_load_in_4bit": true,
    "_load_in_8bit": false,
    "bnb_4bit_compute_dtype": "bfloat16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": true,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 500000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.40.2",
  "use_cache": true,
  "vocab_size": 128256
}

05/18/2024 12:35:49 - INFO - llamafactory.model.utils.quantization - Loading ?-bit BITSANDBYTES-quantized model.
🦥 Unsloth: Will patch your computer to enable 2x faster free finetuning.
Traceback (most recent call last):
  File "/usr/local/bin/llamafactory-cli", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.10/dist-packages/llamafactory/cli.py", line 65, in main
    run_exp()
  File "/usr/local/lib/python3.10/dist-packages/llamafactory/train/tuner.py", line 34, in run_exp
    run_sft(model_args, data_args, training_args, finetuning_args, generating_args, callbacks)
  File "/usr/local/lib/python3.10/dist-packages/llamafactory/train/sft/workflow.py", line 34, in run_sft
    model = load_model(tokenizer, model_args, finetuning_args, training_args.do_train)
  File "/usr/local/lib/python3.10/dist-packages/llamafactory/model/loader.py", line 124, in load_model
    model = load_unsloth_pretrained_model(config, model_args)
  File "/usr/local/lib/python3.10/dist-packages/llamafactory/model/utils/unsloth.py", line 39, in load_unsloth_pretrained_model
    from unsloth import FastLanguageModel
  File "/usr/local/lib/python3.10/dist-packages/unsloth/__init__.py", line 113, in <module>
    from .models import *
  File "/usr/local/lib/python3.10/dist-packages/unsloth/models/__init__.py", line 15, in <module>
    from .loader  import FastLanguageModel
  File "/usr/local/lib/python3.10/dist-packages/unsloth/models/loader.py", line 15, in <module>
    from .llama import FastLlamaModel, logger
  File "/usr/local/lib/python3.10/dist-packages/unsloth/models/llama.py", line 27, in <module>
    from ._utils import *
  File "/usr/local/lib/python3.10/dist-packages/unsloth/models/_utils.py", line 60, in <module>
    import xformers.ops.fmha as xformers
ModuleNotFoundError: No module named 'xformers'


Expected behavior

No response

System Info

No response

Others

No response

Hi, even I too encountered this issue but I thought it was a local one but it seems you too have faced the same error. My best guess to why this might be occurring is due to this line

!pip install --no-deps xformers<0.0.26

as the logs suggests this

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
spacy 3.7.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.
weasel 0.3.4 requires typer<0.10.0,>=0.3.0, but you have typer 0.12.3 which is incompatible.

atleast that's the thing in colab for command line one but the Llama board one works fine.

i think its conflict with another package , i tried !pip install xformers

Traceback (most recent call last):
File "C:\Users\krafi\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 197, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\krafi\AppData\Local\Programs\Python\Python39\lib\runpy.py", line 87, in run_code
exec(code, run_globals)
File "C:\Users\krafi\AppData\Local\Programs\Python\Python39\Scripts\llamafactory-cli.exe_main
.py", line 4, in
ModuleNotFoundError: No module named 'llmtuner'

i see !pip install --no-deps xformers<0.0.26 this line exists on the colabs ,, still for me it didnt fix, i dont know what changed,, please someone check in , just run all ... close the GUI ,, run command line

right now i test this on a new browser with another google account didnt work,,, i used colab this code last week it worked at that time.. maybe somekind of update issue on github code

yes.. it worked .. check my project to create custom private dataset using llama3 offline.
https://gitlab.com/krafi/tuna-asyncio-with-llama.git
it worked.. thanks a lot for your support .