Llama-2 model Training failed with input data

Question

Llama-2 model Training failed with input data

deepakkumar07-debug opened this issue 9 months ago · 4 comments

deepakkumar07-debug commented 9 months ago

I'm fine tunning Llama-2 13-b model with jsonl file it fails. I've tried with 7b model and I've enabled billing also.

DESTINATION_MODEL_NAME = 'deepakkumar07-debug/llama-midjournery'
TRAINING_DATA_URL = 'https://sangli-training-dataset.s3.ap-south-1.amazonaws.com/midjourney_replicate_dataset.jsonl'
training = replicate.trainings.create(
    version='meta/llama-2-13b:078d7a002387bd96d93b0302a4c03b3f15824b63104034bfa943c63a8f208c38',
    input={
        "train_data": TRAINING_DATA_URL,
    },
    destination="deepakkumar07-debug/llama-midjournery",

)

print(training)

from replicate console im getting following error

Downloading weights to models/llama-2-13b/model_artifacts/training_weights...
Downloading weights...
Downloading  https://weights.replicate.delivery/default/llama-2-13b/model-00001-of-00003.safetensors
Downloading  https://weights.replicate.delivery/default/llama-2-13b/model-00002-of-00003.safetensors
Downloading  https://weights.replicate.delivery/default/llama-2-13b/model-00003-of-00003.safetensors
Downloading  https://weights.replicate.delivery/default/llama-2-13b/config.json
Downloading  https://weights.replicate.delivery/default/llama-2-13b/generation_config.json
Downloading  https://weights.replicate.delivery/default/llama-2-13b/model.safetensors.index.json
Downloading  https://weights.replicate.delivery/default/llama-2-13b/special_tokens_map.json
Downloading  https://weights.replicate.delivery/default/llama-2-13b/tokenizer_config.json
Downloading  https://weights.replicate.delivery/default/llama-2-13b/tokenizer.json
Downloading  https://weights.replicate.delivery/default/llama-2-13b/tokenizer.model
[stdout]
models/llama-2-13b/model_artifacts/training_weights/model.safetensors.index.json took 0.563747s (59324 bytes/sec)
[stdout]
models/llama-2-13b/model_artifacts/training_weights/generation_config.json took 0.573989s (238 bytes/sec)
[stdout]
models/llama-2-13b/model_artifacts/training_weights/special_tokens_map.json took 0.581211s (707 bytes/sec)
[stdout]
models/llama-2-13b/model_artifacts/training_weights/tokenizer_config.json took 0.634014s (1175 bytes/sec)
[stdout]
models/llama-2-13b/model_artifacts/training_weights/config.json took 0.717872s (810 bytes/sec)
[stdout]
models/llama-2-13b/model_artifacts/training_weights/tokenizer.json took 0.831675s (2215729 bytes/sec)
[stdout]
Downloaded 500 kB bytes in 0.894s (559 kB/s)
[stdout]
Downloaded 6.2 GB bytes in 20.206s (306 MB/s)
[stdout]
Downloaded 9.9 GB bytes in 28.330s (350 MB/s)
[stdout]
Downloaded 9.9 GB bytes in 28.599s (348 MB/s)
Finished download in 32.95s
Local Output Dir: training_output
Number of GPUs: 8
Train.py Arguments:
['python3', '-m', 'torch.distributed.run', '--nnodes=1', '--nproc_per_node=8', 'llama_recipes/llama_finetuning.py', '--enable_fsdp', '--use_peft', '--model_name=models/llama-2-13b/model_artifacts/training_weights', '--pure_bf16', '--output_dir=training_output', '--pack_sequences=False', '--wrap_packed_sequences=False', '--chunk_size=2048', '--data_path=/tmp/tmpn81ez7iqcode_review_dataset.jsonl', '--num_epochs=1', '--batch_size_training=4', '--gradient_accumulation_steps=1', '--lr=0.0001', '--lora_rank=8', '--lora_alpha=16', '--lora_dropout=0.05', '--peft_method=lora', '--run_validation=True', '--num_validation_samples=50', '--validation_data_path=None', '--val_batch_size=1', '--validation_prompt=None', '--seed=42']
WARNING:__main__:
*****************************************
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.
*****************************************
Selecting observations 0 through -39 from data for training...
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
dataset_train = get_preprocessed_dataset(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
return DATASET_PREPROC[dataset_config.dataset](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
dataset = format_data(dataset, tokenizer, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
if "text" in dataset[0]:
~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
Selecting observations 0 through -39 from data for training...
Selecting observations 0 through -39 from data for training...
Selecting observations 0 through -39 from data for training...
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^    ^^fire.Fire(main)    fire.Fire(main)^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
component, remaining_args = _CallAndUpdateTrace(
^^^ ^ ^ ^ ^ ^    ^ ^component_trace = _Fire(component, args, parsed_flag_args, context, name) ^
^ ^ ^ ^ ^^ ^^ ^^ ^^ ^^ ^^ ^^ ^ ^^ ^ ^  ^^ ^ ^  ^^ ^  ^^  ^ ^ ^  ^ ^ ^^ ^^^ ^^^ ^^^ ^^
^^ ^  File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
^ ^^^ ^^ ^^ ^ ^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
dataset_train = get_preprocessed_dataset(
component, remaining_args = _CallAndUpdateTrace(^    ^component = fn(*varargs, **kwargs)
^
^^^^^^^ ^ ^^^ ^^^ ^  ^^   ^  ^ ^  ^  ^
File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
^ ^^ ^ ^^^ ^^ ^^ ^ ^^ ^^ ^^ ^^^ ^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
^^^^^^^^^^^^^    ^return DATASET_PREPROC[dataset_config.dataset](
^ ^ ^   ^   ^   ^^^^
^^^^^^^^  File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
^^    ^dataset_train = get_preprocessed_dataset(^^
^^^^^ ^ ^ ^  ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
^^^^  File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
dataset = format_data(dataset, tokenizer, config)
return DATASET_PREPROC[dataset_config.dataset](
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^  File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
dataset = format_data(dataset, tokenizer, config)
if "text" in dataset[0]:
component = fn(*varargs, **kwargs)
^ ^^~^~^^~^ ^^ ~^~ ^~^ ~^ ^^ ^^^ ^^
^ ^^ ^^   File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
^^ ^^ ^^ ^^ ^^ ^^^^^^^^^^^
^^  File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
^^^^^^^^^^^^^^^^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
if "text" in dataset[0]:
~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
dataset_train = get_preprocessed_dataset(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
return DATASET_PREPROC[dataset_config.dataset](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
dataset = format_data(dataset, tokenizer, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
if "text" in dataset[0]:
~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
Selecting observations 0 through -39 from data for training...
Selecting observations 0 through -39 from data for training...
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    ^fire.Fire(main)^
^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
component, remaining_args = _CallAndUpdateTrace(
^ ^ ^ ^ ^ ^   ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^  File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(component = fn(*varargs, **kwargs)
^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^  File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
^^^^^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
dataset_train = get_preprocessed_dataset(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
return DATASET_PREPROC[dataset_config.dataset](component = fn(*varargs, **kwargs)
^^^^^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^  ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^  File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
^^^^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
dataset = format_data(dataset, tokenizer, config)
dataset_train = get_preprocessed_dataset(
^^ ^ ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^  File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
return DATASET_PREPROC[dataset_config.dataset](
^^^^^^^^    ^if "text" in dataset[0]:^
^^^^^^^^^^^^^^^^^^^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
dataset = format_data(dataset, tokenizer, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
if "text" in dataset[0]:
~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
return self._getitem(key)
return self._getitem(key)return self._getitem(key)
return self._getitem(key)
return self._getitem(key)
^    ^ ^return self._getitem(key) ^ ^
^ ^  ^ ^     ^   ^     ^   ^  ^   ^ ^   ^ ^    ^^  ^  ^   ^
^ ^^^ ^^  File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
^^^ ^^ ^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^  File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
^^^^^^^^^
^^^^^^^  File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
^^
^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
^^^^^^^^^^^^^ ^ ^ ^ ^  ^ ^ ^ ^ ^ ^ ^  ^ ^ ^ ^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    ^^^^pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)^^
^^^pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^  File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
^^^^^^^ ^  ^  ^^  ^   ^^  ^ ^  ^  ^^  ^ ^ ^
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
^ ^^ ^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    ^^pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^ ^^ ^^ ^^ ^^ ^^^ ^ ^^ ^^ ^^ ^^ ^^^ ^^^ ^^ ^^^ ^ ^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^  File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^  File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
_check_valid_index_key(key, size)
_check_valid_index_key(key, size)    _check_valid_index_key(key, size)_check_valid_index_key(key, size)
_check_valid_index_key(key, size)_check_valid_index_key(key, size)
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexError: Invalid key: 0 is out of bounds for size 0
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexErrorIndexError: : Invalid key: 0 is out of bounds for size 0Invalid key: 0 is out of bounds for size 0
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexError: Invalid key: 0 is out of bounds for size 0
IndexErrorIndexError: : Invalid key: 0 is out of bounds for size 0
Invalid key: 0 is out of bounds for size 0
Selecting observations 0 through -39 from data for training...
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
dataset_train = get_preprocessed_dataset(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
return DATASET_PREPROC[dataset_config.dataset](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
dataset = format_data(dataset, tokenizer, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
if "text" in dataset[0]:
~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
return self._getitem(key)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
_check_valid_index_key(key, size)
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexError: Invalid key: 0 is out of bounds for size 0
--> Running with torch dist debug set to detail
Selecting observations 0 through -39 from data for training...
Traceback (most recent call last):
File "/src/llama_recipes/llama_finetuning.py", line 366, in <module>
fire.Fire(main)
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 141, in Fire
component_trace = _Fire(component, args, parsed_flag_args, context, name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 475, in _Fire
component, remaining_args = _CallAndUpdateTrace(
^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/fire/core.py", line 691, in _CallAndUpdateTrace
component = fn(*varargs, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/llama_finetuning.py", line 119, in main
dataset_train = get_preprocessed_dataset(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/utils/dataset_utils.py", line 43, in get_preprocessed_dataset
return DATASET_PREPROC[dataset_config.dataset](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 102, in get_completion_dataset
dataset = format_data(dataset, tokenizer, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/src/llama_recipes/ft_datasets/completion_dataset.py", line 61, in format_data
if "text" in dataset[0]:
~~~~~~~^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2803, in __getitem__
return self._getitem(key)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/arrow_dataset.py", line 2787, in _getitem
pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 583, in query_table
_check_valid_index_key(key, size)
File "/usr/local/lib/python3.11/site-packages/datasets/formatting/formatting.py", line 526, in _check_valid_index_key
raise IndexError(f"Invalid key: {key} is out of bounds for size {size}")
IndexError: Invalid key: 0 is out of bounds for size 0
[2023-11-30 14:00:52,287] torch.distributed.elastic.multiprocessing.api: [ERROR] failed (exitcode: 1) local_rank: 0 (pid: 689) of binary: /usr/local/bin/python3
Traceback (most recent call last):
File "<frozen runpy>", line 198, in _run_module_as_main
File "<frozen runpy>", line 88, in _run_code
File "/usr/local/lib/python3.11/site-packages/torch/distributed/run.py", line 810, in <module>
main()
File "/usr/local/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper
return f(*args, **kwargs)
^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/torch/distributed/run.py", line 806, in main
run(args)
File "/usr/local/lib/python3.11/site-packages/torch/distributed/run.py", line 797, in run
elastic_launch(
File "/usr/local/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
return launch_agent(self._config, self._entrypoint, list(args))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
llama_recipes/llama_finetuning.py FAILED
------------------------------------------------------------
Failures:
[1]:
time      : 2023-11-30_14:00:52
host      : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank      : 1 (local_rank: 1)
exitcode  : 1 (pid: 690)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[2]:
time      : 2023-11-30_14:00:52
host      : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank      : 2 (local_rank: 2)
exitcode  : 1 (pid: 691)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[3]:
time      : 2023-11-30_14:00:52
host      : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank      : 3 (local_rank: 3)
exitcode  : 1 (pid: 692)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[4]:
time      : 2023-11-30_14:00:52
host      : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank      : 4 (local_rank: 4)
exitcode  : 1 (pid: 693)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[5]:
time      : 2023-11-30_14:00:52
host      : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank      : 5 (local_rank: 5)
exitcode  : 1 (pid: 694)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[6]:
time      : 2023-11-30_14:00:52
host      : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank      : 6 (local_rank: 6)
exitcode  : 1 (pid: 695)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
[7]:
time      : 2023-11-30_14:00:52
host      : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank      : 7 (local_rank: 7)
exitcode  : 1 (pid: 696)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
time      : 2023-11-30_14:00:52
host      : model-train-078d7a00-f07637437b65ecd8-gpu-8x-a-5655986495-c67lx
rank      : 0 (local_rank: 0)
exitcode  : 1 (pid: 689)
error_file: <N/A>
traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================
Traceback (most recent call last):
File "/usr/local/lib/python3.11/site-packages/cog/server/worker.py", line 217, in _predict
result = predict(**payload)
^^^^^^^^^^^^^^^^^^
File "/src/train.py", line 230, in train
raise Exception(
Exception: Training failed with exit code 1! Check logs for details

Answer 1 · 2023-12-20T00:44:36.000Z

@deepakkumar07-debug you are trying to access an element from a dataset that has no elements (size 0). This can happen due to various reasons, like empty data, incorrect pre-processing, or configuration issues.

Answer 2 · 2024-01-22T13:36:02.000Z

dataset has enough data let say 20 objects. what are you trying to say incorrect pre-processing, or configuration issues...