microsoft/Olive

data_config: data_name is not None but olive always said it is None

Elizabeth819 opened this issue · 3 comments

Describe the bug
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/olive/data/component/load_dataset.py", line 33, in huggingface_dataset
assert data_name is not None, "Please specify the data name"
AssertionError: Please specify the data name
Even if i deleted the assert in source code, it is still wrong

To Reproduce
Steps to reproduce the behavior.

Expected behavior
A clear and concise description of what you expected to happen.

Olive config
"data_configs": [
{
"name": "dataset_default_train",
"type": "HuggingfaceContainer",
"params_config": {
"data_name": "json",
"data_files":"../datasets/datasets.json",
"split": "train",
"component_kwargs": {
"pre_process_data": {
"dataset_type": "corpus",
"text_cols": [
"INSTRUCTION",
"RESPONSE",
"SOURCE"
],
"text_template": "<|user|>\n{INSTRUCTION}<|end|>\n<|assistant|>\n{RESPONSE}\n( source : {SOURCE})<|end|>",
"corpus_strategy": "join",
"source_max_len": 2048,
"pad_to_max_len": false,
"use_attention_mask": false
}
}
}
}
],

Olive logs
Add logs here.

Other information

  • OS: ubuntu 20.04
  • Olive version: 0.7.0
  • ONNXRuntime package and version: [e.g. onnxruntime-gpu: 1.15.1]

Additional context
Add any other context about the problem here.

Same issue here.

the data config has been changed , please follow this

"data_configs": [
    {
        "name": "dataset_default_train",
        "type": "HuggingfaceContainer",
        "load_dataset_config": {
            "params": {
                "data_name": "json", 
                "data_files": "../datasets/datasets.json",
                "split": "train"
            }
        },
        "pre_process_data_config": {
            "params": {
                "text_cols": [
                    "INSTRUCTION",
                    "RESPONSE"
                ],
                "corpus_strategy": "join",
                "text_template": "<|user|>\n{INSTRUCTION}<|end|>\n<|assistant|>\n{RESPONSE}<|end|>",
                "source_max_len": 2048,
                "pad_to_max_len": false,
                "use_attention_mask": false
            }
        }
    }
],

Note please reinstall olive agian

Thank you so much Kinfey, the error was solved by your data_configs code!