data_config: data_name is not None but olive always said it is None
Elizabeth819 opened this issue · 3 comments
Describe the bug
File "/anaconda/envs/azureml_py310_sdkv2/lib/python3.10/site-packages/olive/data/component/load_dataset.py", line 33, in huggingface_dataset
assert data_name is not None, "Please specify the data name"
AssertionError: Please specify the data name
Even if i deleted the assert in source code, it is still wrong
To Reproduce
Steps to reproduce the behavior.
Expected behavior
A clear and concise description of what you expected to happen.
Olive config
"data_configs": [
{
"name": "dataset_default_train",
"type": "HuggingfaceContainer",
"params_config": {
"data_name": "json",
"data_files":"../datasets/datasets.json",
"split": "train",
"component_kwargs": {
"pre_process_data": {
"dataset_type": "corpus",
"text_cols": [
"INSTRUCTION",
"RESPONSE",
"SOURCE"
],
"text_template": "<|user|>\n{INSTRUCTION}<|end|>\n<|assistant|>\n{RESPONSE}\n( source : {SOURCE})<|end|>",
"corpus_strategy": "join",
"source_max_len": 2048,
"pad_to_max_len": false,
"use_attention_mask": false
}
}
}
}
],
Olive logs
Add logs here.
Other information
- OS: ubuntu 20.04
- Olive version: 0.7.0
- ONNXRuntime package and version: [e.g. onnxruntime-gpu: 1.15.1]
Additional context
Add any other context about the problem here.
Same issue here.
the data config has been changed , please follow this
"data_configs": [
{
"name": "dataset_default_train",
"type": "HuggingfaceContainer",
"load_dataset_config": {
"params": {
"data_name": "json",
"data_files": "../datasets/datasets.json",
"split": "train"
}
},
"pre_process_data_config": {
"params": {
"text_cols": [
"INSTRUCTION",
"RESPONSE"
],
"corpus_strategy": "join",
"text_template": "<|user|>\n{INSTRUCTION}<|end|>\n<|assistant|>\n{RESPONSE}<|end|>",
"source_max_len": 2048,
"pad_to_max_len": false,
"use_attention_mask": false
}
}
}
],
Note please reinstall olive agian
Thank you so much Kinfey, the error was solved by your data_configs code!