Lightning-Universe/lightning-transformers

training language model on custom data fails

Closed this issue · 2 comments

my dataset trains with no issues on huggingface trainer. I have no experience with hydra. I have a feeling I'm missing something basic.

! python train.py \ task=nlp/language_modeling dataset.cfg.train_file = "/content/gdrive/MyDrive/nlp-chart/train charts.csv" \ dataset.cfg.validation_file = "/content/gdrive/MyDrive/nlp-chart/test charts.csv"

traceback
2021-04-24 11:44:44.333439: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.11.0 Traceback (most recent call last): File "train.py", line 88, in <module> hydra_entry() File "/usr/local/lib/python3.7/dist-packages/hydra/main.py", line 33, in decorated_main config_name=config_name, File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.py", line 370, in _run_hydra lambda: hydra.run( File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.py", line 214, in run_and_report raise ex File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.py", line 211, in run_and_report return func() File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/utils.py", line 373, in <lambda> overrides=args.overrides, File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/hydra.py", line 90, in run run_mode=RunMode.RUN, File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/hydra.py", line 524, in compose_config from_shell=from_shell, File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/config_loader_impl.py", line 149, in load_configuration from_shell=from_shell, File "/usr/local/lib/python3.7/dist-packages/hydra/_internal/config_loader_impl.py", line 223, in _load_configuration_impl parsed_overrides = parser.parse_overrides(overrides=overrides) File "/usr/local/lib/python3.7/dist-packages/hydra/core/override_parser/overrides_parser.py", line 100, in parse_overrides ) from e.__cause__ hydra.errors.OverrideParseException: Error parsing override 'dataset.cfg.train_file' missing EQUAL at '<EOF>' See https://hydra.cc/docs/next/advanced/override_grammar/basic for details

Hi @enpassanty! Thanks for checking out the repo, make sure there are no spaces on either side of the equals sign

! python train.py \ task=nlp/language_modeling dataset.cfg.train_file="/content/gdrive/MyDrive/nlp-chart/train charts.csv" \ dataset.cfg.validation_file="/content/gdrive/MyDrive/nlp-chart/test charts.csv"

Could you try this and let me know if it works?

that fixed the problem. thanks!