Lightning-AI/litgpt

Specify cache for huggingface openwebtext download

srivassid opened this issue · 1 comments

Hello

I was trying to pretrain tinyllama on openwebtext data https://github.com/Lightning-AI/litgpt/blob/main/tutorials/0_to_litgpt.md#pretrain-llms, and i was wondering how can i specify the cache directory for the download?

I tried modifying the script in folder litgpt/litgpt/data/openwebtext.py but nothing happened.

Thanks

Apparently this did it

export HF_HOME="/path/.cache/huggingface" export HF_DATASETS_CACHE="/path/.cache/huggingface/datasets" export TRANSFORMERS_CACHE="/path/.cache/huggingface/models"

Thanks