Datasets dir DNE

Question

Datasets dir DNE

Closed this issue a year ago · 3 comments

Running the ecomm_csv_to_parquet.ipynb nb cell(3)

# read and process initial dataset

ecomm_df = (
    spark.read.format("csv")
    .option("header", True)
    .schema(schema)
    .load(f"{dataset_dir}/{datasets[0]}")
)

gives the following error:
AnalysisException: Path does not exist: file:/opt/spark/work-dir/hitchhikers_guide/datasets/ecomm_behavior_data/2019-Oct-sm.csv
Looking at the leading directories I only see:

os.listdir('/opt/spark/work-dir/hitchhikers_guide')

['first-steps', 'pre-processing', 'when-things-go-bump-in-the-night']

I think getting the dataset is maybe not included in the docker compose
I did see it in the repo and copied it in manually and it works fine

Answer 1 · 2023-06-21T21:38:03.000Z

looked at the PR, might be moot

Answer 2 · 2023-06-22T20:42:07.000Z

Yeah I botched the mount location in the non-arm docker-compose. Just made the change.

Answer 3 · 2023-06-22T20:43:09.000Z

newfront commented a year ago