Datasets dir DNE
Closed this issue · 3 comments
tristenwentling commented
Running the ecomm_csv_to_parquet.ipynb nb cell(3)
# read and process initial dataset
ecomm_df = (
spark.read.format("csv")
.option("header", True)
.schema(schema)
.load(f"{dataset_dir}/{datasets[0]}")
)
gives the following error:
AnalysisException: Path does not exist: file:/opt/spark/work-dir/hitchhikers_guide/datasets/ecomm_behavior_data/2019-Oct-sm.csv
Looking at the leading directories I only see:
os.listdir('/opt/spark/work-dir/hitchhikers_guide')
['first-steps', 'pre-processing', 'when-things-go-bump-in-the-night']
I think getting the dataset is maybe not included in the docker compose
I did see it in the repo and copied it in manually and it works fine
tristenwentling commented
looked at the PR, might be moot
newfront commented
Yeah I botched the mount location in the non-arm docker-compose. Just made the change.
newfront commented