alteryx/open_source_demos

IndexError: list index out of range

akankshadash opened this issue · 7 comments

image

I am trying to run the same code from the open source one but it's throwing an error .

kindly suggest me to deal this

@akankshadash This isn't really enough information to troubleshoot your problem. I assume you are attempting to run the predict-next-purchase notebook based on the code. This error appears to be coming from dask/pandas. What versions of the following libraries are you using?

  • pandas
  • dask

I do not get this error when I run the notebook.

@akankshadash This isn't really enough information to troubleshoot your problem. I assume you are attempting to run the predict-next-purchase notebook based on the code. This error appears to be coming from dask/pandas. What versions of the following libraries are you using?

  • pandas
  • dask

I do not get this error when I run the notebook.

@thehomebrewnerd yes working on predict next purchase

pandas - 1.1.5
dask- 3.0

I'm still not sure what your exact issue might be, but I don't think version 3.0 if a valid dask version. When I run pip install -r requirements.txt in my environment I get pandas version 1.1.5 and dask version 2022.1.1, and things still load fine for me. Can you double-check your dask version and try installing 2022.1.1?

Note, there are some other package version conflicts that appear when I install the requirements, and those could potentially cause other issues in this notebook.

We are currently in process of updating all our demo notebooks to work with newer versions of libraries, namely pandas, Featuretools and EvalML. I just updated this particular notebook last week, but those changes have not yet been merged in to main.

I'm still not sure what your exact issue might be, but I don't think version 3.0 if a valid dask version. When I run pip install -r requirements.txt in my environment I get pandas version 1.1.5 and dask version 2022.1.1, and things still load fine for me. Can you double-check your dask version and try installing 2022.1.1?

Note, there are some other package version conflicts that appear when I install the requirements, and those could potentially cause other issues in this notebook.

We are currently in process of updating all our demo notebooks to work with newer versions of libraries, namely pandas, Featuretools and EvalML. I just updated this particular notebook last week, but those changes have not yet been merged in to main.

@thehomebrewnerd I have 8gb ram so is it also a contributor to the issue?

@akankshadash I don't believe memory issues would cause this particular error. Can you load other data into a dask dataframe with the same type of dd.read_csv command, or is the error specific to this dataset?

If you cannot load other data, that likely points to a problem with your environment.

Oh, wait. I just spotted something in your code that is the source of the problem. The problem is that you are specifying a wildcard in your filename, pointing to a file that does not exist. You do not need the -* included in your filenames. None of the files have a - at the end, and the wildcard is not needed in this case, since you are just reading single files.

If you change your read commands to this, I think you should be fine:

order_products = dd.concat([dd.read_csv(os.path.join(data_dir, "order_products__prior.csv"), blocksize=blocksize),
                            dd.read_csv(os.path.join(data_dir, "order_products__train.csv"), blocksize=blocksize)])
orders = dd.read_csv(os.path.join(data_dir, "orders.csv"), blocksize=blocksize)
departments = dd.read_csv(os.path.join(data_dir, "departments.csv"), blocksize=blocksize)
products = dd.read_csv(os.path.join(data_dir, "products.csv"), blocksize=blocksize)'

after using "-*" included in my file name i came out of the previous error ..after going through various suggestion I used that ,though my old error was solved I landed on this new error