aws/amazon-sagemaker-examples

[Bug Report] Error while creating analysis 'Table Summary' with 'dot include column name' in data wrangler

EunHyeokJung opened this issue · 1 comments

Link to the notebook
Not avaiable

Describe the bug
While creating Table Summary Analysis with Data Wrangler for csv file, I got follwing Analysis Exception with column name, 'emp.var.rate'

OperatorCustomerError: AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name emp.var.rate cannot be resolved. Did you mean one of the following? [emp.var.rate, marital, age, campaign, contact].;

Full column names are:
[age,job,marital,education,default,housing,loan,contact,month,day_of_week,duration,campaign,pdays,previous,poutcome,emp.var.rate,cons.price.idx,cons.conf.idx,euribor3m,nr.employed,y]

To reproduce
A clear, step-by-step set of instructions to reproduce the bug.

  1. Make new S3 (bucket) and upload following example csv file:

bank-additional-full.csv

This file(bank-additional-full.csv) is example file.

  1. Run Data Wrangler and import csv file we uploaded.
  • SageMaker Studio (new) > Data > Data Wrangler > 'Run in Canvas' > 'Open in Canvas'
  • Import and prepare > Tabular
  • Select a data source: Amazon S3
  • Select 'yourexamplefile.csv' that we uploaded to bucket.
  • Sampling method: Random
  • Sample size: 50000
  • Advanced > Multi-line-detection: Checked
  • Import
  1. Go to Analysis and create new 'Data Quality And Insights Report'
  • This procedure is just for reproduce my bug.
  1. Create new 'Table Summary' Analysis
  • You will get AnalysisException on this step.
Analysis_Exception_Image

Logs
An error has occurred. See the error reason below.

OperatorCustomerError: AnalysisException: [UNRESOLVED_COLUMN.WITH_SUGGESTION] A column or function parameter with name emp.var.rate cannot be resolved. Did you mean one of the following? [emp.var.rate, marital, age, campaign, contact].; 'Aggregate [map(cast(count as string), cast(count(age#20798L) as string), cast(mean as string), cast(avg(age#20798L) as string), cast(stddev as string), cast(stddev_samp(cast(age#20798L as double)) as string), cast(min as string), cast(min(age#20798L) as string), cast(max as string), cast(max(age#20798L) as string)) AS age#34632, map(cast(count as string), cast(count(job#19891) as string), cast(mean as string), cast(avg(try_cast(job#19891 as double)) as string), cast(stddev as string), cast(stddev_samp(try_cast(job#19891 as double)) as string), cast(min as string), cast(min(job#19891) as string), cast(max as string), cast(max(job#19891) as string)) AS job#34646, map(cast(count as string), cast(count(marital#19892) as string), cast(mean as string), cast(avg(try_cast(marital#19892 as double)) as string), cast(stddev as string), cast(stddev_samp(try_cast(marital#19892 as double)) as string), cast(min as string), cast(min(marital#19892) as string), cast(max as string), cast(max(marital#19892) as string)) AS marital#34660, map(cast(count as string), cast(count(education#19893) as string), cast(mean as string), cast(avg(try_cast(education#19893 as double)) as string), cast(stddev as string), cast(stddev_samp(try_cast(education#19893 as double)) as string), cast(min as string), cast(min(education#19893) as string), cast(max as string), cast(max(education#19893) as string)) AS education#34674, map(cast(count as string), cast(count(default#19894) as string), cast(mean as string), cast(avg(try_cast(default#19894 as double)) as string), cast(stddev as string), cast(stddev_samp(try_cast(default#19894 as double)) as string), cast(min as string), cast(min(default#19894) as string), cast(max as string), cast(max(default#19894) as string)) AS default#34688, map(cast(count as string), cast(count(housing#19895) as string), cast(mean as string), cast(avg(try_cast(housing#19895 as double)) as string), cast(stddev as string), cast(stddev_samp(try_cast(housing#19895 as double)) as string), cast(min as string), cast(min(housing#19895) as string), cast(max as string), cast(max(housing#19895) as string)) AS housing#34702, map(cast(count as string), cast(count(loan#19896) as string), cast(mean as string), cast(avg(try_cast(loan#19896 as double)) as string), cast(stddev as string), cast(stddev_samp(try_cast(loan#19896 as double)) as string), cast(min as string), cast(min(loan#19896) as string), cast(max as string), cast(max(loan#19896) as string)) AS loan#34716, map(cast(count as string), cast(count(contact#19897) as string), cast(mean as string), cast(avg(try_cast(contact#19897 as double)) as string), cast(stddev as string), cast(stddev_samp(try_cast(contact#19897 as double)) as string), cast(min as string), cast(min(contact#19897) as string), cast(max as string), cast(max(contact#19897) as string)) AS contact#34730, map(cast(count as string), cast(count(month#19898) as string), cast(mean as string), cast(avg(try_cast(month#19898 as double)) as string), cast(stddev as string), cast(stddev_samp(try_cast(month#19898 as double)) as string), cast(min as string), cast(min(month#19898) as string), cast(max as string), cast(max(month#19898) as string)) AS month#34744, map(cast(count as string), cast(count(day_of_week#19899) as string), cast(mean as string), cast(avg(try_cast(day_of_week#19899 as double)) as string), cast(stddev as string), cast(stddev_samp(try_cast(day_of_week#19899 as double)) as string), cast(min as string), cast(min(day_of_week#19899) as string), cast(max as string), cast(max(day_of_week#19899) as string)) AS day_of_week#34758, map(cast(count as string), cast(count(duration#20799L) as string), cast(mean as string), cast(avg(duration#20799L) as string), cast(stddev as string), cast(stddev_samp(cast(duration#20799L as double)) as string), cast(min as string), cast(min(duration#20799L) as string), cast(max as string), cast(max(duration#20799L) as string)) AS duration#34772, map(cast(count as string), cast(count(campaign#20800L) as string), cast(mean as string), cast(avg(campaign#20800L) as string), cast(stddev as string), cast(stddev_samp(cast(campaign#20800L as double)) as string), cast(min as string), cast(min(campaign#20800L) as string), cast(max as string), cast(max(campaign#20800L) as string)) AS campaign#34786, map(cast(count as string), cast(count(pdays#20801L) as string), cast(mean as string), cast(avg(pdays#20801L) as string), cast(stddev as string), cast(stddev_samp(cast(pdays#20801L as double)) as string), cast(min as string), cast(min(pdays#20801L) as string), cast(max as string), cast(max(pdays#20801L) as string)) AS pdays#34800, map(cast(count as string), cast(count(previous#20802L) as string), cast(mean as string), cast(avg(previous#20802L) as string), cast(stddev as string), cast(stddev_samp(cast(previous#20802L as double)) as string), cast(min as string), cast(min(previous#20802L) as string), cast(max as string), cast(max(previous#20802L) as string)) AS previous#34814, map(cast(count as string), cast(count(poutcome#19904) as string), cast(mean as string), cast(avg(try_cast(poutcome#19904 as double)) as string), cast(stddev as string), cast(stddev_samp(try_cast(poutcome#19904 as double)) as string), cast(min as string), cast(min(poutcome#19904) as string), cast(max as string), cast(max(poutcome#19904) as string)) AS poutcome#34828, map(cast(count as string), cast(count('emp.var.rate) as string), cast(mean as string), cast(avg('emp.var.rate) as string), cast(stddev as string), cast(stddev_samp('emp.var.rate) as string), cast(min as string), cast(min('emp.var.rate) as string), cast(max as string), cast(max('emp.var.rate) as string)) AS emp.var.rate#34842, map(cast(count as string), cast(count('cons.price.idx) as string), cast(mean as string), cast(avg('cons.price.idx) as string), cast(stddev as string), cast(stddev_samp('cons.price.idx) as string), cast(min as string), cast(min('cons.price.idx) as string), cast(max as string), cast(max('cons.price.idx) as string)) AS cons.price.idx#34856, map(cast(count as string), cast(count('cons.conf.idx) as string), cast(mean as string), cast(avg('cons.conf.idx) as string), cast(stddev as string), cast(stddev_samp('cons.conf.idx) as string), cast(min as string), cast(min('cons.conf.idx) as string), cast(max as string), cast(max('cons.conf.idx) as string)) AS cons.conf.idx#34870, map(cast(count as string), cast(count(euribor3m#20806) as string), cast(mean as string), cast(avg(euribor3m#20806) as string), cast(stddev as string), cast(stddev_samp(euribor3m#20806) as string), cast(min as string), cast(min(euribor3m#20806) as string), cast(max as string), cast(max(euribor3m#20806) as string)) AS euribor3m#34884, map(cast(count as string), cast(count('nr.employed) as string), cast(mean as string), cast(avg('nr.employed) as string), cast(stddev as string), cast(stddev_samp('nr.employed) as string), cast(min as string), cast(min('nr.employed) as string), cast(max as string), cast(max('nr.employed) as string)) AS nr.employed#34898, map(cast(count as string), cast(count(y#19910) as string), cast(mean as string), cast(avg(try_cast(y#19910 as double)) as string), cast(stddev as string), cast(stddev_samp(try_cast(y#19910 as double)) as string), cast(min as string), cast(min(y#19910) as string), cast(max as string), cast(max(y#19910) as string)) AS y#34912] +- Repartition 16, true +- GlobalLimit 200000 +- LocalLimit 200000 +- Project [cast(age#19890 as bigint) AS age#20798L, job#19891, marital#19892, education#19893, default#19894, housing#19895, loan#19896, contact#19897, month#19898, day_of_week#19899, cast(duration#19900 as bigint) AS duration#20799L, cast(campaign#19901 as bigint) AS campaign#20800L, cast(pdays#19902 as bigint) AS pdays#20801L, cast(previous#19903 as bigint) AS previous#20802L, poutcome#19904, cast(emp.var.rate#19905 as double) AS emp.var.rate#20803, cast(cons.price.idx#19906 as double) AS cons.price.idx#20804, cast(cons.conf.idx#19907 as double) AS cons.conf.idx#20805, cast(euribor3m#19908 as double) AS euribor3m#20806, cast(nr.employed#19909 as double) AS nr.employed#20807, y#19910] +- Relation [age#19890,job#19891,marital#19892,education#19893,default#19894,housing#19895,loan#19896,contact#19897,month#19898,day_of_week#19899,duration#19900,campaign#19901,pdays#19902,previous#19903,poutcome#19904,emp.var.rate#19905,cons.price.idx#19906,cons.conf.idx#19907,euribor3m#19908,nr.employed#19909,y#19910] csv

This should be go to technical support.
My bad, sorry