Snowflake-Labs/Excelerator

One option missing in Excelerator For Snowflake

Opened this issue · 7 comments

Please advise why I don't have the last option -- "Auto-generate data types when not specified for new columns or a new table " Stored Procedures needed". Thank you!

@z05794 Due to some complications we removed this feature.
Is it something that would be important to you? The other way of loading new columns would be to explicitly set the datatype.
I'd like to get a understanding of the value it would add to help determine if we should work at enabling it.
Thanks!
Steve

Hi, Is it complex to add this functionality back in? For me it is a nice to have, which saves me profiling the dataset. I have a some large files and it is not obvious what the ideal datatype is without profiling... which I might use Power Query to detect types. Are you able to suggest data types and populate the first row drop downs? Thanks.

@TonyGaul Sorry for delay. Yes, it's quite complex because the entire upload process is different when we use that feature.
I like your suggestion of profiling within Excel. Let me look into it and I'll let you know.
Thanks.

@ssegal100 Thanks. Even if it is something as simple as macro code showing basic stats for each column on a new blank row e.g. MIN value, MAX value, MAX length, MAX decimal places. I can then choose an appropriate data type. I had a dataset with 100k+ rows and thought a couple of fields were whole numbers, but after the import failed, I discovered a couple of random text values in the columns. I think basic stats would give me more confidence in setting the data types.

@ssegal100 I did give this some thought and attempted something which leverages Excel functions and your VBA. Just an idea anyway and probably should be beautified, maybe have its own button, perhaps less restrictive data types added (like Text instead of Varchar(n) ) & tested.

So I've modified 2 sub routines and added 1 function. The data type logic & profiling will run when the Define Data Types button is pressed:
Sub AddDataTypeDropDowns()
Sub uploadData()
Function CellType()


Sub AddDataTypeDropDowns()
Dim rRange As range
Dim t: t = Null
' Application.ScreenUpdating = False
Set dataWorksheet = getDataWorksheet()
' need to activate this because this Cells(giStartingRowForUpload, 1), will get the value of the active cell
dataWorksheet.Activate
If dataWorksheet.Cells(2, 1).value = "Column profiling start." Then
dataWorksheet.Rows(1).Resize(7).Delete
End If
Set UsedRng = dataWorksheet.UsedRange
LastColIndex = UsedRng.columns.Count 'UsedRng.Rows.Count
' If there isn't data then bail
If LastColIndex > 0 And Cells(1, 1) <> "" Then
'Check if the first cell has a dropdown already. If it does than it means that we should update not insert the row
On Error Resume Next
t = dataWorksheet.Cells(giStartingRowForUpload, 1).Validation.Type
On Error GoTo 0
'Before we start, clear all filters
On Error Resume Next
dataWorksheet.ShowAllData
On Error GoTo 0
If Not IsNull(t) Then 'There is no dropdown so Insert
'dataWorksheet.Rows(giStartingRowForUpload).Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
dataWorksheet.Rows(1).Delete
End If
' add profiling rows
dataWorksheet.Rows(1).Resize(7).Insert Shift:=xlDown, CopyOrigin:=xlFormatFromLeftOrAbove
' Set range row numbers
FirstRowIndex = UsedRng.row + 1 ' First row of data skip header
LastRowIndex = UsedRng.row + UsedRng.Rows.Count - 1 ' Last row of data
' Add the formula arrays
dataWorksheet.Cells(3, 1).FormulaArray = "=INDEX(A$" & FirstRowIndex & ":A$" & LastRowIndex & ",MATCH(MAX(COUNTIF(A$" & FirstRowIndex & ":A$" & LastRowIndex & ","">""& A$" & FirstRowIndex & ":A$" & LastRowIndex & ")),COUNTIF(A$" & FirstRowIndex & ":A$" & LastRowIndex & ","">""& A$" & FirstRowIndex & ":A$" & LastRowIndex & "),0))" 'MIN Value
dataWorksheet.Cells(4, 1).FormulaArray = "=INDEX(A$" & FirstRowIndex & ":A$" & LastRowIndex & ",MATCH(MAX(COUNTIF(A$" & FirstRowIndex & ":A$" & LastRowIndex & ",""<""& A$" & FirstRowIndex & ":A$" & LastRowIndex & ")),COUNTIF(A$" & FirstRowIndex & ":A$" & LastRowIndex & ",""<""& A$" & FirstRowIndex & ":A$" & LastRowIndex & "),0))" 'MAX Value
dataWorksheet.Cells(5, 1).FormulaArray = "=MAX(LEN(A$" & FirstRowIndex & ":A$" & LastRowIndex & "))" 'MAX Length
dataWorksheet.Cells(6, 1).FormulaArray = "=MAX(LEN(A$" & FirstRowIndex & ":A$" & LastRowIndex & ")-FIND(""."",A$" & FirstRowIndex & ":A$" & LastRowIndex & "&"".""),0)" 'MAX Decimals
dataWorksheet.range(Cells(3, 1), Cells(6, LastColIndex)).FillRight ' Copy formula across for all columns
' Label the rows
With dataWorksheet.Cells(2, 1)
.Formula = "Column profiling start."
.Font.Color = vbRed
End With
With dataWorksheet.Cells(3, LastColIndex + 1)
.Formula = "MIN Value"
.Font.Color = vbRed
End With
With dataWorksheet.Cells(4, LastColIndex + 1)
.Formula = "MAX Value"
.Font.Color = vbRed
End With
With dataWorksheet.Cells(5, LastColIndex + 1)
.Formula = "MAX Length"
.Font.Color = vbRed
End With
With dataWorksheet.Cells(6, LastColIndex + 1)
.Formula = "MAX Decimals"
.Font.Color = vbRed
End With
With dataWorksheet.Cells(7, 1)
.Formula = "Column profiling end."
.Font.Color = vbRed
End With
' Set the number format of the MIN/MAX values, otherwise date values are formatted as serial numbers
For i = 1 To LastColIndex
dataWorksheet.range(Cells(3, i), Cells(4, i)).NumberFormat = dataWorksheet.range(Cells(FirstRowIndex, i), Cells(FirstRowIndex, i)).NumberFormat
dataWorksheet.range(Cells(5, i), Cells(6, i)).NumberFormat = "General"
sCellType = CellType(dataWorksheet.range(Cells(3, i), Cells(4, i)))
If sCellType = "Text" Then
If Cells(5, i).value < 255 Then
dataWorksheet.Cells(1, i).value = "Varchar(255)"
Else
dataWorksheet.Cells(1, i).value = "Text"
End If
ElseIf sCellType = "Value" Then
If Cells(6, i).value = 0 Then
dataWorksheet.Cells(1, i).value = "Integer"
Else
dataWorksheet.Cells(1, i).value = "Double" ' "Number(" & Cells(5, i).value & "," & Cells(6, i).value & ")"
End If
ElseIf sCellType = "Date" Then
If Cells(5, i).value <= 10 Then
dataWorksheet.Cells(1, i).value = "Date"
Else
dataWorksheet.Cells(1, i).value = "Timestamp"
End If
ElseIf sCellType = "Blank" Or sCellType = "Error" Then
dataWorksheet.Cells(1, i).value = "Varchar"
Else
dataWorksheet.Cells(1, i).value = sCellType
End If
Next i

    Set rRange = dataWorksheet.range(dataWorksheet.Cells(giStartingRowForUpload, 1), dataWorksheet.Cells(giStartingRowForUpload, LastColIndex))

    With rRange.Validation
        .Delete
        .Add Type:=xlValidateList, AlertStyle:=xlValidAlertStop, Operator:= _
  xlBetween, Formula1:=sgDatatypes
        .IgnoreBlank = True
        .InCellDropdown = True
        .InputTitle = ""
        .ErrorTitle = ""
        .InputMessage = ""
        .ErrorMessage = ""
        .ShowInput = True
        .ShowError = False
    End With
End If

Application.ScreenUpdating = True

End Sub

Function CellType(Rng)
Application.Volatile
Set Rng = Rng.range("A1")
Select Case True
Case IsEmpty(Rng)
CellType = "Blank"
Case WorksheetFunction.IsText(Rng)
CellType = "Text"
Case WorksheetFunction.IsLogical(Rng)
CellType = "Boolean"
Case WorksheetFunction.IsErr(Rng)
CellType = "Error"
Case IsDate(Rng)
CellType = "Date"
Case InStr(1, Rng.Text, ":") <> 0
CellType = "Time"
Case IsNumeric(Rng)
CellType = "Value"
End Select
End Function


in Sub uploadData I just added a 3 lines in this section to remove the new profiling rows before uploading

'************ Checking for Data Type row ************
Dim firstCellValue As String
firstCellValue = dataWorksheet.Cells(giStartingRowForUpload, 1).value
' Check if first row has the data types.
iHeaderRow = giStartingRowForUpload
Dim arrDatatypes() As String
arrDatatypes = Split(sgDatatypes, ",")
If IsInArray(firstCellValue, arrDatatypes) Or firstCellValue = "" Or InStr(firstCellValue, "(") Then
    iHeaderRow = iHeaderRow + 1
    bDataTypeRowExists = True
' ---------------------------------------------------------------------------------------------------------------------
' Remove data profiling rows 
    If dataWorksheet.Cells(2, 1).value = "Column profiling start." Then
        dataWorksheet.Rows(2).Resize(6).Delete
    End If
' ---------------------------------------------------------------------------------------------------------------------
End If
    
'************ Checking to make sure the first column name has a value ************

@TonyGaul Thank you!
I will dig into it.