This is a repository to show that analytical things can be done in SQL :P All are written as user-defined functions (UDFs) compatible with snowflake.
- Calculates the gaussian kernel (1D) - used in anomaly detection
- Can choose the level of smoothing to apply (includes 'rule-of-thumb' measure)
- Can proportionally decrease smoothness at edges by decreasing sigma by euclidean similarity
- Calculates the alpha and beta required for a univariate weighted least squares regression
- UDFs include partition windows so multiple models can be trained on different cuts of the same table (and tested on other cuts!)
- Added example comments to show how to extend to multivariate regressions if required (note in the example below, the x1 value has been log transformed for the beta calc)
- Also includes MAE and MAPE calculations to assess performance on training data
- Calculates the cosine similarity between every combination of two tables of strings.
- It fits TF-IDF for tokenisation (unigram, bigram, trigram)
- It then takes advantage of L2-normalisation as a cheeky math hack to use a linear kernel instead to calculate cosine similarity.
- So it essentially just creates a sum product on the left-join to get a dot product :D