A security screening technique with simple K-Means Clustering. Ensure all variables inputted are scaled such that high values are preferential to low values. Ex: P/B ratios should be B/P for a value stock screen. Optimal K is determined through a simple heuristic rule and from an average number of asset constraint (See below). Values in the X data set are automatically scaled to have 0 mean and 1 variance.
Simply run the function kPortfolio() in Matlab with your dataset. The file NCheck.m (along with NCheck() function) is an accessory file in assisting to find optimal K
- x - an N (assets) by M (variables) matrix
- avgN - Required Average number of Assets per clustered portfolio. This rule is used to screen out undiversified, concentrated portfolios. Default: 25
- avgRange - Range that average number of assets may fall within ex: [avgN - 2, avgN + 2]. This input guarantees that the final portfolio will have atleast N=avgN-avgRange securities in the portfolio. Default: 2
- pIndex - Vector of Index of assets in the portfolio. ex: [1st security, 5th security ... ]
- C - Re-scaled mean of the centroid
- sse - Total Sum of Squares of cluster error
Using K-Means for Value Investors
k-means clustering Matlab function
k-means clustering Wikipedia