worldbank/ietoolkit

iesave: code breaks if too many unique values in a string variable

luizaandrade opened this issue · 0 comments

If there are too many unique values in a string/categorical variable, levelsof breaks with an error message of "cannot compute". I have just run into this with a variable that had 700k+ unique values.

It now runs with the workaround of replacing the following lines

* Number of levels and complete observations
qui levelsof `var'
local varlevels = r(r)
local varcomplete = r(N)

with

* Number of levels
preserve 
	keep `var'
	duplicates drop
	count
	
	local varlevels = r(r)
restore

* Number of complete observations
qui count if !missing(`var')		
local varcomplete	= r(N)

There may be a more elegant approach, though. If no one can think of one, I can open a PR with this one.