Potential faster gsub for unaccenting
ioquatix opened this issue · 2 comments
ioquatix commented
UNACCENT = {
'Ä'=>'A', 'ä'=>'a',
'Á'=>'A', 'á'=>'a',
'É'=>'E', 'é'=>'e',
'Í'=>'I', 'í'=>'i',
'ï'=>'i',
'Ñ'=>'N', 'ñ'=>'n',
'Ö'=>'O', 'ö'=>'o',
'Ó'=>'O', 'ó'=>'o',
'ß'=>'ss',
'Ü'=>'U', 'ü'=>'u',
'Ú'=>'U', 'ú'=>'u',
}
PATTERN = Regexp.union(UNACCENT.keys)
def unaccent_gsub(text, mapping)
text.gsub(PATTERN, mapping)
end
text = "Apples and AÄÁaäá EÉeé IÍiíï NÑnñ OÖÓoöó Ssß UÜÚuüú"
puts unaccent_gsub(text, UNACCENT)
ioquatix commented
mapping
is provided as an argument while PATTERN is generated from the mapping used. So, in theory, it should probably be moved into the function. It depends on whether mapping
is actually constant or not.
In that case, I'd suggest a class instance, to cache the PATTERN
.
geraldb commented
Good point. I added your optimization in unaccent_gsub_3b
and updated the benchmark and readme. Thanks. Cheers.