System Total Fail due to double-Number conversion problems for non "es-US" Cultures
andy-soft opened this issue · 0 comments
andy-soft commented
Hi, I trained the sample, for english BIO NER labeler, adn the training was OK
But when running the TEST batch, it fails TOTALLY!
The labels got were absolutely NUTS!
so I started to din inside algorithms, and found a BIG PROBLEM
when you write down the training and model files, and use a "string" as double, in "en-US" culture, the decimal point is a "." while in spanish culture, it is a "," so when writing down and reading agfain the files, there is a inconsistency (if used as string) so the solution is 2 ways
-
use this way (modelReader.cs)
//读入cost_factor strLine = sr.ReadLine(); cost_factor_ = double.Parse(strLine.Split(':')[1].Trim(), CultureInfo.InvariantCulture);
-
add this to the first run inside a MAIN() loop for console apps.
static void Main(string[] args) { CultureInfo culture = CultureInfo.CreateSpecificCulture("en-US"); CultureInfo.DefaultThreadCurrentCulture = culture; CultureInfo.DefaultThreadCurrentUICulture = culture; Thread.CurrentThread.CurrentCulture = culture; Thread.CurrentThread.CurrentUICulture = culture; .....
And now it works!