Total number of found ngrams changes after splitting modifiers
Closed this issue · 1 comments
Glitchy-Tozier commented
Splitting of ngrams changes number of found ngrams
This for example forced 276d0d4 to be necessary.
Example: (the first number is the count before splitting, the second number is after the splitting.)
Edit: Actually, it might make sense this way. I'm slightly confused.
uni 108783127.16766545
uni 115993432.49160336
tri 108134291.69376425
tri 158193331.2444684
bi 108458004.65033875
bi 140483393.64971808
uni 108783127.16766545
uni 115993432.49160337
tri 108134291.69376425
tri 158195158.18040243
bi 108458004.65033875
bi 140483393.649718
[2022-04-01T20:35:22Z INFO layout_optimization_sa::optimization] Process 0: Starting layout: .czjöqfsäxwt,lngmdiüyßbuaeoprhkv ( 643.6)
uni 108783127.16766545
uni 115993432.49160343
tri 108134291.69376425
tri 158193343.21547225
bi 108458004.65033875
bi 140483393.64971817
uni 108783127.16766545
uni 115993432.49160343
tri 108134291.69376425
tri 158186184.70580426
bi 108458004.65033875
bi 140482049.10099033
uni 108783127.16766545
uni 115993432.49160337
dariogoetz commented
This is not a bug. Modifier splitting takes an n-gram with "higher-layer" symbols and generates multiple new ones with symbols solely on the base layer. The sum of their weights is not necessarily equal to the "starting weight".