neslib/Neslib.MultiPrecision

Support of Type Extended for Win32

Andreas113 opened this issue · 5 comments

Hi Erik,
if you are going to make changes to the code anyway, I have one more request for you.

For the Windows 32 target platform, it is relatively easy to include the use of the Extended type. I have already tried this and tested it several times: It works very well. This would make your library more compatible with the 10-byte "Extended world" instead of the less accurate 8-byte "Double world" of Windows.

For this you would only have to add the following additions to your library:

1):
function MultiPrecisionInit: UInt32;

Instead of:
. . .
SetPrecisionMode(pmDouble); // Original

New:

{IF Defined(WIN32)}
  SetPrecisionMode(pmExtended);
{$ELSE}
  SetPrecisionMode(pmDouble);
{$EndIF}

This would be necessary anyway, because with SetPrecisionMode(pmExtended); in Win32 e.g. the routine FloatToStrF(..) no longer works correctly and cuts off trailing decimal places.

2): In Interface:

Type
  DoubleDouble = record
  public
. . .
New:

class operator Implicit(const Value: Extended): DoubleDouble; inline; static;
class operator Implicit(const Value: DoubleDouble): Extended; inline; static;

3): In Implementation:

New:

class operator DoubleDouble.Implicit(const Value: Extended): DoubleDouble;
VAR
  a, b: Double;

begin
  // QuickTwoSum algorithm:
  a:= Value;
  b:= Value - a;

  Result.X[0]:= a + b;
  Result.X[1]:= b - (Result.X[0] - a);
end;

class operator DoubleDouble.Implicit(const Value: DoubleDouble): Extended;
Begin
  Result:= Value.X[1] + Value.X[0];
End;

Thank you in advance!

Regards,
Andreas

Sorry for my typo:
"This would be necessary anyway, because with SetPrecisionMode(pmExtended); in Win32 e.g. the routine FloatToStrF(..) no longer works correctly and cuts off trailing decimal places."

Correct is as follows:
This would be necessary anyway, because with SetPrecisionMode(pmDouble); in Win32 e.g. the routine FloatToStrF(..) no longer works correctly and cuts off trailing decimal places."
Andreas

Sorry, Erik,
here I see another typo: In {IF $ is missing
Correct is:

{$IF Defined(WIN32)}
  SetPrecisionMode(pmExtended);
{$ELSE}
  SetPrecisionMode(pmDouble);
{$EndIF}

Is's too bad that I can not correct the posted text here subsequently.
Andreas

I am personally not a big fan of the Extended type since it has only very limited support (Win32 only, which is going the way of the dinosaurs).

But I updated MultiPrecisionInit anyway since it doesn't seem to break any unit tests.

I also added support to convert to and from Extended, for both DoubleDouble and QuadDouble. However, I didn't use the Implicit operator for this, since this can lead to unintentional (expensive) conversions (I also didn't add Implicit operators to convert from Double for example). So I added an Explicit operator to convert from an Extended to a DoubleDouble or QuadDouble. And a ToExtended method to convert from DoubleDouble or QuadDouble to Extended. I trust that your implementation is correct. I haven't tested this. Some unit tests for these new methods would be nice...

Note that there are also Init methods that you can use to initialize a DoubleDouble or QuadDouble. Initially, I added an overload for this to initialize from an Extended value, but that caused a bunch of unit tests to fail. I could probably work around this, but since I am not a fan of Extended anyway, I believe it suffices to use the Explicit operator to initialize from an Extended value.

BTW: You should be able to edit your own comments by clicking the '...' button on the top-right of your comment.

Hi Erik,
Thank you for the realization of the Extended Type in your library! I made extensive tests with it, which were all positive so far. Here you can see some results and see for yourself. The QuickTwoSum algorithm works excellent and achieves an almost exact reproduction of the binary encoding of the extended numbers almost always up to 60-65 digits.

Some Examples for DoubleDouble Tests:

Extended value        =  10114
Binary representation =  10114
DoubleDouble value    =  10113.9999999999999999999999999999594
First deviation at position = 4 but correct rounding to 33 places

Extended value        =  9.22167588163992834
Binary representation =  9.221675881639928338585898703883003690862096846103668212890625
DoubleDouble value    =  9.2216758816399283385858987038830
First deviation at position = 33

Extended value        =  10114
Binary representation =  10114.00000000000000266453525910037569701671600341796875
DoubleDouble value    =  10114.0000000000000026645352591004114
First deviation at position = 33

Extended value        =  -0.93675770447191639
Binary representation =  -0.93675770447191638950167058563778255120269022881984710693359375
DoubleDouble value    =  -0.9367577044719163895016705856378
First deviation at position = 33

Extended value        =  -0.34997857521926361
Binary representation =  -0.349978575219263610281660026313232947359210811555385589599609375
DoubleDouble value    =  -0.3499785752192636102816600263132
First deviation at position = 34

Extended value        =  9.22167588163992834
Binary representation =  9.22167588163992833945326044187140723806805908679962158203125
DoubleDouble value    =  9.2216758816399283394532604418714
First deviation at position = 33

Extended value        =  5057.00004943642476
Binary representation =  5057.000049436424763626973799546249210834503173828125
DoubleDouble value    =  5057.0000494364247636269737995462326
First deviation at position = 33

Extended value        =  5056.99995056357525
Binary representation =  5056.99995056357524791934565655537880957126617431640625
DoubleDouble value    =  5056.9999505635752479193456565553650
First deviation at position = 33

Extended value        =  0.999999980448319448
Binary representation =  0.9999999804483194484699092041690704490974894724786281585693359375
DoubleDouble value    =  0.9999999804483194484699092041691
First deviation at position = 32

Extended value        =  0.964809853809547903
Binary representation =  0.9648098538095479029795022152260486336672329343855381011962890625
DoubleDouble value    =  0.9648098538095479029795022152260
First deviation at position = 33

Some Examples for QuadDouble Tests:

Extended value        =  13806
Binary representation =  13806
QuadDouble value      =  13805.99999999999999999999999999999999999999999999999999999999999997
First deviation at position = 4 but correct rounding to 67 places

Extended value        =  9.53285855926342088
Binary representation =  9.532858559263420878716888129389417372294701635837554931640625
QuadDouble value      =  9.53285855926342087871688812938941737229470163583755493164062500
First deviation at position = 64

Extended value        =  13806
Binary representation =  13805.9999999999999982236431605997495353221893310546875
QuadDouble value      =  13805.99999999999999822364316059974953532218933105468749999999999999
First deviation at position = 54 but correct rounding to 66 places

Extended value        =  0.963481246950749523
Binary representation =  0.963481246950749522996344798020373900726553983986377716064453125
QuadDouble value      =  0.96348124695074952299634479802037390072655398398637771606445313
First deviation at position = 63

Extended value        =  -0.267775814393736488
Binary representation =  -0.2677758143937364875030761324214978458257974125444889068603515625
QuadDouble value      =  -0.26777581439373648750307613242149784582579741254448890686035156
First deviation at position = 65

Extended value        =  9.53285855926342088
Binary representation =  9.532858559263420878716888129389417372294701635837554931640625
QuadDouble value      =  9.53285855926342087871688812938941737229470163583755493164062500
First deviation at position = 64

Extended value        =  6903.00003621613791
Binary representation =  6903.000036216137910383139342229696922004222869873046875
QuadDouble value      =  6903.00003621613791038313934222969692200422286987304687500000000000
First deviation at position = 67

Extended value        =  6902.99996378386209
Binary representation =  6902.999963783862087840503818370052613317966461181640625
QuadDouble value      =  6902.99996378386208784050381837005261331796646118164062499999999998
First deviation at position = 55 but correct rounding to 65 places

Extended value        =  0.999999989507130894
Binary representation =  0.999999989507130893529214532566840034633059985935688018798828125
QuadDouble value      =  0.99999998950713089352921453256684003463305998593568801879882813
First deviation at position = 63

Extended value        =  0.97922314936463814
Binary representation =  0.97922314936463813995169702675269718383788131177425384521484375
QuadDouble value      =  0.97922314936463813995169702675269718383788131177425384521484375
First deviation at position = 64

Thank you again!

Regards,
Andreas

Hi Andreas,

Thank you for these extensive tests. Glad everything is looking good!

Closing this issue.