Floating point exceptions
twelsby opened this issue · 19 comments
The code to perform loss of precision checking in parser::parse_internal() for lexer::token_type::value_number types can cause unhandled floating point exceptions with certain combinations of hardware, compilation options and input data. Specifically the following line (currently found at line 6772) causes the exception:
// check if conversion loses precision
const auto int_val = static_cast<number_integer_t>(float_val);
The exception can be generated on Intel x64 processors if the following three conditions occur:
- The floating point unit is configured with the 'floating-point invalid' exception unmasked (enabled).
- The compiler uses a CVTTSD2SI (Convert Scalar Double-Precision Floating-Point Value to Signed Doubleword Integer with Truncation) instruction.
- The value of float_val exceeds that which can be represented (after truncation) in a signed 64 bit integer.
These conditions were found to occur in a DLL that was built using Visual Studio 2015 (Update 1 version 14.0.24720.00) that uses the JSON library. Interestingly it did NOT occur when the DLL was dynamically loaded in a test app that was built to test the DLL (built using the same compiler) but it did occur when the DLL was dynamically loaded by a commercial program (built separately and for which the source code and build options are not available). Both the test app and the commercial program were presented with identical data. This suggests that Visual Studio generates an executable that disables this floating point exception but that for whatever reason the commercial program does not.
A workaround is to manually disable the floating-point invalid exception using, for example, _controlfp_s() (or feenableexcept() under Linux), prior to calling the parse function and then restoring its state afterwards.
It may be better for the library to handle this automatically. This could be done by either:
- Disabling the exception by calling _controlfp_s()/feenableexcept(), probably on entry to, and exit from, parse().
- Preventing the exception by checking the magnitude of float_val before executing the above line to ensure it is capable of fitting in a signed 64 bit integer and if not, keep it as a float_val as is already done in the event of loss of precision.
- Implementing some other means of checking for loss of precision that does not result in a CVTTSD2SI instruction.
If the decision were taken to leave it to the library user to implement their own controls then this should be clearly spelled out in the documentation as this behavior does not seem to be well known and is definitely not expected.
Hi @twelsby, thank you for your detailed analysis!
Just a naive question: Is it possible to detect the exception, say via fetestexcept
? Then we could just keep working with float_val
just as in the else
branch.
I must be even more naive then because I hadn't heard of fetestexcept()
. I have had a look at it and I don't think it will work however. The problem is that fetestexcept()
might detect the exception, but only if program execution doesn't jump to the exception handler, which it will if exceptions have been enabled - hence fetestexcept()
would never be called.
But I think you are on the right track. The related functions in <cfenv>
(or <fenv.h>
) would be even better than what I was suggesting because they would be automatically cross platform without any macro tricks.
Something like the following works just as well as _controlfp_s()
/feenableexcept()
and I believe is functionally equivalent:
#include <cfenv>
...
/// public parser interface
basic_json parse()
{
#pragma STDC FENV_ACCESS on
fenv_t fe;
feholdexcept(&fe);
basic_json result = parse_internal(true);
feclearexcept(FE_ALL_EXCEPT);
feupdateenv(&fe);
...
}
This saves the floating point environment and resets it to the non-stop mode so that exceptions no longer interrupt the program but instead just set the flags that fetestexcept()
reads. It then restores the floating point environment before returning to the program, which is important because there was probably some reason for enabling the exceptions in the first place.
If you wanted to you could then detect the exception with fetestexcept()
, but there really wouldn't be any point because if the exception occurs (due to a float_val
that won't fit in a 64 bit signed integer) then, at least on a 64 bit Intel processor, the result will be set to 0x80000000 (see CVTTSD2SI). This will trigger your existing loss of precision check:
if (approx(float_val, static_cast<long double>(int_val)))
One point to note though is that the #pragma
directive seems to be problematic. Visual Studio doesn't recognize it but instead uses:
#pragma fenv_access (on)
It probably a good idea to define a macro to set the #pragma
accordingly. The #pragma
only needs to be turned on in the function in which the floating point environment is changed and it doesn't have to be turned back off.
The location above may not be the best place for the code but I thought that location may be a good spot as it seems to be the main entry point to the parser and doing it there means only doing it once per parse operation.
I tested both the _controlfp_s()
method and the above code and both work just as well - although I didn't apply it in the location above but instead in my own code before and after json::parse()
.
Of course the other option is simply to include a warning about it in the documentation and leave the user to ensure the floating point environment is set correctly before calling parse(), as I have done. You could really go either way - each has its advantages and disadvantages.
"Both the test app and the commercial program were presented with identical data. This suggests that Visual Studio generates an executable that disables this floating point exception but that for whatever reason the commercial program does not."
FYI, the floating point exceptions start out disabled. This means that the commercial program, or some DLL that it loads, is enabling them, and either leaving them enabled always, or calling something in your DLL while it is still enabled. This is very bad form, since most code does not expect floating point exceptions to be enabled. Enabling FPE should be done only around the specific code that needs it. For many years, I worked with a third-party component that enabled them and then called back into our code with them still enabled, and it was a nightmare. We had to create a class that disabled them and then restored the old state, and place that everywhere that could be called, directly or indirectly, by that component.
I don't think that this is something that a library should have to worry about, as it would have to protect every place that it does a floating point operation. However, if it could be done easily and portably, then I guess it's okay.
https://blogs.msdn.microsoft.com/oldnewthing/20080703-00/?p=21753
Unfortunately like you I have to work with a commercial program that doesn't play nice. It has already thrown another gotcha at me in the form of a necessary header file for its API that used a #pragma pack(2)
without restoring the original packing when it was done defining its structs. That was really bad as it caused weird semi-random memory corruption in string
objects in a completely unrelated class simply because I had the header file included in a different order in two source files. What is particularly irritating about this floating point problem is that the commercial program doesn't even do anything useful with the exception - it just records it in a log with an undocumented code and no other information.
I agree that there is an argument for NOT including code to handle the floating point environment in the library but leaving it to the user because, as you correctly point out, floating point exceptions are normally disabled by default, so it will be a somewhat uncommon problem. What I think is important though is to spell out that floating point exceptions may be generated during normal processing as this is not something that may be expected from a text file parser (as opposed to a graphics or math library).
The counter argument (for including it the code) is that it is quite simple to do, it has negligible performance impact, and it makes the least number of assumptions about the system state at invocation so it is the most robust solution and the one least likely to induce difficult to track bugs. I say difficult to track primarily because unless you think to throw very large numbers at it during testing (and > 9e18 is a magnitude you don't see too often in the wild) you won't know that there is a bug hiding in your code. I only ran across it because i am processing literally thousands of JSON files created by nearly as many random applications - and even then there were only a handful that had such large numbers.
@twelsby, could you provide me with an example of a JSON file which yields a crash. If you like, you can contact me via email: niels.lohmann@gmail.com.
The smallest legal JSON file that will reproduce the problem contains nothing more than the text 9223372036854775807
(which is LLONG_MAX
as defined in <climits>
).
The following is the shortest possible example that reproduces the problem. Note that it requires Visual Studio due to the #pragma
and must be built for x64 to cause the problem.
#include <iostream>
#include <float.h>
#include "json.hpp"
#pragma fenv_access (on)
int main()
{
// Something like this was done anywhere in the calling program for
// any reason - possibly in a closed source library
_controlfp_s(0, 0, _EM_INVALID);
// Parse the absolute minimum legal JSON that demonstrates the problem
nlohmann::json doc = nlohmann::json::parse("9223372036854775807");
// Never reaches here - unhandled exception
std::cout << doc.begin().value() << "\n";
return 0;
}
This can be fixed by the following outside the library:
#include <iostream>
#include <float.h>
#include <cfenv>
#include "json.hpp"
#pragma fenv_access (on)
int main()
{
// Something like this was done anywhere in the program for
// any reason - possibly in a closed source library
_controlfp_s(0, 0, _EM_INVALID);
// Parse the absolute minimum legal JSON that demonstrates the problem
fenv_t fe;
feholdexcept(&fe);
nlohmann::json doc = nlohmann::json::parse("9223372036854775807");
feclearexcept(FE_ALL_EXCEPT);
feupdateenv(&fe);
// Now reaches here
std::cout << doc.begin().value() << "\n";
return 0;
}
Alternatively I have now confirmed that the following fixes it inside the library (excepting the already mentioned #pragma
problem with Visual Studio) and if applied to the first example above will no longer result in an unhandled exception:
#include <cfenv>
#pragma fenv_access (on) // Compiler dependant
...
/// public parser interface
basic_json parse()
{
fenv_t fe;
feholdexcept(&fe);
basic_json result = parse_internal(true);
feclearexcept(FE_ALL_EXCEPT);
feupdateenv(&fe);
expect(lexer::token_type::end_of_input);
// return parser result and replace it with null in case the
// top-level value was discarded by the callback function
return result.is_discarded() ? basic_json() : result;
}
...
This is the first file that I encountered the problem with - example.txt. Not that it is relevant but it was found in \Users\<username>\AppData\Roaming\Adobe\Common\Media Cache Files
. The problem occurs when an attempt is made to parse "mAudioIndex":18446744073709551615
.
I should also point out that the workaround outside the library is good enough for my purposes - at least now that I have identified the problem with the commercial program. Even if the problem ends up being addressed inside the library, I will continue to also apply it at the boundaries of my DLL to head off these kinds of problems with the other libraries I use (I don't want to check every line of every library to see if it could cause an exception).
That said I still tend towards changing the library because it is good practice to assume that external code will do stupid things and try wherever possible to insulate yourself from such things.
After looking further into the C99 standard from which feholdexcept()
and its related functions arise it appears that it may not actually be necessary to invoke the #pragma
. From the standard:
7.6.1 The
FENV_ACCESS
pragma
TheFENV_ACCESS
pragma provides a means to inform the implementation when a program might access the floating-point environment to test floating-point status flags or run under non-default floating-point control modes
Further on it then states:
The purpose of the
FENV_ACCESS
pragma is to allow certain optimizations that could subvert flag tests and mode changes (e.g., global common subexpression elimination, code motion, and constant folding). In general, if the state ofFENV_ACCESS
is ''off'', the translator can assume that default modes are in effect and the flags are not tested.
The intention of feholdexcept()
is to set the default floating point control mode (non-stop) and there is no need to access status flags or change rounding behavior etc it none of the indicated optimizations are occurring, so I tentatively expect that it would be ok to drop it. This interpretation may not be correct however, and it is directly contradicted in this documentation for feholdexcept()
. That said, I suspect that documentation may be wrong as my expectations are born out in testing.
Specifically when I remove the #pragma
everything still works. Moreover, when I set compilation options that turn FENV_ACCESS
off and prohibit use of the #pragma
, such as /fp:fast
, it still works. I have only tested on VS2015 so this could be implementation specific.
In any event the worst case scenario is that setting non-stop mode fails and then it should simply fall back to the previous behavior.
@twelsby If I understand you correctly, you propose to "guard" the parsing call with something like
fenv_t fe;
feholdexcept(&fe);
basic_json result = parse_internal(true);
feclearexcept(FE_ALL_EXCEPT);
feupdateenv(&fe);
However, then it seems that the same surrounding call could be placed around the whole basic_json
code outside the library. I am not sure whether the library does anything extraordinary with floating-point numbers in the sense that it should add extra code to protect the user from floating-point exceptions. In that sense, I agree with @gregmarr not to extend the library.
What do you think? Would an extension be helpful beyond your specific scenario?
I think it could have wider application but only in quite rare circumstances.
If you were writing an application using the library and you weren't using any external code then this is not a trap you would be likely to fall into because you would know that you had unmasked the exceptions in your code and should anticipate this scenario.
The problem is most severe when an application depends on external code (especially where it is hidden in a closed source library) because you may have no reason to suspect that they have been unmasked (or if you have never had cause to use floating point exceptions you may not even be aware that they can be unmasked). You may also not expect that a json library would be using floating point operations. If you are not using floating point, you have no reason to suspect that the external code is using floating point and you don't anticipate the use of floating point by the json library, and your external code doesn't provide meaningful debug info, then you could be in for a tortuous debug session... provided your test suite traps the error. If it doesn't then its just a ticking time bomb until your app encounters some data that triggers it.
There are good arguments against the extension, chief among them is C99 itself, which says that functions should be entitled to expect that they are called in the default environment unless documented otherwise. Guarding is a 'belt and braces' approach to the problem and perhaps it is overkill. I can totally see that argument.
I would suggest though that we really do need to document the fact that exceptions may be raised during normal operation - specifically because it is something that may not be expected given the nature of the task this library performs. All it would take is a simple "Note: this function may perform floating point operations that could give rise to hardware exceptions during normal operation if floating point exceptions have been unmasked prior to it being called. Ordinarily such exceptions will not be caught by c++ try/catch blocks."
By the way I was wrong before, this doesn't just affect parse()
but also any of the get()
type functions that can perform static_casts
from floating point to integer types. That is another argument against changes to the library, and it sways me to that point of view.
I think it is out of scope of the library to fix potential floating point exceptions itself. Instead, I would propose to add the following note to the README file:
- As the exact type of a number is not defined in the JSON specification, this library tries to choose the best fitting C++ number type automatically. As a result, the type
double
may be used to store numbers which may yield floating-point exceptions in certain rate situations. These exceptions are not caused by the library and need to be fixed in the calling code.
I am not too happy about the wording, but I think the core message should be clear.
This sounds a good solution. I would amend the wording however to draw attention to the cause:
- As the exact type of a number is not defined in the JSON specification, this library tries to choose the best fitting C++ number type automatically. As a result, the type double may be used to store numbers which may yield floating-point exceptions in certain rare situations if floating point exceptions have been unmasked in the calling code. These exceptions are not caused by the library and need to be fixed in the calling code, such as by re-masking the exceptions prior to calling library functions.
Technically they are caused by the library, but it's only because the exceptions are unmasked, which is not the normal situation. We can say that the calling code needs to either catch the exception or make sure that exceptions are not unmasked before the library is called.
@gregmarr, the problem with talking about "catching" the exception is that in the c++ world when you say "catch", people tend to assume that a try
/catch
block will catch it but this will not occur unless you specifically install a function to redirect the hardware exceptions to c++ exceptions. Also, c++ exceptions do not allow the program to continue from the point of the exception, so if you do this it will abort processing of valid JSON.
By not redirecting to c++ it is possible to continue processing if you handle the exception in such a way - but this is far from common knowledge. Personally I think it is opening a can of worms to talk about catching exceptions because it is such a narrow set of circumstances where that would be possible, and still have the library function correctly.
I also think you may also struggle to adequately explain it in a succinct enough form for a readme but if you have something in mind give it a go - I'm not dead against the idea.
We could use handle instead of catch, but it's not a huge deal either.
That would certainly avoid the connotations associated with "catch" but you would need to emphasise that it is not simply enough to handle the exception, you must ensure also that program execution resumes unhindered inside the library.
For example, in my case the exception was actually being handled but it was handled inside the calling application so control was being rudely pulled away from my code (and consequently also json) and never returned. The calling application merely made a log entry blaming my code and aborted the whole operation.
In that case, then we have to say that the exceptions must not be unmasked when calling this library. Nothing else will allow the program execution to continue.
I think that is the best way to go, although the C signals mechanism would permit control to return so it could still made to work even with them unmasked but I think mentioning that just complicates things unnecessarily. Better to keep it short and sweet.
Thanks for the discussion! I added a note to the README file.