gazebosim/gz-math

Unsafe Constant Initialization

mahiuchun opened this issue · 12 comments

In ignition::math::Color the color constants are defined like public: static const Color Cyan;. While the resulting interface is nice to use, it could cause static initialization order fiasco as C++ makes no guarantee that global variables in this form is going to be initialized in usage order. In other words, another global variable could use Color::Cyan and be initialized first.

A simple fix is to convert to a function form like the following:
public: const Color& Cyan() { static auto *c = new Color(0, 1, 1, 1); return *c; }

Another solution is more involved but it could retain the ability to refer to constants without parentheses.

  1. Add a constexpr constructor to the class.
  2. In another class, say, ColorConstants, use public: static constexpr Color Cyan = {0, 1, 1, 1};

Both solutions represent a API/ABI change, so I would like to hear some feedback first before sending out PRs.

The Vector classes might be affected by this too.

The Vector classes might be affected by this too.

Yes, I believe that Vector3<T>::UnitX etc. could all affected, at least in theory. (Some compilers will optimize away the problematic portions, in practice.)

The full list of static const globals on the default branch seems to be: Angle, Color, Material, Matrix3, Matrix4, Pose3, Quaternion, Vector2, Vector3, Vector4.

Note that Material::Predefined() is particularly problematic, because it returns to the user a reference to std::map, which always implies a heap allocation and thus will always be leaked anytime the library is loaded and then unloaded. Similarly for kMaterialData in src/MaterialType.hh. Even if those collection types are fixed to be heapless, the class Material has a name which is required to always be on the heap.

Also note that both construction and destruction order are in play here. As library code, I'd argue that ign-math should have no global constructors nor global destructors registered at all.

For cross-reference, see also gazebosim/sdformat#552.

A simple fix is ...

A third option (similar to the first option in the OP) would be use function-local static std::aligned_storage, which still keeps the data in static global storage, instead of on the heap. Once example is Drake's never_destroyed implementation. This can be useful if the constructor is complicated enough that it cannot be made constexpr. For this option as well as the first one, the function-local static requires the compiler to emit a lock guard instruction and storage for the boolean lock, so that the constant will be latch-initialized upon first use. This is useful for complicated globals, but is overkill for the simple structs such as vector, color, quaternion, etc.

A fourth option (similar to the second option in the OP) is to keep the member fields static, but instead mark them constexpr at the definition. See PR #283 for one example. This is ABI-breaking but not API-breaking. Edit: This does not work on Windows vs dllexport. Therefore, I've tried option (4b) now -- use const references + constexpr storage.

Therefore, my suggestion is to use:

  • option (4) for the simple numeric constants (angle, vector3, vector4, pose, quaternion, color) as in PR #283;
  • either option (1) or option (3) for any global data that is const but cannot be constexpr (material);

Hopefully, there are no cases of mutable global data.

azeey commented

@scpeters, @mjcarroll, and I had a discussion and our preference is to use option (4) as in #283 for the simple numeric constants and option (3) for the rest. To that end, NeverDestroyed has been added to ign-utils in gazebosim/gz-utils#31.

Note that Material::Predefined() is particularly problematic, because it returns to the user a reference to std::map, which always implies a heap allocation and thus will always be leaked anytime the library is loaded and then unloaded. Similarly for kMaterialData in src/MaterialType.hh. Even if those collection types are fixed to be heapless, the class Material has a name which is required to always be on the heap.

(Possibly) related to this - I'm seeing a unit test failure on Fedora rawhide where it appears that the kMaterialData map in src/MaterialType.hh isn't being properly initialized before being used in src/Material.cc. The UNIT_Material_TEST fails with the following output (full log here):

        Start  35: UNIT_Material_TEST
 34/102 Test  #35: UNIT_Material_TEST .....................***Failed    0.01 sec
Running main() from gtest_main.cc
[==========] Running 3 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 3 tests from MaterialTest
[ RUN      ] MaterialTest.Init
/builddir/build/BUILD/ignition-math-6.9.2/src/Material_TEST.cc:52: Failure
Expected: (mats.find(static_cast<MaterialType>(i))->second.Density()) > (0.0), actual: 0 vs 0
/builddir/build/BUILD/ignition-math-6.9.2/src/Material_TEST.cc:52: Failure
Expected: (mats.find(static_cast<MaterialType>(i))->second.Density()) > (0.0), actual: 0 vs 0
/builddir/build/BUILD/ignition-math-6.9.2/src/Material_TEST.cc:52: Failure
Expected: (mats.find(static_cast<MaterialType>(i))->second.Density()) > (0.0), actual: 0 vs 0
/builddir/build/BUILD/ignition-math-6.9.2/src/Material_TEST.cc:52: Failure
Expected: (mats.find(static_cast<MaterialType>(i))->second.Density()) > (0.0), actual: 0 vs 0
[  FAILED  ] MaterialTest.Init (0 ms)
[ RUN      ] MaterialTest.Comparison
[       OK ] MaterialTest.Comparison (0 ms)
[ RUN      ] MaterialTest.Accessors
/builddir/build/BUILD/ignition-math-6.9.2/src/Material_TEST.cc:95: Failure
Expected equality of these values:
  2700.0
    Which is: 2700
  mat.Density()
    Which is: 3.1620201333839779e-322
/builddir/build/BUILD/ignition-math-6.9.2/src/Material_TEST.cc:123: Failure
Expected equality of these values:
  MaterialType::TUNGSTEN
    Which is: 4-byte object <0C-00 00-00>
  material.Type()
    Which is: 4-byte object <00-00 00-00>
/builddir/build/BUILD/ignition-math-6.9.2/src/Material_TEST.cc:124: Failure
Expected equality of these values:
  19300.0
    Which is: 19300
  material.Density()
    Which is: 6.924653541527823e-310
/builddir/build/BUILD/ignition-math-6.9.2/src/Material_TEST.cc:136: Failure
Expected equality of these values:
  MaterialType::TUNGSTEN
    Which is: 4-byte object <0C-00 00-00>
  material.Type()
    Which is: 4-byte object <00-00 00-00>
/builddir/build/BUILD/ignition-math-6.9.2/src/Material_TEST.cc:137: Failure
Expected equality of these values:
  19300
  material.Density()
    Which is: 6.924653541527823e-310
[  FAILED  ] MaterialTest.Accessors (0 ms)
[----------] 3 tests from MaterialTest (0 ms total)
[----------] Global test environment tear-down
[==========] 3 tests from 1 test case ran. (0 ms total)
[  PASSED  ] 1 test.
[  FAILED  ] 2 tests, listed below:
[  FAILED  ] MaterialTest.Init
[  FAILED  ] MaterialTest.Accessors
 2 FAILED TESTS

I did a little tracing/printing, and the density values for each of the kMaterialData map entries all contain garbage data when the kMaterials map is assembled in src/Material.cc. I wasn't able to track down the root cause - it could be static initialization issues, or it could be a bug in gcc-12 which was just introduced into the Fedora buildroot.

If I modify the initialization of kMaterialData per this patch, the density values are initialized correctly.

FYI recently #290 fixed this issue (with respect to the Material map), for the main branch.

Yes, @richmattes this has been fixed on main, but due to our versioning strategy preserving ABI/API within a major, there isn't a way to release this into ign-math6.

I would suggest either attempting to use the main branch (if you are willing to use development branches until garden is released later this year) OR building fortress from source with the patch that you indicated above.

Thanks, I probably should have noticed that PR in the thread above... I'm working on ign-math6 because gazebo-11 seems to be version locked to it, so I'll continue building the fedora package with my linked patch. Feel free to pull it into ign-math6 if other distros start to have issues.

I see some new instances of potentially unsafe constant initialization in #388, #390, and #393. They almost follow the pattern of #283 but are not constexpr. Perhaps for these new classes, it would be better to define the static variables inside of a const method rather than as static member variables?

azeey commented

I believe this was fixed by #283 and #290

There are still some constructor+destructor fiascos:

./include/gz/math/graph/Vertex.hh:57:    public: static Vertex<V> NullVertex;
./include/gz/math/graph/Edge.hh:208:    public: static UndirectedEdge<E> NullEdge;
./include/gz/math/graph/Edge.hh:271:    public: static DirectedEdge<E> NullEdge;

There are still some destructor-only fiascos. Hopefully the compiler optimizes it away, but I don't think its guaranteed:

./include/gz/math/Line2.hh:183:        static math::Vector2<T> ignore;
./include/gz/math/Line3.hh:266:        static math::Vector3<T> ignore;

The fix for those last two is easy, in any case. Just remove the static. It's way slower to use "static" anyway. A dummy on the stack should be free. A dummy as a mutable global requires checking an initialization lock every time we call that function.

azeey commented

Thanks. I'll reopen it since we'll be bumping the major version of gz-math soon. It would be good to fix these then.