tud-zih-energy/otf2xx

Make impl::undefined equal to nullptr

Closed this issue · 13 comments

The current implementation of <definition>_impl::undef returns a value that is in an "undefined" state, this has some drawbacks:

  • One needs to create such a value and passing that around incurs overhead although there is no benefit of having it
  • Comparing an undefined def to itself yields false:
    if (!a.is_valid() || !b.is_valid())
    This was already considered mistake for NaNs and causes many headaches. E.g. one cannot do if(undef_instance==DefType::undef()) but needs to resort to if(undef_instance!=undef_instance)
  • There might not even be an invalid state. The comparison relies on the value of ref() but e.g. a property has no reference at all
  • One could change the undefined state: DefType::undef().get_weak_ref()->name() = foo

I propose the following changes:

  • Remove all *_impl::undef methods
  • Use nullptr as the undef value (this allows e.g. checks like if(undef_instance.isValid()))
  • Make base::ref() return the undefined reference value for referable types
  • Compare the contained pointer only when comparing definitions. One should not have definitions with same reference ids but different contents anyway, so this should not break anything and allows e.g. comparing undefined values and properties

I think you're mixing things up here.

This code will run fine:

otf2::definition::string def;
assert(def.is_valid() = false);

otf2::definition::string def2;
assert(def != def2);

auto undef = otf2::definition::string::undefined();
assert(undef.is_valid() == true);
assert(undef == undef);

auto undef2 = otf2::definition::string::undefined();
assert(undef == undef2);

One needs to create such a value and passing that around incurs overhead although there is no benefit of having it

It's there as a static. non-issue.

Comparing an undefined def to itself yields false:

No, it doesn't.

There might not even be an invalid state. The comparison relies on the value of ref() but e.g. a property has no reference at all

This might be true, but is it really a problem?

One could change the undefined state: DefType::undef().get_weak_ref()->name() = foo

This is effectively a const_cast, but an issue I'd agree.

Use nullptr as the undef value (this allows e.g. checks like if(undef_instance.isValid()))

Breaks the interface.

Compare the contained pointer only when comparing definitions. One should not have definitions with same reference ids but different contents anyway, so this should not break anything and allows e.g. comparing undefined values and properties

This assumption is plain wrong. Think of a trace merger. Plus, you may not want to or be able to have the full information for a definition during runtime and have to still compare those definitions anyhow.

The misunderstanding is: IMO undefined == invalid == !isValid. So Question: What is the semantic difference of undefined and invalid? Shouldn't those be the same as OTF2 defines an undefined definition but not an invalid one? What would be the use of an invalid one that an undefined one does not solve?

It's there as a static. non-issue.

Passing it around still has overhead (when passed by copy)`

No, it doesn't.

Try: assert(def != def), this passes, why should it?

This might be true, but is it really a problem?

Maybe. You can't generically do definition == DEF::undef(), which might be required.

Breaks the interface.

If there are reasons for that why not? You did the same in #9. In this case there are 2 invalid states for definitions which is IMO a mistake. If it isn't can you define the semantics and why not to merge them?
Besides: Except that !isValid and ==DEF::undef() will be the same, how would a user notice? Is writing such an undefined definition even valid OTF2?

This assumption is plain wrong. Think of a trace merger.

The reference number is always relative to the archive it belongs to. So actually you cannot compare 2 reference numbers from different traces at all because there is no relation. Or why should "Foo"(2) == "Bar(2)" hold? The trace merger should rewrite the definitions. So for every unique definition (ptr to _impl) there should be a mapping to a new, unified definition.

Plus, you may not want to or be able to have the full information for a definition during runtime and have to still compare those definitions anyhow.

Sorry? If you only have reference numbers and want to compare reference numbers, do that. Why create a definition instance from it, where all values are in an invalid state?

Undefined means the reference number is -1 (or the OTF2_..._UNDEFINED equivalent. Invalid means, the definition object was either moved from or default constructed; or data_ == nullptr. There is no OTF2 equivalent because definitions aren't objects there.

If there are reasons for that why not?

Yes, if there are compelling reasons. A performance problem used in hot-loops is a wholly different kind than comparing undefined and invalid definitions. I don't see enough reason here.

Sorry? If you only have reference numbers and want to compare reference numbers, do that. Why create a definition instance from it, where all values are in an invalid state?

I didn't say any information. You may still have parts of the information. And, you have to somehow connect the reference number to whatever the domain gives you. Why not just use an in-part dummy definition?

After thinking a bit about it over lunch, I've several points. First of all, the undefined state is the natural extension of the OTF2_..._UNDEFINED reference numbers existing in OTF2. There is one reference number in OTF2 denoting an undefined definition and there is one undefined definition in otf2xx.

From an interface perspective, I like the idea of removing the undefined state. But, what concerns me, is the following. Right now, a precondition for every member function of a definition is that the definition is valid (it hasn't a nullptr as data_ member). If we change the ref() method to return -1 if the definition is not valid, we would introduce non-negligible overhead, as to get the ref() of a definition while writing an event, you'd have to perform the validity check for every definition referenced by the event. We don't have that time when writing events.

Or to make it pretty easy. �While referencing undefined definitions from other definitions doesn't make much sense, sometimes you need to (maybe indirectly) reference an undefined definition from an event, because while writing events, you may not have all information required to fill in every field of an event. Think of sampling a stripped binary, you'll probably end up with calling contexts without any notion which code that is.

For the comparison operator, I'm actually tempted to insert the following:

assert(a.is_valid() && b.is_valid()); which would circumvent the NaN == NaN discussion.

All in all, I already spent to much time on this and I'm not convinced that removing the undefined() will bring much to the table, so I'd rather stay with it, as it might be needed, and if it is only as a template for the users on how to fill in unknown or don't care values with at least something.

First of all, the undefined state is the natural extension of the OTF2_..._UNDEFINED reference numbers existing in OTF2

The same purpose would hold for undefined <=> nullptr

I understand the point of the additional overhead. It is essentially: int ref(){ return ptr == nullptr ? INVALID : ptr->ref;} Note that the debug build already has this check and in both this incurs a pointer dereference. So what about moving the reference into base (or better still in a subclass referable_base)? In the default ctor you set the ptr to nullptr and the ref to undefined, the other ctor requires a non-null ptr and the ref. From an interface perspective nothing has changed as the ptr and the ref are both hidden in the definition and not changeable.
Only thing where this can be noticed is with the weak_ref where you could do that. But I would argue that this class is a design mistake anyway, a relict from where there actually was a shared_ptr but now weak_ref is very unsafe to use. For compatibility I'd define it to the actual definition instead and add an explicit operator bool to the definition-base. Actually it is a detail-type anyway so usages of that in client code are invalid. And in library code it can be easily removed.

Downside: Definitions are ~ the size of 2 pointers, but copying the additional value is free, as the atomic ref count stalls the CPU anyway.
Advantage: Much faster writing as access to ref() now means a pointer indirection less

Additionally this fits the concept I described: The definition objects are a handle, they are essentially the reference and for convenience act like their actual definition via the included pointer.

Comparison is now trivial: Implement ref==ref for referable_base and nothing for base until one can compare e.g. property by value or whatever.

So what about moving the reference into base

I just tried that until I realized that this isn't a good idea at all. This will end horribly as soon as someone tries to change the reference of a definition. This effectively means that not only you can have identical refs with different data, but you can also have different refs with identical data.

Summary of call: No real changes to current behavior. References of existing definitions are constant.

I have a working patch set. It wasn't as easy as anticipated, because the events only hold weak_refs, which also needed the ref values. I really hope this will not explode...

However, there is one more thing that came to my mind. A while back @tilsche wanted proxy definitions. This change should bring this.

Now I "only" have to merge it with #34

Why use weak_ref anyway? Just give them a full definition, weak_ref is extremly unsafe (lock is useless and does not do, what it promises. The definition could as well be freed already)

"proxy definitions"?

Just give them a full definition, weak_ref is extremly unsafe (lock is useless and does not do, what it promises. The definition could as well be freed already)

That was the whole point about them. The precondition is: the referenced definition will outlive the weak_ref. This will be true in the reading case, because the definitions live in the reader, which outlives events. And this is true for the writer case, as long as the user passes the definitions first into the writer.
Based on this, we were able to strip some guaranties from the shared_ptr (and weak_ptr) and get a performance improvement by using the intrusive_ptr / weak_ref combination. Using definition objects in events means way to much overhead, because of the reference counting necessary.

"proxy definitions"?

Cheap definitions. Basically something like only passing around reference numbers, which the new definition object allows.

I see. So the performance improvement comes from being able to copy events faster (no ref counting in the contained defs)
I was wondering, because everytime you want to access a definition from an event, it gets converted to a full definition. You are not even able to only return a reference so if you access the definitions in the events more often than you copy events you'll be slower. Quite a gamble... For the usual writing (create an event, immediately write it) it is the same. Well, if it works, don't touch it.

I was wondering, because everytime you want to access a definition from an event, it gets converted to a full definition.

The point is, that for writing an event you need nothing else than the ref of a definition. This is a theoretical point of view, I don't necessarily know how to make events do that. There are certain hot loops where you do not want to either copy or create any smart pointer whatsoever.

I've just seen, that there is a friend-decl there, so the writer as indeed no overhead.

I would even make the weak_ref much smaller. Essentially just a holder of definition_impl and (optionally) a ref with it being implicitly convertible to a definition. All those pointer-operators may lead to misuse... Or one could make it fully registry based, so all definitions live in the registry and one only needs the weak_ref and never a full definition. It would only change access from region.name() to region->name() which most IDEs convert automatically. Just an idea for later.