eclipse-cyclonedds/cyclonedds-python

idl enum generator

nate800 opened this issue · 19 comments

The idl generator allows for a python type to be generated from the following definition.

enum example {
  None,
  Another
}

When generating python code, None is a keyword.

Hi @nate800,

I do agree with you this is an issue. I am having some trouble with how to fix this without destroying assignability in XTypes, because if you change member names or literal names that check will fail. Some "shadow administration" with fake vs real member names will be necessary. For now I recommend you "just work around it" and wait till either I get around to go through the effort of reworking the name administration or someone else does.

IIRC, somewhere in the IDL spec it says a leading underscore is used to work around keywords. Something like a leading underscore is stripped off from symbols on input, and prepended to any symbol that clashes with a keyword in the target language.

I suppose being masters of the universe, the OMG need not worry about the possibility of a language disallowing a leading underscore and all kinds of other fun things. Or perhaps I have not read the spec carefully enough to have noted that way this transformation is done is language dependent, with the leading underscore simply being the one used for C/C++.

Hello, @eboasson and @thijsmie , I don't know if I have understood you well.

I have tried this IDL code:

module vehicles
{
    enum example {
      _None,
      Another
    };

    struct Vehicle
    {
        string name;
        int64 x;
        int64 y;
    };
};

When I compile it with IDLC for python I get this code (notice that it is generated without the leading underscore). This code it is not going to work for the None keyword:

class example(idl.IdlEnum, typename="vehicles.example", default="None"):
    None = auto()
    Another = auto()

When I compile it with IDLC for C I get this code:

typedef enum vehicles_example
{
  vehicles_None,
  vehicles_Another
} vehicles_example;

If we follow the C philosophy for python, we wouldn't have the problem with the keyword None, we could generate with IDLC for python code like this:

class example(idl.IdlEnum, typename="vehicles.example", default="None"):
    vehicles_None = auto()
    vehicles_Another = auto()

Could this solution be acceptable?

Hi @javiersorianoruiz,

While this proposition works in your example, there are some problems with it:

  • What about enums in the global namespace (e.g. not in a module). They still suffer the same problem with keywords.
  • It is not "developer-ergonomic" to force the user to type long strings, especially if there are 15 nested modules.

I see the real solution as follows:

  1. Add a cyclonedds.idl.annotations.member_name annotation that allows overriding the member name.
  2. Ensure that this annotation is used by cyclonedds.idl._xt_builder.XTBuilder when determining member names.
  3. Add a filter to idlpy that modifies names that are keywords. The OMG spec dictates we should prefix with an underscore (https://www.omg.org/spec/IDL/4.2/PDF section 7.2.3.1) which has some semantic meanings in Python but we'll have to ignore those, but we can add a command line option to use an Unicode character instead (again, not very developer-ergonomic).
  4. Implement the keyword filter in pure python too and apply in cyclonedds.idl._xt_builder.XTInterpreter when dynamically resolving types.

If those steps seem feasible to go through for you I would be happy to guide you towards making a PR. Join the discord for a chat for some tighter communication loops 😄

Hi everybody.

We (@Bit-Seq, @jordialexalar and I) have continued working on this issue. We have found a complete solution, it would be a new feature, that it could solve this problem for any target language (C, C++ or Python) and any reserved keyword.

The change will be very located, under the scope the IDLC compiler (cyclonedds tool), concretely in the source code file cyclonedds/src/idl/src/scanner.c (line 797), in the method tokenize, after checking if an analysed grammatically token is a keyword, a new check function, to be implemented by us, has to be added. This new function will check any possible identifier against 3 new lists of reserved keywords for every of the languages supported: C, C++ and Python.

We propose a new flag for IDLC compiler to control the behaviour of this new feature.

If this new flag is not set, the keyword check will not be performed (old behaviour is maintained).

A new flag -K will be defined, following the code seen in the IDLC for other similar parameters it could have this form:

&(idlc_option_t){
    IDLC_STRING, { .string = &config.keywords }, 'K', "", "warning|error",
    "Reserved keywords checking for any language suported.\n\t - warning: IDL conversion will be performed, and warnings will be displayed for each conflicting keyword indicating the language(s) in conflict\n\n\t - error: IDL conversion will not be performed, and an error message will be displayed for the conflicting keyword indicating the language(s) in conflict" },

As you can see in the code when this new flag is set, two values are possible for the flag: warning or error.

  • With the warning value, the IDL conversion will be performed, and warnings will be displayed for each conflicting keyword indicating the language(s) in conflict.

  • With the error value, the IDL conversion will not be performed, and an error message will be displayed for the conflicting keyword indicating the language(s) in conflict.

We have already implemented a prototype of this new function, for the problem of using the reserved keyword None. It has been integrated with the actual IDLC compiler in order to verify that this solution is doable, and the results have been successful. The reserved keyword None is detected during the IDLC conversion, and a warning or error can be displayed to the developer.

The change is very located and isolated. No risks have been identified, and it will have a low and depreciable impact in the performance for IDLC compilation process.

Our question is, should be see this as a “bug-fix” and use 105 as change-request? Or would it better to add a (new) “enhancement” issue and link them?

Any feedback on the preliminary options-names for the flags are welcome. More details will be added to the cyclonedds-docs.

Please, @thijsmie and @eboasson or someone else who can guide us, what do you think about the new feature that we are proposing to implement?

Hi @javiersorianoruiz,

Let me start by saying I really appreciate the effort you are putting into this, this is a real issue that needs fixing which at the moment we do not have time for. That is why I feel a little awkward in saying that no, what you propose is not a good solution of the actual problem. Let me go through why I think that in a couple of notes, and why I stand by my original idea where we implement a solution for the Python backend.

  • The problem is already "solved" in the OMG spec: it simply states that you should prefix with a "_" anytime you come across a keyword in your target language. Leading underscores are not allowed in IDL names so no chances for collisions exist.
  • This solution is already implemented for C and C++ backends and has been in use since before this millennium started in DDS/CORBA related worlds.
  • Emitting a warning/error doesn't actually solve the problem, it just notifies that there is one. What if you have existing C applications compiled last in 2005 running in your network with a "None" field in there? The sad truth is that that situation is not as ridiculous as you expect...
  • Maintaining keyword lists per language does not really scale. Not all cyclone idl backends are in this even in this github organisation! (I have a couple private ones even, Erik has a Haskell one and possibly more, there is a rust one out there, a C# one is in the works and who knows what is in peoples private repositories out there). If I am building something with C and C++ what do I care if I match a Java keyword? I would never enable those warnings if they bug me about programming languages I don't care about 😅 This leaks details about all possibly supported languages into the core repository which we do not want to do.

I hope this doesn't come over as harsh. I don't want to discourage you from contributing! I do hope that I've made my thought process clear and why I think we need to solve this on the Python end by the strategy I outlined in my previous comment.

Hello @thijsmie.

Ok, we understand your point of view.

We have been checking the IDLC compilation for C++.

For an IDL with this code:

module HelloWorldData
{
  struct Msg
  {
    long userID;
    string message;
    string void;
  };
  #pragma keylist Msg userID
};

We have seen that for any variable the suffix _ (not a prefix) is added to the end of the identifier in C++.

Also, if the name of the identifier is a C++ reserved keyword (in the above example, void) is added this prefix _cxx_ .

So, we obtain the following C++ code after IDLC compilation:

class Msg
{
private:
 std::string message_;
 std::string _cxx_void_;

public:
    message_(message),
    _cxx_void_(_cxx_void) { }

  void message(std::string&& _val_) { this->message_ = _val_; }
  const std::string& _cxx_void() const { return this->_cxx_void_; }

This C++ code compile with no problems. So, we think that for C++ the issue 105 is solved.

On the other hand, for C, we have seen that this behaviour is not followed.

With the same IDL we have generated C code with the IDLC compiler, and we have gotten this C code:

typedef struct HelloWorldData_Msg
{
  int32_t userID;
  char * message;
  char * void;
} HelloWorldData_Msg;

We don't see the suffix _ (not a prefix) added to the end of the all identifiers.

We don't see either the suffix _c_ (or similar) for the C reserved keyword void, and evidently this C code doesn't compile.

So, we have two questions:

  • Is not Issue 105 solved for C?
  • For solving issue 105 for Python, do we have to follow the same behaviour as in C++? (adding the suffix _ at the end of all identifiers and for Python reserved keywords we added at the beginning of the identifier the prefix _py_ (or similar)

Thanks and regards

Hi @javiersorianoruiz,

Indeed for C this issue is not solved, and it should be! I think we're all so used to C keywords in IDL land that we never needed it implemented. void should result in _void clearly. For Python we should solve by adding the _ prefix to reserved keyword, no other changes needed on names, just needing to add the member_name annotation as well for XTypes type object compatibility as described above.

Hello @thijsmie,

We see possible problems of interoperability between different languages if we implement this solution, maybe we are wrong and need more information.

For instance, if we define an IDL like this:

module HelloWorldData
{
  struct Msg
  {
    long userID;
    string break;
  };
  #pragma keylist Msg userID
};

For C++ the IDLC compiler generates a C++ code where the identifier break is translated into _cxx_break (because is a reserved keyword in C++).

For Python, when in the next future we implement the change, the IDLC compiler will generate a Python code where the identifier break is translated into _break or _py_break.

We have done manually this Python translation. The generated Python code has been manually modified, the identifier's name has been changed to _break. We have a C++ publisher sending this HelloWorldData.Msg message. The C++ subscriber does receive the message, but the Python subscriber doesn't.

We have done another test, where the IDL has a C++ reserved keyword (in this case void), which is not a Python reserved keyword. So, in C++ we have a translated identifier _cxx_void, but in Python, we have the unmodified identifier void.

For this test, both, C++ and Python subscribers, receive the message sent by the C++ publisher. So, we think that in some point, we don't know where, in the cyclonedds-cxx code, the identifier _cxx_void is being translated again to void, before doing the sending, but where?

We guess that we have to follow the same logic for Python. Working from the point of view of the programmer, with identifiers like _break, but internally, cyclonedds-python has to send and receive messages with identifiers like break, in order to maintain interoperability between languages.

Are we right?

Yep, you are fully right. That is why the generated classes need an option to retain the original member/type name. The implementation here functions as IDL annotations (see cyclonedds/idl/annotations.py. By adding a @class_real_name(name: str) decorator and a member_real_name(member_name: str, real_name: str) function and then accounting for those in XTBuilder and XTInterpreter in cyclonedds/idl/_xt_builder.py you can achieve the desired effect. Then these annotations should be emitted by idlpy when it detects a python keyword.

Hello @thijsmie.

We have this proposal about decorators.

Imagine that we have the following IDL

module HelloWorldData
{
  struct Msg
  {
    long userID;
    string message;
    string break;
    string None;
  };
  #pragma keylist Msg userID
};

This IDL will be translated into Python with the following decorators (notice the new decorator @annotate.keywords , using a python dictionary):

@dataclass
@annotate.keylist(["userID"])
@annotate.final
@annotate.autoid("sequential")
@annotate.keywords({'_py_break': 'break', '_py_None': 'None'})
class Msg(idl.IdlStruct, typename="HelloWorldData.Msg"):
    userID: types.int32
    message: str
    _py_break: str
    _py_None: str

Moreover, we will have to create the following function in the file cyclonedds/idl/annotations.py to attend decorator execution

def keywords(list_of_keyswords: Dict[str,str]):
    ....

Do we go by the right direction?

Hi @javiersorianoruiz,

This is indeed in the right direction, however I propose you don't use the class decorators in this case but the member modifiers. Since they might be used for more than just 'keywords' a more generic 'member_name' would be better. Next, I think a plain _ instead of _py_ is sufficient (and might line up better with the other backends). I've updated your example below:

@dataclass
@annotate.final
@annotate.autoid("sequential")
class Msg(idl.IdlStruct, typename="HelloWorldData.Msg"):
    userID: types.int32
    annotate.key('userID')
    message: str
    _break: str
    annotate.member_name("_break", "break")
    _None: str
    annotate.member_name("_None", "None")

Hello @thijsmie,

While we were making the changes to achieve use Python keywords in the IDL we think we have encountered an interoperability problem.

When we publish a message in C++ and another user working in Python subscribes to that topic, the subscriber receives messages.

But, when the publisher is the Python user and the subscriber is the C++ user, the subscriber is not able to receive any message.

We have verified that the problem persists with the main branch:

  • Python cyclonedds environment: version 0.10.2
  • C++ cyclonedds environment: version 0.10.1
  • gcc version: 11.1

Is this an error and we should open an issue?

Regards.

The careful consideration of names of members with keywords is part of this issue, let's get the solution @javiersorianoruiz is working on merged and then re-review the interoperability.

I am working with @javiersorianoruiz and as I said the interoperability problem was happening before any change from our side, that is, the problem is happening with the main branch.

We will do what you say, we will focus on solving members names with keywords.

Possibly the C++ branch is not careful enough in retaining the original member name in the XTypes typeobject. This will need some separate research. You can check by:

  1. Creating a topic with C++ with a keyword named field
  2. Checking the type of that topic as it comes over the wire with cyclonedds typeof topicname

If that is indeed broken an issue can be submitted to the cxx branch.

We have done this test, a cyclonedds-cxx publisher and other cyclonedds-cxx subscriber using a c++ reserved keywords.

module HelloWorldData
{
  struct Msg
  {
    @key
    long userID;
    string break;
  };
};

We have successfully checked that subscriber receives the message.

So, we think cyclonedds-cxx is correctly managing the use in the IDL of c++ reserved keywords.

We don't know how to check the second point that you indicate us in the previous comment:

"2. Checking the type of that topic as it comes over the wire with cyclonedds typeof topicname"

But we think cyclonedds-cxx is working properly.

Even with a cyclone version just downloaded and not using any keyword named field, the problem we have seen it is about interoperability between a cyclonedds-python publisher and cyclonedds-cxx subscriber.

In the other direction it works well (cxx -> python).

We are using an IDL without any keywords:

module HelloWorldData
{
  struct Msg
  {
    @key
    long userID;
    string message;
  };
};

To verify all the possibilities, we have also tested :

  • a cyclonedds-cxx publisher and and cyclonedds(C) subscriber, and it has worked. (in the opposite direction it has worked as well).
  • a cyclonedds-python publisher and cyclonedds(C) subscriber, and it has worked. (in the opposite direction it has worked as well).

So it seems is only not working with cyclonedds-python publisher and cyclonedds-cxx subscriber.

Are you sure it is not the same issue as earlier with the fix posted here: eclipse-cyclonedds/cyclonedds-cxx#331 ?

This was solved by the great work of @javiersorianoruiz and @Bit-Seq!