mransan/ocaml-protoc

allow adding ppx annotations on generated types

Closed this issue · 17 comments

For example, I need to add [@@deriving sexp] to all of the types generated by ocaml-protoc. Would be great to support this feature.

Thanks for this nice project. It has been very useful.

for [@@deriving sexp] I believe that you will also need to be able to open the appropriate top level module (Sexplib.Std/Conv?)

So we could do something like:

message Person {
    (option ocaml_ppx_annotation) = "@@deriving sexp";
    required string name = 1;
    required int32 id = 2 [(ocaml_type) = int_t];
    optional string email = 3;
    repeated string phone = 4;
    map<string, string> details = 5 [(ocaml_container) = hashtbl];
}

The string specified in the option would be simply be propagated to the generated type as is.

Another thing to consider is that some deriving ppx's allow for attributes on the fields themselves. Check out the show plugin for example.

Would be nice to support this as well.

Hi @rgrinberg, @agarwal

I implemented the support for ppx extension annotation for messages and enums. For instance:

import "ocamloptions.proto";

enum E {
  option (ocaml_enum_ppx) = "@@deriving show"; 
  EONE = 1; 
  ETWO = 2;
}

message M {
  option (ocaml_type_ppx) = "@@deriving show"; 
  required int32 f1 = 1;
  message Sub {
    required int32 sub_f1 = 1;
  }
  required Sub f2 = 2; 
  required E f3 = 3;
}

Would generate:

type e =
  | Eone 
  | Etwo 
[@@deriving show]

type m_sub = {
  sub_f1 : int32;
}
[@@deriving show]

type m = {
  f1 : int32;
  f2 : m_sub;
  f3 : e;
}
[@@deriving show]

Can you try it and let me know what you think.

As a a concequence I am thinking of removing generating the generation of pp_<type name> function the the _pb.{ml|mli} since it can be trivially added with @@deriving show. Let me know if that would be a concern.

Thanks for working on this. IIUC, this requires modification of the .proto file. In my case, I'm getting the .proto files from an external project that I don't control.

Correct it requires the modification of the .proto file. Depending on how frequent you get the updates merging the new .proto files might not be that bad and probably better than modifying the generated code.

A nice alternative would be to have a file level option which would apply the ppx to all the file types. You would then only need a single line update.

Depending on how frequent you get the updates

On every fresh build of my project. :) Of course, I could limit this, but right now I literally do wget calls as part of my build.

merging the new .proto files might not be that bad and probably better than modifying the generated code.

The files I'm dealing with have nearly 150 type definitions. Manual modification in any form is cumbersome.

I was thinking a more automated solution would be possible. For example a command line option -add-ppx-to-types "[@@deriving show]", which would blindly add that to every generated type definition.

for [@@deriving sexp] I believe that you will also need to be able to open the appropriate top level module (Sexplib.Std/Conv?)

You're correct.

Another thing to consider is that some deriving ppx's allow for attributes on the fields themselves. Check out the show plugin for example.

Right, so a general solution could get tricky unless we could use ocaml-protoc as a library? Could a function be exposed that let's us specify a hook that post-processes the result of ocaml-protoc's default work?

I would need to think a bit more about the library idea... we probably need some placeholders in the ocaml_type AST so that the caller can add data.

For the field attribute i think this could easily be implemented by introducing a "ocaml_field_ppx" custom option.

The required open statement is a bit more difficult and adhoc to this specific ppx. Looks like all the other ppx deriving are not like that!

The required open statement is a bit more difficult and adhoc to this specific ppx.

I'm not quite sure why ppx_sexp_conv does this, but other ppx rewriters do require something similar. I can't remember which one right now, but I recall some discussion about rewriters that allow customizing the "runtime" library that gets used. For example, the rewriter generates code containing List.existsi, but what library is providing such a function? I use Core, which provides such a function, so for me having an open Core.Std could satisfy the requirement. However, someone else is using only stdlib, which doesn't have this function, so they need to make it available some other way. So the ppx rewriter (call it foo) requires that the user do open Foo_runtime in the context of the generated code, i.e. the user can implement foo's runtime however they want.

Regarding the library idea, actually there's nothing to do. It already exists. You can always apply multiple rewriters. If you want to do something arbitrarily general, then writing your own rewriter is the solution. So I retract that proposal. This project should only provide a more specific solution that its users are likely to commonly want, e.g. adding annotations to generated types.

Regarding that, I still would push for a solution that doesn't require modifying .proto files. One of the benefits of .proto is that it is language agnostic. Adding OCaml specific stuff within your proto definitions seems the wrong direction.

@mransan the required open statement can be added without modifying the source by compiling the resultant module with something like -open Core.Std I believe.

the required open statement can be added without modifying the source by compiling the resultant module with something like -open Core.Std I believe.

Yes, but I wouldn't promote using -open as a requirement. The meaning of your code is no longer defined by the code itself, but also the commands it gets compiled with. (Of course, that's also what ppx is all about, but we already know ppx is a hack that the community decided is worth the benefits.)

@agarwal that last PR #90 should implement your workflow. I added 2 new file options:

  • ocaml_file_ppx : this allows the specification of file level ppx (ie [@@@ ... ])
  • ocaml_all_types_ppx: this allows the bulk specification of the type level ppx

Then subsequently i added support for all custom file options from the command line:

$ ./ocaml-protoc --help
ocaml-protoc -ml_out <output_directory> <file_name>.proto
  -debug enable debugging
  -I include directories
  -ml_out output directory
  -int32_type int32_type file option
  -int64_type int64_type file option
  -ocaml_file_ppx ocaml_file_ppx file option
  -ocaml_all_types_ppx ocaml_all_types_ppx file option
  -help  Display this list of options
  --help  Display this list of options

Can you try the following:

ocaml-protoc -ocaml_all_types_ppx "deriving show" -ml_out ./  blah.proto

Let me know!

Regarding that, I still would push for a solution that doesn't require modifying .proto files. One of the benefits of .proto is that it is language agnostic. Adding OCaml specific stuff within your proto definitions seems the wrong direction.

Command line arguments cannot solve everything as it can be hard to support message/enum/field options. Google is already using those extensions to affect the generated code in the backend they support; hence I think it's reasonable to be consistent with their approach.

I hope the last PR strikes a balance by allowing all file options to also be accessible on the command line. For all other options (field/enum/message one would have to modify the .proto).

As a side node, a workflow I have seen used and recommended to decouple application is to share the .proto (or other schema format) at the moment you integrate an API. The client would take a copy of the schema (annotate if needed) integrate the API and rely on the backward compatibility of the API. If the API evolve and new feature are added (in a backward compatible way) then whenever it is convenient, the client can choose to sync up its schema file and use the new functionality. This workflow allows for true decoupling of software components and independent releases. The drawback is that the backward compatibility constraint is likely to increase invariants and harder to understand schemas.

Thanks. This seems to work pretty well. I notice it doesn't compile with OCaml 4.02.3 so maybe you want to add available: [ocaml-version >= "4.03.0"] to the opam file.

Do you plan to add support for the deriving sexp requirement of adding some code to the top of the file? I'll be perfectly happy without that if you think it doesn't merit support. I can easily do something in my build script to insert extra text myself.

Thanks for all this work!

I fixed the compilation issue in 4.02.3 and released a new opam version with all the new functionality!

Thanks for all your feedback. If no one objects I will remove soon the pp_<type> function as mentioned in #91

@mransan Thanks!