lukehutch/pikaparser

Represent the parsing process in BPMN 2?

elfring opened this issue · 10 comments

It was chosen to represent the pika parsing algorithm in Java source code.
🤔 How do you think about to convert available information into a representation which would be supported by the standard “Business Process Model and Notation”?

Sorry, what would be the value of representing the algorithm in BPMN 2 notation?

I wasn't familiar with this notation, but looking at some BPMN diagrams, this looks derived from the old-school flowchart techniques of the 80s. These went the way of the dodo, and with good reason -- designing algorithms with flowcharts typically leads to spaghetti code. They are really only useful for large-scale design where you have multiple independent systems interacting via information transfer of some sort.

Specifying an algorithm using a working prototype accomplishes numerous important things:

  1. It completely eliminates ambiguity, since a real programming language is unambiguous in syntax and semantics (I am very against the use of pseudocode in papers, for this reason). It allows readers of the paper to see how the algorithm described in the paper can actually be implemented (many algorithms described in papers that do not come with source code are exceptionally hard to recreate).
  2. It proves that the algorithm works for some set of testcases, and allows further testcases to be written to strengthen that proof. It also in some cases allows for formal proofs of correctness.
  3. It allows interested parties to tinker with the algorithm without having to reimplement it themselves.

You are welcome to model the algorithm in BPMN 2 if you would like, and if it's useful, I could consider adding it to this repository. However I don't see the value in doing it myself. Sorry!

I imagine that additional data processing representations can help to convert your reference implementation into variants for other programming languages.

Ah, OK. Well I tried to write in a very generic style, which uses almost no Java-specific idioms. It should be very straightforward to port the reference parser to other programming languages. I bet a Python expert could do it in an afternoon, for example.

🤔 I imagine also that a possible conversion of the reference parser can become more challenging for programming languages which do not provide a convenient standard library so far.

Literally the only standard library routines that this parser uses is "read all the lines from a file". So if your programming language doesn't support that, you have much bigger problems!

Everything else that could be construed as standard library support is really basic stuff like string concatenation, and data structure stuff like the use of a priority queue. There are almost no standard library calls in the whole codebase. Priority queues are data structures 101, so if a language doesn't have support for those in either the standard libraries or some open source library, then it must be a very early-stage programming language indeed, and there are thousands of webpages that show you how to implement those using a heap.

The data processing convenience is varying between available programming languages (also according to function or class libraries to which you got used to).

Well languages use very different paradigms for almost everything, yes. But take a good look at the code, porting will not be complicated. However there are many tricky details to get right in an implementation that you would have to figure out from scratch if implementing from some sort of spec. It will be much easier to port the code directly, rather than trying to re-implement the algorithm from a spec, whether in pseudo code or in BPMN or other flowchart system.

Did you come along any efforts to transform Java code into code variants for other programming languages (which do not share similar virtual machine bytecode) automatically?

I have many times ported code (automatically, semi-automatically, and/or manually) to/from other programming languages, yes. I haven't tried that with this parser. But I consciously thought about porting when I wrote the code, since it is a reference parser, so I'm telling you, it's about as portable as it could possibly be.