Parameterize a script.Pipe with a user data struct ?

Question

Parameterize a script.Pipe with a user data struct ?

fbaube opened this issue 2 years ago · 6 comments

There's many pipeline packages "out there" but script seems to be one that gets it conceptually correct.

Question: Should it be possible to attach a data structure to a pipeline using generics ? Something like

type ParamPipe[T any] struct {
        UserData T 
        Pipe
}

Then a process pipeline for an instance of a user-defined struct could easily be constructed in a one-liner, and new functions could specifically process the user data.

I wish to write processing pipelines for chunks of content, and script's bash-style primitives provide a lot of helpful functionality.

Answer 1 · 2023-08-21T16:30:53.000Z

Thanks for the suggestion, @fbaube! Can you come up with an example of the kind of program you'd like to write using this idea? That'll help me get a clearer picture of how it might work.

Answer 2 · 2023-08-21T18:55:01.000Z

This is really similar to what I was messing with but never got around to doing.

Can I hit you with a use case ?

I have 3 folders and each is sort of their own binary micro service

I want that when one folder changes to raise a change event to a broker like nats. This is done by a fs watcher

Nats broadcasts it to the other folders binaries and try do sone work and change their file system and this raise more events.

this is called Choreography . It’s bottom up work flow piping where the workflow emirates from whatever file change events are being board cast and who is listening.

mots simple like this project and the schema is just the file that changed in which project.

Answer 3 · 2023-08-22T08:49:31.000Z

Can you come up with an example of the kind of program you'd like to write using this idea? That'll help me get a clearer picture of how it might work.

My goal is something like a DSL for processing Lightweight DITA. LwDITA is a DITA with a greatly reduced tag set, plus support for HTML5 and Markdown. This is where script would be used, and I would create new ParamPipe functions.

(As an aside, I figure that when I have M pipelines for M files, each with N processing stages, then there is a number of ways that this load could be distributed across multiple processors.)

So in the CLI program, the processing for a file looks (or will look) something like this:

Gather CLI references to files and directories
Expand directories into file lists
Process in-file metadata (e.g. HTML )
Read file content
Analyze file content (MIME type? Is XML? Has DOCTYPE? Is valid XML? etc.)
Parse file into an AST (e.g. using goldmark for Markdown, stdlib for HTML5 and XML)
Extract "interesting" links (cross-references, ToC entries, etc.)
(Note that up until this point, each file can be processed in isolation)
Resolve and check validity of inter-file links
Prepare file set for XSLT processing

fred

Answer 4 · 2023-08-22T10:15:37.000Z

That sounds great! So what would the script code look like to do this?

Answer 5 · 2023-08-22T20:32:32.000Z

Good question! I already have code that looks like this:

return p.
                st1a_ProcessMetadata().
                st1b_GetCPR().         // Concrete Parse Results 
                st1c_MakeAFLfromCFL(). // Abstract Flat List from Concrete Flat List 
                st1d_PostMeta_notmkdn()

A pure DSL tho would need to deal with how a list of N files fans out into N separate pipelines. I'm not sure whether script can do this, and it's not a typical task for a shell script either. I don't know whether there is a best practice for DSLs to do this.

To [Param]Pipe I would also add a debug io.Writer and a DB connection.

Answer 6 · 2024-10-06T22:16:21.000Z

@bitfield what're your thoughts on this issue?

Answer 7 · 2024-10-07T09:53:15.000Z

A generic pipe type is an intriguing idea—I don't think it would fit into script, but it could make an interesting package of its own.