nopgen: A Shell repository from mcamiano

NopGen was an exploratory project to consider how factor out the dependencies from among source code modules, to derive common patterns and integrate new modules in terms of existing patterns.

NopGen was a compositional language, mapping data streams into source code.

At the time I was learning a bit of calculus, and it was very exciting to consider the possibilities.

The NopGen markup, using delimited text with attributed blocks, was purely a case of convergent evolution.
I did not know about SGML while implementing NopGen's and XML didn't yet exist even as hype.
The notation was inspired by a need to conserve the source code identity, byte for byte, so that two factorings could be demonstrated to be equivalent.
Then I discovered Charles Goldfarb's SGML Bible, and a more literate colleague mentioned the entire family of LISP languages.
It was deflating to know that others had explored this wheel decades before my clunky, squarish version.

NopGen was a skunkworks project. Implemented in AWK, the tool worked slowly and it was buggy.
The effort to fix it is not at all worth it, given that a multitude of interconnected templating systems and transformational languages have far surpassed its original intent.
It is now trivial to use PHP or XSLT or any of many Ruby DSLs to act as a source code composition engine, and some languages like Ruby have metacoding as a core idiom.

Nopgen got me interested in SGML, then Scheme and DSSSL, XML and XSLT.

NOPGEN

released to the public domain

No claim is made as to the appropriateness of this software to any task.

Use at your own risk.

No warranty is expressed or implied.

Abstract

Dependancies exist throughout typical application software which make
the code inflexible to external and internal changes. For example, most
data intensive applications make use of some form of data dictionary to
maintain data structure information. Yet, in the software editing process
the implimenter may make only cursory manual use of the information. The
software text created has numerous points at which data structure
'meta-data' is hard coded, unfactored, and unidentified. When (not if)
the structure of the data changes, maintainance costs increase because

Nopgen is source code manipulation device which allows dependancy
factoring. It is similar to a lexical analyzer and parser, with both
user specified and predefined patterns and actions. Output is generated
by automating the typical editing process using a predefined template
for intput. A rough template representative of output text is factored
so as to remove dependancies. The dependancies are documented in the
template text as they are removed, and rules are defined and stored
which are later used to recreate documents similar to the original.

Text Generation Model

The text generator works by editing a file containing source text, using
simple user defined patterns and actions. A pattern and action are together
known as a "coupling". The coupling is used to represent a dependancy in the
source text. A file containing coupling definitions, referred to as a
"junction box," is used to provide a single point of change for maintaining
these dependancies.

The source files, called pattern files, provide the baseline text from
which the output is generated. Couplings are included in the pattern
file at points referred to as "fittings." A fitting is a change point
in the pattern file text. Is specifies, through Nopgen statements,
the effect that a dependancy has on the baseline text.

Couplings represent lists, or sets, of items. They are connected to a
pattern file by inserting fittings. The fitting associates a coupling with
unblocked text (a plain macro), or with blocked text (via Nopgen statements).
An unblocked text fitting is made by prefacing and suffixing a coupling name
with the demarcation patterns, or by inserting an EVALUATE statement.
A fitting is made with a BLOCK defined pattern by using the PASTE statement.

When a coupling is accessed, it is evaluated as if it were a readable file
containing multiple space-delineated fields, in multiple newline delineated
rows. A PASTE statement can edit a text pattern in a BLOCK by using the
information supplied by a coupling. The text read from a coupling can
also be evaluated and reevaluated inline using the EVALUATE statement.
This allows the coupling mechanism to be used for pattern file text
inclusion.

Description of the Code Generation Process

To produce output, couplings are joined to the pattern file via fittings.
A coupling is evaluated at code generation time, and results used to control
editing of the pattern file template.

There are three ways in which pattern file text can be processed
when a coupling is evaluated.

The first is simple replacement of a fitting with the value of its coupling.
In this case, text is processed line by line and passed through with one line
of output (usually) per line of input. This type of fitting is essentially
a type of macro expansion.

The second type of processing involves the BLOCK blocks. It
is essentially an automated cut-and-paste process. A BLOCK block
specifies a frame of text which is copied, cut, and pasted once
for each set of fields in the coupling output. All other
fittings within the block are processed, as in the previous case.

BLOCKs may be parameterized so that they become a form of templates.
A block's parameters may be fitted into the text within the block itself.
They will be replaced with fields read from a coupling, which is evaluated
when a PASTE statement is executed.

By default, no text is output for a block until it is PASTEd into the
highest level block, the "fulltext" block. The "fulltext" block represents
the last pass of text generation, after all baseline text has been read in.

Code Generation Details

Demarcation of fittings from pattern text.

The default demarcation pattern is "$-" . This can be changed with the
environmental variables "DEMARCSTART" and "DEMARCEND". The patterns are
set as regular expressions. See the man pages for grep, ed, awk, or sed for
use of regular expressions. Use of constants is recommended, since
matches via complex regular expressions are not easily debugged
by visual analysis; they can lead to unexpected behavior.

Defining couplings.

Couplings are defined within pattern files using the COUPLING statement.
A default coupling definition file is provided, named "junction.box".
Each line in the junction box defines a coupling which will be used
in editing the pattern file for output.

The junction box entries are Nopgen coupling definition statements.
The Nopgen keyword COUPLING precedes the definition. Like a COUPLING
statement in a pattern file, it needs to be prefaced and suffixed
with demarcation patterns. The value assigned to the coupling represents
an executable command line.

As an example, the following defines a coupling named "tables" which
returns all table names in an SQL database:

$-coupling tables="isql - mydb <<EOT
select tabname from systables
EOT"-$

Since coupling definitions are actually command lines, the programs
called can be parameterized by the use of environmental variables. This
example makes use of the environmental variable "mytable", presumably set
to a table name before calling Nopgen :
(note that newlines should NOT be present in the actual statement)

$-coupling columns="isql - mydb <<EOT
select colname from syscolumns,
systables where systables.tabname = \"$mytable\" and systables.tabid
= syscolumns.tabid"-$

A definition for a coupling may not contain embeded newlines. The
definition may exceed the visible line length, so long as it remains one
single uninterrupted line.

Nopgen statements.

The Nopgen statements follow. Couplings are denoted by the word
'coupling_name'. Fully optional arguments are shown in square brackets.
When at least one choice must be made from a list of options, braces are
made. Elipses are used to represent an open-ended list of options.
Parenthesis and quotes are literals and should be used as shown.

BLOCK block_name ([ param1{="val"} {, param2{="val2"},... paramN={"valN"} } ])
END BLOCK block_name

A block of text is delineated from other pattern file text by the use of
the demarcation patterns and the Nopgen statement BLOCK. By default,
there is always at least one implicit text block defined, identified by
the block id of "fulltext".

The BLOCK defines editable units of text. The block parameter
names may appear within the block, in which case the parameters are
replaced with text provided by a coupling, or by defaults. Parameters
which do not appear in the text are ignored. The parameters may also
appear in text inserted into a BLOCK body with an EVALUATE statement.
Parameters may be defined default values, which take precedence if
the BLOCK is PASTEd with insufficient (or null) parameter values.

The block does not appear in the output unless it is instantiated with
a coupling in a PASTE statement.

Block definitions are processed before any other directives in the
pattern file. All statements (except more BLOCK statements) may be used
inside a block definition. None of the statements inside the block are
processed until the "fulltext" block is processed for output.

EVALUATE coupling ([ param-1, param-2,... param-N ])

The EVALUATE statement lets text from a coupling be directly
included in the output, or to be inserted into the body of a BLOCK.
The coupling may optionally be passed parameters. The parameters should
have been defined in a containing BLOCK statement. If the parameters
are not defined by an 'ancestor' BLOCK, they are assigned the null string.

Text which is included with an EVALUATE statement is recursively
reevaluated until all Nopgen statements are exhausted. When used within
a BLOCK of text, only successive EVALUATE statements are re-evaluated in
this way. Final evaluation of other statements and couplings is left
to the final output of the "fulltext" block.

PASTE block_name FOR EACH ([ field-1, field-2,... field-N ])
IN coupling ([ coup-param-1, coup-param-2,... coup-param-N ]) [ UNLESS NULL ]

PASTE statements will copy-and-paste the named BLOCK of text for each
space-seperated field in the text of the evaluated coupling. Parameters
may optionally be passed to the coupling. The structure of coupling
text may be specified using optional positional field parameters. The names
of the field parameters should coincide with parameter names of the
named block.

It should also be noted that the interface between the text block and
the system couple are flexible. The BLOCK parameter list is NOT positional.
Instead, its parameters are associative; they are accessed by name only.
Any structure of text can come from an evaluated coupling, but couplings
used in PASTE statements generally deliver field-oriented strings. Some
reconciliation has been provided for in the PASTE statement field list.

If a coupling within a given PASTE statement evaluates to an odd number
of fields in relation to the number of fields in the field list, the PASTE
statement, on its last iteration, will set all unassigned fields to the
null string. (It is not an error. What will eventually happen is that
any default parameter values in the BLOCK definition will supercede.)

If the UNLESS NULL clause is included, the last iteration of the previous
case will be prevented. Also, when a coupling evaluates to nothing, the
PASTE operation will not occur at all.

coupling_name

This can be seen clearer if we use demarcation strings ($- and -$):

$-coupling_name-$

This is known as a discrete coupling fitting. As a final stage in
pattern file processing, all such discrete couplings are evaluated,
their place in the text being replaced with the text from the coupling.

Since couplings are used throughout the pattern file text, they are
considered to have a global namespace. They may not share names with
BLOCKs or BLOCK parameters.

(comments)

This can also be seen clearer with demarcation:

$-( This is a comment )-$

A pair of matching parenthesis containing any string, prefixed and suffixed
with the demarcation patterns provides for a Nopgen comment. The
comment will be ignored.

Example Pattern File

A sample pattern file follows. Three couplings named "table", "column",
and "keylist" have been defined, the demarcation patterns are set to
"$-" and "-$" respectively, and the code has been factored into three explicit
text blocks.

$-BLOCK keylist (key)-$ a list of discrete column variable

l$-table-$.$-key-$

$-END BLOCK keylist-$

$-BLOCK afterfield (field)-$ The default AFTER FIELD clause

AFTER FIELD $-field-$
IF l$-table-$.$-field-$ IS NULL THEN
ERROR $-field-$, " is empty. "
END IF # l$-table-$.$-field-$ IS NULL

$-END BLOCK afterfield-$

$-BLOCK infield (field)-$ A default infield clause

WHEN infield( $-field-$ )
CALL Pick$-field-$() RETURNING l2$-table-$.*
IF NOT int_flag THEN
LET l$-table-$.$-field-$ = l2$-table-$.$-field-$
DISPLAY BY NAME l$-table-$.$-field-$
NEXT FIELD $-field-$
END IF

$-END BLOCK infield-$

FUNCTION fEdit$-table-$( l$-table-$ )
DEFINE
l$-table-$ RECORD LIKE $-table-$.*

INPUT BY NAME
$-PASTE keylist(key) FOR EACH keyfield()-$

WITHOUT DEFAULTS

$-PASTE afterfield(field, junkfield) FOR EACH keyfield()-$

$-(junkfield is ignored by the afterfield block)-$

ON KEY (CONTROL-F)

CASE # infield( $-table-$.* )

$-PASTE infield(field) FOR EACH column()-$

END CASE # infield( $-table-$.* )

END INPUT

RETURN ( l$-table-$.* )

END FUNCTION # fSave$-table-$()

Miscellaneous Notes
Heirarchical Editing.

A pattern file is normally the only source of pattern text for Nopgen.
However, couplings may be used along with Nopgen statements to create a
heirarchical framework. The EVALUATE statement is especially helpful
in this respect.

Since the BLOCK body is recursively rescanned for Nopgen statements,
a set of source management utilities can be written to provide library
management and text inclusion. Care must be taken to prevent infinite
recursion, since it could overflow the Nopgen stack.

Delineation of coupling fields.

The default coupling-field delineation pattern is a space; changing it
is not currently supported.

Evaluation of couplings.

It is assumed that a coupling will give back meaningful pipe input. In
particular, if a fitting is generally made in unblocked portions of
pattern text, the corresponding coupling will usually evaluate to only
one contiguous (undelineated) field. If the fitting is usually made
in PARAGRAPH blocks, it will usually evaluate to multiple (space delineated)
fields.

No checking is performed to ensure that a coupling actually gives any
readable text. The default action in this case is to do nothing
for unblocked text (print the original text alone without the fittings),
and to assign defaults to block parameters (null strings if no defaults).
A PASTE operation can be prevented when the coupling returns nothing by
using the UNLESS NULL clause.

If a coupling is associated with a parameterized block, it is assumed that
the coupling will evaluate to a regular multiple of the number of block
parameters. If it evaluates to a multiple with a remainder, the remaining
parameters will be assigned the null string on the last iteration of the
PASTE statement.

Release Notes

Changing the field delineating string is not supported by the prototype.
It will be supported in a future release.

Nopgen is generally case sensitive, but certain operating systems are not;
this may cause porting problems when moving couplings across environments.

Don't use the same names for block names, couplings, and block parameters.
Nopgen doesn't support overloading of names, though it may work in some
cases.

The "fulltext" block cannot be redefined or PASTEd. It is for
referential use only.

If there is extra (trailing) input returned from a coupling, it will be
discarded rather than prepended to the next row. This was a time-driven
implementation constraint :-(.

Toward the Future

NOpGen needs a front-end. Such an interface should provide some mechanism
of full function text editing. It should also provide a hypertext type of
access to the system (inter- or infra-) dependancies. The front-end should
NOT be a character-based application; graphic-based client-server technologies
are the preferred form of implimentation. Decomposition of the front-end
services will not dictated at this point, but they should be tightly coupled
and capable of communicating information among themselves. The front-end
when considered as a system in itself, should be maintainable as a NOpGen
dependancy.

The front-end services for NOpGen should facilitate the construction of
software systems. The worker should be allowed to access concurrent views
of any and all dependancies. A single application system should itself
be considerable as a kind of dependancy, and this view should be promoted.

NOpGen also needs a back-end, to serve as a repository for its objects.
In NOpGen's case, the objects are usually considered as some form of the
system inter- or infra- dependancies mentioned above. The front-end
will make use of the repository services to provide multiple concurrent
views of a system, with features including context sensitivity, simple
keyword indexing, complex permuted indexing, and hypertext style browsing.
( For the uninformed, a permuted index is one which has more than
one ordered column of identifying keywords. )

During a typical session with the front-end, a system maintenance worker
will typically look for patterns which depend on some other portion of the
system. These patterns may be factored, distributed, or possibly even
eliminated. These are all mental activities which code maintainers
currently engage in. The important aspect to note here is that, once
identified, a dependancy can be generalized and made explicit. It can
be stored and retrieved with the repository, reviewed with the front
end, and finally, embeded and/or evaluated in the system with the NOpGen core
dependancy evaluation service. The whole of the NOpGen services should
encourage the worker to recognize prototypical information patterns within
and at the boundaries of an application.

Dependancies may be parameterized, depending on the nature of the
inter-object coupling. Whether the dependancy is only evaluated once
per application or many times, it is important that as much information
concerning its usage be presented to the worker during maintenance. It
is not enough to give a name; there should be ample access to the
parameter-interface descriptions, and parameter defaults, and intended usage.
The worker must be allowed to access as much information as they want, no
more and no less, so that they may make quick and informed decisions.

mcamiano/nopgen