atedeg/sbt-ubiquitous-scaladoc

Load configuration from `.ubidoc.yml` instead from `build.sbt`

Closed this issue · 2 comments

Since lots of configurations in term of tables, html tags, class to include/exclude, would be better to move out those configs from the build.sb in order to reduce his (unnecessarily) verbosity. In order to do this, we could create a configuration file called .ubidoc.yml.

The .ubidoc.yml could contain the following properties:

  • Header of the table and relative html tags to extract the content
  • The name of the files to parse in order to build the table rows

An example of this .ubidoc.yml could be:

tables:
  - name: FooBar
    header:
      col1: "title"
      col2: "div > p"
      col3: "p"
    rows: ["Class1", "Class2", "Type1", "Enum1"]

  - name: FooBarBaz
    header:
      col1: "title"
      col2: "div > p"
      col3: "p"
    rows: ["Class3", "Class4", "Type2"]

A possible serialization of this file could be:

final case class TableList(tables: List[Table])
final case class Table(name: String, header: Map[String, String], rows: List[String])

Note on the header map: it might help to have a pair ("Column title" -> "HTML selector") without the need to check if table columns number matches the html selector. Finally, the map helps to get the right html selector based on the right column.

Talking with @giacomocavalieri we agreed that this approach could be more extensible and convenient.
Moreover, this approach gives us the possibility to define the order of appearance in the table of the definitions, according to the order specified in the rows key.

I slightly changed the original description to consider the possibility of having other entities rather than only classes: each row of the table could correspond to a Class, an Enum, a case of an enum, a Type, ... rather than calling this field "classes" I'd opt for "rows" which makes clearer you are specifying an entity for each row.

Why do we need this

  • It allows to create multiple distinct tables. We really need this to auto generate the tables in mdm (one for incoming events, one for the outgoing events, one for the ubiquitous language). With our current approach it would be impossible to create distinct tables
  • We can specify the order with which each element can appears in a table. Right now the order is determined by the order in which each file is discovered in the file system. This method would allow to easily impose a fixed order
  • It explicits what files to include in each table. Right now we simply look up for all files in a given directory. This would allow to have a much more fine-grained control on what files to include

Possible drawbacks

  • One has to explicitly specify the files contained in each row. If a table has many rows one has to enumerate each one
    • This is not necessarily a drawback, we have maximum control over what has to be included without relying on an underlying fixed file system structure
    • However, this could be alleviated by allowing one to specify a directory where all files are parsed as rows instead of the usual file list. The argument of rows could either be a list of files or a single directory allowing for both a batch and a fine-grained approach

Aspects to pay attention to
If a file can not be parsed, rather than silently discarding it without notifying the user, we could:

  • issue a warning
  • create an empty row in the table with an explicit error message
  • do both things (what I'd prefer)

I think this is crucial since one may not be aware that some elements were discarded and miss important parts of the documentation

Possibly useful resources
The yaml file most likely have a pretty straightforward structure, I think using circe-yaml and circe to parse it could turn out to be quite easy; this tutorial already covers most of what we could need

I propose the structure of the file could be the following:

ignore: 
  - "Class1"
  - "dir2"
  - "glob3"

tables:
  - name: "Table 1"
    columns: 
      - name: "col1"
        selector: "title"
      - name: "col2"
        selector: "div > p"
      - name: "col3"
        selector: "p"
    rows:
        - "MyClass"
        - "dir"
        - "glob"
        - "MyEnum"
final case class Table(name: String, columns: Seq[Column], rows: List[String])
final case class Column(name: String, selector: String)
  • tables is composed of a list of Table objects, each one has
    • name the name of the table
    • a list of Column objects, each one has 2 fields: a name and an html selector
    • a rows field with Strings

How the plugin could work
Each String in row is used to find new files to add as rows to the table, the algorithm should be:

  • For each String
  • Start in the base directory specified by the plugin key
  • Try to find a file with the name of the String
  • If not found try to find a directory with the name of the String
  • If not found try to interpret the String as a glob and see if it matches at least one element
  • If there are no matches throw an exception
  • Accumulate the resulting matches (a single file, a single directory, or one or more files/directories sorted in alphabetical order) in a list
    Then when all the matches are accumulated in a list
  • Expand the directories to all the files they contain (ordering those files in alphabetical order)
  • Now the list only contains valid file names, for each file
  • Open the file, parse its html content, extract the columns, add them to the table
    At the end
  • Expand the ignore field with the same strategy used by the rows field (first try to interpret as file, then as directory, then as glob)
  • Scan the starting directory recursively enumerating all the files it contains
  • Remove from the list of all files the ones added to a table and the ones to ignore
  • If there is any file remaining issue a warning or an exception

Noteworthy

  • The rows should follow the declaration order, if there is a directory/glob its content should be alphabetically sorted
  • If a file or a directory can not be found the plugin should be as loud as possible about it, I think throwing an exception could be the better option here
  • There may be several errors:
    • A file or a directory can not be found: the plugin should throw an exception, it means an important domain concept we needed is not actually there
    • An .html file is not parsable: the plugin should throw an exception, it means an important domain concept will not be added to the table
    • An .html file does not contain any the selectors specified by the columns: here we could either
      • Issue a warning and ignore the file
      • Issue a warning and create an empty row in the table with just the file name
      • Throw an exception: if someone says he wants a row for the file XYZ then it should be considered an important domain concept; if the plugin can not create a row for it then it should be as loud as possible and refuse to create the table until the problem is solved
    • There are leftover files that are not marked as to ignore: the plugin should at least issue a warning to notify there are some concepts that may have been ignored
  • We need to decide a way to generate the row name, we could either
    • Simply take the file name and change it from camel case to something more readable splitting the words
    • Let the programmer decide a strategy by providing a plugin key that is a function FilePath => String so one can customise the name creation. The default behaviour could still be to split camel case