Load configuration from `.ubidoc.yml` instead from `build.sbt`
Closed this issue · 2 comments
Since lots of configurations in term of tables, html tags, class to include/exclude, would be better to move out those configs from the build.sb
in order to reduce his (unnecessarily) verbosity. In order to do this, we could create a configuration file called .ubidoc.yml
.
The .ubidoc.yml
could contain the following properties:
- Header of the table and relative html tags to extract the content
- The name of the files to parse in order to build the table rows
An example of this .ubidoc.yml
could be:
tables:
- name: FooBar
header:
col1: "title"
col2: "div > p"
col3: "p"
rows: ["Class1", "Class2", "Type1", "Enum1"]
- name: FooBarBaz
header:
col1: "title"
col2: "div > p"
col3: "p"
rows: ["Class3", "Class4", "Type2"]
A possible serialization of this file could be:
final case class TableList(tables: List[Table])
final case class Table(name: String, header: Map[String, String], rows: List[String])
Note on the header
map: it might help to have a pair ("Column title" -> "HTML selector") without the need to check if table columns number matches the html selector. Finally, the map helps to get the right html selector based on the right column.
Talking with @giacomocavalieri we agreed that this approach could be more extensible and convenient.
Moreover, this approach gives us the possibility to define the order of appearance in the table of the definitions, according to the order specified in the rows
key.
I slightly changed the original description to consider the possibility of having other entities rather than only classes: each row of the table could correspond to a Class, an Enum, a case of an enum, a Type, ... rather than calling this field "classes" I'd opt for "rows" which makes clearer you are specifying an entity for each row.
Why do we need this
- It allows to create multiple distinct tables. We really need this to auto generate the tables in mdm (one for incoming events, one for the outgoing events, one for the ubiquitous language). With our current approach it would be impossible to create distinct tables
- We can specify the order with which each element can appears in a table. Right now the order is determined by the order in which each file is discovered in the file system. This method would allow to easily impose a fixed order
- It explicits what files to include in each table. Right now we simply look up for all files in a given directory. This would allow to have a much more fine-grained control on what files to include
Possible drawbacks
- One has to explicitly specify the files contained in each row. If a table has many rows one has to enumerate each one
- This is not necessarily a drawback, we have maximum control over what has to be included without relying on an underlying fixed file system structure
- However, this could be alleviated by allowing one to specify a directory where all files are parsed as rows instead of the usual file list. The argument of
rows
could either be a list of files or a single directory allowing for both a batch and a fine-grained approach
Aspects to pay attention to
If a file can not be parsed, rather than silently discarding it without notifying the user, we could:
- issue a warning
- create an empty row in the table with an explicit error message
- do both things (what I'd prefer)
I think this is crucial since one may not be aware that some elements were discarded and miss important parts of the documentation
Possibly useful resources
The yaml file most likely have a pretty straightforward structure, I think using circe-yaml
and circe
to parse it could turn out to be quite easy; this tutorial already covers most of what we could need
I propose the structure of the file could be the following:
ignore:
- "Class1"
- "dir2"
- "glob3"
tables:
- name: "Table 1"
columns:
- name: "col1"
selector: "title"
- name: "col2"
selector: "div > p"
- name: "col3"
selector: "p"
rows:
- "MyClass"
- "dir"
- "glob"
- "MyEnum"
final case class Table(name: String, columns: Seq[Column], rows: List[String])
final case class Column(name: String, selector: String)
tables
is composed of a list ofTable
objects, each one hasname
the name of the table- a list of
Column
objects, each one has 2 fields: aname
and an htmlselector
- a
rows
field withStrings
How the plugin could work
Each String
in row is used to find new files to add as rows to the table, the algorithm should be:
- For each
String
- Start in the base directory specified by the plugin key
- Try to find a file with the name of the
String
- If not found try to find a directory with the name of the
String
- If not found try to interpret the
String
as a glob and see if it matches at least one element - If there are no matches throw an exception
- Accumulate the resulting matches (a single file, a single directory, or one or more files/directories sorted in alphabetical order) in a list
Then when all the matches are accumulated in a list - Expand the directories to all the files they contain (ordering those files in alphabetical order)
- Now the list only contains valid file names, for each file
- Open the file, parse its html content, extract the columns, add them to the table
At the end - Expand the
ignore
field with the same strategy used by therows
field (first try to interpret as file, then as directory, then as glob) - Scan the starting directory recursively enumerating all the files it contains
- Remove from the list of all files the ones added to a table and the ones to ignore
- If there is any file remaining issue a warning or an exception
Noteworthy
- The rows should follow the declaration order, if there is a directory/glob its content should be alphabetically sorted
- If a file or a directory can not be found the plugin should be as loud as possible about it, I think throwing an exception could be the better option here
- There may be several errors:
- A file or a directory can not be found: the plugin should throw an exception, it means an important domain concept we needed is not actually there
- An .html file is not parsable: the plugin should throw an exception, it means an important domain concept will not be added to the table
- An .html file does not contain any the selectors specified by the columns: here we could either
- Issue a warning and ignore the file
- Issue a warning and create an empty row in the table with just the file name
- Throw an exception: if someone says he wants a row for the file
XYZ
then it should be considered an important domain concept; if the plugin can not create a row for it then it should be as loud as possible and refuse to create the table until the problem is solved
- There are leftover files that are not marked as to ignore: the plugin should at least issue a warning to notify there are some concepts that may have been ignored
- We need to decide a way to generate the row name, we could either
- Simply take the file name and change it from camel case to something more readable splitting the words
- Let the programmer decide a strategy by providing a plugin key that is a function
FilePath => String
so one can customise the name creation. The default behaviour could still be to split camel case