The library takes an existing .xlsx
file and populates the extracted data from provided JSON
, into the designated cells, which we call templates. If the extracted data is an array - it is expanded and occupies as many cells, as the size of the data. In both dimensions.
Each template follows a specific format and structure, and it defines these crucial aspects:
- Where is the data coming from?
- How is the data extracted?
- How are the values for the cells extracted from the data?
- How are the cell styles tweaked?
All these in the context of the provided JavaScript object.
Consider the following JSON:
{
"title": "An exemplary title",
"rows": [
{
"header": "Row 1",
"data": [11, 12, 13, 14, 15]
}, {
"header": "Row 2",
"data": [21, 22, 23, 24, 25]
}, {
"header": "Row 3",
"data": [31, 32, 33, 34, 35]
}
]
}
A cell with the following content: {{ | | title }}
will be expanded into An exemplary title
after the data is filled. More information on the template format will be given in a second.
Consider this cell content: {{ | rows | header }}
. It’ll be expanded into three cells, in a column. Like this:
A | |
---|---|
1 | Row 1 |
2 | Row 2 |
3 | Row 3 |
The template defines that the data should be extracted from rows
- which results in an array of 3 objects, and the values for the target cells - 3, of course - should be extracted using the header
path inside each of the entries in the extracted data.
So far, the data was extracted from the root of the initially provided JavaScript object. However, each template can refer another one, taking the data extracted from it, as a basis for its own processing.
Consider this spreadsheet:
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
1 | {{ | rows | header }} | {{ A1 | 1 * data | }} | ||||
2 | ||||||
3 |
The template in A1
is clear - it expands into the range A1:A3
as expected. The template in B1
, however, introduces two new, interesting aspects. First, it refers another template - the one in A1
, and second - it gives strange notion of how the data is extracted - 1 * data
, instead of just data
.
The second one is simple - since the general data extraction form is <rows> * <columns>
and, for example rows
(in A1
) is a shorthand for rows * 1
, so the 1 * data
instructs the engine to expand the retrieved data horizontally (i.e. in many columns, but a single row), rather than vertically.
The first one - the reference - means that the B1
will not extract the data directly from the provided JS object, but rather - from the data already extracted from the referred template. And this happens for each data entry extracted from the referred template.
Replaying in "slow-mo", the whole data extract & placement process will look like:
- The engine processes
A1
template:- Extracts the data, resulting in an array of 3 objects (exactly the one referred by
rows
property). - From each of these object, a value is extracted using the
header
property, resulting in the following array:[“Row 1”, “Row 2”, “Row 3”]
. - The values are placed from the template’s cell (
A1
) downwards.
- Extracts the data, resulting in an array of 3 objects (exactly the one referred by
- For each of the objects in the array, extracted in [1.a], all dependent templates - in this case
B1
are processed. We’ll show the processing for only the{ “header”: “Row 1”, “data”: [...]}
object:- Data is extracted using the
data
property from the provided reference object, resulting in an array of 5 numbers. - Since there is no additional value extractor - the data is used as is, for filling the cells - in this case in the range
B1:F1
. - The same process is repeated for all three of the objects extracted in step [1.b].
- Data is extracted using the
The resulting table will look like:
A | B | C | D | E | F | |
---|---|---|---|---|---|---|
1 | Row 1 | 11 | 12 | 13 | 14 | 15 |
2 | Row 2 | 21 | 22 | 23 | 24 | 25 |
3 | Row 3 | 31 | 32 | 33 | 34 | 35 |
Great! That’s it!
One more thing... As the general syntax of the data extraction suggests - there is another, more elegant way to achieve the same result. The template in B1
could have be written in the following form: {{ | rows * data | }}
. Quite natural to write, and should be clear, by now, why it leads to the same result.
There is one more heavy lifting task that the engine does - it automatically merges cells, if the referring template turns to occupy more than one cell in the same dimension. In other words, if the template in B1
was written as {{ A1 | data | }}
, (i.e. without the 1*
part), this would instruct the engine to grow the data vertically. But, the data from A1
template, already grows vertically, so the engine will have to make the A1
cells “bigger”, i.e. occupying more rows. The result will look like this:
A | B | |
---|---|---|
1 | Row 1 | 11 |
2 | 12 | |
3 | 13 | |
4 | 14 | |
5 | 15 | |
6 | Row 2 | 21 |
7 | 22 |
... at least, these are the first 7 rows of it. Cells A1:A5
will be merged, just like A6:A10
, and A11:A15
.
Hope it is clear by now. Check this and the other examples.
The actual access to a XLSX notebook is delegated to an external library, through a so-called accessor, and there is currently one implementation, based on xlsx-populate
library. Check the API to see how a custom one can be implemented.
Considering the existing accessor implementation, the use of xlsx-datafill
is quite simple:
// Open the notebook and create the accessor for it
const wb = await XlsxPopulate.fromFileAsync(path);
const xlsxAccess = new XlsxPopulateAccess(wb, XlsxPopulate);
// Create an instance of XlsxDataFill and provide custom
// options.
const dataFill = new XlsxDataFill(xlsxAccess, {
callbacksMap: ... // Some custom handlers.
});
// Make the actual processing of `data`.
dataFill.fillData(data);
// The data in populated inside the `wb`, so it can be used.
wb.workbook().toFileAsync(...);
Note: The template definitions are overwritten by the actual data, so don’t expect to be able to run
fillData()
with different data.
Check the template options section for more information on how to configure the XlsxDataFill
instance.
Refer to the examples folder, as well as to the API documentation for more and deeper documentation.
The general format of each template is like follows:
{{ <reference cell>
| <iterators>
| <extractor>
| <padding>
| <styling> }}
Both the surrounding {{
mustache}}
brackets, and the |
separator symbol are configurable, via XlsxDataFill constructor’s options.
The meaning of each field is:
Field | Meaning |
---|---|
reference |
The address of a cell, to be used as a reference for data extraction. If empty - the provided object’s root is considered. In both cases this is referred as template data root in the rest of the description. |
iterators |
JSON paths, determining how the data should be extracted from the template data root, and it follows the form <row data path> * <column data path> . The <row path> is applied on the template data root, while the <col path works on the result of <row path> extraction. If one needs the data to grow vertically (i.e. only as a column), the form 1 * <col path> is allowed, in which case <col path> works directly on the template data root.Can be empty, if the template data root itself should be used. |
extractor |
A JSON path, determining how the value that needs to be written in the cell(s) should be extracted from the data, provided by the iterators. Can be omitted, in which case the iterators’ provided data is taken as a whole. |
padding |
A : delimited pair specifying how many cells on each direction row:column need to be added for each new entry from the extracted data. Can be omitted. |
styling |
A comma-delimited styling pairs of the format <style name>=<extractor> , setting each cell’s style (with the given name), to the value extracted from iterator’s data, using the extractor as a JSON path. E.g. fill=danger:dangerColor will set the fill style of each of the cells, to the value returned by dangerColor handler, when provided the value of danger property from each of the extracted entries. |
The JSON path, mentioned above, refers to the ability to provide a full path of properties from the template data root instead of just one property. So valid paths are, for example: rows
, genres.fiction
, data[0].name
, etc. Check the lodash’s get
helper, because this is what is used.
In order to add additional flexibility, one can reference a user-provided function (via template options) for both iterators and extractors, including those used for styling. If a JSON-path component is suffixed with :<handler name>
(e.g. data:dataFix
, or even just :dataGive
), the result of invoking the corresponding handler is used. The expected definition of such handler is:
/**
* @param {object} data The data base for the current context.
* @param {Cell} cell A target cell, if applicable.
* @returns {*} The required value.
*/
function myHandler(data, cell);
Few things need to be clarified. First, the context (i.e. this
) provided is the options object, as provided upon XlsxDataFill instantiation.
The data
object is the one that corresponds to the given context. For example, in the following template:
{{ A1 | rows:hRows * data:hData | :hNumber }}
All three handlers hRows
, hData
and hNumber
will be invoked with different data
argument - hRows
will receive whatever rows
extracted, hData
will receive, whatever hRows
returned (!), and hNumber
will be given whatever hData
returned.
In other words:
Handlers are applied after, and their result taken instead of, whatever data is extracted with the JSON path.
Another interesting thing is styling. It is quite straightforward, however. Each named style, as recognized by accessor’s setStyle()
method, is referred and the value is extracted in the usual way (JSON path + handler).
Here are the options and their defaults.
{
templateRegExp: new RegExp(/\{\{([^}]*)\}\}/),
fieldSplitter: "|",
joinText: ",",
callbacksMap: {
"": data => _.keys(data)
}
};
There are several specifics to be kept in mind:
- The iterators should resolve to arrays. It is possible for the last one (the innermost dimension) to resolve to an object, but in such case — it’ll be automatically converted to an array of object’s values. If you need the keys — append
:
to the last iterator - it’ll resolve to an empty extraction handler which has a default definition of returning the keys of the provided object. - If during value extraction, the result is an array — it is automatically joined, using the (configurable)
joinText
from the options. - No matter what part of the cell’s value the template definition occupies, at the end — the whole cell is overwritten with the resolved value(s). Since, this is not a simple find-and-replace, and the cells need duplication — it is not as trivial as expected. A possible workaround is to use handlers to append whatever is needed for each cell.
- The
xlsx-populate
library is not a dependency, because (potentially in the future) other accessors can be used, so don’t expect it to be there if you just referxlsx-datafill
.
Any help is appreciated! Check the list in the repo.
Thank you!