/exscalabur

Templated excel generation

Primary LanguageScalaApache License 2.0Apache-2.0

Scala version

Exscalabur

Exscalabur is a Scala library for creating Excel files from data and a template. Exscalabur allows excel layout and formatting to be specified by excel templates, allowing anyone with basic excel knowledge to design and modify templates.

Development

Requirements

  • Java 11 or later
  • Scala 2.12 or later
  • sbt 1.2.8 or higher

Common sbt Commands

Command Description
~compile ~ enables hot reloading
~run ~ enables hot reloading
test Runs all tests
testQuick Runs tests only affected by your latest code changes
clean Removes generated files from the target directory
update Updates external dependencies
package Creates JAR
publishLocal Publishes to local IVY repository, by default in ~/.ivy2/local

How To Use

Templates

To use Exscalabur to create Excel files, you first require an Excel Template file.

The Excel template contains formatting and layout but instead of data, there are keys in the cells that indicate where the data will be injected.

Any cell styling/cell borders made to the template will appear in the generated excel file.

An example template sheet:

template

Template Keys

A template key is a cell with the value prefixed by $KEY. or $REP.

  • A $KEY. cell will be substituted for a single piece of data.
  • A $REP. cell will be repeated with formatting for each row of data provided.

Cells that do not contain keys will be copied as is.

In the above example, this template has 6 keys:$KEY.fname, $KEY.lname, $REP.animal, $REP.weight, $KEY.conclusion,and $REP.element.

Schema Definition

Yaml

A schema definition is required, and may be provided as a YAML file with the structure:

KEYNAME1:
  dataType: oneOf("string", "double", "long", "date")
  excelType: oneOf("string", "number", "date")
KEYNAME2:
  dataType: oneOf("string", "double", "long")
  excelType: oneOf("string", "number", "date")  
# etc
  • KEYNAME is the key exactly as seen in the template.
  • dataType is the runtime type of the data provided to exscalabur.
  • excelType is the type of the cell in the final sheet.

com.carta.yaml.YamlReader is provided to read the schema definition yaml files.

For the example template Excel file above, the following is valid schema definition:

$KEY.fname:
  dataType: "string"
  excelType: "string"

$KEY.lname:
  dataType: "string"
  excelType: "string"

$KEY.conclusion:
  dataType: "string"
  excelType: "string"

$REP.animal:
  dataType: "string"
  excelType: "string"

$REP.weight:
  dataType: "double"
  excelType: "number"
  
$REP.element:
  dataType: "string"
  excelType: "string"

In-Code Map

Alternatively, this schema may be provided in-code as a Map[String, YamlEntry]. The Map's keys are the KEYNAMEs. import com.carta.yaml.YamlEntry is a provided case class representing the above YAML structure.

Data Substitution

Single-Substitution Data

To pass data in to be substituted into a $KEY. template cell, an instance of a DataCell(key: String, value: oneOf(String, Long, Double)) must be created. key is the key from the template, without the $KEY. prefix. value is the value to substitute. It's runtime type must match the schema definition.

For the above example, we may have DataCells like:

import com.carta.exscalabur.DataCell

DataCell("fname", "Joe");
DataCell("lname", "Person")
DataCell("conclusion", "EXSCALABUR")

Repeated Data Substitution

To pass data to be substituted into a repeated row, instances of DataRow are passed in, containing DataCells for each cell in the repeated row.

For the example, our DataRow instances might look like:

import com.carta.exscalabur.DataRow

DataRow().addCell("animal", "monkey").addCell("weight", 12.1)
DataRow().addCell("animal", "horse").addCell("weight", 12.2)
DataRow().addCell("element", "hydrogen")
DataRow().addCell("element", "helium")
DataRow().addCell("element", "lithium")

Writing

Exscalabur supports multi-step writes. This can be done with multiple calls to writeStaticData and writeRepeatedData.

Currently, Exscalabur only supports writing in an append-only manner. So, for the above example, the data for $KEY.fname must be provided before the data for $REP.animal is provided.

There are plans for Exscalabur to support writing to rows out of order, but this has yet to be implemented.

Exscalabur does not support sub-tables arranged horizontally

Lastly, all that's left is to write the data. To do so, create an instance of a Exscalabur object:

import com.carta.exscalabur.Exscalabur

Exscalabur(
  templates, // Iterable[String], representing paths to the template sheets
  schema, // The in-code schema representation explained above
  windowSize // number of rows in the output workbook to keep in memory at a time
)

For every sheet to be written to, create a SheetWriter:

exscalabur.getAppendOnlySheetWriter(sheetName) // sheetName is as defined in the template sheet.

Note, the sheetName used here comes from the tab in the template. It must match the template exactly

Write static data: sheetWriter.writeStaticData(staticData: Seq[DataCell])

and to write repeated data: sheetWriter.writeRepeatedData(repeatedData: Seq[DataRow])

And write to the output file: exscalabur.exportToFile(outputPath)

Doing so results in the final output sheet:

output

Supported Cell Types

Literal values

Currently, Exscalabur supports copying cells containing numeric, boolean, and string values.

Formulas

Exscalabur currently does not support copying formulas that reference cells on repeated rows.

If a formula contains a reference to a cell, there cannot be any repeated rows between that cell, and the formula cell itself

Formulas must also reference cells on the same sheet.

An example of a legal formula is:

formula usage

These formulas would not be considered legal if they contained a reference to a cell before row 31, or after 36.

Exscalabur supports copying cell formulas if all referenced cells in the formula are on the same sheet, and there are no repeated rows between any cell references and the formula itself. Formulas may not behave as expected if this requirement is not met.

Cells formula errors will not copied from the template to the output sheet.

Roadmap

  • Add support for Exscalabur to copy formulas on single-substitution template cells.