Work in Progress - status: functional, but limited testing
This repository contains a static site generator for TEI Publisher. It can basically create a static version of a website by pre-generating all content. It does so by traversing the site's content via TEI Publishers public API, transforming all documents via the associated ODDs and storing the output into the file system. The result is a website without dynamic content: neither eXist-db nor TEI Publisher are required.
Obviously the generated website will lack some of the functionality, which requires a database backend, in particular:
- simple client-side search only
- no facetted browsing
On the upside, the resulting HTML files can be hosted on any webserver at small or no cost (e.g. using github pages). Most web components and page layouts will still work as before. A static site is thus a viable option for small editions with a strong focus on the text presentation and requiring less advanced features.
The generator is written in Python and requires Python 3. Until TEI Publisher 8 is released, you also need a development build of TEI Publisher (master branch).
- clone the repository
- install dependencies:
pip3 install -r requirements.txt
To recursively fetch an entire site, simply run
python3 -m tpgen build
You can also specify a different configuration with
python3 -m tpgen build -c guidelines.yml
The build
command includes the following tasks (if defined in the configuration):
- fetch all assets (if any) and store them into the configured output directory
- if enabled: recursively scan the root data collection of the application. This will store the information to be displayed in the document browser, which - by default - is the main entry point into a TEI Publisher application.
- traverse and download all documents found during the collection scan by following links from the collection listing
- fetch additional pages as defined in the pages section. Those are pages which do not directly correspond to a single TEI document listed in the document browser and therefore won't be processed by step 3.
For testing purposes you can also call steps 2 to 4 separately using the following commands:
python3 -m tpgen collection -r
Without the -r|--recursive
option, only the collection listing for the document browser will be fetched, not the content of the documents linked from it. You can optionally specify a relative path to the root collection to fetch, e.g. playground
if you only want documents from TEI Publisher's playground collection.
python3 -m tpgen document test/F-rom.xml
python3 -m tpgen pages
python3 -m tpgen assets
To see the result you can launch the built-in webserver of Python:
python3 -m tpgen serve --port 8001
The main collection task traverses the collection hierarchy recursively, downloading the HTML view for each page of documents to show. The retrieved content is stored into static/collections
. It then inspects the HTML to collect the documents to be fetched.
For each document, the generator performs the following operations:
- ask the server which HTML template and view mode ('div', 'page', 'single') should be used for the particular document
- create an output directory below
static
, reflecting the relative path to the document in TEI Publisher. For example, output fortest/graves6.xml
will be stored intostatic/test/graves6.xml/
. - try to find an HTML template with the same name in
templates
. If there is no local correspondance, it falls back to the default template:view.html
. Expand the template and store the result asindex.html
into the output directory. - check if there is a configuration for the named template in
config.yml
, listing the different views to fetch data for - walk through all pages of the document (as a user would do), downloading their HTML content to the output folder
For some use cases, traversing a collection may not be the correct approach, e.g. if you use different means of navigating the edition. In this case, collection scanning can be disabled (collection: false
in the config) and you can instead directly define the pages to be fetched (see below).
Because the result should be a static website, the HTML templates used are necessarily different than the ones within TEI Publisher, though it should be easy enough to copy/paste and then modify the relevant bits. The static templates use the Jinja templating framework.
The existing templates in templates
have been directly copied from TEI Publisher and then modified to match the different templating framework. When copying HTML, it is important to pay attention to the following caveats:
- while the HTML templates in TEI Publisher must be valid XHTML, the static templates are HTML5. You should thus not use closed empty elements. For example,
<pb-param name="..." value="..."/>
should be changed to<pb-param name="..." value="..."></pb-param>
. Also, auto-closing HTML elements like<link>
should not be closed with</link>
- add an additional attribute
static
to anypb-view
andpb-load
web component, instructing the components to rewrite URLs in order to retrieve content from a static server
The different tasks can be configured via a YAML configuration file (default: config.yml
).
On top this defines various variables, which will be passed on to the templating system. You can add your own variables here and use them in your templates.
Important variables are:
Variable | Description |
---|---|
title (required) | general title to be shown to the user if no other title is provided |
name | short name to be displayed when the title would not fit |
remote | Base URL of the TEI Publisher instance to fetch data from |
context | the prefix path under which the static content will be available. Use the empty string ("") if the content should be made available under the root context. |
cdn | the CDN host to use for loading the TEI Publisher web component library |
components | version of the TEI Publisher web component library to use |
The templates
section defines the data to be fetched for a given HTML template. A template may include more than one view on the content (i.e. multiple pb-view
or pb-load
components). You can thus define a series of different views in data
, each using a different configuration, corresponding to the HTTP parameters to be sent with the request. Whenever the template is used (e.g. to display a given TEI document), the generator walks through the list of views. We distinguish two kinds of views:
- static views which stay the same for all pages of a document: they simply require a link pointing to the static data
- dynamic views which support pagination: in this case the generator walks through the pages as a user would and stores their content each in a separate output file
For example, take the documentation.html
template configuration:
documentation.html:
data:
main:
breadcrumbs:
user.mode: breadcrumbs
toc.html: "{{remote}}api/document/{{doc}}/contents?target=transcription&icons=false"
documentation.css: "{{remote}}templates/pages/documentation.css"
The main text view is dynamic, but does not require additional parameters, which are thus left empty. The generator will contact the server-side endpoint and ask for the first page of the currently processed document. It then walks through all subsequent pages until it reaches the end.
The page also includes a pb-view
used for displaying breadcrumbs, which again is dynamic, but - contrary to the main view - requires an additional parameter user.mode=breadcrumbs
. This will thus be added when walking through the pages.
toc.html
is a static view for the table of contents (loaded by a pb-load
in the page). It only needs to be retrieved once for the document and is stored into doc.html
. Finally, we download some additional CSS and store it as well.
You can use any of the variables declared in the variables
section of config.yml
as well as the variables doc
, odd
and view
, which are set to the corresponding values reported by the server for the current document. Additional per-template variables can also be defined:
documentation.html:
variables:
title: "TEI Publisher Documentation"
data:
...
This section defines pages which would not be found by traversing the collection hierarchy. This may include secondary documents like "about" pages, project documentation etc., or other views on the data like a listing of people, places, abbreviations or a bibliography.
The key of each entry in the pages section defines the output path where the fetched data will be stored. The value is an object. It must at least reference an HTML template and either a path to a single TEI document (doc
) or an API endpoint returning a sequence of items to be processed (sequence
).
The generator will look up the specified template in the templates
section and retrieve the views there. If a single document was specified (via doc
), the template will be instantiated once for the given document. If a sequence is given (via sequence
), the generator expects an URL, which it will contact to retrieve a sequence of items. Each item should be an object defining one or more properties to be passed to the template as variables.
The template is called for each item in the sequence, resulting in a number of subdirectories in the output folder, whose name is derived from the output
property.
For example, guidelines.yml
, which will result in a static version of the TEI Guidelines app, defines the following:
pages:
"":
template: guidelines_start.html
doc: p5.xml
"p5.xml":
template: guidelines.html
doc: p5.xml
"ref":
template: guidelines_ref.html
sequence: "api/idents"
output: "ref/{{ident}}"
The first mapping states that the template guidelines_start.html
should be used as the entry page to the website. A single TEI document (p5.xml
) is used as input. The second path, /p5.xml
, gets generated based on the same input document, but using a different template. In each case, the generator then checks the template
section for a template specification describing the views to retrieve.
The third entry establishes a slightly more complex mapping: instead of outputting a single page, it generates a sequence of different pages based on the information returned by the API endpoint referenced in sequence
. This endpoint is expected to return a JSON array. Each element in the array should be an object, defining parameter mappings.
For each parameter mapping, the HTML template is instantiated once with the additional parameters and any views it defines are retrieved. The resulting content is stored into the subdirectory path given by the output
variable. As you can see above, we use the parameter {{ident}}
as name of the final directory. This is a parameter returned by the endpoint we called to get the sequence (it will correspond to a TEI element or class name).
This section defines static assets which should be fetched or copied into an output directory. The output directory is specified as the key of each entry and the files to be fetched as a list.
-
remote resources should be specified with a full URI. They will be retrieved and directly stored into the output directory.
-
local resources should either be
- a file path, which may be a glob expression (with wildcards) to potentially match multiple files
- an object with properties
in
andout
, whereout
denotes the file name of the output file
Text files (Javascript, CSS, HTML, XML) will be treated as templates, i.e. expanded by the templating system before they are stored.
If present, the generator will also create a service worker, providing further instructions to the browser about caching strategies to be used for the different resources. Most important, you want to precache certain assets, which are needed for every page, e.g. javascript or CSS files.
The service worker also increases offline usability: if users get disconnected from the network, they will still be able to browse any content visited before.
Within the worker section, the precache
property should contain a list of assets to be added to the browser cache upfront. One can use glob expressions to add e.g. all files ending with *.js
at once.
The general approach to take when converting a working HTML template from TEI Publisher is as follows:
- create a static template by copying/pasting the relevant bits from your TEI Publisher template, obeying the rules given above (see 'Templates')
- in
config.yml
add an entry belowtemplates
with the name of your template - for each
pb-view
in the template add a view configuration as shown fordocumentation.html
above:- every
pb-param
should be defined as a request param, prefixed byuser.
- if your
pb-view
uses a different ODD or view than the one defined as default for the document, add it as parameter as well
- every
- for each
pb-load
, create an additional mapping using the original URL as value and an arbitrary filename as key (e.g.toc.html
), then use this key as the@url
attribute ofpb-param