remarkjs/remark-math

rehype-mathjax: allow configuration of TeX input processor

MartyO256 opened this issue · 9 comments

Problem

The options for the rehype-mathjax plugin are not that of the whole MathJax instance.
Rather, they are only the options for the SVG output processor.

Hence, it is currently not possible to configure the TeX input processor, in order to, for example, enable automatic equation tagging (with MathJax configuration { tex: { tags: "ams" } }).

Expected behavior

The options parameter for the renderSvg(options) function below should be of the form { tex: { /* ... */ }, svg: { /* ... */ } } as in the MathJax documentation.
This way, both the TeX input and SVG output processors may be configured.

function renderSvg(options) {
return createRenderer(createInput(), createOutput(options))
}

Currently, those options are only that of the SVG output processor.
This could instead read something like return new Svg((options || {}).svg || {}).

function createOutput(options) {
return new Svg(options)
}

Below, the createInput() function should have an options parameter, and still have all the packages as default value.
This could instead read something like return new Tex({packages: packages, ...(options || {}).tex}).

function createInput() {
return new Tex({packages: packages})
}

One problem that could arise with these changes is that when using automatic equation numbering, the numbers could continue across multiple documents being processed using the same instance of rehype.
That is, new instances of the TeX input and SVG output processors should be used for each document being processed to prevent the numbering from continuing onto the next document.
This problem may be ignored if instances of rehype are not meant to be reused for multiple documents.

hi there!

A PR would be welcome to support tex input options. I’d prefer not to introduce a breaking change though by changing the options format in a braking way. Also: note that rehype-mathjax supports CHTML and a browser version (for loading mathjax in browsers) too

This problem may be ignored if instances of rehype are not meant to be reused for multiple documents.

Instances of rehype indeed should be reusable for multiple documents

Upon closer inspection, I think it would be more useful to be able to configure the input and output processors using the VFile data rather than the plugin options.
That is because TeX macros are most likely to be declared for specific documents (in the frontmatter of markdown files for instance).
This requires that the renderer be created in the transformer rather than the attacher.
I do not know how well this would work with the browser plugin.

It could also be worthwhile to add an option to register the assistive MathML handler as in the non-component-based example usage of MathJax.
This could be a separate issue, but adding that configuration as a plugin option would mix SVG output processor options with handler options.
I have not figured out how to use the other accessibility extensions with pre-processing.

There is an unfortunate consequence of MathJax relying on a global instance whereby handlers are registered and not automatically unregistered.
That is, if registering handlers occurs in the attacher, then they cannot be unregistered properly unless there were destructors in JS.
These handlers may only pose a problem if the rehype processor is not reused for each document, so it's not that bad.
One way to avoid this would be to register handlers in the transformer rather than the attacher, but this may bog down performance.

I worked on a different implementation of rehype-mathjax for my project.
I'm pasting it here to better illustrate what I had in mind.
A default MathJax configuration may be provided as a plugin option, which may then be overriden by the VFile data.

import deepMerge = require("lodash.defaultsdeep");

import { JSDOM } from "jsdom";

import { jsdomAdaptor } from "mathjax-full/js/adaptors/jsdomAdaptor";

import { AsciiMath } from "mathjax-full/js/input/asciimath";
import { MathML } from "mathjax-full/js/input/mathml";
import { TeX } from "mathjax-full/js/input/tex";
import { AllPackages } from "mathjax-full/js/input/tex/AllPackages";

import { CHTML } from "mathjax-full/js/output/chtml";
import { SVG } from "mathjax-full/js/output/svg";

import { mathjax as MathJax } from "mathjax-full/js/mathjax";
import { RegisterHTMLHandler } from "mathjax-full/js/handlers/html";
import { AssistiveMmlHandler } from "mathjax-full/js/a11y/assistive-mml";

import * as unified from "unified";
import visit from "unist-util-visit";
import hastFromDom from "hast-util-from-dom";
import hastToText from "hast-util-to-text";

// eslint-disable-next-line import/no-unresolved
import { Node } from "unist";

export type InputJaxType = "MathML" | "AsciiMath" | "TeX";

const inputJaxBuilderSupplier = (type: InputJaxType = "TeX") => {
  switch (type) {
    case "AsciiMath":
      return (mathjaxOptions) => new AsciiMath(mathjaxOptions?.asciimath);
    case "MathML":
      return (mathjaxOptions) => new MathML(mathjaxOptions?.mml);
    case "TeX":
      return (mathjaxOptions) =>
        new TeX({
          packages: AllPackages,
          ...mathjaxOptions?.tex,
        });
    default:
      throw new Error(`Unrecognized MathJax input format type "${type}".`);
  }
};

export type OutputJaxType = "CommonHTML" | "SVG";

const outputJaxBuilderSupplier = (type: OutputJaxType = "CommonHTML") => {
  switch (type) {
    case "SVG":
      return (mathjaxOptions) => new SVG(mathjaxOptions?.svg);
    case "CommonHTML":
      return (mathjaxOptions) =>
        new CHTML({ fontURL: "/fonts", ...mathjaxOptions?.chtml });
    default:
      throw new Error(`Unrecognized MathJax output format type "${type}".`);
  }
};

export interface RehypeMathJaxOptions {
  input: InputJaxType;
  output: OutputJaxType;
  mathjax: unknown;
  a11y: Partial<{
    assistiveMml: boolean;
  }>;
}

export const attacher: unified.Attacher<
  [Partial<RehypeMathJaxOptions>?]
> = ({ input, output, mathjax, a11y } = {}): unified.Transformer => {
  const adaptor = jsdomAdaptor(JSDOM);
  const handler = RegisterHTMLHandler(adaptor); // Handler is never unregistered
  if (a11y?.assistiveMml) AssistiveMmlHandler(handler);

  const createInputJax = inputJaxBuilderSupplier(input);
  const createOutputJax = outputJaxBuilderSupplier(output);

  return (tree, { data }) => {
    const options = deepMerge(
      {},
      (data as Record<string, unknown>)?.mathjax,
      mathjax,
    );

    const input = createInputJax(options);
    const output = createOutputJax(options);

    const document = MathJax.document("", {
      InputJax: input,
      OutputJax: output,
    });

    let context = tree;
    let found = false;

    visit(tree, "element", (node: Node) => {
      const classes =
        ((node?.properties as Record<string, unknown>)
          ?.className as string[]) ?? [];
      const inline = classes.includes("math-inline");
      const display = classes.includes("math-display");

      if (node.tagName === "head") context = node;

      if (!inline && !display) return;

      found = true;
      node.children = [
        hastFromDom(document.convert(hastToText(node), { display })),
      ];

      return visit.SKIP;
    });

    if (found)
      (context.children as Node[]).push({
        type: "element",
        tagName: "style",
        properties: {},
        children: [
          {
            value: adaptor.textContent(
              output.styleSheet(document) as HTMLElement,
            ),
            type: "text",
          },
        ],
      });
  };
};

That is because TeX macros are most likely to be declared for specific documents

What would be the benefit of using different custom macros in different files? It sounds rather confusing to reason about in my opinion define X and Y for one page, and a Z and a different Y for another page.

I do not know how well this would work with the browser plugin.

The browser plugin passes math through uncompiled, so that MathJax in a browser can deal with it. So configuration would still live, as it already does, in the browser, and be the responsibility of users.


Is your issue about passing options to the TeX input processor, or about using MathML + ASCIIMath too? The first is much easier than the latter.

What would be the benefit of using different custom macros in different files? It sounds rather confusing to reason about in my opinion define X and Y for one page, and a Z and a different Y for another page.

The benefit would be the ability to define TeX macros in a way that is similar to how you would declare them in the preamble of LaTeX documents. Admitedly, the newcommand extension already enables adding macros on the fly in the contents of the document, and having macros in the preamble raises issues as to how to merge options (replace vs. append macros for instance).

Having a MathJax configuration object as part of the VFile data was my non-breaking solution to configuring the TeX input processor, with { tex: ..., svg: ... }. However, this has the consequence of moving the MathJax document instantiation from the attacher to the transformer.

Specifically for my project, I want to implement inheriting TeX macros from other files, so I will need to be able to configure MathJax from the VFile data. I understand that this is not something most people need when using rehype-mathjax.

Is your issue about passing options to the TeX input processor, or about using MathML + ASCIIMath too? The first is much easier than the latter.

It is just about passing options to the TeX input processor. I think that part may still be useful to other users of the plugin.

Since there is no tex property to the SVG output processor options, the TeX input processor may be configured by adding a tex property to the rehype-mathjax options. In the future, it may be preferable to introduce the breaking change of having the SVG options as in the MathJax options, so with { tex: ..., svg: ... }.

As it currently stands, I find the following statement in the readme to be misleading since the options are not in the same format as that of MathJax:

All options, except when using the browser plugin, are passed to MathJax.

For now, do you want me to open a PR to add that tex property to the options? That is, rehype-mathjax would have options type RenderSVGOptions & { tex: InputTeXOptions }?

Yeah, I think tex prop in the options that are now passed to either svg or chtml, would indeed be an acceptable solution for now! And indeed: the docs should be clearer on what options is, too.
I’d appreciate a PR for that!


Specifically for my project, I want to implement inheriting TeX macros from other files, so I will need to be able to configure MathJax from the VFile data. I understand that this is not something most people need when using rehype-mathjax.

I’m fine discussing this too, although I’m bit unclear about your use case. And we can discuss this separately

tani commented

I agree with @MartyO256 . Classical LaTeX processors usually works for a file and not for a string.
Using same mathjax instance for file by file would be a reasonable option. I, however, have no idea to extend this plugin at this moment.

@wooorm @MartyO256 actually I'm a little confused about what options should look like, in the docs it seems as though it is something directly passed to MathJax, so I thought the following would be acceptable

const options = {
  tex: {
    inlineMath: [
      ['$', '$'],
      ['\\(', '\\)'],
    ],
    tags: 'ams',
  },
  svg: {
    fontCache: 'global',
  },
};

adding that to the example.js

unified()
.use(parse, { fragment: true })
.use(mathjax, options)
.use(stringify)
.process(vfile.readSync('example.html'), function (err, file) {
if (err) throw err;
console.log(String(file));
});

I got the following error

---/node_modules/mathjax-full/js/util/Options.js:143
        finally { if (e_2) throw e_2.error; }
                           ^

Error: Invalid option "tex" (no default value).

I tried with the following with a similar error

const options2 = {
  inlineMath: [
    ['$', '$'],
    ['\\(', '\\)'],
  ],
};
/node_modules/mathjax-full/js/util/Options.js:143
        finally { if (e_2) throw e_2.error; }
                           ^

Error: Invalid option "inlineMath" (no default value).

I think I must be understanding the options incorrectly, would appreciate some clarifications, thanks!

@dingram in that example.js file, the options are that of the SVG output processor. That is, the options default to the following:

{
    scale: 1,                      // global scaling factor for all expressions
    minScale: .5,                  // smallest scaling factor to use
    mtextInheritFont: false,       // true to make mtext elements use surrounding font
    merrorInheritFont: true,       // true to make merror text use surrounding font
    mathmlSpacing: false,          // true for MathML spacing rules, false for TeX rules
    skipAttributes: {},            // RFDa and other attributes NOT to copy to the output
    exFactor: .5,                  // default size of ex in em units
    displayAlign: 'center',        // default for indentalign when set to 'auto'
    displayIndent: '0',            // default for indentshift when set to 'auto'
    fontCache: 'local',            // or 'global' or 'none'
    localID: null,                 // ID to use for local font cache (for single equation processing)
    internalSpeechTitles: true,    // insert <title> tags with speech content
    titleID: 0                     // initial id number to use for aria-labeledby titles
}

Making the rehype-mathjax options match that of MathJax will require breaking changes. #54 introduces hacks to allow configuring the TeX output processor by extending the options type, but these changes have yet to be merged on the account of the fact that they are unsightly, and complexify the public interface.

I'm willing to open a PR with the breaking changes to make the options match that of MathJax.

@MartyO256 I see, thanks for doing this!