ken107/read-aloud

MathML Core support

Opened this issue · 10 comments

dginev commented

Hi!

I recently discovered read-aloud and have found it quite enjoyable as a Firefox user.

I maintain a web version of the arXiv preprint server called ar5iv, which contains ~2 million advanced STEM research articles encoded as HTML5+MathML.

Here is an example page:
https://ar5iv.labs.arxiv.org/html/1910.06709

MathML is generally not supported by TTS engines without some initial preprocessing. Dedicated math AT libraries, such as MathJax+SRE or MathCAT, are commonly used to prepare the MathML as plain-text content. Is there any path to integrating read-aloud with one of those tools by default, so that users who install the read-aloud extension have immediate support for MathML equations on any website?

Thanks for the great extension!

ken107 commented

Thanks so much, this is very helpful. I've tested out SRE and able to generate speech text from the outerHTML of a <math> element. So, I'm thinking to do it as follows.

Probably include sre.js in the extension package (rather than provide conversion as a web service). sre.js is a bit large, but probably okay after compression.

When getting the innerText of a DOM node, we hide any descendant <math> elements, and insert <span>s containing the corresponding speech text in their places.

ken107 commented

It looks like some sites uses MathJax library to display MathML, and so the markup is different.

http://neuralnetworksanddeeplearning.com/chap1.html
http://www.cv.nrao.edu/~sransom/web/Ch1.html
https://openstax.org/books/university-physics-volume-1/pages/9-2-impulse-and-collisions

For these, we need to find <div class="MathJax"> and get the MathML from its data-mathml attribute.

dginev commented

@ken107 thank you for acting so quickly on the issue! This is a very pleasant surprise.

Would you happen to have an estimate when I would be able to take the new support for a test drive? I assume I should wait for the next browser extension release?

ken107 commented

If you want to test it, you can clone the repo locally and go to "chrome:extensions" and click the "load unpacked" button and choose the repo directory. That'll load a local version of the extension. Otherwise, I'm just waiting for a few more changes before releasing a new version; latest probably end of this week.

dginev commented

@ken107 this is great - I tried using it and the base SRE support is now active from the top-level play button when testing on an ar5iv page.

One aspect tripped me up while trying it out - I first attempted to select a small range with the mouse and right-click + read-aloud on the range, but I am pretty sure that is currently passed into a code path that does not have the Math preprocessing enabled. I tracked the code a little, and voicing a selection seems to use SimpleSource in document.js -- but if the selection had a <math> element, the source isn't as simple :) Maybe that is too advanced a request for the initial support release - but I wanted to report that I stumbled onto the issue.

Thanks again for making math readouts possible!

ken107 commented

Indeed when the user uses the right click menu, the browser hands us the selected text directly (hence SimpleSource), and we just read that aloud instead of going through the code path that processes the page.

The reason being in many situations the selected text cannot be gathered through the page processing code path, such as when the selected text is inside an iframe, or coming from some non-html document like a PDF viewer.

I guess we could try processing the page and then only uses the processed text if it appears to match the selected text the browser handed us.

The vast majority of read-aloud articles do not have math markup. This changeset already added a small, but negligible, mathml-detection overhead to every request. I'm hesitant to add further overhead to the page processing code path though.

dginev commented

The vast majority of read-aloud articles do not have math markup. This changeset already added a small, but negligible, mathml-detection overhead to every request. I'm hesitant to add further overhead to the page processing code path though.

That is fair, math is bound to be a very niche use case from your perspective. I would have understood if you even gated the feature with a global setting from the "Options" menu, where math is only ever checked for when it has been explicitly turned on. In that case it would probably be easier to motivate also supporting selections. But the basic support is already more than I expected on this timeline, so this is just brainstorming, not a request.

ken107 commented

Gating is a fine idea. In this case, the performance overhead is small enough (2 document.querySelector calls) to override the UX overhead of adding an option. But yeah, the overhead for handling text selection is much more, but also not that bad. Let me consider this a while.

By the way, after selecting text, you can also click the icon to read aloud, or use the shortcut keys. In those cases, we have to go through the processing step, and so MathML would be handled correctly.

Hey @ken107 , just checking in - has math support been released in the public extension? I think you had it mostly ready for an initial release last I checked, and it would be nice to have for early testing.

It should be working in the latest release (2.4.0). You can test out those links from above:

http://neuralnetworksanddeeplearning.com/chap1.html
http://www.cv.nrao.edu/~sransom/web/Ch1.html
https://openstax.org/books/university-physics-volume-1/pages/9-2-impulse-and-collisions

Some of these pages uses MathJax library which processes the MathML only after the page has finished loading, so there might be a delay before the MathML becomes available. If the user clicks read aloud too early, the MathML may not have been processed yet. Unfortunately we cannot handle this in a consistently reliable way.