Math equation is broken
shizidushu opened this issue · 10 comments
### Linear Models and Least Squares
Given a vector of inputs $X^T=(X_1, X_2, \ldots, X_p)$, we predict output $Y$ via the model
$$
\hat{Y} = \hat{\beta}_0 + \sum_{j=1}^p X_j \hat{\beta}_j
$$
The term $\hat{\beta}_0$ is the intercept, also known as the *bias* in machine learning. Often it is convenient to include the constant variable 1 in $X$, include $\hat{\beta_0}$ in the vector of coefficients $\hat{\beta}$, and then write the linear model in vector form as an inner product
$$
\hat{Y} = X^T \hat{\beta}
$$
where $X^T$ denotes vector or matrix transpose ($X$ being a column vector). Here we are modeling a single output, so $\hat{Y}$ is a scalar; in general $\hat{Y}$ can be a $K$-vector, in which case $\beta$ would be a $p \times K$ matrix of coefficients. In the $(p+1)$-dimensional input-output space, $(X, \hat{Y})$ represents a hyperplane. If the constant is included in $X$, then the hyperplane includes the origin and is a subspace; if not; it is an affine set cutting the $Y$-axis at the point $(0, \hat{\beta}_0)$. From now on we assume that the intercept is included in $\hat{\beta}$.
with open('temp.md', "r", encoding="utf-8") as mdFile:
newPage = page.children.add_new(PageBlock, title=mdFile.name)
txt = mdFile.read()
txt_list = re.split(pattern, txt)
for i, string in enumerate(txt_list):
if string == '':
txt_list[i] = '\n'
new_txt = ''.join(txt_list)
rendered = convert(new_txt,addLatexExtension(NotionPyRenderer))
for blockDescriptor in rendered:
uploadBlock(blockDescriptor, newPage, mdFile.name)
Hmm, it looks like the _0 ... m_
in the equation seems to have been interpreted as Markdown italics by notion-py?
That's the only difference I see between your equation and the below:
\hat{Y} = \hat{\beta}_0 + \sum_{j=1}^p X_j \hat{\beta}_j
This line should be setting title_plaintext
like CodeBlock
does, instead of title
. That should fix it
This line should be setting
title_plaintext
likeCodeBlock
does, instead oftitle
. That should fix it
@Cobertos Thanks. I get it works.
from mistletoe.block_token import BlockToken
from mistletoe.html_renderer import HTMLRenderer
from mistletoe import span_token
from mistletoe.block_token import tokenize
from md2notion.NotionPyRenderer import NotionPyRenderer
from notion.block import EquationBlock, field_map
class CustomEquationBlock(EquationBlock):
latex = field_map(
["properties", "title_plaintext"],
python_to_api=lambda x: [[x]],
api_to_python=lambda x: x[0][0],
)
_type = "equation"
class CustomNotionPyRenderer(NotionPyRenderer):
def render_block_equation(self, token):
def blockFunc(blockStr):
return {
'type': CustomEquationBlock,
'title_plaintext': blockStr #.replace('\\', '\\\\')
}
return self.renderMultipleToStringAndCombine(token.children, blockFunc)
import re
pattern = re.compile(r'( {0,3})((?:\$){2,}) *(\S*)')
class Document(BlockToken):
def __init__(self, lines):
if isinstance(lines, str):
lines = lines.splitlines(keepends=True)
else:
txt = lines.read()
txt_list = re.split(pattern, txt)
for i, string in enumerate(txt_list):
if string == '':
txt_list[i] = '\n'
lines = ''.join(txt_list)
lines = lines.splitlines(keepends=True)
lines = [line if line.endswith('\n') else '{}\n'.format(line) for line in lines]
self.footnotes = {}
global _root_node
_root_node = self
span_token._root_node = self
self.children = tokenize(lines)
span_token._root_node = None
_root_node = None
def markdown(iterable, renderer=HTMLRenderer):
"""
Output HTML with default settings.
Enables inline and block-level HTML tags.
"""
with renderer() as renderer:
return renderer.render(Document(iterable))
def convert(mdFile, notionPyRendererCls=NotionPyRenderer):
"""
Converts a mdFile into an array of NotionBlock descriptors
@param {file|string} mdFile The file handle to a markdown file, or a markdown string
@param {NotionPyRenderer} notionPyRendererCls Class inheritting from the renderer
incase you want to render the Markdown => Notion.so differently
"""
return markdown(mdFile, notionPyRendererCls)
The InlineEquation has the same problem. @Cobertos Can you have a look?
I comment this line https://github.com/miyuchina/mistletoe/blob/2cfe7446b975685f98837f9e40aaabcc0e270a79/mistletoe/core_tokens.py#L63
Then the InlineEquation works.
I'll leave it open until the fix gets in the library itself. Will need to do that soon.
As for the inline equations, notion-py is the one that actually handles uploading inline equations to Notion, added in this PR. This is because it does some special conversions to convert to Notion's expected format.
Looking at that PR, it looks like notion-py
's inline equations are formatted with double '$$'s, not single? Which seems to differ from your example, not sure if that is working for you?
In your case though, in md2notion, emphasis is handled by re-echoing out the specific markdown as notion-py will handle that later. That's going to cause issues in your case, converting _
to '*'. I will look into seeing if there's a way mistletoe will allow the exact emphasis formatting marker to carry over. That should at least preserve your _
to let notion-py handle the rest.
There is no problem related to the single $
, it has been handled well somewhere.
There is another problem that worth metioning is that if there is no blank line before the block equation, the block equation will be treated as part of TextBlock
.
I add \n
before and after the double $$
and then trim the equation block string to avoid.
import itertools
new_lines = []
for (i, line) in enumerate(lines):
new_line = [None, line, None]
if i > 0 and i < len(lines) - 2:
if line == '$$\n' and lines[i-1][0] != '\n':
new_line[0] = '\n'
if line == '$$\n' and lines[i+1][0] != '\n':
new_line[2] = '\n'
new_lines.append(new_line)
new_lines = list(itertools.chain(*new_lines))
new_lines = list(filter(lambda x: x is not None, new_lines))
new_lines = ''.join(new_lines)
lines = new_lines.splitlines(keepends=True)
lines = [line if line.endswith('\n') else '{}\n'.format(line) for line in lines]
Hope it will be handled well and may be more intelligently by the package too.
title_plaintext
is now added to master. I also added two tests. Still need to push a package update
To answer all the fixes/questions related to equation blocks current state
The InlineEquation has the same problem.
I added a test that now tests for this. What gets passed to notion-py
should be well-formed. Looks like notion-py
is parsing the inline Markdown again, so that is most likely where the issue arises.
I don't see an easy fix for notion-py
on this though...
There is no problem related to the single $, it has been handled well somewhere.
Woops, yes, I was mistaken. This works correctly. Single
There is another problem that worth metioning is that if there is no blank line before the block equation, the block equation will be treated as part of TextBlock.
Hmm, I am seeing this issue. Ideally we would support this sort of case because it's similar to how CommonMark's specification describes code fences. "A fenced code block may interrupt a paragraph, and does not require a blank line either before or after."
After some research, the issue lies with how mistletoe
s Paragraph
block read()
function works. It will specifically loos for CodeFence.start()
to break out of it's read()
loop. We would need to edit Paragraph
s read()
function to add BlockEquation.start()
in there too to fix this.
Upstream tag for the inline equation issue. Open to ideas to fix the newline thing,,, can't think of an easy way to integrate that