physikerwelt/texvcinfo

remove operators from the identifier list

Closed this issue · 7 comments

for instance

curl -X POST --header 'Content-Type: application/x-www-form-urlencoded' --header 'Accept: application/json' -d 'q=%5Cunderbrace%7Bu_1(%5Cmathbf%7Bx%7D%2Cz_1)%3Dv_1%2B%5Cdot%7Bu%7D_x%7D_%7B%5Ctext%7BBy%20definition%20of%20%7Dv_1%7D%3D%5Coverbrace%7B-%5Cfrac%7B%5Cpartial%20V_x%7D%7B%5Cpartial%20%5Cmathbf%7Bx%7D%7Dg_x(%5Cmathbf%7Bx%7D)-k_1(%5Cunderbrace%7Bz_1-u_x(%5Cmathbf%7Bx%7D)%7D_%7Be_1%7D)%7D%5E%7Bv_1%7D%20%5C%2C%20%2B%20%5C%2C%20%5Coverbrace%7B%5Cfrac%7B%5Cpartial%20u_x%7D%7B%5Cpartial%20%5Cmathbf%7Bx%7D%7D(%5Cunderbrace%7Bf_x(%5Cmathbf%7Bx%7D)%2Bg_x(%5Cmathbf%7Bx%7D)z_1%7D_%7B%5Cdot%7B%5Cmathbf%7Bx%7D%7D%20%5Ctext%7B%20(i.e.%2C%20%7D%20%5Cfrac%7B%5Coperatorname%7Bd%7D%5Cmathbf%7Bx%7D%7D%7B%5Coperatorname%7Bd%7Dt%7D%20%5Ctext%7B)%7D%7D)%7D%5E%7B%5Cdot%7Bu%7D_x%20%5Ctext%7B%20(i.e.%2C%20%7D%20%5Cfrac%7B%20%5Coperatorname%7Bd%7Du_x%20%7D%7B%5Coperatorname%7Bd%7Dt%7D%20%5Ctext%7B)%7D%7D' 'https://en.wikipedia.org/api/rest_v1/media/math/check/tex'

includes 'd'

"checked": "\\underbrace {u_{1}(\\mathbf {x} ,z_{1})=v_{1}+{\\dot {u}}_{x}} _{{\\text{By definition of }}v_{1}}=\\overbrace {-{\\frac {\\partial V_{x}}{\\partial \\mathbf {x} }}g_{x}(\\mathbf {x} )-k_{1}(\\underbrace {z_{1}-u_{x}(\\mathbf {x} )} _{e_{1}})} ^{v_{1}}\\,+\\,\\overbrace {{\\frac {\\partial u_{x}}{\\partial \\mathbf {x} }}(\\underbrace {f_{x}(\\mathbf {x} )+g_{x}(\\mathbf {x} )z_{1}} _{{\\dot {\\mathbf {x} }}{\\text{ (i.e., }}{\\frac {\\operatorname {d} \\mathbf {x} }{\\operatorname {d} t}}{\\text{)}}})} ^{{\\dot {u}}_{x}{\\text{ (i.e., }}{\\frac {\\operatorname {d} u_{x}}{\\operatorname {d} t}}{\\text{)}}}",

possible test

{\frac {\operatorname {d} u_{x}}{\operatorname {d} t}}

should result in

["u_{x}","t"]

see also

ffb11eb

credits

@leokraemer

@leokraemer Could you be so kind and create a json file with the 100 test formulae the actual identifiers and the false positives like for instance:

[
  {
    "qID": "1",
    "math_inputtex": "W(2, k) > 2^k/k^\\varepsilon",
    "identifier": [
      "W",
      "k",
      "\\varepsilon"
    ]
  },
  {
    "qID": "36",
    "math_inputtex": "n = \\prod_{i=1}^r p_i^{a_i}",
    "identifier": [
      "n",
      "i",
      "r",
      "p_{i}",
      "a_{i}"
    ],
    "fp": [
      "p"
    ],
    "fn": [
      "p_{i}"
    ]
  }
]

That way, we can make sure that we don't break anything else while fixing the bug above.

It seems that I already started workin on this problem on the WIP branch
d321963

the patch seems to work.
https://github.com/physikerwelt/texvcinfo/tree/wip
However, I'd like to get the new test into effect before publishing a new version.

Hope that suits you. The ordering of the entries is different though, if that is a problem tell me.
extraxredIdentifiers.txt

Just noticed that that throws the test off. I'll provide a pull-request with the functional test soon.

Added the testcase with the data as it was extracted. See #6
Also I had to adjust the ordering for two of the identifier entries to one that actually fits the formula. See commit
The test fails at one point where it recognizes \infty as identifier.

Edit: also a previous test specifically asks for \infty to be extracted as an identifier. So either the gold is wrong or the testcase

@leokraemer thank you very much for your contribution. Really nice work.