misohena/el-easydraw

comma/whitespace separators in svg path descriptors arguments are optional

Closed this issue · 5 comments

In the SVG Paths specification it is stated that the comma-wsp separators in arguments of path commands are optional.

For an example see the question mark in

moveto-argument-sequence:
    coordinate-pair
    | coordinate-pair comma-wsp? lineto-argument-sequence

So the path descriptor

ZM10 20.1L.1-2e+1 20e1-5e-1

is legal and should be parsed as

((Z) (M 10.0 20.1) (L 0.1 -20.0 200.0 -0.5))

I became aware of this issue when loading svg images generated by dvisvgm --no-fonts=1 with edraw.

(Translated Message)
Hello Tobias Zawada.

I knew that comma-wsp was an option, but I didn't know that it could be separated by signs. I misunderstood.
When I looked under the BNF, there was an example "M 100-200" written there! I laughed. The mistake of only reading the BNF and code and not reading the text written in English is similar to the previous issue.

Well, I saw the pull request. I think there are generally no problems with the behavior, but there are a few things I would like to check.

  1. The last edraw-path-d-wsp of edraw-path-d-command has been changed to edraw-path-d-comma-wsp. Is this intentional? Grammarly it is wsp*.
    (edraw-path-d-parse "M0 0,L100 0 0 100") => Error or ((M 0.0 0.0) (L 100.0 0.0 0.0 100.0))
  2. Isn't it unnecessary to skip comma-wsp in the edraw-path-d-split-numbers-str function? string-match searches from POS and skips to the part that matches /number/. I think that the edraw-path-d-command guarantees that there are no unnecessary items in the skipped location. Actually it seems to work even if I remove (when (eq pos (string-match edraw-path-d-comma-wsp numbers-str pos)) (setq pos (match-end 0))) .
    (edraw-path-d-split-numbers-str ".1-2e+1 20e1-5e-1+100.0.5") => (".1" "-2e+1" "20e1" "-5e- 1" "+100.0" ".5")
  3. I noticed while testing 2 that it is also possible to write something like 100.0.5 (an example is also included in the specification!). It seems correct to be interpreted as 100.0 0.5. edraw-path-d-split-numbers-str is fine, but edraw-path-d-command does not match. Not only "[+-] edraw-path-d-abs-number" but also a pattern starting with "." is required. What do you think?
  • number:

    • [+-]? [0-9]+ : sign? integer-constant
    • [+-]? [0-9]* [.] [0-9]+ ([eE] [+-]? [0-9]+)? : sign? (digit-sequence? "." digit -sequence) exponent?
    • [+-]? [0-9]+ [.] ([eE] [+-]? [0-9]+)? : sign? (digit-sequence ".") exponent?
    • [+-]? [0-9]+ ([eE] [+-]? [0-9]+) : sign? digit-sequence exponent
  • Possible first characters: + - 0-9 .

  • Possible trailing characters: 0-9 .

There seems to be no problem with the macro's debug declaration. I've only recently started using Edebug more frequently. I will study it.

(Original Message)
TobiasZawadaさん、こんにちは。

comma-wspがオプションなのは知っていましたが符号で区切れることは知りませんでした。誤解していました。
BNFの下を見ると "M 100-200"という例がちゃんと書いてありますね! 笑ってしまいました。BNFやコードを読んで英語で書かれている文章を読まない失敗は以前のIssueと似ています。

さて、プルリクエスト拝見しました。概ね動作に問題は無いと思いますが、いくつか確認させていただきたいところがあるのですがよろしいでしょうか:

  1. edraw-path-d-commandの最後のedraw-path-d-wspがedraw-path-d-comma-wspになっていますが意図的でしょうか? 文法的にはwsp*です。
    (edraw-path-d-parse "M0 0,L100 0 0 100") => Error or ((M 0.0 0.0) (L 100.0 0.0 0.0 100.0))
  2. edraw-path-d-split-numbers-str関数内のcomma-wspの読み飛ばしは不要ではないでしょうか? string-matchはPOSから検索して/number/にマッチする所までスキップするので。スキップした場所に余計なものが入っていないことはedraw-path-d-commandで保証されていると思います。実際に(when (eq pos (string-match edraw-path-d-comma-wsp numbers-str pos)) (setq pos (match-end 0)))を削除しても動作しているように見えます。
    (edraw-path-d-split-numbers-str ".1-2e+1 20e1-5e-1+100.0.5") => (".1" "-2e+1" "20e1" "-5e-1" "+100.0" ".5")
  3. 2をテストしていて気が付いたのですが、100.0.5のような書き方も出来るのですね(これも仕様書に例が書いてあります!)。100.0 0.5と解釈されるのが正しいようです。edraw-path-d-split-numbers-strは問題ありませんが、edraw-path-d-commandの方で跳ねられてしまいます。[+-] edraw-path-d-abs-number だけでなく .で始まるパターンも追加する必要がありますが、どうしましょうか。
  • number:

    • [+-]? [0-9]+ : sign? integer-constant
    • [+-]? [0-9]* [.] [0-9]+ ([eE] [+-]? [0-9]+)? : sign? (digit-sequence? "." digit-sequence) exponent?
    • [+-]? [0-9]+ [.] ([eE] [+-]? [0-9]+)? : sign? (digit-sequence ".") exponent?
    • [+-]? [0-9]+ ([eE] [+-]? [0-9]+) : sign? digit-sequence exponent
  • 先頭に来る可能性がある文字: + - 0-9 .

  • 末尾に来る可能性がある文字: 0-9 .

マクロのdebug宣言はそのまま取りこみます。私は最近ようやくEdebugの使用頻度が増えてきたところです。勉強させていただきます。

Oops, the following part

   "\\(" edraw-path-d-number ;;(2) command arguments
   "\\(?:" edraw-path-d-comma-wsp edraw-path-d-number "\\|[+-]" edraw-path-d-abs-number "\\)*\\)" "\\)?"

If you simply write it like this...

   "\\(" edraw-path-d-number ;;(2) command arguments
   "\\(?:" edraw-path-d-comma-wsp "?" edraw-path-d-number "\\)*\\)" "\\)?"

Is there a problem?

(let ((str "10.20+30"))
  (when (string-match (format "\\(%s\\)%s?\\(%s\\)" edraw-path-d-number edraw-path-d-comma-wsp edraw-path-d-number) str)
    (list (match-string 1 str) (match-string 2 str))))

Dear Kouhei-san,
everything you say is fine.
I intentionally checked the option that you can commit to the pull-request. So, in principle you could change it yourself.
I'll try to adopt all your proposals and do another commit to save you time.

The edraw-path-d-comma-wsp-thing was really a mistake. Good that you did find it. Thanks.

I am just wondering whether the final edraw-path-d-comma-wsp in edraw-path-d-command is really necessary.

You wrote:

I noticed while testing 2 that it is also possible to write something like 100.0.5 (an example is also included in the specification!). It seems correct to be interpreted as 100.0 0.5. edraw-path-d-split-numbers-str is fine, but edraw-path-d-command does not match. Not only "[+-] edraw-path-d-abs-number" but also a pattern starting with "." is required. What do you think?

Just a note:

There is a potential ambiguity: 10.5 Is it 1 0.5 or 10 .5? Currently, it is read as 10 .5 since the match for numbers is greedy. This is the right way of parsing it according to the very last remark in the specification.

Citation:

The processing of the BNF must consume as much of a given BNF production as possible, stopping at the point when a character is encountered which no longer satisfies the production. Thus, in the string "M 100-200", the first coordinate for the "moveto" consumes the characters "100" and stops upon encountering the minus sign because the minus sign cannot follow a digit in the production of a "coordinate". The result is that the first coordinate will be "100" and the second coordinate will be "-200".

Similarly, for the string "M 0.6.5", the first coordinate of the "moveto" consumes the characters "0.6" and stops upon encountering the second decimal point because the production of a "coordinate" only allows one decimal point. The result is that the first coordinate will be "0.6" and the second coordinate will be ".5".

@TobiasZawada
(Translated Message)

since the match for numbers is greedy.

that's right. It's "greedy" so it's okay.

I thought of the possibility that the illegal string 11e22e33 could be interpreted as 11e2 2e33, but there are other cases where the error cannot be detected, so that's another issue.
This area is more difficult than I expected, and I may fix it again someday.
Please let me know if you find anything else strange.
Thank you for letting me know and for fixing it.

(Original Message)

There is a potential ambiguity: 10.5 Is it 1 0.5 or 10 .5? Currently, it is read as 10 .5 since the match for numbers is greedy.

その通り。「greedy」だから大丈夫ですよね。

私は11e22e33という不法な文字列が 11e2 2e33 と解釈される可能性を思いついたのですが、エラーを検出できないケースは他にもあるので、それはまた別の問題ということで。
この辺りは思っていた以上に難しく、そのうちまた修正するかもしれません。
また何かおかしな所を見つけたら教えてください。
教えてくれて直してくれてありがとうございました。