mliebelt/pgn-parser

Problem with file from chessbase

Closed this issue · 7 comments

I have a file

[Event ""]
[White "зада~~а 1"]
[Black ""]
[Site ""]
[Round ""]
[Annotator ""]
[Result "*"]
[Date "2020.07.12"]
[PlyCount "3"]
[Setup "1"]
[FEN "4r1k1/1q3ppp/p7/8/Q3r3/8/P4PPP/R3R1K1 w - - 0 1"]

1. Qxe8+ {} Rxe8 2. Rxe8# *

[Event ""]
[White "зада~~а 2"]
[Black ""]
[Site ""]
[Round ""]
[Annotator ""]
[Result "*"]
[Date "2020.07.12"]
[PlyCount "3"]
[Setup "1"]
[FEN "7k/pbp1Q1p1/1p5p/8/4B3/1P6/2P4P/5qBK b - - 0 1"]

1... Qf3+ {} 2. Bxf3 Bxf3# *

[Event ""]
[White "зада~~а 3"]
[Black ""]
[Site ""]
[Round ""]
[Annotator ""]
[Result "*"]
[Date "2020.07.12"]
[PlyCount "3"]
[Setup "1"]
[FEN "5rkn/3R2p1/4r1qp/4B3/3P4/6R1/5PP1/6K1 w - - 0 1"]

1. Rxg7+ Qxg7 2. Rxg7# *

[Event ""]
[White "зада~~а 4"]
[Black ""]
[Site ""]
[Round ""]
[Annotator ""]
[Result "*"]
[Date "2020.07.12"]
[PlyCount "3"]
[Setup "1"]
[FEN "5rk1/7p/2N3p1/3P4/4Q2q/5R1n/6P1/4RK2 b - - 0 1"]

1... Qf2+ {} 2. Rxf2 Rxf2# *

[Event ""]
[White "зада~~а 5"]
[Black ""]
[Site ""]
[Round ""]
[Annotator ""]
[Result "*"]
[Date "2020.07.12"]
[PlyCount "3"]
[Setup "1"]
[FEN "2Rr3k/3q1ppp/p7/1p2P3/2p5/Q5PP/P4P1K/8 w - - 0 1"]

1. Qf8+ {} Rxf8 2. Rxf8# *

[Event ""]
[White "зада~~а 6"]
[Black ""]
[Site ""]
[Round ""]
[Annotator ""]
[Result "*"]
[Date "2020.07.12"]
[PlyCount "5"]
[Setup "1"]
[FEN "k7/1p1q1pp1/p2p4/P2P4/4R2r/7r/KPP1Q1R1/8 b - - 0 1"]

1... Qa4+ 2. Rxa4 (2. Kb1 Rh1+ 3. Qe1 Qxe4 )Rxa4+ 3. Kb1 Rh1+ *

[Event ""]
[White "зада~~а 7"]
[Black ""]
[Site ""]
[Round ""]
[Annotator ""]
[Result "*"]
[Date "2020.07.12"]
[PlyCount "5"]
[Setup "1"]
[FEN "6k1/2r2ppp/3p2b1/q7/1p1Pp1QP/4P1P1/1P3P1K/2R5 w - - 0 1"]

1. Qc8+ Rxc8 2. Rxc8+ Qd8 3. Rxd8# *

[Event ""]
[White "Эйве"]
[Black "Ломан"]
[Site ""]
[Round ""]
[Annotator "123"]
[Result "1-0"]
[Date ""]
[PlyCount "35"]

1. Nf3 d5 2. c4 d4 3. b4 g6 4. Bb2 Bg7 5. Na3 e5 6. Nc2 $1 {Гармония
фигур. Конь на краю доски} Bg4 7. e3 $1 {Развитие
фигур в дебюте. Центр} Ne7 8. exd4 exd4 9. h3 Bxf3 {
} 10. Qxf3 c6 11. h4 O-O {} 12. h5 $1 {Гармония
фигур. Открытая линия} Re8 {} 13. O-O-O $1 {
Развитие фигур в дебюте. Рокировка. Защита
короля} a5 {} 14. hxg6 hxg6 15. Qh3 $1 {
Батарея} axb4 16. Nxd4 Bxd4 {} 17. Qh8+ $1 {
Рентген} Bxh8 18. Rxh8# 1-0

I am trying to use pgn-parser to parse it, but have an error
`Expected "*", "0-1", "1-0", "1/2-1/2", ";", "O-O", "O-O-O", "[", "x", "{", [R,N,B,Q,K,P], [a-h], end of input, integer, or whitespace but "" found.

After uploading it to lichess and downloading a study from there - pgn-parser works well

Any ideas ?

Not at the moment. I will do some analysis of the file uploaded to lichess, and the download from there, to see if I can spot the differences. After that, I will comment on that then.

Have you tried any one of them alone? So that we would know which one is the one that is not working?

Ok, I found a pattern. When I copy the individual PGN into https://mliebelt.github.io/PgnViewerJS/config/config.html (the PGN field), the one with {} (empty comments) are failing.

The line innerComment in the grammar expects something inside the comment, so an empty comment is not allowed. I would like to have that in the grammar, and the result should then be to have no comment at all.

Fixed it by allowing empty comments in the grammar (which are just ignored then).

still not working, maybe the problem is somewhere else ?

Still not working is not very exact. What I do is:

So seems to be that not the empty comment, but something else was wrong.


Checked it again, there seems to be an unvisible character in your first game at the beginning. Copying the game, and pasting it into a hex editor shows something strange ... When I delete in your example the first characters, and type them again, everything is working. I have no idea what that is, and why it is not shown, but moved over when copied and pasted. Have to find a way to filter those characters.

I found out the reason. ChessBase seems to have the so called BOM (Byte Order Mark) at the beginning of the PGN output. This is shown in some editors (Vim) then as <feff>. The parser could not do anything with the BOM, and spit out an error. I will change the grammar to ignore the BOM on the beginning of the input for games, game, tags and pgn which are the 4 allowed start rules.

@shooter01 Could you check that your problem is solved?

Found https://unicode.org/faq/utf_bom.html#bom1 as reference that explains BOM in detail.