niklasf/python-chess

`illegal san` error message when the first game in a database contains a move in the White or Black header

johndoknjas opened this issue · 4 comments

Take the following pgn file:

[Event "?"]
[Site "?"]
[Date "2024.04.25"]
[Round "?"]
[White "White vs 1...c5"]
[Black "?"]
[Result "*"]
[ECO "A00"]
[PlyCount "0"]
[SourceVersionDate "2024.04.25"]

 *

[Event "?"]
[Site "?"]
[Date "2024.04.25"]
[Round "?"]
[White "White vs 1...c5"]
[Black "?"]
[Result "*"]
[ECO "A00"]
[PlyCount "0"]
[SourceVersionDate "2024.04.25"]

 *

When parsing over this file with chess.pgn.read_game, a message is printed as follows: "illegal san: 'c5' in...". It is referring to the "c5" in the White header for game 1. However, the same error isn't given for the second game, only the first.

Hi. I can't reproduce the issue when copying the PGN. Which python-chess version are you using? Are there maybe some invisible characters in the original source, or is there an issue with the file encoding? Maybe uploading a Zip with the original could help reproduce this.

Hi @niklasf, I produced the pgn by saving a few games in chessbase, and then doing "output to textfile". So maybe that could be why there's an issue? I just tested downloading a pgn from lichess where the first game has a move in it, and there were no problems.

On another note, today I had issues running the following code on a few chessbase pgns. E.g.:

import chess.pgn
def main() -> None:
    pgn = open("test.pgn", "r", errors="replace")
    counter = 0
    while True:
        counter += 1
        headers = chess.pgn.read_headers(pgn)
        if headers is None:
            break
        print(f"{counter}: {str(headers)}")
    pgn.close()

if __name__ == "__main__":
    main()

This produced the output:

1: Headers()
2: Headers()
3: Headers(Event='?', Site='?', Date='2024.04.25', Round='?', White='White vs 1...c5', Black='?', Result='*', ECO='A00', PlyCount='0', SourceVersionDate='2024.04.25')

Where the "3" game was the second game in my database. The first game's headers were not retrieved, and for some reason two blank headers were given instead. I've attached this pgn here - it should also reproduce the issue I mentioned in my previous comment, if read_headers is changed to read_game in the above code.
test.zip

But again, when doing this with a lichess pgn, no issues. So it seems to be a problem with the way chessbase makes its pgns.

Thanks! Looks like that file is UTF-8 with BOM, so open(..., encoding="utf-8-sig") would be appropriate.

@niklasf Ah ok, it works fine for me after doing that - thx!