compare speed : Nim vs Pypy

Question

compare speed : Nim vs Pypy

Closed this issue a year ago · 8 comments

i used Nim to build some own simple chess engine by rewriting a Python script .. reason is that a Nim source can be compiled into a binary, which i expect to be (much) faster than Python. Btw. in the recent past we discussed this also in #2 but i want to elaborate on this.

you stated that running a Python script can also be done by Pypy .. indeed, this runs much faster .. but my Nim script is a bit slower (!?) then the Pypy version, eg. evaluating a certain position with the Nim binary (compiled with -d:danger and --passC:"-flto") takes about 300 seconds, compared to Pypy which does it in about 240 seconds .. i did not expect that .. i'm new to Nim, but i like it : syntax and functions are rather similar to Python .. to make the binary faster i also used your option --passC:"-flto", which i didn't know .. without this option, my example duration was 330 seconds ..

my Nim script is a first version .. maybe i can boost performance by using variable types like Int32 or Int64 at certain points .. upto now i only use simply Int .. same for floats, etc.

how can Pypy be faster than a compiled Nim version ? I thought all Nim code is converted to C-like structures, which is fastest, isn't it ? Maybe you need to see my code to answer this question - i can send it to you.

[ i'm on Xubuntu 22.04 64-bit ]

Answer 1 · 2023-08-25T12:47:58.000Z

Yes, it's definitely possible that PyPy is faster than a native binary! Just-in-time (JIT) compilation watches the code as it runs and keeps optimizing it. So I suppose when you analyze a chess position for a longer time (e.g. minutes), the JIT effect really kicks in and speeds up the PyPy version! Whereas if you use something closer to the NTC defaults and only search maybe for a second or less, Nim and PyPy are pretty close in NPS speed, or Nim is slightly faster.

I couldn't quite believe it the first time I saw it, either! We always tend to think of C as the fastest language and Nim is just a more convenient way to program in C, so it is surprising.

I don't really know much about further speeding up the Nim code. Perhaps int types or other C compiler switches might help, who knows. Maybe also try a different garbage collector in Nim? Nim's performance tricks are not very well documented AFAIK, so it's learning by doing…

Answer 2 · 2023-08-25T13:15:21.000Z

i'm glad i asked you this question, and thanks for the answer .. now i know the resulting evaluation times are "normal", not due to some fault on my side .. indeed it's amazing JIT has such good results regarding speed .. i really appreciate your explanation and info!

btw. to speed things up, i also did a version in Julia, another language which resembles Python and promises high speed, but it seems hard to do proper code optimalisation for best performance, Julia is rather complicated concerning all this .. besides that, Julia scripts can not be compiled into binaries AFAIK ..

Answer 3 · 2023-08-25T15:07:47.000Z

I think PyTuroChamp is almost the perfect demo for PyPy because it gets such a huge speedup that rivals native code. Sadly not all Python software is like this. PTC is computationally expensive, so PyPy works wonders, but for network-centric software that needs throughput and many simultaneous connections, PyPy is probably not remotely as effective. But if you write a computationally expensive program in Python that cannot be accelerated by NumPy or other means, PyPy is pretty magical indeed!

Yes, Julia. I have taken a look in the past. Supposedly they have been improving the long startup times (a.k.a. "time to first plot"), but I think part of the reason I don't like it is that the project seems to be too profit-driven. The creators clearly want to make money from their users, so I get bad flashbacks to the 1990s and commercial Fortran compilers like Absoft. I think Open Source programming languages should be completely free, because otherwise there is a risk that the creators will first create lock-in of their users and then charge them. So for that reason I find Julia dubious. But also as a language, I don't think it is a huge improvement over Python/Fortran/C/C++/Octave and other truly Open Source options. So I am not too unhappy to see it mostly failed in the market…

Answer 4 · 2023-08-25T22:52:10.000Z

great background Julia info! I mainly did web coding, using JS and PHP .. later learned Python myself, also combined with TKinter to build GUIs .. i like to keep things simple, being old-school but following newest tech like NodeJS and many more .. you mention Numpy, i never did any code with it, but i'm interested! Looks good for recursive chess engine programming .. did you ever use Numpy? Can / should it be compiled to reach highest speeds?

there's another thing, regarding speed : you write "PTC is computationally expensive" .. my own script is based on the ToyFish code, see https://github.com/maksimKorzh/toyfish , which i tru to complete, all for learning .. upto now i managed to create many shortcomings and my imagined functionalities .. a next step will be implementation of the Turing rules, replacing the eval by PST tables and material count .. but i wonder : how good is the pruning ? I imagine if pruning is "strong", higher depths will be reached much faster - i see this happen with many decent engines .. only the simple ones reach upto about depth 7 when playing a (say 10 minute) game in CuteChess..

Can you say something about pruning regarding speed, esp. when implementing the Turing Rules ? It could help me to decide my approach for my next version.

Answer 5 · 2023-08-26T12:24:44.000Z

NumPy is mainly for scientific code, it's probably not useful for a chess engine. I was just using it as a general example of ways to speed up Python code, which depends on what the code is doing. For chess, bitboards are the fastest way I think, but a bitboard implementation is about 20 kB in Python with lots of incomprehensible, magical constants, so I preferred not to use it for PTC.

As for pruning, PTC/NTC do not use it, so I don't know much about it. I am sure the chess programming wiki has an article about it though.

And while we are on the topic of toy engines (even more of a toy than PTC already is), I contributed a Python toy engine to Rosetta Code a while ago: https://rosettacode.org/wiki/Chess_player#Library:_python-chess It only searches two plies deep, so that means you can use loops instead of recursion. It can defeat a random mover, so it's not totally useless! :-)

Answer 6 · 2023-08-26T13:18:59.000Z

..the chess programming wiki has an article about it..

thanks, i will investigate further.

Answer 7 · 2023-08-26T13:24:50.000Z

https://www.chessprogramming.org/Pruning

According to the article, Alpha-Beta is considered pruning too, so I guess in that case, PTC and NTC do have pruning…

Answer 8 · 2023-08-26T14:08:58.000Z

yes, i just wanted to mention something like that, but then i thought maybe it's better not to elaborate on pruning here, because it's beside the Issue - i like to discuss more though ..

to get a picture of my concerns, here's a code snippet from my script, almost identical to the original ToyFish code :

proc rtEvaluate(): int =
  var
    score, ix, rt: int
    piece: string

  for square in 0 .. (g.Cmb-1):
    piece = g.Gboard[square]
    if not (piece[0] in " .\n"):
      ix = g.aPCix.find(piece[0])
      score += g.weight[ix]
      if isLowerAscii(piece[0]):
        score -= g.pst[square]
      if isUpperAscii(piece[0]):
        score += g.pst[square]

  rt = if (g.Gside == 1): -score else: score
  return rt



proc rtSearch(depth: int): int =

  if depth == 0:
    return rtEvaluate()

  var
    best_score: int = -10000
    best_source: int = -1
    best_target: int = -1
    move_list: seq[g.Tmove]
    score: int

  move_list.add(rtGenerateMoves())

  if len(move_list) == 0:
    return 10000

  for dMove in move_list:
    fnMakeMove(dMove, false, true)
    score = -rtSearch(depth-1) # <<<<<<<< RECURSIVE SEARCH
    fnTakeBack(dMove)
    if score > best_score:
      best_score = score
      best_source = dMove.source
      best_target = dMove.target
  g.Gbest_source = best_source
  g.Gbest_target = best_target

  return best_score

i guess for you this is easy to understand : only piece weights and PSTs are used to evaluate an end-leave-position while depth is down to zero during search .. to imagine this recursion, combined with the well known simple alpha-beta function, i'm wondering how much more variation/leaves will/can be cut when the eval "more strongly" distinguishes between good/better and bad/worse moves .. this way, i'm curious how my implementation of the Turing Rules will work out - skipping the speed issue .. at least i want to prove it plays the same games as the know ones, then maybe adjust some parameters ?

i hope you get the picture .. at this moment in time i'm a beginner regarding building a chess engine, so i might lack some basic understandings.