Implement the input tokenizer in Fortran

Question

certik opened this issue 2 years ago · 0 comments

Currently the input tokenizer is in Python, taken from the original OpenAI's implementation: https://github.com/certik/fastGPT/blob/01eb84b015d89a567245da0445c0abb7d53a8500/encode_input.py. We should implement it in Fortran. That will eliminate the need to call the Python script before running fastGPT.

We have to write tests that exercise each code path in the Python implementation to ensure our Fortran implementation is correct.