/rust-tokenize

A portable function that tokenizes a string using a FFI.

Primary LanguageRust

rust-tokenize

Tokenize a string (in the NLP sense) using a rust library and FFI.

Getting Started

Make sure you have rust installed.

Use the provided Makefile to to build the dylib and link with a demo c program.

$ make

cargo build --release
   Compiling libc v0.2.36
   Compiling tokenize v0.1.0 (file:///Users/sam/src/tokenize)
    Finished release [optimized] target(s) in 0.72 secs
gcc -L"./target/release/" -ltokenize test.c -o tokenize_linked

Next running the example c program you can see the ffi function output of the tokenizer:

$ ./tokenize_linked "Hello Hello Hello this is a test test test of tokens."

Hello this is a test of tokens