rust-ml/linfa

Add Serde to CountVectorizer, to export and import it

Bastian1110 opened this issue · 6 comments

Hello again!

I'm trying to make a project trying to compile a Machine Learning model to WASM and be able to put it in a browser using a UI framework, you can see the repository here.
I think it would be very useful if it were easier to export and import already trained models (like joblib with Sklearn) since it would open up a world of possibilities to be able to "embed" machine learning models anywhere!
I have been investigating a bit about how serde and ciborium work, I have managed to export one or another model but it has been very difficult for me. I'd like to help lymph models do this, but my knowledge of Rust is minimal.
Especially if someone could help me tell me how I can export a count vectorizer for my project I would greatly appreciate it.

I don't think CountVectorizer has serde support but it should be pretty easy to add. Can you post your code snippet just to be sure?

Sure !
This is the code Im using to "export" a model using ciborium, this method has worked successfully with linfa-svm .

// In the winequality SVM example after "fiting" the SVM model

let model_value = cbor!(model).unwrap();
let mut vec_model = Vec::new();
let _cebor_writer = ciborium::ser::into_writer(&model_value, &mut vec_model);

//Esporting it to a .cbor file
let path: &Path = Path::new("./model.cbor");
fs::write(path, vec_model).unwrap();

Then, you can import the model and use it, like this :

//Reading the .cbor file and converting it to a ciborium value
let mut file = File::open("./model.cbor").unwrap();
let mut data: Vec<u8> = Vec::new();
file.read_to_end(&mut data).unwrap();
let model_value = ciborium::de::from_reader::<Value, _>(&data[..]).unwrap();

//Creating again the model, but its already trained 
let model: Svm<f64, bool> = model_value.deserialized().unwrap();
println!("{}", model);

This really works with ease (with the SVM-model), but the way I find out that a model doesn't support serde serialization is by trying to pass it to the cbor! macro, when a model does not support serialization, the following error appears:

the trait bound `<MODELNAME>: serde::ser::Serialize` is not satisfied
the following other types implement trait `serde::ser::Serialize`:
  &'a T
  &'a mut T
  ()
  (T0, T1)
  . . .

I just cloned your repository with the addition of serde support, thank you very much!
I tried to test it inside the extra-serde branch, my test was the same as described in the other comment, I tried to pass the CountVectorizer through the cbor! macro but I get the same error:

//In the countvectorization.rs example of linfa-preprocessing (inside the extra-serve branch)

let vectorizer_value = cbor!(vectorizer).unwrap();

But the following error occurs :

the trait bound `CountVectorizer: serde::ser::Serialize` is not satisfied
the following other types implement trait `serde::ser::Serialize`:
  &'a T
  &'a mut T
  ()
  (T0, T1)

Maybe Im testing it wrong? If so, any other idea on how to test it without merging to the master branch?

Did you enable the serde feature on the crate?

I just tested with the serde feature enabled and I asserted that CountVectorizer: Serialize holds. I'm going to merge the PR into master and you can test it from there.

I just tested adding the serde to the features in Cargo.toml and it works!
Thanks a lot!