Deserialization 3.5x slower than Python pickle, 4x slower than serde_json
naktinis opened this issue · 1 comments
I set up a simple benchmark with a 67MB pickle and measured deserialization speed in 7 scenarios.
library | time |
---|---|
Python pickle.load |
341 ms |
Python json.load |
397 ms |
serde_json from_str |
327 ms |
bincode from_slice |
314 ms |
py-marshal marshal_load |
691 ms |
serde-pickle from_reader |
1250 ms |
serde-pickle from_slice |
1310 ms |
Is this a known behavior? Is there any hope of this getting improved in the foreseeable future? I share my setup below, so you can point out any issues or things I missed.
Data
>>> import random, string
>>> data = [''.join(random.sample(string.ascii_letters, 32)) for _ in range(2_000_000)]
Python load
>>> import time, pickle, marshal
>>> marshal.dump(data, open('test.marshal', 'wb'))
>>> pickle.dump(data, open('test.pickle', 'wb'))
>>> json.dump(data, open('test.json', 'w'))
>>> t = time.time(); _ = pickle.load(open('test.pickle', 'rb')); print(f'{time.time() - t:.3f}s')
0.341s
>>> t = time.time(); _ = json.load(open('test.json', 'rb')); print(f'{time.time() - t:.3f}s')
0.397s
Rust load
pub fn load_pickle(path: &str) -> pickle::Value {
let file = BufReader::new(File::open(path).unwrap());
pickle::from_reader(file).expect("couldn't load pickle")
}
pub fn load_pickle_slice(path: &str) -> pickle::Value {
let mut bytes = Vec::new();
File::open(path).unwrap().read_to_end(&mut bytes).unwrap();
pickle::from_slice(&bytes).expect("couldn't load pickle")
}
pub fn load_marshal(path: &str) -> Result<Arc<RwLock<Vec<Obj>>>, &'static str> {
let file = BufReader::new(File::open(path).unwrap());
match read::marshal_load(file) {
Ok(obj) => Ok(obj.extract_list().unwrap()),
Err(_) => Err("error_load"),
}
}
pub fn load_json(path: &str) -> json::Value {
let mut s = String::new();
File::open(path).unwrap().read_to_string(&mut s).unwrap();
serde_json::from_str(&s).expect("couldn't load json")
}
pub fn load_bincode<T>(path: &str) -> T
where T: serde::de::DeserializeOwned
{
let file = BufReader::new(File::open(path).unwrap());
bincode::deserialize_from(file).unwrap()
}
fn main() {
println!("Loading pickle...");
let timer = time::Instant::now();
let data = load_pickle("test.pickle");
println!("Load completed in {:.2?}", timer.elapsed());
println!("Loading pickle slice...");
let timer = time::Instant::now();
let data = load_pickle_slice("test.pickle");
println!("Load completed in {:.2?}", timer.elapsed());
println!("Loading marshal...");
let timer = time::Instant::now();
let data = load_marshal("test.marshal").unwrap();
println!("Load completed in {:.2?}", timer.elapsed());
println!("Loading JSON...");
let timer = time::Instant::now();
let data = load_json("test.json");
println!("Load completed in {:.2?}", timer.elapsed());
println!("Loading Bincode...");
let timer = time::Instant::now();
let data: Vec<String> = load_bincode("test.bincode");
println!("Load completed in {:.2?}", timer.elapsed());
}
Dependencies
[dependencies]
serde-pickle = "0.6"
bincode = "1.3"
serde_json = "1.0"
py-marshal = { git = "https://github.com/sollyucko/py-marshal" }
serde = { version = "1.0", features = ["derive"] }
Thanks for the report, I can more or less reproduce the results. (Please include all of the code next time though, it makes it much easier.)
This crate hasn't been optimized for speed (yet), so it's not surprising that it won't outperform Python's pickle module. As for a comparison between different formats, that is always a little more difficult to reason about.
In any case I can't spend much time on this at present - PRs are welcome and I expect there might be some easy wins achievable with basic profiling.