Virtual vector

Overview

virtual_vec is a type compatible with std::vector<T>. The implementation targets large vectors that (1) are pushed to frequently and (2) the size is not known, (3) have a fixed memory upper bound. This guarantees O(1) for appending to the end. Do note the purpose of this implementation is experimentation, and it has not been tested in a production system.

Strategy

When the vector is instantiated, it reserves a large chunk of virtual memory addresses from the operating system. Memory from this address space is committed to our process as the vector grows. We never have to copy the entire vector to a new address space.

This implementation is hardcoded to use 4GB virtual address for each vector, so you can have thousands of these per process. The number was chosen arbitrarily and can be made bigger or smaller depending on your use case. For obvious reasons this is only applicable to 64-bit systems.

Should I use this or something like it?

This implementation is just for fun. Your mileage will vary platform to platform. If you are on a system that overcommits memory, like Linux, then you should use std::vector<T>::reserve(4ULL << 30) and you will get the best possible performance. If not, you should always run benchmarks, because you can incur large page fault penalties when bypassing the user space allocator (this is why reserve on Linux with std::vector is the best of both worlds).

Building

NOTE: Right now this only builds on Linux (would gladly accept patches to support other OSes). To build this you just need to include virtual_vec.h and virtual_vec.cpp and a c++17 (or newer) compiler.

Running tests

gtest must be installed.

To run the tests:


./excute test

To run the tests under valgrind:


./execute test-valgrind

Running benchmarks

Google benchmark must be installed.


./execute bench

Benchmarks

Below is a table of benchmarks comparing the performance of virtual_vec<int64_t> to std::vector<int64_t>. The test calls push_back until # elems / sizeof(int64_t) exceeds Storage size.

All benchmarks are performed on a Intel i7-8665U (8) @ 4.800GHz. Time is recorded in CPU time.

One thing to keep in mind is these benchmarks were done with -O2. With -O3 std::vector gets 10% faster.

Storage size	std::vector (ns)	virtual_vec (ns)	diff (%)
512B	1221	755	38
1KB	2248	1627	27
8KB	16635	12635	24
32KB	85878	38748	54
131KB	375485	234833	37
524KB	1534855	798632	47
1MB	2940128	1276827	56
4MB	13290644	6129833	53
8MB	29162281	12560318	56
10MB	39694352	13500909	65

keur/virtual_vec