Cannot use QuantizedGraph::quantize
lerouxrgd opened this issue · 21 comments
Hello @masajiro ,
I am adapting ngt-rs
to latest NGT 2.0 and I am having trouble with QuantizedGraph::quantize
.
In versions 1.14.x I used to call quantize
on a path containing a pre-built NGT index and it worked fine.
Now I get the error:
QuantizedGraph::quantize: Quantized graph is already existed.
It looks like the issue is at this test, however if NGTQ_QBG
is not defined we branch to the same test but this time it will quantize the graph (same behavior as before 2.0 I guess).
Is there something I am not using correctly ? When QBG is enabled we cannot use QuantizedGraph ?
Note that if I print what's in my temporary index there is only:
/tmp/.tmpUZ7Hcq/tre
/tmp/.tmpUZ7Hcq/prf
/tmp/.tmpUZ7Hcq/grp
/tmp/.tmpUZ7Hcq/obj
Whereas the error message says:
QuantizedGraph::quantize: Quantized graph is already existed. /tmp/.tmpUZ7Hcq/qg
And clearly /tmp/.tmpUZ7Hcq/qg
does not exist.
Hi @lerouxrgd ,
QBG::Index::quantize() of 2.0 doesn't work now. Instead of this, several steps are needed to build the QG index for 2.0. However, since the steps are a little complicated, I am going to provide a new quantize function like 1.0.
Finally, I have released v2.0.6 including the updated QBG::Index::quantize() that is the same as the function in v1.14.x.
Hi @masajiro ,
Thank you or updating the code. I have tried to use it as in v1.14.x with the following test which used to work:
// Create an index for vectors of dimension 3
let prop = Properties::dimension(3)?;
let mut index = Index::create(dir.path(), prop)?;
// Insert two vectors and get their id
let vec1 = vec![1.0, 2.0, 3.0];
let vec2 = vec![4.0, 5.0, 6.0];
let id1 = index.insert(vec1.clone())?;
let _id2 = index.insert(vec2.clone())?;
// Build and persist the index
index.build(1)?;
index.persist()?;
let params = QGQuantizationParams::default();
let index = QGIndex::quantize(index, params)?;
Where QGIndex::quantize
just calls ngtqg_quantize from C API.
But then I get the following error:
build-qg: Warning! None is unavailable for the global type. Zero is set to the global type.
Error: Error("Capi : ngtqg_quantize() : Error: /home/rgd/dev/projects/ngt-rs/ngt-sys/NGT/lib/NGT/NGTQ/Optimizer.h:optimize:323: the vector is empty")
Do you have an idea of what is the issue ? Should I use ngtqg_quantize
differently ?
The number of the inserted objects is more than 16 to train the quantization.
Hmm I have tried to insert many objects (up to 2k) of many sizes (up to 512 dimensions) but I always get the same error.
In my tests I use the format:
obj1 = [1, 2, 3]
obj2 = [4, 5, 6]
obj3 = [7, 8, 9]
...
I have also tried to fill up objects with random numbers between 0 and 1, but I still have the error.
Do you call NGT::Index::createIndex and NGT::Index::save before calling NGTQG::Index::quantize? The following is the example of building and searching the QG index. You should run this example in the root of NGT to load data.
#include "NGT/NGTQ/QuantizedGraph.h"
int
main(int argc, char **argv)
{
string indexPath = "index";
string objectFile = "./data/sift-dataset-5k.tsv";
string queryFile = "./data/sift-query-3.tsv";
// NGT index construction
try {
NGT::Property property;
property.dimension = 128;
property.objectType = NGT::ObjectSpace::ObjectType::Uint8;
property.distanceType = NGT::Index::Property::DistanceType::DistanceTypeL2;
std::cout << "creating the index framework..." << std::endl;
NGT::Index::create(indexPath, property);
NGT::Index index(indexPath);
ifstream is(objectFile);
string line;
std::cout << "appending the objects..." << std::endl;
while (getline(is, line)) {
vector<float> obj;
stringstream linestream(line);
while (!linestream.eof()) {
int value;
linestream >> value;
if (linestream.fail()) {
obj.clear();
break;
}
obj.push_back(value);
}
if (obj.empty()) {
cerr << "An empty line or invalid value: " << line << endl;
continue;
}
obj.resize(property.dimension); // cut off additional data in the file.
index.insert(obj);
}
std::cout << "building the index..." << std::endl;
index.createIndex(16);
index.save();
} catch (NGT::Exception &err) {
cerr << "Error " << err.what() << endl;
return 1;
} catch (...) {
cerr << "Error" << endl;
return 1;
}
// quantization
size_t dimensionOfSubvector = 1;
size_t maxNumberOfEdges = 50;
try {
std::cout << "quantizing the index..." << std::endl;
NGTQG::Index::quantize(indexPath, dimensionOfSubvector, maxNumberOfEdges, true);
} catch (NGT::Exception &err) {
cerr << "Error " << err.what() << endl;
return 1;
} catch (...) {
cerr << "Error" << endl;
return 1;
}
// nearest neighbor search
try {
NGT::Index index(indexPath);
NGT::Property property;
index.getProperty(property);
ifstream is(queryFile);
string line;
std::cout << "searching the index..." << std::endl;
while (getline(is, line)) {
vector<uint8_t> query;
{
stringstream linestream(line);
while (!linestream.eof()) {
int value;
linestream >> value;
query.push_back(value);
}
query.resize(property.dimension);
cout << "Query : ";
for (size_t i = 0; i < 5; i++) {
cout << static_cast<int>(query[i]) << " ";
}
cout << "...";
}
NGT::SearchQuery sc(query);
NGT::ObjectDistances objects;
sc.setResults(&objects);
sc.setSize(10);
sc.setEpsilon(0.1);
index.search(sc);
cout << endl << "Rank\tID\tDistance: Object" << std::showbase << endl;
for (size_t i = 0; i < objects.size(); i++) {
cout << i + 1 << "\t" << objects[i].id << "\t" << objects[i].distance << "\t: ";
NGT::ObjectSpace &objectSpace = index.getObjectSpace();
uint8_t *object = static_cast<uint8_t*>(objectSpace.getObject(objects[i].id));
for (size_t idx = 0; idx < 5; idx++) {
cout << static_cast<int>(object[idx]) << " ";
}
cout << "..." << endl;
}
cout << endl;
}
} catch (NGT::Exception &err) {
cerr << "Error " << err.what() << endl;
return 1;
} catch (...) {
cerr << "Error" << endl;
return 1;
}
return 0;
}
I am creating the NGT index and saving it before quantization.
I have followed your example, here are the essential steps:
let ndims = 128;
let props = NgtProperties::dimension(ndims)?
.object_type(NgtObject::Uint8)?
.distance_type(NgtDistance::L2)?;
let dir = tempdir()?;
let mut index = NgtIndex::create(dir.path(), props)?;
// Insert some objects of float32 (more than 16) ...
// Build and persist the index
index.build(1)?;
index.persist()?;
let params = QgParams {
dimension_of_subvector: 1.0,
max_number_of_edges: 50,
};
let index = QgIndex::quantize(index, params)?;
I always have the message:
build-qg: Warning! None is unavailable for the global type. Zero is set to the global type.
But if I run the test multiple times it can have different results. Sometime the quantization process starts.
Sometime it crashes with SIGABRT, I have tried to run gdb and got the following backtrace:
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007ffff79716b3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007ffff7921958 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007ffff790b53d in __GI_abort () at abort.c:79
#4 0x00007ffff7e77ea3 in NGT::Clustering::kmeans (this=0x7fffe7ffeca0, vectors=std::vector of length 25, capacity 25 = {...}, numberOfClusters=16,
clusters=std::vector of length 16, capacity 16 = {...}) at /home/rgd/dev/projects/ngt-rs/ngt-sys/NGT/lib/NGT/Clustering.h:994
#5 0x00007ffff7ef8455 in _ZN3QBG9Optimizer16optimizeRotationEmRSt6vectorIS1_IfSaIfEESaIS3_EER6MatrixIfES9_S9_RS1_IS1_IN3NGT10Clustering7ClusterESaISC_EESaISE_EENSB_14ClusteringTypeENSB_18InitializationModeEmmmmbfmRdRNSA_5TimerEfb._omp_fn.0(void) () at /home/rgd/dev/projects/ngt-rs/ngt-sys/NGT/lib/NGT/NGTQ/Optimizer.h:253
#6 0x00007ffff78bd406 in gomp_thread_start (xdata=<optimized out>) at /usr/src/debug/gcc/libgomp/team.c:129
#7 0x00007ffff796f8fd in start_thread (arg=<optimized out>) at pthread_create.c:442
#8 0x00007ffff79f1a60 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Note that:
$ c++filt _ZN3QBG9Optimizer16optimizeRotationEmRSt6vectorIS1_IfSaIfEESaIS3_EER6MatrixIfES9_S9_RS1_IS1_IN3NGT10Clustering7ClusterESaISC_EESaISE_EENSB_14ClusteringTypeENSB_18InitializationModeEmmmmbfmRdRNSA_5TimerEfb._omp_fn.0
QBG::Optimizer::optimizeRotation(unsigned long, std::vector<std::vector<float, std::allocator<float> >, std::allocator<std::vector<float, std::allocator<float> > > >&, Matrix<float>&, Matrix<float>&, Matrix<float>&, std::vector<std::vector<NGT::Clustering::Cluster, std::allocator<NGT::Clustering::Cluster> >, std::allocator<std::vector<NGT::Clustering::Cluster, std::allocator<NGT::Clustering::Cluster> > > >&, NGT::Clustering::ClusteringType, NGT::Clustering::InitializationMode, unsigned long, unsigned long, unsigned long, unsigned long, bool, float, unsigned long, double&, NGT::Timer&, float, bool) [clone ._omp_fn.0]
So it looks like QBG
optimizer is involved even though I am creating a QG
index with ngtqg_quantize
, I don't know whether this is an issue or not.
Since this message below is not related to this issue, you can ignore this message.
build-qg: Warning! None is unavailable for the global type. Zero is set to the global type.
Building a QG index calls the QBG functions, because QG is implemented with QBG since v2.0.
Unfortunately, I have no idea to resolve this issue at this moment.
@lerouxrgd
This release would resolve this issue. Could you check it up?
@dmyzk
Thank you for helping me resolve this issue.
@masajiro
Indeed I can correctly build QGIndex
by using the quantize
function now !
However if I try to search an object that I have inserted, sometimes it works, but sometimes I get this error:
ngt-8890cf0f1d61f324: /home/rgd/dev/projects/ngt-rs/ngt-sys/NGT/lib/NGT/NGTQ/Quantizer.h:1306: void NGTQ::QuantizedObjectDistance::createFloatL2DistanceLookup(void*, size_t, void*, DistanceLookupTableUint8&): Assertion `tmp >= 0 && tmp <= 255' failed.
Which is this assertion fail.
Do you have an idea about it ?
From the error, it seems that you built NGT on a machine without avx2 nor avx512. However, QG and QBG require avx2 or avx512.
It seems that I have at least avx2 when I check with: grep avx /proc/cpuinfo
.
Just to be sure, when you build NGT, can you see the message below?
#warning "AVX2 is available for NGTQG"
Yes I see this message when building NGT.
I have confirmed that the part is not compiled on CPUs with avx2. This error is quite strange.
Another possibility is that your program loads another NGT library that was compiled without the avx2 nor avx512 option when it runs.
To confirm that, could you insert the line which outputs a message like below before the line, and run it again?
std::cerr << "*** this line is reached." << std::endl;
If you don't see this message, your program loads another NGT library.
@dmyzk found how to avoid this issue as well. Thanks!
Since cargo build environment is a little different from ordinary build environments, you have to explicitly specify using AVX2, even if the cpu has AVX2. Could you insert the line below to this line?
config.define("NGT_AVX2", "ON");
When you build it, please add --release
.
cargo build --release
@masajiro Actually I have a quick follow up question:
Is it possible to build NGT 2.X with both NGT_SHARED_MEMORY_ALLOCATOR=ON and Q(B)G enabled ?
I tried it and some symbols related to QG are missing.
Whenever NGT is built with NGT_SHARED_MEMORY_ALLOCATOR=ON, QBG and QG are disabled.
Thank you for the confirmation !