guokr/simbase

is only "int" vector id type supported?

bwlim opened this issue · 4 comments

I've read some simbase codes (because you
ve requested implementing score function using Euclidean-distance(preserving vector magnitude))
I noticed that vector id type of simbase is java int type.
That means vecid value space is limited to 32 bit integer space..

How about supporting "long" vector id type or "String" vector id type?
String type => can support any kind of id values, but maybe much more memory foot print.
long type => can provide huge integer id value space, more memory than 32bit int type but much less than String type.

In some application case(my case :D), int id type is not enough..
My database key value which will be matched to vector instance(match by vecid) in simbase DB is 64bit long type..
(I'm using Titan Graph Database(It's awesome 👍 ), and graph vertex id is 64 bit long type)

I am agree with you on the use of long type.

The refactor will be straightforward, just change the type and do not change the logic. The only thing need to be consider is data migration between two versions.

We will do the refactor when we have time which depends on the progress of other project on hands.

We had already fix it on develop branch, please check with the latest commit 93b70fd

All tests are passed, but please help us test it on more real cases.

And @bwlim please give us feedback on this issue. Thanks!

Thanks for adopting long id type, seems very good, thanks 👍

Also thanks for your advice.