Aquila-Network/aquila

Sample Code returns empty

pasa13142 opened this issue · 17 comments

from aquiladb import AquilaClient as acl
db = acl('localhost', 50051)

sample = db.convertDocument([0.1,0.2,0.3,0.4], {"hello": "world"})

db.addDocuments([sample])
vector = db.convertMatrix([0.1,0.2,0.3,0.4])

k = 10
result = db.getNearest(vector, k)

This is sample data set which is here https://github.com/a-mma/AquilaDB/wiki/Get-started-with-AquilaDB , and in my try, it returns empty list with something like :

status: true
documents: "[]"

Any idea ?

It is required at least vecount (as configured in DB_config.yml) documents to be indexed before first run of getNearest(). Please let me know if you are getting the error even after indexing vecount documents. To debug you can also try steps mentioned here as well: https://github.com/a-mma/AquilaDB/issues/45#issuecomment-569086001

No activity, closing. Please reopen if the issue persists.

The issue persists.
vecount is 100
After adding 250 documents, getNearest() still comes empty.

The logs say, there is an uninitialized JS-object:

1|peer_manager  |  TypeError: Cannot read property 'rows' of undefined
1|peer_manager  |     at /AquilaDB/src/p2p/routing_table/index.js:157:34

No activity, closing. Please reopen if the issue persists.

Please reopen.
I cannot reopen it myself, can only create a new one.
AFAIK, only a repository collaborator can reopen this issue.

@NikolaiPohodenko could you please provide more details to reproduce the issue?

  1. which docker image is you are using
  2. share the client code you have used to index and query the data (so that I can run the code myself)
  3. which operating system you are using
  4. more logs from vecdb and vecstore (not from peer_manager)
  5. any other information so that will help while testing it out myself

If you change the config, when the container is running, you need to restart the container.
docker restart <container id>

  1. docker image is "latest" from 07 Jan 2020.
Digest: sha256:29bb80d259e17d754a9ad283eb308fdf4e1e64cdde03c3142c9193c0c69ada25
Status: Downloaded newer image for ammaorg/aquiladb:latest
docker.io/ammaorg/aquiladb:latest
  1. the client code is simple
    In one script I populate the db:
from aquiladb import AquilaClient as acl
db = acl('localhost', 50051)

v0 = [0.1, 0.2, 0.3, 0.4]
attempts = 300
success_count = 0

for i in range(attempts):
    v = [i+x for x in v0]
    s = db.convertDocument(v, {"idx": f"{i}"})
    r = db.addDocuments([s])
    
    if r.status:
        success_count += 1
        
print(f"success_count = {success_count} of {attempts}") # 300 of 300

In another script I make KNN requests:

from aquiladb import AquilaClient as acl
db = acl('localhost', 50051)

v0 = [0.11, 0.21, 0.31, 0.41]

success_count = 0
attempts = 300

for i in range(attempts):
    v = [i+x for x in v0]
    m = db.convertMatrix(v)
    r = db.getNearest(m, 10)
    
    if r.status:
        success_count += 1
        
print(f"success_count = {success_count} of {attempts}") # 0 of 300
  1. operating system is Windows 10 Enterprise
  2. more logs
(base) root@2fc5f5700917:/# pm2 logs
[TAILING] Tailing last 15 lines for [all] processes (change the value with --lines option)
/root/.pm2/pm2.log last 15 lines:
PM2        | 2020-01-10T11:34:20: PM2 log: PM2 PID file         : /root/.pm2/pm2.pid
PM2        | 2020-01-10T11:34:20: PM2 log: RPC socket file      : /root/.pm2/rpc.sock
PM2        | 2020-01-10T11:34:20: PM2 log: BUS socket file      : /root/.pm2/pub.sock
PM2        | 2020-01-10T11:34:20: PM2 log: Application log path : /root/.pm2/logs
PM2        | 2020-01-10T11:34:20: PM2 log: Worker Interval      : 30000
PM2        | 2020-01-10T11:34:20: PM2 log: Process dump file    : /root/.pm2/dump.pm2
PM2        | 2020-01-10T11:34:20: PM2 log: Concurrent actions   : 2
PM2        | 2020-01-10T11:34:20: PM2 log: SIGTERM timeout      : 1600
PM2        | 2020-01-10T11:34:20: PM2 log: ===============================================================================
PM2        | 2020-01-10T11:34:20: PM2 log: App [vecdb:0] starting in -fork mode-
PM2        | 2020-01-10T11:34:20: PM2 log: App [vecdb:0] online
PM2        | 2020-01-10T11:34:20: PM2 log: App [peer_manager:1] starting in -fork mode-
PM2        | 2020-01-10T11:34:20: PM2 log: App [peer_manager:1] online
PM2        | 2020-01-10T11:34:20: PM2 log: App [vecstore:2] starting in -fork mode-
PM2        | 2020-01-10T11:34:20: PM2 log: App [vecstore:2] online

/root/.pm2/logs/vecdb-error.log last 15 lines:
/root/.pm2/logs/vecstore-error.log last 15 lines:
/root/.pm2/logs/peer-manager-out.log last 15 lines:
1|peer_man | peer events subscription done
1|peer_man | Example app listening on port 50053!
1|peer_man | OpenError: IO error: /data/default_swarmdb: Invalid argument
1|peer_man |     at /AquilaDB/src/node_modules/levelup/lib/levelup.js:87:23
1|peer_man |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14
1|peer_man |     at /AquilaDB/src/node_modules/levelup/node_modules/deferred-leveldown/deferred-leveldown.js:20:21
1|peer_man |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14

/root/.pm2/logs/vecstore-out.log last 15 lines:
2|vecstore | FAISS index loading failed Error in faiss::{anonymous}::FileIOReader::FileIOReader(const char*) at index_io.cpp:136: Error: 'f' failed: could not open /data/model_hf for reading: No such file or directory
2|vecstore | Annoy index loading failed
2|vecstore | Starting server. Listening on port 50052.

/root/.pm2/logs/vecdb-out.log last 15 lines:
0|vecdb    | OpenError: IO error: /data/default_docsdb: Invalid argument
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/lib/levelup.js:87:23
0|vecdb    |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/node_modules/deferred-leveldown/deferred-leveldown.js:20:21
0|vecdb    |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14
0|vecdb    | OpenError: IO error: /data/default_docsdb: Invalid argument
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/lib/levelup.js:87:23
0|vecdb    |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/node_modules/deferred-leveldown/deferred-leveldown.js:20:21
0|vecdb    |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14
0|vecdb    | OpenError: IO error: /data/default_docsdb: Invalid argument
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/lib/levelup.js:87:23
0|vecdb    |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/node_modules/deferred-leveldown/deferred-leveldown.js:20:21
0|vecdb    |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14

/root/.pm2/logs/peer-manager-error.log last 15 lines:
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_man | TypeError: Cannot read property 'rows' of undefined
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_man | TypeError: Cannot read property 'rows' of undefined
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_man | TypeError: Cannot read property 'rows' of undefined
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_man | TypeError: Cannot read property 'rows' of undefined
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_man | TypeError: Cannot read property 'rows' of undefined
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_man | TypeError: Cannot read property 'rows' of undefined
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_man | TypeError: Cannot read property 'rows' of undefined
1|peer_man |     at /AquilaDB/src/p2p/routing_table/index.js:157:34

1|peer_manager  | TypeError: Cannot read property 'rows' of undefined
1|peer_manager  |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
1|peer_manager  | TypeError: Cannot read property 'rows' of undefined
1|peer_manager  |     at /AquilaDB/src/p2p/routing_table/index.js:157:34
  1. any other information so that will help while testing it out
(base) root@2fc5f5700917:/AquilaDB/src# cat DB_config.yml
docs:
  vecount: 100 # minimum data required to start indexing
faiss:
  init:
    nlist: 1 # number of cells
    nprobe: 1 # number of cells that are visited to perform a search
    bpv: 8 # bytes per vector
    bpsv: 8 # bytes per sub vector
    vd: 784 # fixed vector dimension
annoy:
  init:
    vd: 784 # fixed vector dimension
    smetric: 'angular' # similarity metric to be used
    ntrees: 10 # no. of trees
couchDB:
  DBInstance: default # database namespace
  host: /data
  user: root
  password:
vectorID:
  sync_t: 5000(base) root@2fc5f5700917:/AquilaDB/src#

@NikolaiPohodenko I was able to run your script successfully with successful kNN search returns.

But in your case from the logs I can see that the vecdb module is crashing.

/root/.pm2/logs/vecdb-out.log last 15 lines:
0|vecdb    | OpenError: IO error: /data/default_docsdb: Invalid argument
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/lib/levelup.js:87:23
0|vecdb    |     at /AquilaDB/src/node_modules/abstract-leveldown/abstract-leveldown.js:30:14
0|vecdb    |     at /AquilaDB/src/node_modules/levelup/node_modules/deferred-leveldown/deferred-leveldown.js:20:21

So, it is required to identify the specific reason why it's crashing. What we know about your test environment which is different from AquilaDB automated test environment (Ubuntu AMD64):

  • your operating system is windows 10 enterprise

could you please follow some steps share more info?

  • which's your processor (CPU) model?
  • please run docker run -d -i -p 50051:50051 -t ammaorg/aquiladb:bleeding (with latest bleeding image) and check if you get the same issue (vecdb log mentioned above) and let me know
  • After following the above step and you still didn't get kNN search results, please make sure the disk you mount with -v parameter has right permissions to write and read from docker container (only applicable if you mount host system directory as volume)
  • If the permissions are okay, please make sure that no more than one AquilaDB containers that are running are mounted to the same host directory with -v param
  • Wait a few seconds between vector indexing and querying (in your case run second script a few seconds after first one) because, AquilaDB is an eventual consistent database.

I confirm that in my case the problem was with the container access to the mounted host directory and, therefore, is of different nature than the issue description.

If I keep the data within the container, the example works ok.

@NikolaiPohodenko, thanks for the update. So, write permission prevented the document DB from accessing the mount directory which in turn blocked the change event generated by the document DB event listener and blocked updates to the vector DB as well. That's why you were getting empty results.

It will be great if you could figure out and share tips for Windows users who might face the same issue while mounting host directories.

And, I'm going to keep this issue open for while..

Hello everyone!

I'm facing the same issue as described here. I have windows 10 os with docker and the image of AquilaDB installed. If don't mount the directory, the example works great. But if I try to mount it, the example stops working.

I've tried differents paths but no one seems to works. I've also checked the options in docker to share my drives and I've tried to start docker with superuser permissions but still nothing.

Anyone can help me with this? @NikolaiPohodenko How did you solve it??

Thanks in advance.

@Mikel-a-esparza , I didn't make Aquila to store vectors on win-10 host file system.
I keep the data within the container. On Ubuntu external storage option works though.

I didn't delve into the problem, since I'm planning to migrate away from Aql, my reasons:

  1. Aql hasn't yet implemented Issues #25 #61
  2. Pre-search filtering might be eventually required
  3. A new bug (unreported yet): less than k-nn items returned, when there are identical vectors stored in Aql
  4. Underlying FAISS does not support neither #61 not pre-search filtering, which implies Aql may never have these.

Hi @NikolaiPohodenko , we're going through whiteboard discussions and code refactoring of ADB including changes to parts of existing architecture. It will take some time until next release. Unfortunately, features https://github.com/a-mma/AquilaDB/issues/25 and https://github.com/a-mma/AquilaDB/issues/61 will only be available with that release. We're sorry for the inconvenience. You can take a look at Elastic search which has implemented vector search within it. https://www.elastic.co/blog/text-similarity-search-with-vectors-in-elasticsearch It's straight forward and with Elastic search, you can implement all the use cases you see in our documentation. We're very thankful for your support in testing out ADB and reporting multiple issues.

Hi @NikolaiPohodenko. First of all thank you so much for taking your time in answering. I will do my testing keeping the data inside the container an if it's performance is good I will evaluate to migrate the solution into a Ubuntu system.

Btw which other DB are you taking into consideration for this type of projects? I'm building an engine for Face similarity search so a fast knn search and a optional pre filtering would be great.

Thanks again.

@Mikel-a-esparza in the end FAISS is the king, but there is also PostgreSQL+Cube.
https://news.ycombinator.com/item?id=21461755

Code is rewritten. Bug is irrelevant and covered. closed.