milvus-io/milvus-sdk-java

MilvusClientV2 doesn't support JSON as a return type

Closed this issue · 5 comments

There is a collection in a LangChain format:

{'collection_name': 'test',
 'auto_id': False,
 'num_shards': 1,
 'description': '',
 'fields': [{'field_id': 100,
   'name': 'id',
   'description': '',
   'type': <DataType.VARCHAR: 21>,
   'params': {'max_length': 36},
   'is_primary': True},
  {'field_id': 101,
   'name': 'text',
   'description': '',
   'type': <DataType.VARCHAR: 21>,
   'params': {'max_length': 65535}},
  {'field_id': 102,
   'name': 'metadata',
   'description': '',
   'type': <DataType.JSON: 23>,
   'params': {}},
  {'field_id': 103,
   'name': 'vector',
   'description': '',
   'type': <DataType.FLOAT_VECTOR: 101>,
   'params': {'dim': 768}}],
 'aliases': [],
 'collection_id': 451819797554279738,
 'consistency_level': 0,
 'properties': {},
 'num_partitions': 1,
 'enable_dynamic_field': True}

When I try to get a metadata field MilvusClientV2 fails.

List<String> query_output_fields = Arrays.asList("id", "metadata");
QueryReq queryParam = QueryReq.builder()
  .collectionName("test")
  .consistencyLevel(ConsistencyLevel.STRONG)
  .ids(Arrays.asList("8c053dc3-f888-5867-e50a-e497ea5310ce"))
  .outputFields(query_output_fields)
  .offset(0L)
  .build();
QueryResp resp = client.query(queryParam);

The error:

java.lang.NoSuchMethodError: 'com.google.gson.JsonElement com.google.gson.JsonParser.parseString(java.lang.String)'
	at io.milvus.response.FieldDataWrapper.ParseJSONObject(FieldDataWrapper.java:377)
	at io.milvus.response.basic.RowRecordWrapper.buildRowRecord(RowRecordWrapper.java:82)
	at io.milvus.response.QueryResultsWrapper.buildRowRecord(QueryResultsWrapper.java:87)
	at io.milvus.response.QueryResultsWrapper.getRowRecords(QueryResultsWrapper.java:71)
	at io.milvus.v2.utils.ConvertUtils.getEntities(ConvertUtils.java:57)
	at io.milvus.v2.service.vector.VectorService.query(VectorService.java:147)
	at io.milvus.v2.client.MilvusClientV2.lambda$query$26(MilvusClientV2.java:480)
	at io.milvus.v2.client.MilvusClientV2.retry(MilvusClientV2.java:153)
	at io.milvus.v2.client.MilvusClientV2.query(MilvusClientV2.java:480)
	at .(#72:1)

As my final goal is to filter entities by metadata, could you also advise if filtering by JSON is supported? If so, what is the syntax? Specifically, I need to look up entities by two fields (the absolute_directory_path and file_name in my case).

'metadata': {'absolute_directory_path': 'c:\\Temp\\Test',
   'index': '19',
   'size': '20',
   'file_name': 'lorem.txt'}}
yhmo commented

"java.lang.NoSuchMethodError: 'com.google.gson.JsonElement com.google.gson.JsonParser.parseString(java.lang.String)'"

Could you use the maven to show the dependency tree?
mvn dependency:tree -Dverbose

The Java SDK requires version v2.10.1 of Gson lib. There is a class named "JsonParser" in Gson.
The class JsonParser has a method parserString() which is implemented in v2.8.6.
You will get this error if the Gson version is overridden to a low version of less than v2.8.6.

The Springboot framework might depend on v2.8.5.

yhmo commented

Document about filtering on JSON field: https://milvus.io/docs/use-json-fields.md#Basic-scalar-filtering

Example to filter JSON field:

package io.milvus.v2;

import com.google.gson.*;
import io.milvus.v2.client.*;
import io.milvus.v2.common.ConsistencyLevel;
import io.milvus.v2.common.DataType;
import io.milvus.v2.common.IndexParam;
import io.milvus.v2.service.collection.request.AddFieldReq;
import io.milvus.v2.service.collection.request.CreateCollectionReq;
import io.milvus.v2.service.collection.request.DropCollectionReq;
import io.milvus.v2.service.vector.request.*;
import io.milvus.v2.service.vector.request.data.FloatVec;
import io.milvus.v2.service.vector.response.*;

import java.util.*;

public class SimpleExample {
    public static void main(String[] args) {

        ConnectConfig config = ConnectConfig.builder()
                .uri("http://localhost:19530")
                .build();
        MilvusClientV2 client = new MilvusClientV2(config);

        String collectionName = "java_sdk_example_simple_v2";
        int dim = 4;
        // drop collection if exists
        client.dropCollection(DropCollectionReq.builder()
                .collectionName(collectionName)
                .build());

        CreateCollectionReq.CollectionSchema collectionSchema = CreateCollectionReq.CollectionSchema.builder()
                .build();
        collectionSchema.addField(AddFieldReq.builder()
                .fieldName("id")
                .dataType(DataType.Int64)
                .isPrimaryKey(Boolean.TRUE)
                .build());
        collectionSchema.addField(AddFieldReq.builder()
                .fieldName("vector")
                .dataType(DataType.FloatVector)
                .dimension(dim)
                .build());
        collectionSchema.addField(AddFieldReq.builder()
                .fieldName("metadata")
                .dataType(DataType.JSON)
                .build());

        List<IndexParam> indexes = new ArrayList<>();
        indexes.add(IndexParam.builder()
                .fieldName("vector")
                .indexType(IndexParam.IndexType.FLAT)
                .metricType(IndexParam.MetricType.COSINE)
                .build());

        CreateCollectionReq requestCreate = CreateCollectionReq.builder()
                .collectionName(collectionName)
                .collectionSchema(collectionSchema)
                .indexParams(indexes)
                .consistencyLevel(ConsistencyLevel.BOUNDED)
                .enableDynamicField(Boolean.TRUE)
                .build();
        client.createCollection(requestCreate);
        System.out.printf("Collection '%s' created\n", collectionName);

        // insert some data
        List<JsonObject> rows = new ArrayList<>();
        Gson gson = new Gson();
        for (int i = 0; i < 100; i++) {
            JsonObject row = new JsonObject();
            row.addProperty("id", i);
            row.add("vector", gson.toJsonTree(new float[]{i, (float) i /2, (float) i /3, (float) i /4}));
            row.add("metadata", gson.fromJson(
                    String.format("{'absolute_directory_path': 'path%d', 'size': %d}", i, i),
                    JsonObject.class));
            rows.add(row);
        }
        InsertResp insertR = client.insert(InsertReq.builder()
                .collectionName(collectionName)
                .data(rows)
                .build());
        System.out.printf("%d rows inserted\n", insertR.getInsertCnt());


        // retrieve
        List<String> query_output_fields = Arrays.asList("id", "metadata");
        String filter = "metadata[\"size\"] < 8";
        QueryReq queryParam = QueryReq.builder()
                .collectionName(collectionName)
                .consistencyLevel(ConsistencyLevel.STRONG)
                .filter(filter)
                .outputFields(query_output_fields)
                .build();
        QueryResp resp = client.query(queryParam);
        List<QueryResp.QueryResult> queryResults = resp.getQueryResults();
        for (QueryResp.QueryResult result: queryResults) {
            System.out.println(result.getEntity());
        }
    }
}

@yhmo thank you for the detailed explanation!

I checked the dependencies and it seems that an effective version is 2.10.1. All others including 2.8.9 are marked as omitted.

...
[INFO] |  +- org.codehaus.plexus:plexus-utils:jar:3.0.24:compile
[INFO] |  +- com.google.code.gson:gson:jar:2.10.1:compile
[INFO] |  +- org.apache.parquet:parquet-avro:jar:1.13.1:compile
...
[INFO] +- dev.langchain4j:langchain4j-open-ai:jar:0.29.1:compile
[INFO] |  +- (dev.langchain4j:langchain4j-core:jar:0.29.1:compile - omitted for duplicate)
[INFO] |  +- dev.ai4j:openai4j:jar:0.17.0:compile
[INFO] |  |  +- (com.squareup.retrofit2:retrofit:jar:2.9.0:compile - omitted for duplicate)
[INFO] |  |  +- (com.squareup.retrofit2:converter-gson:jar:2.9.0:compile - omitted for duplicate)
[INFO] |  |  +- (com.google.code.gson:gson:jar:2.8.9:compile - omitted for conflict with 2.10.1)
...
[INFO] |  |  |  +- (com.google.protobuf:protobuf-java:jar:2.5.0:compile - omitted for conflict with 3.24.0)
[INFO] |  |  |  +- (com.google.code.gson:gson:jar:2.9.0:compile - omitted for conflict with 2.10.1)
[INFO] |  |  |  +- org.apache.hadoop:hadoop-auth:jar:3.3.6:compile
... 

Spring is not used for this app.

@yhmo It seems this is the JJava's bug. It reproduces only when I run the code in the Jupyter Notebook. It works fine with the same dependencies if started as a normal Java SE app.

This issue can be closed.