[Java][FlightRPC] Handle binding parameters when server can't provide the expected type
Opened this issue · 3 comments
Describe the enhancement requested
Based on #156, we need to support the case where the server can't specify the expected type of a given parameter.
Not all servers can always provide an accurate type for bind parameters. What should we do there? Note that ADBC uses NA/NullType as a wildcard/placeholder type here. This isn't specified in Flight SQL itself, but perhaps we could adopt that convention as well.
Proposal:
- When the server doesn't know the type of any of the parameters, it can just set the parameter schema to
null(or alternatively, an empty schema). - When the server doesn't know the type of only some of the parameters, it can just set the respective Field to
NullType. - Before we start binding values, we transform the
preparedStatement.getParameterSchema()based on the types of the givenTypedValues. If the Schema is empty/null, we create every Field for the schema based on the TypedValue type. Otherwise, we replace all NullType fields with a Field based on the TypedValue.
The only potential drawback I see with this approach is that NullTypes can't be used for parameters. That being said, I can't really think of a case why one would want to do that.
Component(s)
Java
I've just run into this while adding various language integration tests against a FlightSQL server implementation. I'll note that so far both Go and Python haven't had any issues without a parameter schema available (though this did finally motivate me to implement that bit of the spec).
- Before we start binding values, we transform the
preparedStatement.getParameterSchema()based on the types of the givenTypedValues. If the Schema is empty/null, we create every Field for the schema based on the TypedValue type. Otherwise, we replace all NullType fields with a Field based on the TypedValue.
I'm confused why the Java client is attempting to type check anything at all. I would think it should be up to the server to decide whether a provided type is acceptable or not. One obvious case of this would be whether type casting is allowed. Another case is when any type is acceptable like SELECT $1;.
I did spend some time reading through the implementation to see if I could provide a patch easily enough, but I'm no Java expert and got super lost in the whole arrow vector package trying to figure out how it's intended to work. I was vaguely surprised by the comment in VectorSchemaRoot acknowledging that most (all?) other languages use a RecordBatch for sending parameters. Given that its RecordBatch's that are sent across the wire I assume that's some sort of JVM optimization? No idea there.
Anywho, if anyone gets around to this, feel free to ping me as I should be able to fairly quickly check and proposed patches for fixing this.
IIRC, I believe there's a few things at play here:
- The usage of the Avatica JDBC client
- The statically typed nature of Java
- The general design of the driver
I'd have to dig into the code again as it's been a while, but from what I remember, when you create a PreparedStatement, the driver creates a VectorSchemaRoot based on the returned Parameter schema. When you set a parameter, it attempts to set the given value on the VectorSchemaRoot. I'm not entirely sure why most of the Java implementation is around VectorSchemaRoot instead or raw ArrowRecordBatches, but I imagine it's due to some sort of memory optimization (being able to re-use allocated memory within the JVM for pointers to raw Arrow data in the shared memory space). That being said, I don't think the use of VectorSchemaRoot changes anything here as ultimately the data is indeed sent to the server in an ArrowRecordBatch.
I think it's possible to re-design how we bind parameters to do the following instead:
- Move the creation of the
VectorSchemaRootfrom theAvaticaParameterBinderconstructor intoAvaticaParameterBinder.bind - Create the
VectorSchemaRootby introspecting the list oftypedValuesinstead of using thePreparedStatement.getParameterSchema
The only potential issue here is I'm not entire sure if the FlightSqlClient needs the ParameterSchema to match the schema of the VectorSchemaRoot that we actually pass in.
I'm curious, how is this handled in other languages? Does the schema that is sent with parameters need to match the ParameterSchema?
Relevant parts of the code:
Ok, I dug into this a bit more. I think the change shouldn't be too difficult, the hardest part will be agreeing on expected behavior.
I started a draft PR: #462
I would love to hear thoughts & opinions on:
- Should we ignore the
parameterSchemareturned by the server altogether or should we respect all types except for Null vectors and treat those as "wild cards"?- What do the other drivers do?
- How do we ensure all of the required Avatica
TypedValuesare mapped to anArrowType?