[Discussion] Check datatype of properties in the database.

Question

[Discussion] Check datatype of properties in the database.

Purvanshsingh opened this issue 3 years ago · 16 comments

@open-risk Suggested having proper datatypes of the properties in the database, but the hydrus generate the properties (Columns) in string datatype.

Answer 1 · 2021-06-11T11:17:16.000Z

@Mec-iS @farazkhanfk7 Please share your views.

Answer 2 · 2021-06-11T11:22:22.000Z

Yes, currently every property(attribute in table ) is made using Column(String) in db_models.py. To add such feature in hydrus we'll first have to make these changes in hydra-python-core where we can also define the preferred datatype of properties and then hydrus will look into api_doc to create a column accordingly in tables.

Answer 3 · 2021-06-11T11:49:59.000Z

@sameshl please support

Answer 4 · 2021-06-11T12:04:45.000Z

Yes, currently every property(attribute in table ) is made using Column(String) in db_models.py. To add such feature in hydrus we'll first have to make these changes in hydra-python-core where we can also define the preferred datatype of properties and then hydrus will look into api_doc to create a column accordingly in tables.

Are you suggesting to add a property in our api_doc to specify which data type is to be used?

Answer 5 · 2021-06-11T12:09:37.000Z

Yes, currently every property(attribute in table ) is made using Column(String) in db_models.py. To add such feature in hydrus we'll first have to make these changes in hydra-python-core where we can also define the preferred datatype of properties and then hydrus will look into api_doc to create a column accordingly in tables.

Are you suggesting to add a property in our api_doc to specify which data type is to be used?

Property type in the API_Doc.
Suppose "TotalBalance" property has
"@type": " "xsd:decimal"
So the Column in the database should have it's Datatype as Decimal where as currently we only have column of Datatype as VARCHAR.

Answer 6 · 2021-06-11T12:12:28.000Z

I think types can be inferred from JSON strings when they got deserialised.
Every time a json.loads(...) happens the parser takes a JSON string and cast it into a Python type (str, int, float, ...); once we have the JSON in a form of a Python dictionary, we can easily check the type with type or isinstance.

>>> import json
>>> json_string = '{"string": "test", "integer": 3, "float": 2.34}'
>>> obj_from_json = json.loads(json_string)
>>> obj_from_json
{'string': 'test', 'integer': 3, 'float': 2.34}
>>> type(obj_from_json["string"])
<class 'str'>
>>> isinstance(obj_from_json["integer"], int)
True
>>>

Then map the Python type to an SQL type using sqlalchemy.

UPDATE: I noticed that the problem happens before the actual data type is used.
@Purvanshsingh The use of entities from xsd is a good solution but what is the suggested and spec-compliant way of defining data types in the ApiDoc? Have you searched the repo's issues for similar questions or examples?
If the right way is using a type specifier as a property like propertyOn, for sure we have to make the property parsed by hydra-python-core.

Answer 7 · 2021-06-11T12:18:48.000Z

Yes, currently every property(attribute in table ) is made using Column(String) in db_models.py. To add such feature in hydrus we'll first have to make these changes in hydra-python-core where we can also define the preferred datatype of properties and then hydrus will look into api_doc to create a column accordingly in tables.

Are you suggesting to add a property in our api_doc to specify which data type is to be used?

Property type in the API_Doc.
Suppose "TotalBalance" property has
"@type": " "xsd:decimal"
So the Column in the database should have it's Datatype as Decimal where as currently we only have column of Datatype as VARCHAR.

I don't think we can rely on the property field for extracting the datatype of the column because there can be many properties being referred, which won't be in the standard sqlalchemy types

Answer 8 · 2021-06-11T12:20:18.000Z

I think types can be inferred from JSON strings when they got deserialised.
Every time a json.loads(...) happens the parser takes a JSON string and cast it into a Python type (str, int, float, ...); once we have the JSON in a form of a Python dictionary, we can easily check the type with type or isinstance.
>>> import json
>>> json_string = '{"string": "test", "integer": 3, "float": 2.34}'
>>> obj_from_json = json.loads(json_string)
>>> obj_from_json
{'string': 'test', 'integer': 3, 'float': 2.34}
>>> type(obj_from_json["string"])
<class 'str'>
>>> isinstance(obj_from_json["integer"], int)
True
>>> 
Then map the Python type to an SQL type using sqlalchemy.

This seems like a possible way to go forward. @Purvanshsingh @farazkhanfk7 can you look into this and tell if this can be done?

Answer 9 · 2021-06-11T12:23:26.000Z

@sameshl
Sure, we will look into it and let you know if we can implement this in db_models.py.

Answer 10 · 2021-06-11T12:28:03.000Z

the problem is that ApiDoc parsing happens before any instance is provided, so there is no Python dictionary on which to run inference, see my update above:

UPDATE: I noticed that the problem happens before the actual data type is used.
@Purvanshsingh The use of entities from xsd is a good solution but what is the suggested and spec-compliant way of defining data types in the ApiDoc? Have you searched the repo's issues for similar questions or examples?
If the right way is using a type specifier as a property like propertyOn, for sure we have to make the property parsed by hydra-python-core.

Answer 11 · 2021-06-11T12:39:19.000Z

From the poc perspective two different data types would be sufficient (numeric, string) as this would demonstrate the complete chain and can be expanded as/when needed. Detailed mapping of data types between rdf, xml/xsd, json(ld), vanilla python, standard python data science tools like numpy / pandas, and sql backends - so that information is not lost in the translation - is quite a hairy issue. (see how many types just postgres supported types)

Answer 12 · 2021-06-11T12:43:39.000Z

I think types can be inferred from JSON strings when they got deserialised.
Every time a json.loads(...) happens the parser takes a JSON string and cast it into a Python type (str, int, float, ...); once we have the JSON in a form of a Python dictionary, we can easily check the type with type or isinstance.
>>> import json
>>> json_string = '{"string": "test", "integer": 3, "float": 2.34}'
>>> obj_from_json = json.loads(json_string)
>>> obj_from_json
{'string': 'test', 'integer': 3, 'float': 2.34}
>>> type(obj_from_json["string"])
<class 'str'>
>>> isinstance(obj_from_json["integer"], int)
True
>>> 
Then map the Python type to an SQL type using sqlalchemy.

UPDATE: I noticed that the problem happens before the actual data type is used.
@Purvanshsingh The use of entities from xsd is a good solution but what is the suggested and spec-compliant way of defining data types in the ApiDoc? Have you searched the repo's issues for similar questions or examples?
If the right way is using a type specifier as a property like propertyOn, for sure we have to make the property parsed by hydra-python-core.

@Mec-iS I took the reference from NPL ontology provided by open risk.
I'll check the specification once more & confirm the right way.

Answer 13 · 2021-06-11T12:51:27.000Z

@open-risk Good point, initially we have at least to support db.String, db.Integer and db.Float

@Purvanshsingh that is the example of a JSON-LD vocabulary. We are looking for a Hydra ApiDoc example (that it is itself a JSON-LD vocabulary but for REST APIs).

I think we need to implement the read and convertion of types. Please open a issue in hydra-python-core. As a side note, also the OpenAPI parser ignores types, so we need to open a issue also there.

Answer 14 · 2021-06-14T07:55:58.000Z

I think types can be inferred from JSON strings when they got deserialised.
Every time a json.loads(...) happens the parser takes a JSON string and cast it into a Python type (str, int, float, ...); once we have the JSON in a form of a Python dictionary, we can easily check the type with type or isinstance.
>>> import json
>>> json_string = '{"string": "test", "integer": 3, "float": 2.34}'
>>> obj_from_json = json.loads(json_string)
>>> obj_from_json
{'string': 'test', 'integer': 3, 'float': 2.34}
>>> type(obj_from_json["string"])
<class 'str'>
>>> isinstance(obj_from_json["integer"], int)
True
>>> 
Then map the Python type to an SQL type using sqlalchemy.

UPDATE: I noticed that the problem happens before the actual data type is used.
@Purvanshsingh The use of entities from xsd is a good solution but what is the suggested and spec-compliant way of defining data types in the ApiDoc? Have you searched the repo's issues for similar questions or examples?
If the right way is using a type specifier as a property like propertyOn, for sure we have to make the property parsed by hydra-python-core.

@Mec-iS I searched in the specifications issues and found this issue
HydraCG/Specifications#192.
I am not sure at what conclusion they reached but the issue is raised for defining dataproperty type as rdfs:range.
can you please have a look at it?

Answer 15 · 2021-06-15T12:13:18.000Z

>>> import json
>>> json_string = '{"string": "test", "integer": 3, "float": 2.34}'
>>> obj_from_json = json.loads(json_string)
>>> obj_from_json
{'string': 'test', 'integer': 3, 'float': 2.34}
>>> type(obj_from_json["string"])
<class 'str'>
>>> isinstance(obj_from_json["integer"], int)
True
>>>

A solution like this would only work when we get the actual data for a given property. But we cannot apply such an approach to the API Documentation which is the only source of information we have when we build the database schema.

The issue now becomes how do we identify datatypes using the @type field, a similar discussion was had in HydraCG/Specifications#192 and they concluded that enforcing vocabularies to specify datatypes is less beneficial.

See my comments on HTTP-APIs/hydrus#581 about how implementing something like this from the API Doc is also tricky.

Answer 16 · 2021-06-15T15:50:36.000Z

Moved to hydrus.