[Discussion] Check datatype of properties in the database.
Purvanshsingh opened this issue ยท 16 comments
@open-risk Suggested having proper datatypes of the properties in the database, but the hydrus generate the properties (Columns) in string datatype.
@Mec-iS @farazkhanfk7 Please share your views.
Yes, currently every property(attribute in table ) is made using Column(String)
in db_models.py. To add such feature in hydrus we'll first have to make these changes in hydra-python-core where we can also define the preferred datatype of properties and then hydrus will look into api_doc to create a column accordingly in tables.
Yes, currently every property(attribute in table ) is made using
Column(String)
in db_models.py. To add such feature in hydrus we'll first have to make these changes in hydra-python-core where we can also define the preferred datatype of properties and then hydrus will look into api_doc to create a column accordingly in tables.
Are you suggesting to add a property in our api_doc to specify which data type is to be used?
Yes, currently every property(attribute in table ) is made using
Column(String)
in db_models.py. To add such feature in hydrus we'll first have to make these changes in hydra-python-core where we can also define the preferred datatype of properties and then hydrus will look into api_doc to create a column accordingly in tables.Are you suggesting to add a property in our api_doc to specify which data type is to be used?
Property type in the API_Doc.
Suppose "TotalBalance" property has
"@type": " "xsd:decimal"
So the Column in the database should have it's Datatype as Decimal where as currently we only have column of Datatype as VARCHAR.
I think types can be inferred from JSON strings when they got deserialised.
Every time a json.loads(...)
happens the parser takes a JSON string and cast it into a Python type (str
, int
, float
, ...); once we have the JSON in a form of a Python dictionary, we can easily check the type with type
or isinstance
.
>>> import json
>>> json_string = '{"string": "test", "integer": 3, "float": 2.34}'
>>> obj_from_json = json.loads(json_string)
>>> obj_from_json
{'string': 'test', 'integer': 3, 'float': 2.34}
>>> type(obj_from_json["string"])
<class 'str'>
>>> isinstance(obj_from_json["integer"], int)
True
>>>
Then map the Python type to an SQL type using sqlalchemy
.
UPDATE: I noticed that the problem happens before the actual data type is used.
@Purvanshsingh The use of entities from xsd
is a good solution but what is the suggested and spec-compliant way of defining data types in the ApiDoc? Have you searched the repo's issues for similar questions or examples?
If the right way is using a type specifier as a property like propertyOn
, for sure we have to make the property parsed by hydra-python-core
.
Yes, currently every property(attribute in table ) is made using
Column(String)
in db_models.py. To add such feature in hydrus we'll first have to make these changes in hydra-python-core where we can also define the preferred datatype of properties and then hydrus will look into api_doc to create a column accordingly in tables.Are you suggesting to add a property in our api_doc to specify which data type is to be used?
Property type in the API_Doc.
Suppose "TotalBalance" property has
"@type": " "xsd:decimal"
So the Column in the database should have it's Datatype as Decimal where as currently we only have column of Datatype as VARCHAR.
I don't think we can rely on the property field for extracting the datatype of the column because there can be many properties being referred, which won't be in the standard sqlalchemy
types
I think types can be inferred from JSON strings when they got deserialised.
Every time ajson.loads(...)
happens the parser takes a JSON string and cast it into a Python type (str
,int
,float
, ...); once we have the JSON in a form of a Python dictionary, we can easily check the type withtype
orisinstance
.>>> import json >>> json_string = '{"string": "test", "integer": 3, "float": 2.34}' >>> obj_from_json = json.loads(json_string) >>> obj_from_json {'string': 'test', 'integer': 3, 'float': 2.34} >>> type(obj_from_json["string"]) <class 'str'> >>> isinstance(obj_from_json["integer"], int) True >>>
Then map the Python type to an SQL type using
sqlalchemy
.
This seems like a possible way to go forward. @Purvanshsingh @farazkhanfk7 can you look into this and tell if this can be done?
@sameshl
Sure, we will look into it and let you know if we can implement this in db_models.py.
the problem is that ApiDoc parsing happens before any instance is provided, so there is no Python dictionary on which to run inference, see my update above:
UPDATE: I noticed that the problem happens before the actual data type is used.
@Purvanshsingh The use of entities from xsd is a good solution but what is the suggested and spec-compliant way of defining data types in the ApiDoc? Have you searched the repo's issues for similar questions or examples?
If the right way is using a type specifier as a property like propertyOn, for sure we have to make the property parsed by hydra-python-core.
From the poc perspective two different data types would be sufficient (numeric, string) as this would demonstrate the complete chain and can be expanded as/when needed. Detailed mapping of data types between rdf, xml/xsd, json(ld), vanilla python, standard python data science tools like numpy / pandas, and sql backends - so that information is not lost in the translation - is quite a hairy issue. (see how many types just postgres supported types)
I think types can be inferred from JSON strings when they got deserialised.
Every time ajson.loads(...)
happens the parser takes a JSON string and cast it into a Python type (str
,int
,float
, ...); once we have the JSON in a form of a Python dictionary, we can easily check the type withtype
orisinstance
.>>> import json >>> json_string = '{"string": "test", "integer": 3, "float": 2.34}' >>> obj_from_json = json.loads(json_string) >>> obj_from_json {'string': 'test', 'integer': 3, 'float': 2.34} >>> type(obj_from_json["string"]) <class 'str'> >>> isinstance(obj_from_json["integer"], int) True >>>
Then map the Python type to an SQL type using
sqlalchemy
.UPDATE: I noticed that the problem happens before the actual data type is used.
@Purvanshsingh The use of entities fromxsd
is a good solution but what is the suggested and spec-compliant way of defining data types in the ApiDoc? Have you searched the repo's issues for similar questions or examples?
If the right way is using a type specifier as a property likepropertyOn
, for sure we have to make the property parsed byhydra-python-core
.
@Mec-iS I took the reference from NPL ontology provided by open risk.
I'll check the specification once more & confirm the right way.
@open-risk Good point, initially we have at least to support db.String
, db.Integer
and db.Float
@Purvanshsingh that is the example of a JSON-LD vocabulary. We are looking for a Hydra ApiDoc example (that it is itself a JSON-LD vocabulary but for REST APIs).
I think we need to implement the read and convertion of types. Please open a issue in hydra-python-core
. As a side note, also the OpenAPI parser ignores types, so we need to open a issue also there.
I think types can be inferred from JSON strings when they got deserialised.
Every time ajson.loads(...)
happens the parser takes a JSON string and cast it into a Python type (str
,int
,float
, ...); once we have the JSON in a form of a Python dictionary, we can easily check the type withtype
orisinstance
.>>> import json >>> json_string = '{"string": "test", "integer": 3, "float": 2.34}' >>> obj_from_json = json.loads(json_string) >>> obj_from_json {'string': 'test', 'integer': 3, 'float': 2.34} >>> type(obj_from_json["string"]) <class 'str'> >>> isinstance(obj_from_json["integer"], int) True >>>
Then map the Python type to an SQL type using
sqlalchemy
.UPDATE: I noticed that the problem happens before the actual data type is used.
@Purvanshsingh The use of entities fromxsd
is a good solution but what is the suggested and spec-compliant way of defining data types in the ApiDoc? Have you searched the repo's issues for similar questions or examples?
If the right way is using a type specifier as a property likepropertyOn
, for sure we have to make the property parsed byhydra-python-core
.
@Mec-iS I searched in the specifications issues and found this issue
HydraCG/Specifications#192.
I am not sure at what conclusion they reached but the issue is raised for defining dataproperty
type as rdfs:range
.
can you please have a look at it?
>>> import json >>> json_string = '{"string": "test", "integer": 3, "float": 2.34}' >>> obj_from_json = json.loads(json_string) >>> obj_from_json {'string': 'test', 'integer': 3, 'float': 2.34} >>> type(obj_from_json["string"]) <class 'str'> >>> isinstance(obj_from_json["integer"], int) True >>>
A solution like this would only work when we get the actual data for a given property. But we cannot apply such an approach to the API Documentation which is the only source of information we have when we build the database schema.
The issue now becomes how do we identify datatypes using the @type
field, a similar discussion was had in HydraCG/Specifications#192 and they concluded that enforcing vocabularies to specify datatypes is less beneficial.
See my comments on HTTP-APIs/hydrus#581 about how implementing something like this from the API Doc is also tricky.
Moved to hydrus.