Provide a "class" row factory
deppe opened this issue · 14 comments
Is your feature request related to a problem? Please describe.
Currently, comdb2 module provides dict and namedtuple factories. It would also be nice to provide a "class" row factory which would allow rows returned as client-defined dataclasses, pydantic models, etc.
Describe the solution you'd like
See a similar concept in psycopg's class_row
factory: https://www.psycopg.org/psycopg3/docs/api/rows.html#psycopg.rows.class_row
This also allows nice static type checking: https://www.psycopg.org/psycopg3/docs/advanced/typing.html#example-returning-records-as-pydantic-models
One way to design the "class" row factory would be to have a the class_row_factory
as a class
. This would mean it would look something like the following below
In comdb2/factories.py
class ClassRowFactory:
def __init__(self, class_pointer: Any) -> None:
self.class_pointer = class_pointer
def __call__(self,col_names) -> Any:
_raise_on_duplicate_column_names(col_names)
def class_row(col_values):
key_value_mapping = dict(zip(col_names, col_vals))
resulting_class = self.class_pointer(**key_value_mapping)
return resulting_class
return class_row
So this will try to fit in to the existing code structure that is present in the factories.py
file. The use of it will be something like
Example:
>>> @dataclass
>>> class ABC:
>>> x: int
>>> y: int
.......
>>> conn.row_factory = ClassRowFactory(ABC)
>>> row = conn.cursor().execute("select 1 as x, 2 as y").fetchone()
>>> print(row)
<__main__.ABC object at 0x000001CBE9CC8790>
>>> print(row.x)
1
key_value_mapping = dict(zip(col_names, col_vals)) resulting_class = self.class_pointer(**key_value_mapping)
Is this the way that psycopg3 does it, with all columns names being passed to the class constructor as keyword arguments?
The column names aren't passed as a class constructor but the pointer to the particular class is passed to the constructor. To make it behave like a function (as it is shown in the existing format of comdb) I used the specified format in the __call__
method. But if you wish to see how its done in psycopg, it does approach it in a similar fashion where :
def class_row(cls: Type[T]) -> BaseRowFactory[T]:
r"""Generate a row factory to represent rows as instances of the class `!cls`.
The class must support every output column name as a keyword parameter.
:param cls: The class to return for each row. It must support the fields
returned by the query as keyword arguments.
:rtype: `!Callable[[Cursor],` `RowMaker`\[~T]]
"""
def class_row_(cursor: "BaseCursor[Any, Any]") -> "RowMaker[T]":
names = _get_names(cursor)
if names is None:
return no_result
def class_row__(values: Sequence[Any]) -> T:
return cls(**dict(zip(names, values))) < --------- This is how it is converting it to a class and returning it
return class_row__
return class_row_
I adopted a similar method and to make it easy to understand and read I put it in a class
the __call__
enables the class instance to be called like a function. So essentially what is being passed conn.row_factory = ClassRowFactory(ABC)
is like class instance that can also behave like a reference to a function
The column names aren't passed as a class constructor but the pointer to the particular class is passed to the constructor
Right, I understand that, I was asking about how that class is called. Does psycopg3 turn every column name into a keyword argument for that class constructor?
Does psycopg3 turn every column name into a keyword argument for that class constructor?
In its source code it does get the column names using another function and zips it to keyword argument of values.
How it gets the column names is through a cursor
functionality implemented within it.
Its easy to convert a dictionary to a class object. If there is a blueprint for getting the dictionary within the factories.py
then can't that dictionary factory be extended to just be passed into its respective class?
My concerns about a class row factory have never been about ease of implementation, they're about performance and about robustness.
On the performance side, this approach to a class row factory means that for each row a zip
object gets created, and a dict
(or I think actually 2 dicts, under the covers, due to the way the interpreter implements **
), and an instance of the user-provided class. That's a lot of work, but maybe it's justified for the convenience.
On the robustness side, I'm wondering how useful it is to assume that column names will exactly match the parameter names of the __init__
of the class that was provided to the factory-factory's constructor.
If psycopg3 has done this the same way - with a dict being created for each row and used to provide keyword arguments to the class constructor based on the column names - that helps to set my mind at ease about both the performance and robustness aspect. That's why I'm inquiring about the approach they took.
On the robustness side, I'm wondering how useful it is to assume that column names will exactly match the parameter names of the __init__ of the class that was provided to the factory-factory's constructor.
The column names and values are returned as a dictionary
{'col1':value1,'col2':val2}
These will match with the parameters of the class specified if the parameter name and the key names (in this case the column name) will be the same
so the class should look something like
@dataclass
class ABC:
col1: data_type1
col2: data_type2
Just how it has been described in the original psycopg implementation
Also important thing. Datatype of the attributes of the dataclass should be the same as the datatype of the corresponding values of the dictionary.
I'm pretty sure I don't want to expose this feature, at least as things stand today. There's a few different things that make this make more sense for psycopg3 than it makes for Comdb2, but the biggest one is that I'm reluctant to add more code that makes static type checking difficult.
The psycopg3 Connection and Cursor classes are generic, so a type checker like mypy or pylance can catch a bug like this:
from dataclasses import dataclass
@dataclass
class Point:
x: int
y: int
conn = psycopg.connect(row_factory=class_row(Point))
cur = conn.cursor()
cur.execute("select 1, 2")
for point in cur.fetchall():
print(f"x={point.x} y={point.y} z={point.z}")
The type checker knows that Connection[Point].cursor().fetchall()
returns a Sequence[Point]
, and it knows that a Point
doesn't have a z
attribute, so the last line fails type analysis because it tries to access point.z
.
Compare that against this example that uses python-comdb2
, using the interface proposed in #57:
from dataclasses import dataclass
from comdb2.dbapi2 import connect
from comdb2.factories import ClassRowFactory
@dataclass
class Point:
x: int
y: int
conn = connect("mattdb")
conn.row_factory = ClassRowFactory(Point)
cur = conn.cursor()
cur.execute("select 1, 2")
for point in cur.fetchall():
print(f"x={point.x} y={point.y} z={point.z}")
In the case of python-comdb2
, our Connection
and Cursor
classes aren't currently generic, and so the type checker wouldn't catch the same bug - cur.fetchall()
is typed as returning Sequence[Any]
in the comdb2 world, and accessing the z
attribute of an object of type Any
isn't a type error.
It seems a bit unfair to dislike this because of an unrelated feature that python-comdb2
is lacking, except that there's an alternative way to write this code where a type checker would catch the bug. If that Comdb2 example were rewritten to this:
from dataclasses import dataclass
from comdb2.dbapi2 import connect
from comdb2.factories import dict_row_factory
@dataclass
class Point:
x: int
y: int
conn = connect("mattdb")
conn.row_factory = dict_row_factory
cur = conn.cursor()
cur.execute("select 1, 2")
for row in cur.fetchall():
point = Point(**row)
print(f"x={point.x} y={point.y} z={point.z}")
then the type checker would catch the bug. That's only one more line of code, and it makes it so that type checkers can usefully check the processing of each row. If someone wants to convert each row to a class, I'd rather they do it this way so that the type checker understands what's happening.
That's not a "no" for doing this proposed change, but I think it is a "not yet". I don't think this change makes sense unless and until our Connection
and Cursor
types become generic.
And sorry for taking so long to reply here. I was hoping I'd find time to try making our Connection and Cursor classes generic, and never managed to get around to it.