Why iterrowmapmany convert each row to Record instance ?
jossefaz opened this issue · 2 comments
In this method :
Line 309 in 0be2735
Each row is converted to a Record
instance.
Line 314 in 0be2735
In my usecase, my "rowgenerator" helper function function do need a named tuple and not a plain row as an input. This is a great convenience to call named attribute instead of unclear row[5]
- "index notation".
For that purpose I tried to use rowmapmany
in this way :
etl.rowmapmany(etl.namedtuples(my_table), rowgenerator=mapper, header=headers)
I thought that using namedtuples
will solve my issue (because my row has more than 100 columns, so it is a bit hard to use indexes i.e row[57]
where a named tuple could simply gives me the convenience of row.my_target_attribute
.
But because of this conversion to Record instance, the input will convert each namedtuple to a plain list of values which is a bit frustrating, since it forces us to use the indexes notation in the mapper function (very hard to read).
When I remove this line
Line 314 in 0be2735
It works like a charm....
Why this Record conversion is important ?
If it is not, could we remove it from the iterrowmapmany
method ?
Please help 🙏
Another reason to not convert to a Record : using nameduple as input for the rowmapper, unleash us from any order binding... accessing property in the mapper will be by name and not by position.
So no matter what are the order of the field in the input source, the mapper will work as expected, even if the field order changed between two input that have the same output target.
https://petl.readthedocs.io/en/latest/util.html#petl.util.base.records
"a record is a hybrid object supporting all possible ways of accessing values."
The examples for rowmapmany
demonstrate this:
https://petl.readthedocs.io/en/latest/transform.html#petl.transform.maps.rowmapmany
`
def rowgenerator(row):
... transmf = {'male': 'M', 'female': 'F'}
... yield [row[0], 'gender',
... transmf[row['sex']] if row['sex'] in transmf else None]
... yield [row[0], 'age_months', row.age * 12]
... yield [row[0], 'bmi', row.height / row.weight ** 2]
...
table2 = etl.rowmapmany(table1, rowgenerator,
... header=['subject_id', 'variable', 'value'])
`