petl todataframe() multiply run lambda function in addfield
ikanashov opened this issue · 1 comments
ikanashov commented
Minimal, reproducible code sample, a copy-pastable example if possible
import petl
z = 0
def tost(sql):
global z
z += 1
print('z=', z)
return str(z)
table = [['md5', 'sql'], [1, 'select from *'], [2, 'select from tt'], [3, 'select from ddd']]
>>> petl.wrap(table)
+-----+-------------------+
| md5 | sql |
+=====+===================+
| 1 | 'select from *' |
+-----+-------------------+
| 2 | 'select from tt' |
+-----+-------------------+
| 3 | 'select from ddd' |
+-----+-------------------+
>>> petl.wrap(table).addfield('tables', lambda row: tost(row['sql']))
z= 1
z= 2
z= 3
+-----+-------------------+--------+
| md5 | sql | tables |
+=====+===================+========+
| 1 | 'select from *' | '1' |
+-----+-------------------+--------+
| 2 | 'select from tt' | '2' |
+-----+-------------------+--------+
| 3 | 'select from ddd' | '3' |
+-----+-------------------+--------+
z = 0
>>> petl.wrap(table).addfield('tables', lambda row: tost(row['sql'])).todataframe()
z= 1
z= 2
z= 3
z= 4
z= 5
z= 6
z= 7
z= 8
z= 9
md5 sql tables
0 1 select from * 7
1 2 select from tt 8
2 3 select from ddd 9
z = 0
>>> petl.wrap(table).addfield('tables', lambda row: tost(row['sql'])).tupleoftuples()
z= 1
z= 2
z= 3
(('md5', 'sql', 'tables'), (1, 'select from *', '1'), (2, 'select from tt', '2'), (3, 'select from ddd', '3'))
Problem description
When convert petl.Table to pandas.dataFrame lambda function in addfield run three times
Version and installation information
- petl version 1.7.4
- Version of Python interpreter - 3.9.9
- Operating system Linux
- How petl was installed poetry in docker container
dnicolodi commented
The issue is caused by the implementation of todataframe()
calling list()
on the table. The list
constructor in turns calls __len__()
(twice, the second indirectly through __length_hint__()
) and the implementation of __len__()
for Petl objects is to iterate the table to get its length.
The issue is solved avoiding to call list()
from todataframe()
or to call list(iter(table))
instead. I'll prepare a PR later.