zopefoundation/DocumentTemplate

`dtml-in` batching accesses the complete list

Closed this issue · 6 comments

DocumentTemplate 4.0

"zopefoundation/Products.ZCatalog#155" reports that Products.ZCatalog's manage_catalogView takes excessive times for large catalogs.
"zopefoundation/Products.ZCatalog#155 (comment)" contains corresponding profile results. They suggest that dtml-in accesses each element in the list even when called with batching parameters.

The behavior is caused by the list conversion in "

sequence = list(sequence)
".
It is there to support iterators in addition to true sequences but of course it destroys lazy access to a sequence (highly wished for batching).

I do not yet know how to distinguish safely between a true sequence (not requiring a list conversion) and something requiring a list conversion (e.g. a general iterator).

@sauzher found out that the list conversion has been introduced by @hannosch in "5716eb8". Unfortunately, the prevents lazy batching.

For what I see, there are a lot of places where sequence gets unpacked: to-list conversions and key-lookup access:

List conversions:

Key lookups:

Key lookups I think are unnecessary because all the try-execeps blocks could be rewritten as simple if-then-else checks on start/end values.

List conversions perhaps could be refactored to convert in list only the firsts "batch_size * Page number" elements.

my 2 cent

I found out that dtml-in (as it is now) does not support arbitrary iterators:

uses not sequence and this will be False for general iterators even if the iterator does not give any values. The list conversion comes too late to remedy this.

The original implementation tried hard to avoid the computation of the sequence length (by e.g. using try: sequence[end] except: to check whether there is an element at end) because it wanted support for lazy batching and for some lazy sequences length computation is expensive. Of course, converting the sequence into a list destroys this aim (at it computes the length as a side effect).

I propose to retain the support for lazy batching as follows:
We check whether the sequence has __len__ and __getitem__ but lacks keys; this first indicates that it may be designed as a Sequence, the second that it likely is not designed as a Mapping. We use the sequence as is if the check succeeds; otherwise we replace it by its list conversion.
@hannosch What do you think?