zopefoundation/ZODB

Savepoint commit logging is misleading (and potentially expensive)

jamadden opened this issue · 3 comments

When savepoints are in use, ZODB.Connection logs something like this (notice that's 80GB):

DEBUG [ZODB.Connection]] Committing savepoints of size 86970105856

ZODB/src/ZODB/Connection.py

Lines 1026 to 1033 in 2f8cc67

def _commit_savepoint(self, transaction):
"""Commit all changes made in savepoints and begin 2-phase commit
"""
src = self._savepoint_storage
self._storage = self._normal_storage
self._savepoint_storage = None
try:
self._log.debug("Committing savepoints of size %s", src.getSize())

Where src is a TmpStore. TmpStore copies the getSize() method from the underlying original storage, so what's being logged isn't the size of the savepoint, it's the size of the entire database.

ZODB/src/ZODB/Connection.py

Lines 1131 to 1141 in 2f8cc67

class TmpStore(object):
"""A storage-like thing to support savepoints."""
def __init__(self, storage):
self._storage = storage
for method in (
'getName', 'new_oid', 'getSize', 'sortKey',
'isReadOnly'
):
setattr(self, method, getattr(storage, method))

On RelStorage, in order to provide a consistent, up-to-date answer (and be usable no matter what else the state of the storage is) this always opens a new connection to the database, which can be slow.

https://github.com/zodb/relstorage/blob/2c8997d6c862de7e6a4d9f2a8d368a5471e93ff7/src/relstorage/adapters/postgresql/stats.py#L45-L50

Perhaps what we want to log is len(src)? That's the number of entries in the TmpStore.

Why not src.position ?

BTW, IStorage defines both __len__ & getSize so it's not consistent that TmpStore redirects only 1 of them to the underlying storage.

src.position includes extra header data for each object and I think in the event of multiple savepoints that write the same OID more than once it also includes duplicates, but I haven't looked that closely. Either way seems like an improvement.