Unit and integration tests for the vacuum command
rtyler opened this issue · 2 comments
rtyler commented
Our preliminary support for vacuum would benefit from some more unit and integration tests to ensure that it's only deleting the right files whenever the vacuum is invoked.
danmx commented
Referencing @mrk-its comment from #97 so it's not lost.
@rtyler @fvaleye It looks like there are still two serious issues with vacuum implementation:
* vacuum lists all files in dataset using `StorageBackend.list_objs`. The problem is this function returns all files (including these in subdirectories) on `s3` backend and `gcs` backend (althrough I'm not sure about gcs). On `file` and `azure` backends this function lists only first-level files (without recursing to subdirectories). * vacuum ignores files not referenced by delta log at all (so not included on DeltaTableState.files() and DeltaTableState.all_tombstones() lists).
ion-elgreco commented
This should be well tested at this point, also with introduction of VACUUM START, VACUUM END, I added a couple tests on the Python side