delta-io/delta

[Feature Request] Snapshot.scan() support return RemoveFile action

horizonzy opened this issue · 7 comments

Feature request

I want to check the RemoveFile action tombstone to delete the physical file.

Hi @horizonzy - are you referring to https://github.com/delta-io/connectors/blob/master/standalone/src/main/java/io/delta/standalone/Snapshot.java#L39 ?

There is no Snapshot::scan method that I can see in this delta-io/delta repo.

It seems like you want to perform a VACUUM-like command while using delta-standalone?

Just curious: Are you using any higher-level connector (e.g. Flink, Presto) or raw delta-standalone?

What system will you use to perform the deletes?

It seems like you want to perform a VACUUM-like command while using delta-standalone?

Yes.

Just curious: Are you using any higher-level connector (e.g. Flink, Presto) or raw delta-standalone?

raw delta-standalone

What system will you use to perform the deletes?

Kinds of cloud storage, like S3, Azure...

@horizonzy thanks for your response. I think a decision we would need to make here is
a) exactly what this API should look like. i.e. get all RemoveFiles ... or only those that are expired

but more importantly

b) who should be responsible for doing this VACUUM? i.e. should delta-standalone be the one to do it? not you, the user/connector?

VACUUM is more than "remove all the expired tombstones". As you can see in delta-core VacuumCommand.scala, it also does a recursive list of the delta table, and THEN compares it do the tombstones and addFiles in the latest table state.

Would you rather we add a vacuum API to delta-standalone?

Would you rather we add a vacuum API to delta-standalone?

This is the best way, we can trigger vacuum in delta-standalone to delete the no association files and expired tombstones files.

Any progress?

Hi @horizonzy, thanks for following up on this. Sorry, I was sick and then on vacation the past two weeks.

I've created this issue in the delta-connectors repository. delta-io/connectors#484.

Would you be interested in implementing this feature? You can use the VACUUM implemention in this delta-core repo as an example.

Closing due to inactivity, and also because I have created delta-io/connectors#484