a workflow to remove a repository after ingestion?
gasche opened this issue · 1 comments
It has happened to me several times now that I ingest a large set of repositories, I look at the data, and I notice oddities caused by a repository that should not have been there in the first place.
Is there a workflow to remove a repository from the database, and rerun the plotting?
Currently I don't know of such a workflow, so I manually remove the repository, delete the database, and restart ingestion from scratch. This is ok, but it can be annoying when ingestion is slow (several minutes on large repository sets).
I thought about running sqlite
on the database and doing a DELETE
operation on all raw_commits coming from this directory. However, if I understand correctly, the plotting data comes from the authors
table that I would need to update with new aggregates, and I don't know how to do it easily.
Assuming this does not currently exist, my proposal would be to have a command fornalder reanalyze foo.db
that would drop the current authors
table and recompute it from the raw_commits
table as it currently exists.
(Another option of course would be to have a fornalder repo-remove foo.db repo.git
command that removes a repository from a table, instead of adding it as fornalder ingest foo.db repo.git
does. But that sounds like more work.)
The authors
table gets derived from raw_commits
every run, so it should be safe to poke around in the latter. See:
Lines 204 to 233 in 43f3d48
I intended to re-run postprocess()
only if something changed (e.g. store a hash of the meta file provided, clear a flag whenever a fornalder
command like ingest
changes the database), but it wasn't too slow in practice, so I didn't feel the need to optimize it, at least not yet. I left a reminder here:
Line 217 in 43f3d48
Anyway, the bottom line is that manually editing raw_commits
is safe, for now.
I like the idea of having CLI for common database editing (like removing a repo, or maybe a date range). Let's keep this issue open for repo-remove
(or remove-repo
?).