CyberShadow/btdu

Feature request: write file paths chosen from UI into a log

pwaller opened this issue · 23 comments

Hi!

First, this is an excellent implementation of an excellent concept. It has made browsing my old archive disks so much easier and more efficient, so thanks!

I note you have a feature to delete things on the fly from the UI. That's nice, however, I would rather batch up my deletes, and have a log of what I have deleted.

To that end, it would be really useful to have a key which 'marks' a file, so and records those paths into a log, so that they may be filtered and deleted through other tooling.

Another issue here are shared extents - what to do if some extents belong to multiple snapshots or reflink copies. It would be quite useful to me to know if there are files which need to be deleted together to liberate the space relating to them in practice.

Thanks for considering the above, all the best.

Hi & thank you,

This seems like a rather specific feature request, so I think it would make sense to collect more information about potential use cases and demand before going forward with an implementation.

For now I suggest patching this feature into your copy of btdu. You could try something like the following patch:

diff --git a/source/btdu/browser.d b/source/btdu/browser.d
index 730fecf..56f8a3b 100644
--- a/source/btdu/browser.d
+++ b/source/btdu/browser.d
@@ -1082,6 +1082,15 @@ struct Browser
 						}
 						mode = Mode.deleteConfirm;
 						break;
+					case 'l':
+						if (selection)
+						{
+							import std.stdio : File;
+							import std.path : expandTilde;
+							File("~/btdu-log.txt".expandTilde, "ab").writeln(getFullPath(selection));
+							showMessage("Logged selected item.");
+						}
+						break;
 					default:
 						// TODO: show message
 						break;

Makes sense, thanks for the patch! (Feel free to close this issue at any point for any reason, I won't be offended).

Some thoughts on this topic:

  • We can add the ability to mark or unmark nodes in the browser UI. The mark persists when traveling to other nodes.
  • btdu could show aggregated data about all marked nodes, such as shared usage, or combined unique usage (how much free space would be freed by deleting the marked nodes). Need to check if we can perform these calculations with the data we have on hand.
  • Deletion would act on the marked nodes if there are any.
  • We have an --export feature which saves a JSON file of collected data. We could bind this to a key in the UI.
  • The exported data would gain a field which would indicate if the node is marked or not.
  • A jq script could then be used to convert the exported JSON into a list of paths, like btdu-log.txt above.

There is some work-in-progress on the above here:

https://github.com/CyberShadow/btdu/commits/marks

@pwaller Hi, I realize this is a year late but if you're still interested could you please try https://github.com/CyberShadow/btdu/commits/marks (builds here). Workflow is as follows:

  • Run btdu as normal (though --expert mode recommended)
  • Mark some items with the space bar
  • Press M to view all marked items
  • Press O to export to a JSON file (marked nodes will now have a "mark": false or "mark": true field
  • Or press D Y to delete all marked items (maybe after reviewing them in M).

This looks really neat.

I did get a segfault:, running 6dfac85.

$ time sudo btdu --min-resolution=10M --headless -o test.btdu /mnt/luksroot-2305/
Collected 237678 samples (achieving a resolution of ~10.000 MiB) in 5 secs, 116 ms, 201 μs, and 6 hnsecs.
Exporting results...
Exported results to: test.btdu
Segmentation fault

I tried attaching GDB, but that ran for 2 minutes and then ran out of memory before even starting btdu, so I'm not sure how to get a stack trace for the segfault.

Regarding the marks, bugs:

  • For a moment it said +- nan GiB in the title where it would show the total amount marked.
  • The total on that same display seems to be bogus. I marked one directory and then unmarked a child directory, and the resulting value was much larger than my disk.

Features wanted:

  • It would be nice to sort the marks by size. Though I can see how additive and subtractitive marks could compilcate things.
  • It would be nice to know the total 'unmarked' amount, so I know how much I have left to go through which isn't accounted for.

Thanks!

I tried attaching GDB, but that ran for 2 minutes and then ran out of memory before even starting btdu, so I'm not sure how to get a stack trace for the segfault.

You can avoid this by running gdb with -iex 'set demangle-style none'. (The new UI is template-heavy and causes the demangled form of symbols to be very long.)

Aha, backtrace for you:

0x00000000004bfe8c in _D4btdu2ui7browser7Browser11__fieldDtorMFZv ()
(gdb) bt
#0  0x00000000004bfe8c in _D4btdu2ui7browser7Browser11__fieldDtorMFZv ()
#1  0x0000000000430416 in _D4btdu4main7programFS2ae5utils6funopt__T11_OptionImplVEQBiQBiQBf10OptionTypei2TAyaVQea45_5061746820746f2074686520726f6f74206f66207468652066696c6573797374656d20746f20616e616c797a65Vai0VQEbnVQEgnZQFySQGvQGvQGs__TQGoVQGei1TkVQFna84_4e756d626572206f662073616d706c696e672073756270726f6365737365730a202864656661756c74206973206e756d626572206f66206c6f676963616c204350557320666f7220746869732073797374656d29Vai106VQMna1_4eVQMwnZQOoSQPlQPlQPi__TQPeVQOui1TkVQOda34_52616e646f6d2073656564207573656420746f2063686f6f73652073616d706c6573Vai0VQRfnVQRknZQTcSQTzQTzQTw__TQTsVQTii0TbVQSra12_68696464656e4f7074696f6eVai0VQUbnVQUgnZQVySQWvQWvQWs__TQWoVQWei0TbVQVna44_4d65617375726520706879736963616c2073706163652028696e7374656164206f66206c6f676963616c292eVai112VQZlnVQZqnZQBBiSQBCgQBChQBCf__TQBCcVQBBti0TbVQBBda67_457870657274206d6f64653a20636f6c6c65637420616e642073686f77206164646974696f6e616c206d6574726963732e0a55736573206d6f7265206d656d6f72792eVai120VQBGwnVQBHcnZQBIvQPuSQBJwQBJxQBJv__TQBJsVQBJji1TQBIrVQBIwa66_536574205549207265667265736820696e74657276616c2e0a53706563696679203020746f2072656672657368206173206661737420617320706f737369626c652eVai105VQBOna8_4455524154494f4eVQBPla8_696e74657276616cZQBRwSQBSuQBSvQBSt__TQBSqVQBShi0TbVQBRra44_52756e20776974686f7574206c61756e6368696e672074686520726573756c742062726f777365722055492eVai0VQBVonVQBVunZQBXnSQBYlQBYmQBYk__TQBYhVQBXyi1TmVQBXia32_53746f7020616674657220636f6c6c656374696e67204e2073616d706c65732eVai110VQCAja1_4eVQCAtnZQCCmSQCDkQCDlQCDj__TQCDgVQCCxi1TQCCfVQCCka37_53746f702061667465722072756e6e696e6720666f722074686973206475726174696f6e2eVai0VQCFta8_4455524154494f4eVQCGrnZQCIkSQCJiQCJjQCJh__TQCJeVQCIvi1TQCIdVQCIia58_53746f7020616674657220616368696576696e672074686973207265736f6c7574696f6e2028652e672e2022314d4222206f722022312522292eVai0VQCNha4_53495a45VQCNxnZQCPqQBWpSQCQsQCQtQCQr__TQCQoVQCQfi1TQCPnVQCPsa56_4f6e20657869742c206578706f72742074686520636f6c6c656374656420726573756c747320746f2074686520676976656e2066696c652eVai111VQCUpa4_50415448VQCVfa6_6578706f7274ZQCXmSQCYkQCYlQCYj__TQCYgVQCXxi0TbVQCXha77_4f6e20657869742c206578706f727420726570726573656e7465642073697a6520657374696d6174657320696e202764752720666f726d617420746f207374616e64617264206f75747075742eVai0VQDDsnVQDDynZQDFrSQDGpQDGqQDGo__TQDGlVQDGci0TbVQDFma105_496e7374656164206f6620616e616c797a696e6720612062747266732066696c6573797374656d2c20726561642070726576696f75736c7920636f6c6c656374656420726573756c74732073617665642077697468202d2d6578706f72742066726f6d20504154482eVai102VQDOenVQDOka6_696d706f7274ZQDQrZv ()
#2  0x000000000042f570 in _D2ae5utils6funopt__TQkS_D4btdu4main7programFSQBsQBsQBp__T11_OptionImplVEQCtQCtQCq10OptionTypei2TAyaVQea45_5061746820746f2074686520726f6f74206f66207468652066696c6573797374656d20746f20616e616c797a65Vai0VQEbnVQEgnZQFySQIgQIgQId__TQGoVQGei1TkVQFna84_4e756d626572206f662073616d706c696e672073756270726f6365737365730a202864656661756c74206973206e756d626572206f66206c6f676963616c204350557320666f7220746869732073797374656d29Vai106VQMna1_4eVQMwnZQOoSQQwQQwQQt__TQPeVQOui1TkVQOda34_52616e646f6d2073656564207573656420746f2063686f6f73652073616d706c6573Vai0VQRfnVQRknZQTcSQVkQVkQVh__TQTsVQTii0TbVQSra12_68696464656e4f7074696f6eVai0VQUbnVQUgnZQVySQYgQYgQYd__TQWoVQWei0TbVQVna44_4d65617375726520706879736963616c2073706163652028696e7374656164206f66206c6f676963616c292eVai112VQZlnVQZqnZQBBiSQBDrQBDsQBDq__TQBCcVQBBti0TbVQBBda67_457870657274206d6f64653a20636f6c6c65637420616e642073686f77206164646974696f6e616c206d6574726963732e0a55736573206d6f7265206d656d6f72792eVai120VQBGwnVQBHcnZQBIvQPuSQBLhQBLiQBLg__TQBJsVQBJji1TQBIrVQBIwa66_536574205549207265667265736820696e74657276616c2e0a53706563696679203020746f2072656672657368206173206661737420617320706f737369626c652eVai105VQBOna8_4455524154494f4eVQBPla8_696e74657276616cZQBRwSQBUfQBUgQBUe__TQBSqVQBShi0TbVQBRra44_52756e20776974686f7574206c61756e6368696e672074686520726573756c742062726f777365722055492eVai0VQBVonVQBVunZQBXnSQBZwQBZxQBZv__TQBYhVQBXyi1TmVQBXia32_53746f7020616674657220636f6c6c656374696e67204e2073616d706c65732eVai110VQCAja1_4eVQCAtnZQCCmSQCEvQCEwQCEu__TQCDgVQCCxi1TQCCfVQCCka37_53746f702061667465722072756e6e696e6720666f722074686973206475726174696f6e2eVai0VQCFta8_4455524154494f4eVQCGrnZQCIkSQCKtQCKuQCKs__TQCJeVQCIvi1TQCIdVQCIia58_53746f7020616674657220616368696576696e672074686973207265736f6c7574696f6e2028652e672e2022314d4222206f722022312522292eVai0VQCNha4_53495a45VQCNxnZQCPqQBWpSQCSdQCSeQCSc__TQCQoVQCQfi1TQCPnVQCPsa56_4f6e20657869742c206578706f72742074686520636f6c6c656374656420726573756c747320746f2074686520676976656e2066696c652eVai111VQCUpa4_50415448VQCVfa6_6578706f7274ZQCXmSQCZvQCZwQCZu__TQCYgVQCXxi0TbVQCXha77_4f6e20657869742c206578706f727420726570726573656e7465642073697a6520657374696d6174657320696e202764752720666f726d617420746f207374616e64617264206f75747075742eVai0VQDDsnVQDDynZQDFrSQDIaQDIbQDHz__TQDGlVQDGci0TbVQDFma105_496e7374656164206f6620616e616c797a696e6720612062747266732066696c6573797374656d2c20726561642070726576696f75736c7920636f6c6c656374656420726573756c74732073617665642077697468202d2d6578706f72742066726f6d20504154482eVai102VQDOenVQDOka6_696d706f7274ZQDQrZvVSQDTdQDTeQDTc12FunOptConfigS1nS_DQDTlQDTk8usageFunFQDRkZvZQDUzFAQDRxZv ()
#3  0x00000000004dc17c in _Dmain ()
#4  0x00000000005e6f5c in _D2rt6dmain212_d_run_main2UAAamPUQgZiZ6runAllMFZv ()

Thanks, should be fixed in d4c4d8b.

That was what, 10s? Fastest bugfix in recorded history ;-)

Confirmed no more segfault.

  • For a moment it said +- nan GiB in the title where it would show the total amount marked.

  • The total on that same display seems to be bogus. I marked one directory and then unmarked a child directory, and the resulting value was much larger than my disk.

Should be fixed, thanks for testing.

  • It would be nice to sort the marks by size. Though I can see how additive and subtractitive marks could compilcate things.

Marks screen now respects and allows changing the sort mode (but still sorts topologically first).

  • It would be nice to know the total 'unmarked' amount, so I know how much I have left to go through which isn't accounted for.

By this, do you mean...

  • Amount of represented space in nodes that are not marked?
  • Amount of space that is not covered exclusively by all nodes that are marked?
  • Amount of space that is covered exclusively by all nodes that are not marked?

I added a * key, which you can press at the root node or marks screen to recursively invert all marks (and it's symmetric, so it can be pressed again to go back). In a way that allows you to see stats for everything that was not marked, but maybe that's not what you actually need (it allows answering the first and third question).

I had a play with it, looking good, inverting the marks is a neat trick. I've not had to consider exclusivity too deeply though since I don't have a lot of reflinks on my testing filesystem.

That said, I think being able to tell how much is not covered exclusively by the marked set sounds also like a useful value to consider.

OK... and I'm guessing, you would like that to only show space used in files, not in unallocated or unused space?

That would be a good guess. I don't think I generally need an estimate of unallocated/unused space, for that the single value of df -h is probably useful enough, unless I'm missing something interesting.

So, currently btdu does not really meaningfully distinguish between space that is "in files", space used for other purposes, or space that's not used (of which there are three flavors, UNUSED/UNALLOCATED/SLACK). We'd have to look at every kind of allocation that btdu can identify and decide whether it's "in files", and therefore something that should be tallied up in the "remaining space".

Some of these are a bit hairy, e.g. what about unreachable extents?

Also, what happens if a node which is not "in files" is marked? There is no meaningful way to show the "remaining" space in this case, as it might be negative or just a meaningless value.

Due to such questions, I am probably going to have to decline that particular feature request...

(I don't mind the feature request being declined, sounds like reasonable concerns!)

A lot of intersting stuff to unpack / try and wrap my head around in what you're saying if you're happy to help and want to continue the discourse (you can assume I'm a bit naive here).

  • What's the issue with an "unreachable extent"? If it's unreachable is it not by definition "not in a file"?

  • Hmm, why would I want to mark a non-file, I ask myself. Maybe there are usecases I hadn't considered (such as possibly using this information to decide to rebalance or something?) but generally if I'm using btdu it's mostly because I want to know which files I need to deal with, so I might expect those to have some priority in the UI, and therefore maybe wouldn't expect to be able to mark a non-file. Currently though I don't know for sure which items we're talking about, I'm not sure I've yet encountered them (are we just talking about the handful of them at the top level, or something else?)

If querying the type requires a lot of extra syscalls then I can understand a hesitance to add the future.

I wonder if there is a way to get the effect I'm after by considering something along the lines of "Total amount used in files reported by df -h minus the marked amount", which wouldn't require additional overhead. If what I'm asking for makes sense.

  • What's the issue with an "unreachable extent"? If it's unreachable is it not by definition "not in a file"?

It is still attributed to a specific file path.

  • Hmm, why would I want to mark a non-file, I ask myself.
  • To check space exclusively used by more than one non-file item.
  • If only some nodes are markable, then that makes the marking UX less intuitive overall.
  • Currently though I don't know for sure which items we're talking about, I'm not sure I've yet encountered them (are we just talking about the handful of them at the top level, or something else?)

Yes, fer example, pressing * at the top level to invert everything would no longer be possible or would have to be done differently somehow.

Total amount used in files reported by df -h

IIRC, df -h includes metadata, not just data in files. It certainly includes unreachable extents as well.

👍

Except for the above, this is now in master; thanks for the feedback :)

Thanks for the feature, nice one! :)