darakian/ddh

Excessive CPU and memory use + thread panics

pkolaczk opened this issue · 11 comments

Tried this on my home directory with ~1.5 million files.

/usr/bin/time -v target/release/ddh ~
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
thread '<unnamed>' panicked at 'Error reading Symlink Metadata: Os { code: 2, kind: NotFound, message: "No such file or directory" }', src/lib.rs:285:25
...

There seems to be no way to turn off symlink following, so I guess it might have escaped from my home dir and tried to scan the whole filesystem. But anyways, the CPU and memory usage was extremely high. System time usage is quite unusal assuming this was I/O bound workload:

^CCommand terminated by signal 2
	Command being timed: "target/release/ddh /home/pkolaczk"
	User time (seconds): 50.08
	System time (seconds): 983.47
	Percent of CPU this job got: 776%
	Elapsed (wall clock) time (h:mm:ss or m:ss): 2:13.10
	Average shared text size (kbytes): 0
	Average unshared data size (kbytes): 0
	Average stack size (kbytes): 0
	Average total size (kbytes): 0
	Maximum resident set size (kbytes): 18724728
	Average resident set size (kbytes): 0
	Major (requiring I/O) page faults: 0
	Minor (reclaiming a frame) page faults: 4688422
	Voluntary context switches: 27301
	Involuntary context switches: 150856
	Swaps: 0
	File system inputs: 103592
	File system outputs: 0
	Socket messages sent: 0
	Socket messages received: 0
	Signals delivered: 0
	Page size (bytes): 4096

ddh consumed 18 GBs, while other programs like
fdupes, jdupes, rmfind, fclones need about 150-500 MB.

Interesting. So, ddh shouldn't be following symlinks at all. That error comes from this bit of code

let good_entries: Vec<_> = read_dir_results
                    .filter(|x| x.is_ok())
                    .map(|x| x.unwrap())
                    .collect();
let (files, dirs): (Vec<&DirEntry>, Vec<&DirEntry>) = good_entries.par_iter().partition(|&x|
                        x.path()
                        .as_path()
                        .symlink_metadata()
                        .expect("Error reading Symlink Metadata")
                        .file_type()
                        .is_file()
                        );

Where directory entries are sorted as files and dirs prior to hashing. The symlink term is used to map the rust file api. I can definitely make a patch to better handle that error, but would you be willing to test some changes on a branch? I'd like to dig into what the root cause is if possible.

The debug-thread-file-panics branch might resolve your issue due to a change in metadata access. If you can get a reproducible panic can you share what the type of filesystem entry it is?

Panics are not happening on the debug-thread-file-panics branch, but excessive memory use is still present. It consumes tens of gigabytes very quickly, before even printing anything. Is there any progress bar, to at least know how far it went?

Ah, I was wrong: there was a panic, but I needed to wait longer:

thread '<unnamed>' panicked at 'Error reading path length: Os { code: 40, kind: Other, message: "Too many levels of symbolic links" }', src/lib.rs:270:17
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

I was able to reproduce the error locally and made a new commit on debug-thread-file-panics which resolves the bug for me. Would you mind giving it another spin?

@pkolaczk any update?

I was busy, I'll try to look at this on weekend.

I've merged the debug-thread-file-panics branch into master as it corrects a bug. When you have a chance please let me know if that resolves your bug or if there is yet another :)

Any update @pkolaczk?

Closing. I believe the issue is resolved and the reporter has not followed up.