deadc0de6/catcli

find results only show last directory of file location

deathtrip opened this issue · 4 comments

Currently when using the find command, the results show only the last directory of the path in which the file is located.
For example if we have /var/log/pacman.log, the find results will show only: log/pacman.log
This behaviour can make it difficult to locate files in databases with complex directory structure.

I don't remember if it's also the case with results which are just directories, as my database is currently on another machine.

Well it all depends at which level you index your directory. The idea of index is to index files that are under the pointed directory. It won't store the parent information. Also remember that this information can vary between different hosts (same disk is mounted on /media/mnt on one host and on /mnt/disk on another for example).

So if you index /var/log with the command catcli index logs /var/log then catcli will index under the name logs files that are under the directory log and thus searching for pacman.log will result in log/pacman.log.

$ catcli index logs /var/log
$ catcli find pacman.log
log/pacman.log [size:1.4M, storage:logs]

If you want to keep the information on the parent directory of the indexed storage, either add it as a name, or as a metadata:

# parent in name (information is displayed in storage
$ catcli index var /var/log
$ catcli find pacman.log
log/pacman.log [size:1.4M, storage:var]

# parent in metadata
$ catcli index --meta="parent:/var" logs /var/log
$ catcli ls
top
- storage: logs (free:XXG, total:YYG, date:XX) (parent:/var)

Well, the /var/log example may not have been the best one to describe the issue.
Let's say i have /a/b/c/d/e/myfile directory hierarchy, and i index everything starting at the "a" directory.
When i then search for myfile, the results will only show "e/myfile", omitting the other indexed directories in the database.

Ok this was actually a bug :-s Thanks for reporting it!

To speed up things, I'm storing the relative path of each indexed file to its storage and messed that up resulting in having always only the direct parent shown instead of the entire hierarchy path (thus the e/myfile instead of the expected b/c/d/e/myfile).

This is fixed now but catalogs indexed before the fix still have an incorrect relative path. This however only affects find so I added a small -P --parent switch to cirumvent that. Instead of displaying the relative path from the stored field, it will re-calculate it.

$ mkdir -p /tmp/a/b/c/d/
$ touch /tmp/a/b/c/d/somefile
$ python3 -m catcli.catcli index test /tmp/a
$ python3 -m catcli.catcli find somefile -P
b/c/d/somefile [size:0, storage:test]

As said, the relative path is only used for find and thus won't affect anything else. But if you want to fix your catalog you will have to reindex everything unfortunately (an update won't do it since it will only reindex files that have a changed mtime). Otherwise simply use the -P switch.

Sorry again for that and thanks for reporting it!

I have released the new version (0.5.7) on pypi.
Thanks again for your help!