A file indexer and search software.
Simply run:
mvn clean install
This will create the CLI client jar.
Simply run HslServer.kt
.
From a terminal (won't work from IntelliJ):
java -jar haystacker-shell\target\haystacker-shell-1.0-SNAPSHOT.jar
Or in debug mode:
java.exe -agentlib:jdwp=transport=dt_socket,address=5005,server=y,suspend=n -jar haystacker-shell\target\haystacker-shell-1.0-SNAPSHOT.jar
Haystacker uses JNA's FileMonitor to receive notifications from the operating system about changes to the file system. It is possible for notifications to be lost in case many changes are performed in a short time, due to the operating system's own buffer overflowing, or due to other reasons.
To update the index manually in such case, this shell command can be used:
haystacker> add-to-index D:\\my-data
This way, the contents of D:\\my-data
will be re-indexed.
Then type help
for help. Example usage:
; create an empty index at C:\\test-index on the server
haystacker> create-index C:\\my-index
; set the current index to C:\\test-index (will be used by future commands)
haystacker> set-index C:\\my-index
; adds directory D:\\my-data to the index, to make it searchable
haystacker> add-to-index D:\\my-data
; searches the index using the provided HSL (Haystacker Search Language) query
haystacker> search "name = myfile.txt"
Total results: 1
Returned results: 1
Items:
Path: D:\my-data\myfile.txt
Took: 23 ms
Haystacker supports a simple but powerful language to search for indexed files.
Search for files with the given filename:
name = "file name with spaces.txt"
Quotes can be omitted if there are no spaces:
name = filename.txt
The key name
doesn't just look at the filename, but also the full path.
For example, if a file is indexed at C:\directory\data\archive\file.txt
, this query will match it and all other files in that directory:
name = archive
To return all indexed files larger than 100 megabytes:
size > 100mb
The key size
supports operators >
, >=
, <
, <=
and =
.
Supported data size suffixes are none or b
for bytes, kb
for kilobytes, mb
for megabytes, gb
for gigabytes or tb
for terabytes.
The keys that can be used are created
and last_modified
.
For example, to search for all files that were last modified before January 5th 2021 at 10:00:00 UTC:
last_modified < '2020-01-05T10:00:00Z'
Note that the single quotes around the value are allowed but not necessary.
The above specifies a date-time with UTC offset (Z
). Offsets can also be specified numerically, for example 2020-01-05T10:00:00+04:00
.
Additionally, it's possible to also just specify the date: 2020-01-05
. Note that this is interpreted as 2020-01-05T00:00:00Z
(start of day, in UTC).
Operators >
, >=
, <
, <=
and =
are supported.
The real power comes with being able to combine queries, using AND
and OR
. Note that AND
has precedence over OR
, but this can be overridden with parentheses.
Examples:
last_modified < 2020-01-05T10:00:00Z AND name = "file.txt"
last_modified < 2020-01-05T10:00:00Z OR name = "file.txt"
created < '2020-01-05' AND (last_modified > '2020-01-10' OR name = "file.txt")