Fuse-rust
What is Fuse?
Fuse is a super lightweight library which provides a simple way to do fuzzy searching.
Fuse-RS is a port of https://github.com/krisk/fuse-swift written purely in rust.
Usage
Initializing
The first step is to create a fuse object, with the necessary parameters. Fuse::default, returns the following parameters.
Fuse::default() = Fuse{
location: 0, // Approx where to start looking for the pattern
distance: 100, // Maximum distance the score should scale to
threshold: 0.6, // A threshold for guess work
max_pattern_length: 32, // max valid pattern length
is_case_sensitive: false,
tokenize: false, // the input search text should be tokenized
}
Example 1
Simple search.
cargo run --example simple-search
let fuse = Fuse::default();
let text = "Old Man's War";
let search_text = "od mn war";
let result = fuse.search_text_in_string(search_text, text);
assert_eq!(result, Some(ScoreResult{
score: 0.4444444444444444,
ranges: vec!((0..1), (2..7), (9..13)),
}), "Simple search returned incorrect results");
Example 2
Search over a string iterable.
cargo run --example iterable-search
let fuse = Fuse::default();
let books = [
"The Silmarillion",
"The Lock Artist",
"The Lost Symbol"
];
// Improve performance by creating the pattern before hand.
let search_pattern = fuse.create_pattern("Te silm");
let results = fuse.search_text_in_iterable("Te silm", books.iter());
assert_eq!(results, vec!(
SearchResult{
index: 0,
score: 0.14285714285714285,
ranges: vec!((0..1), (2..8), (10..14)),
},
SearchResult{
index: 2,
score: 0.49857142857142855,
ranges: vec!((0..1), (2..5), (6..10), (11..12), (14..15)),
},
SearchResult{
index: 1,
score: 0.5714285714285714,
ranges: vec!((0..1), (2..5), (8..9), (11..15)),
},
), "Iterable search returned incorrect results");
Example 3
Search over a list of items implementing the Fuseable trait.
cargo run --example fuseable-search
struct Book<'a> {
title: &'a str,
author: &'a str,
}
impl Fuseable for Book<'_>{
fn properties(&self) -> Vec<FuseProperty> {
return vec!(
FuseProperty{value: String::from("title"), weight: 0.3},
FuseProperty{value: String::from("author"), weight: 0.7},
)
}
fn lookup(&self, key: &str) -> Option<&str> {
return match key {
"title" => Some(self.title),
"author" => Some(self.author),
_ => None
}
}
}
fn main() {
let books = [
Book{author: "John X", title: "Old Man's War fiction"},
Book{author: "P.D. Mans", title: "Right Ho Jeeves"},
];
let fuse = Fuse::default();
let results = fuse.search_text_in_fuse_list("man", &books);
assert_eq!(results, vec!(
FusableSearchResult{
index: 1,
score: 0.015000000000000003,
results: vec!(FResult{
value: String::from("author"),
score: 0.015000000000000003,
ranges: vec!((5..8)),
}),
},
FusableSearchResult{
index: 0,
score: 0.027999999999999997,
results: vec!(FResult{
value: String::from("title"),
score: 0.027999999999999997,
ranges: vec!((4..7)),
})
}
), "Fuseable Search returned incorrect results");
}
Furthermore, you can add a chunk size to run this over multiple threads.
Currently, the chunk size is one, so the chunks of size 1 will be run on seperate threads.
fuse.search_text_in_fuse_list_with_chunk_size("man", &books, 1, |x: FuseableSearchResult| {
dbg!(x);
});
Example 5
You can look into examples/chunk-search.rs for the source code, and can run the same with:
cargo run --example chunk-search
This searches for a text over a list of 100 items with a chunk size of 10.
Options
As given above, Fuse takes the following options
location
: Approximately where in the text is the pattern expected to be found. Defaults to0
distance
: Determines how close the match must be to the fuzzylocation
(specified above). An exact letter match which isdistance
characters away from the fuzzy location would score as a complete mismatch. A distance of0
requires the match be at the exactlocation
specified, adistance
of1000
would require a perfect match to be within800
characters of the fuzzy location to be found using a 0.8 threshold. Defaults to100
threshold
: At what point does the match algorithm give up. A threshold of0.0
requires a perfect match (of both letters and location), a threshold of1.0
would match anything. Defaults to0.6
maxPatternLength
: The maximum valid pattern length. The longer the pattern, the more intensive the search operation will be. If the pattern exceeds themaxPatternLength
, thesearch
operation will returnnil
. Why is this important? Read this. Defaults to32
isCaseSensitive
: Indicates whether comparisons should be case sensitive. Defaults tofalse