ajitid/fzf-for-js

Great to see someone else working on this!

Closed this issue Β· 14 comments

Hi,

It's great to see that someone else is working on the same thing I was planning to spend some time on this summer. To me it seems that you're taking the way I was initially taking (building the whole thing in pure javascript), however since some initial fiddling (and despite the old README still being in place), I recently started looking in another direction: creating an fzf-lib in go (basically removing all non-core code from the original fzf) and then compiling this to WebAssembly, as well as using gopherjs to compile to pure javascript.

I'm about to do a write up on my blog on these efforts, and to compare performance of native go vs wasm vs gopherjs; would love to include fzf-for-js as pure written-in-js effort in there as well. Would you be comfortable with this? Let me know!

Heya, your work on extracting FZF to make it a general purpose library is super!

I too considered using gopherjs for FZF. But my initial finding showed that just compiling algo and normalize map results in a file with 27K lines. I had to drop that idea.

I would love for this lib to be mentioned in your blogpost, though I'll request you to not include it in performance benchmarks, at least not for now. This library is behind 1.x and is very young (only 3 weeks old). Also being a 1:1 port of its Go counterpart means that perf was not the primary focus while porting it, rather creating a stable port was. That being said, I'm already seeking ways to improve its performance.

I edited my comment above too many times, hope you read the last edit πŸ˜….

Closing this ticket for now. If you have anything further to discuss, feel free to drop back a comment here, we'll reopen it.

Thanks for alerting me to all the updates :).

Just wondering (I didn't dive into your source code yet), did you implement algo v1 or v2?

Yes, the wasm/gopher solutions result in huge files; even tinygo wasm is something like 200kb. Probably this is because they need to include large parts of the go standard library.

Also (as a sneak peek) I have to say that I'm not very impressed by the performance of either the wasm or the gopherjs code. Perhaps I missed some optimatisation flags somewhere, but we're talking a factor 100 to 1000 slower than normal compiled go code. It basically means that this is fine for a case when you're searching through a list of 1000 items, however completely useless when having a list of 100k items (which is exactly when fzf shines in my opinion).

(I have a hunch that it has a lot to do with memory allocation; if this is the case, then probably smart memory allocation tricks may speed the whole thing up. Having said that, I also believe that since the JS engines in browsers are optimised for a certain type of code -- normal javascript constructs -- they may not do optimally in the transpiled case. All in all, there may be a good argument to have a native js fzf.

If you need any help on your project, let me know. I'd be happy to lend a hand.

(seems that I don't have rights to reopen the issue; so I just hope that you will get a notification :))

Hope you don't mind, I did try a quick comparison to see how your code performs.

The good news is that it manages to search through a list of 65536 randomly generated sentences (see this gist) in 500ms. That's about 3 to 4 times as fast as the wasm code. However it did find only 10% of the matches that fzf --filter (or the wasm implementation) finds for the same list (I used the fzf-for-js on https://fzf.netlify.app/basic ; maybe there is a newer one I should be using?). It also reports a different score for the top hit (which is the same in both implementations). The second hit is not the same between implementations.

commandline code

curl https://gist.githubusercontent.com/reinhrst/1fcb49f0886621857f05c8e4969fc3b0/raw/da7b5a4d838ca8eddbc40377c7969d64aae6a1ce/data.json |  jq '.[]' | fzf --filter "hello world"  | wc -l

code run on https://fzf.netlify.app/basic :

init()
(wait...)
fetch("https://gist.githubusercontent.com/reinhrst/1fcb49f0886621857f05c8e4969fc3b0/raw/da7b5a4d838ca8eddbc40377c7969d64aae6a1ce/data.json")
    .then(response => response.json())
    .then(data => {
        myFzf = new fzf.Fzf(data)
            const start=new Date();
            const result = myFzf.find('hello world');
            const end=new Date();
            console.log(end-start, result.length)
        }
    )

Having said this, I fully understand (as you also say) that this is a new product; probably still work in progress. As such I'm hesitant to start posting bugs about this; let me know if you're interested in diving further into this now, and if so how would be the best way to approach this.

However it did find only 10% of the matches that fzf --filter (or the wasm implementation) finds for the same list

This is expected. By default FZF for JS does a basic match with v2 algorithm. This means if the string contains the words in query but they are out of order, then it won't be matched (for eg. "world hello" won't match), thus resulting in less number of matches. You can activate basic match in FZF CLI by using fzf +x to get the same behaviour.

It also reports a different score for the top hit

FZF CLI also uses a tiebreaker, which is length by default. Adding a tiebreaker can result in increased score for an item, thus resulting in this score discrepancy. Tiebreakers will be introduced in a newer FZF for JS version but they won't boost the score like FZF does.

So these two behaviours aren't bugs. I would be happy to explain further the decisions I made for the defaults. If you find any other issue, then sure do raise a ticket.

Now moving onto the test...

I used the latest dev and ran the same test as you did but with extended match option enabled (equivalent to fzf in CLI). This gave us 4694 matches which matches with other implementations. Here's the result:

image

And with the default options it gives 427 matches (equivalent to fzf +x in CLI):

image

In both cases any sort of caching of results is not done.

And as we have discussed already, mentioning this library is fine in your blogpost as long as it is not put up in benchmarks against its counterparts. I would also request for the images quoted in this comment to not be posted in the blogpost either. Hope that is okay with you. πŸ™‚

… did you implement algo v1 or v2?

Both version of algorithms have been implemented. Extended matching is also there. However, the latest release (v0.3.1) only exposes v2 algorithm + basic match to the user.

… I have to say that I'm not very impressed by the performance of either the wasm or the gopherjs code. Perhaps I missed some optimatisation flags somewhere…

I don't think you missed any flags there. Or more appropriately, passing more flags might not have helped much either. It probably has happened due to multiple reasons:

  • Compiled GopherJS code is very huge and probably not optimized enough (I just noticed you've already mentioned this)
  • JS on browser doesn't has access to threads. Offloading to web workers won't necessarily improve the performance either.
  • Most often, WASM performance is equivalent to JS counterpart (see this post). The compiled fzf-lib/fzf-js WASM code might not be using threads either.
  • As you mentioned, memory allocation could be a big reason too

If not already done, changing partitions to 1 here might help in making a fairer comparison.

If you need any help on your project, let me know. I'd be happy to lend a hand.

Thank you! I'll surely ask for help if needed. πŸ˜„

*Just updated both of the comments that I made above.

Ah that makes complete sense! I was also struggling to find "reasonable defaults" for different options. It may be useful to mention the defaults you chose in the readme to avoid confusion.

Thanks for your answers, they make sense and give me a better idea of the state of the project. Very impressive what you managed to do!

And as we have discussed already, mentioning this library is fine in your blogpost as long as it is not put up in benchmarks against its counterparts. I would also request for the images quoted in this comment to not be posted in the blogpost either. Hope that is okay with you. πŸ™‚

Of course, I will respect your wishes. I think it may make sense to explain a bit more what exactly I'm trying to do. First of all, any blog post would be just that, a post on my (unfortunately not very popular yet) blog; no professional publication or money making. My main goal is to make sure that the world gets a bit smarter as a whole, that people won't have to go through the same struggles to answer the questions that I have spent a long time finding the answers too...

My interest in it all started when I was looking for a fzf for javascript 2 months ago (seems that if I had waited a month, I would just have stumbled upon your project and be done with it :)). However as mentioned, I decided to tackle the same problem as you, but from another approach.

My initial hope was to just WASM-compile fzf-lib and have a very good performing fzf implementation I could use in the browser. Indeed I agree with all the reasons you mention why this may not be the case. Indeed the wasm code only uses a single thread (or at most loads my cpu to 120% compared to 700% I may get when running the cli).

The more I dove into this, the more I realised that the story is much more complex. For one, there is more than one way to get your code to run in the browser; go wasm target, tinygo wasm target, and gopherjs (and possibly more I haven't looked into yet). It's not entirely clear what the maturity of these projects is; all that can be found online is that they produce large (tinygo slightly less large) chunks of code. I couldn't find much about performance.

From the viewpoint of performance I think there are some interesting things to talk about:

  • Performance differences between different compilation/transpilation methods
  • Performance differences between different browsers
  • Performance difference compared to native go (from a theoretical point of view I agree that it's most fair to compare to single threaded go.... however from a practical end-user point of view, most people by now have 4 or more cores, so if this influences performance, this is obviously interesting to talk about too.
  • Performance compared to a javascript(or typescript)-first code-base doing the same work (I was actually considering making this myself, until I ran into this project)

An additional thing that may be important is speed consistency (I found that sometimes, randomly, the same wasm code is 20 times as slow as other times, which may mean that it's not practically usable in a web page), memory usage (especially safari seems to run out of memory, sometimes giving an error, other times just stop running the javascript without a message).

All in all, I would love to know more about all this for my own knowledge. I fully understand that your project is still in an early stage (although it's super impressive that you have everything working!). Especially considering that in your quick test you show that you get timings that are very much on par with what I got through the other methods, I'd be excited to see how it performs when I run the whole test-suite (and possibly make some small changes to make sure it returns the same data, so that the tests are fair comparison). Obviously I would make PR for any changes, in case you're interested in it.

Then a final step would be a writeup on my blog. If fzf-for-js indeed wins on some areas (it should at least on size, I expect, but as I said, I can very much imagine that it will do very well in other areas as well), I would certainly like to include it (in addition to just using it as an npm module in another small project I'm working on :)), including the disclaimer that this is a very young project and there may be room for further improvement. At least this could be useful for people looking for fzf in the browser, however I think it could also serve others who are wondering whether they should use the wasm/gopherjs route if they want to use a go library in the browser, or write it themselves in javascript.

Obviously if fzf-for-js performs bad (which I really don't think), I would rather see it as a challenge to submit some PRs for speedups -- or at least get to the bottom of why we see a certain performance.

How would you like the idea that I first run some benchmarks (which I now realise I want anyways just for myself -- and I expect that you are interested in this data too, just for your own knownledge), and based on the results we will see whether it makes sense to include them in any write-up.

Sure! Instead of the latest release, I'll suggest you to create a bundle from dev branch and use it for your benchmarks:

  • Clone this repo to your local and checkout dev branch
  • Run npm i to install the dependencies followed by npm run build
  • Use npm pack to pack the dist into an archive
  • cd to your project
  • npm i ../path/to/fzf-for-js/fzf-0.3.1.tgz to install this package into your JS project

To make this library behave like FZF CLI, use the following options:

import { Fzf, tiebreakers } from 'fzf'

const fzf = new Fzf(list, {
  extended: true,
  tiebreakers: [ tiebreakers.byLengthAsc ]
})

You can get a list of available options from this (WIP) doc.

I would rather see it as a challenge to submit some PRs for speedups

Yes, this is something that I really wish I get contributions for. πŸ™

Great, sounds good! Thanks for the example code, saves me time trying to figure this out myself :)

Your suggestion to disable dev tools while doing benchmarks is an interesting one, and one I didn't think about -- I assumed that as long as you're not writing too much to console, it doesn't affect anything, but it certainly won't hurt to try. I'm currently thinking of a new setup to do the tests, maybe even checking if I can restart the browser for each test run, so that the tests will not influence each other. Shouldn't be too hard I think :).

One more quick question. You mentioned that in your test you had caching switched off. Did you not port the caching parts of the fzf code yet, or did you explicitly switch off caching for this test?

Oh I wrote that?! I have misworded it then. Caching is not implemented yet.

Closing for now as further discussions were happened on the email.