v7.0 Plan

Question

v7.0 Plan

thecodrr opened this issue 3 years ago · 8 comments

fdir has reached maximum performance. I am not saying that to discourage others from trying to make a faster directory crawler. Not at all. But I have stripped fdir down to its bones and haven't noticed any significant increase in speed. The only space left for improvement is directly in NodeJS internals. So where does that leave us?

I had initially planned to freeze the API of fdir but fdir has no proper "API". What I would be freezing, most probably, would be more features. I can't do that though. Not yet, anyway. Why?

fdir is not feature complete. I don't want to make just the fastest directory crawler but also the best one.
I don't want any one using fdir to ever feel like it can't do X or Z. An impossible feat but...
I am not happy with the current state of the codebase. Even though an API freeze doesn't mean I can't change code, I want to be able to break things.

So what's the plan for v6.0?

Modernize the codebase — the whole codebase is ES3 or something. In my quest for performance, I went to the extreme. Too much, it seems. Maybe we can even use Typescript this time around (but I like keeping things simple).
Strip the core directory crawler down to the bare minimum — right now, everything flows in and out of fdir with a free pass. The result is that there is a lot going on at one time. It is time to simplify things...and that brings me to:
Make fdir pluggable — after some initial thought, this is the best way forward. The idea is to allow anyone to directly tap into the crawling process and control it.
Move all the extra features & flags to their own plugins — this will enable a more robust system. There will be an "official" set of plugins that will act as a foundation for others to play with the Plugin API.
Improve documentation — this doesn't even need explaining but yes! It's time to move to Docsaurus 😆

Does this mean that the Builder API is dying? Nope. I like the Builder API and it'll stay. It will act as a kind of a proxy for the plugins underneath. That means the Builder API will also need to be extensible.

The end goal is to make the following features possible:

Streams & Iterators
Globbing (the current one is a joke, I am talking about a proper globber)
Multiple filesystems (locking fdir to fs is a bad idea)
Deno support, maybe (not sure about this)
And other stuff...you guys can suggest.

In the end, the idea is to reduce complexity, increase flexibility, improve readability, and maintain the performance (and become the de-facto directory crawler in the Nodeverse).

In any case, that's the plan. I would love to hear what you guys think about this. Ideas, suggestions, possible ways to implement a plugin system, ideas for plugins, etc, etc, etc. is all welcome!

Answer 1 · 2021-10-25T13:00:14.000Z

Not a fan of the builder API, would prefer seeing options api become default. It's a much more common pattern across Node libs.

Multiple filesystems

Would be great to use with mem-fs. Some examples in the docs would be nice.

Globbing

I would like to be able to search for a dir, but also exclude everything under it. E.g. Find me all node_modules, but don't visit anything contained in node_modules. It's an edge case that I haven't found any globber able to handle.

Streams & Iterators

As I mentioned elsewhere, I'd like to be able to start processing first glob result as soon as its available, however continue walking in the background.

Idea: Watcher

Would be nice to be able to run a glob, and then set a flag to watch for changes. Currently, you have to use chokidar/watchman, and duplicate configuration. Many applications that need walking, also need watching. Could investigate if there is a way to plug fdir into chokidar, or build own.

Idea: Worker processing

Explore perf benefits of using workers for CPU ops. Can it speed up globbing? Maybe someone needs more complicated filtering logic. This starts to feel like it should be a stream operator separate to fdir though. But if we are using workers for globbing, maybe re-using them for processing is helpful.

Answer 2 · 2023-01-12T10:55:02.000Z

Hey, just a small idea.

Rather than completely suppressing errors, can they be returned in the result (maybe as an opt-in option)? In my case, it would be handy to display any errors that may have excluded files/folders.

I'm too happy to help in any way (when available 😉)

Answer 3 · 2023-02-08T18:29:08.000Z

@glenn2223 The only problem is how the API would look. We already have a withErrors method.

P.S. Sorry for the delay in responding. I just now finished releasing v6.0 which completely rewrites fdir from scratch in TypeScript. Contributing now should be much easier.

Answer 4 · 2023-02-08T22:29:22.000Z

The only problem is how the API would look. We already have a withErrors method.

So does this still return just with errors in the object? Sorry reading the docs, I just assumed (which one should never do 😜) it threw

P.S. Sorry for the delay in responding. I just now finished releasing v6.0 which completely rewrites fdir from scratch in TypeScript. Contributing now should be much easier.

Not a problem. A nicety to have typescript and we're all busy with our normal (non open source) lives

Answer 5 · 2023-02-09T04:43:58.000Z

So does this still return just with errors in the object? Sorry reading the docs, I just assumed (which one should never do stuck_out_tongue_winking_eye) it threw

Yes, it throws the errors instead of collecting them. The default behavior is suppressing & ignoring errors. It might be a good idea to collect the errors by default but it'll change the shape of the result so making it the default would break a lot of things.

If we make it opt-in, what should it be called when we already have withErrors. Or should we include an option in withErrors? Something along the lines of: withErrors({ collect: true }).

Answer 6 · 2023-02-09T11:53:50.000Z

Ahh, sorry I misunderstood. It wasn't a "no, we already have it"; more of just a how 🤦‍♂️

In my personal opinion, withErrors does read more as: include the errors in the returned object. However, as you said, this would be a breaking change. If this is the opted route, would it be that large of an issue in v7 - where it can be documented? For the original behaviour you could then go for withErrors({ throw: true }) (or throwInstead or something similar)

Just to throw a proverbial spanner in the works. Another option may be to utilise some logging functionality. You can then have a callback for the user to handle each error. This could also have the added benefit of terminating the running process.

Not only would this benefit the user it could also benefit you. By implementing some deep trace/debugging calls you can debug issues a bit easier. Doing this myself helped a lot with debugging issues, I simply ask users to send me a trace log and then I can walk through the logs/steps - and usually tell them it's not an issue with my extension 😜. You then have error, warning, info and trace logs which could be callbacks or included in the response.

It could then work as either of:

// # Option 1
.withLogging(options: {
    minimumLevel: LogLevel, // maybe to save the user doing throwaway 🤷‍♂️
    callback: (data: {
        level: LogLevel,
        message: string,
        error: Error // Possibly on or exceptions or multi-typed for custom errors
    }) => boolean | Promise<boolean> // return type is to support cancellation
}): Builder<TReturnType>;


// Option 2
.withLogging(options: {
    onError: (data: {
        message: string,
        error: Error // Possibly on or exceptions or multi-typed for custom errors
    }) => boolean | Promise<boolean>, // return type is to support cancellation
    onWarning: (data: {
        message: string,
        error?: Error // Possibly on or exceptions or multi-typed for custom errors
    }) => boolean | Promise<boolean>, // return type is to support cancellation
    onInformation: (data: {
        message: string,
    }) => boolean | Promise<boolean>, // return type is to support cancellation
    onTrace: (data: {
        message: string,
        error: Error // Possibly on or exceptions or multi-typed for custom errors
    }) => boolean | Promise<boolean>, // return type is to support cancellation
}): Builder<TReturnType>;

You could always add things like path/directory info, walker info, running time, etc. to the object too

Answer 7 · 2024-04-13T14:18:50.000Z

this package was sent my way recently and, so far, i'm loving it. great work

a few immediate thoughts:

an options API makes more sense as already mentioned, it is a common pattern in the ecosystem already
a plugin system does seem like a smarter idea than bloating this package, but i would avoid going too far (to the point where it slows you down), though i'm sure you'll keep a good handle on that
it would be useful to have the option to pass certain things in (picomatch and fs in particular, in case we want it to use an fs-like object which isn't node:fs

Another thing to mention is that stats are useful in many cases. Currently, fdir eats them up during crawling and the output is a straight list of paths. It would be useful to have the option to return the list of stats (which we must've done as part of the crawl presumably) rather than just the paths.

That would allow us to know which path is a directory, a file, etc.

Finally, if you need any help, shout up :D this package will be very useful as part of the ecosystem cleanup. so i'd be happy to help out wherever if i can.