Extract TCP Broadcaster into separate gem
e2 opened this issue · 22 comments
UPDATE/SUMMARY: TCP functionality shouldn't be in Listen (which aims at portability) - see comments. As a Guard plugin - maybe.
A built in TCP server within Listen makes little sense, because:
- the host/port setup for broadcast/recipient modes is confusing and complex
- the current TCP server mode is unreliable on Windows (see #246)
- it adds an otherwise unnecessary dependency on Celluloid::IO
- it makes contribution tougher (
listen
is already too complex as it is) - the message protocol should be versioned anyway
- the user should have more control over the TCP setup and which events to create and send and why (without needing changes in Listen), which files/dirs to ignore, etc.
I'd suggest something like listen_tcp_server
which would simply use Listen
as any other app would.
Any suggestions regarding the name and the implementation would be appreciated. (I can't promise much support aside from moving the existing Celluloid::IO-based TCP server into a separate gem).
I've read on the events-forwarding quite a lot.
I dont have much experience in benchmarking network messaging, so please exuse me if the following is a stupid question:
Is TCP broadcasting the quickest way to deliver messages over the network? I mean i really don't know whether TCP sockets are quicker then a messaging queue like NATS. Others: http://queues.io/
Another thing is that messaging queues usually already have client libraries in different languages so integration will be a easier.
Tell me if I'm fooling myself.
I am researching this related a virtual dev-environment setup of Windows (host) and Linux (guest).
@antitoxic - listen's focus is mostly on supporting multiple platforms and "correctness" of events (through "best effort").
You can use diagnostic mode of Listen to check if the network performance is sufficient for you (use LISTEN_GEM_DEBUGGING=2
environment variable, and you'll see how long things take).
You might find that Celluloid overhead is more significant than you think.
Also, correctness is pretty important, and Celluloid has specific delays to accumulate events, e.g. translate delete + create as a renaming of a file. (You might want to check the wait_delay option).
Overall, performance was never a goal in Listen - especially not network/Celluloid performance on Windows.
And, Listen shouldn't be doing any TCP broadcasting, because that's not related to filesystem events anyway.
As for messaging libraries - if TCP isn't enough, you can simply use Listen as a client library and use any messaging library you want in the callback - and simply setup a client for that messaging library on the other side.
I've tested Listen and it performs way better (cpu and memory wise) than similar (and not that cross-platform) libraries in other languages like watchdog(python) or chokidar (nodejs). That's regarding the listening itself.
You're right, Listen shouldn't do any TCP or network broadcasting. I'm thinking of making a script that uses Listen and then broadcasts over rabbitmq. It's mainly targetting the network drive scenario, I will let you know about results.
@antitoxic - I've heard that polling on Windows may be faster than WDM - and polling itself may be faster and more "correct" in some cases (especially if you properly tweak the ignore rules and reduce the latency and wait_delay), though the cost may become too big for larger projects.
Using LISTEN_GEM_DEBUGGING=2
on both sides should give you a good idea of where the bottleneck is.
You may have other options depending on your use case (e.g. if you're using an editor to make changes, you might have multiple FS operations per file saved).
I think it would be a major mistake to change the protocol. What is needed is broader support for this, not a new protocol. This is a huge unsolved problem that could be solved if there was broad agreement on how to best do this.
Already tools have sprung up that talk this protocol: watch-network, GoListen
I'd be happy to own the Ruby side of this, but I'd strongly suggest that we keep the protocol framing the same.
That said, if the protocol were to change, I'd be a strong advocate of HTTP as the basis. It seems like the perfect fit for Server-sent Events.
More thoughts here https://github.com/99designs/watchd
TL;DR - TCP functionality shouldn't be in Listen (which aims at portability). A Guard plugin - maybe.
@lox - good points. The only 2 things I'd change in the protocol are:
- have the broadcaster send a protocol version
- allow the listener to send a "configuration" (including expected/supported protocols)
Supporting various protocols shouldn't be a problem - as long as there's a sane way to detect the protocol.
I have no personal interests in the TCP side ATM (this might change), so this issue is up for grabs (though, I'd be glad to review, answer questions, etc.).
Personally, I'd start by creating a Guard Listen client plugin to support multiple protocols - especially those from other projects.
I don't think TCP should be handled by Listen at all - creating "workarounds" for a lack of inotify/kqueue functionality results in horribly complex implementation (and as equally horribly complex and inconvenient to set up).
In short, TCP support here is just a poor workaround for a lack of fundamental inotify/kqueue like support somewhere.
The ideal solution is implementing "inotify over the network" - which doesn't make sense unless the host and client are sharing the exact same filesystem. The alternative - is an inotify-driven RSync.
Neither of those solutions "fit" into Listen. At best, Listen's Linux adapter could receive inotify events over the network through a gem with the same interface as rb-inotify. But if you're not hosting files on Linux, then the workload is on the host (optimizing events) and what you really need is a Guard plugin (which solves some threading challenges as well).
https://fsnotify.org/ seems like a reasonable high-level protocol for this. Hope this helps.
It doesn't look like it makes any mention of a network protocol for fs
events?
On Saturday, February 20, 2016, Dan Moore notifications@github.com wrote:
https://fsnotify.org/ seems like a reasonable high-level protocol for
this. Hope this helps.—
Reply to this email directly or view it on GitHub
#258 (comment).
Nope -- maybe high end protocol was the wrong term. It's a layer of abstraction to multiple systems -- since there was discussion about inotify vs other OSes. With fsnotify in place on each system, using tcp to handle the connections would make sense.
I've also looked into moving this to any other layer. To me, and my personal problem I'm trying to solve, the proper place to put this would be to support file change events on the VirtualBox shared folders. Though the Oracle contribution network is pretty overbearing and not very transparent -- compared to a modern community like GitHub.
I'm assuming the problem this discussion is trying to solve is much broader.
TL;DR - see summary below.
I'm assuming the problem this discussion is trying to solve is much broader.
I'm more and more convinced it's the "opposite".
The best approach is "case by case". It's too easy to just say: "No file monitoring on shared VM folders? Ok, let's just send events over TCP. Done!".
That doesn't work in practice. Doing such low level stuff over the network without kernel support is too complex. It's too complex to make everyone happy.
Besides, many people use OSX. And Windows. And those systems are NEVER going to any decent file monitoring over the network.
Let's take an example of VMs: shared folders. In short, the host OS needs to cooperate with the VM. Everything else is a clumsy workaround.
Another example: low level filesystem sync over a network. BTRFS support on Windows anyone? Of course not.
Another example: bi-directional filesystem sync. Of course this is just ridden with problems.
No "broad solution" will fix everything.
So the problem is not "file system events over network" but "lack of cooperation between OSs, companies and technologies".
Sure, developers need to get stuff done. So now we're into workarounds.
And this is where solutions for all problems are mutually exclusive.
"Low-level" solutions make everyone "feel good". That's why it seems like "a good protocol" is the solution. It isn't. Something will break.
E.g. One "unknown feature" of Listen is: editor support. Editors move/copy/delete/rename/create/atom-delete files. With enough editor plugins, the delay between those events increases. Listen intelligently "reinterprets" those events into a single "file modified" event.
It's clever and quite reliable, no matter how the editor is configured.
The problem is: this doesn't work over the network.
Why?
Because listen needs to collect events over a period of time and physically check the files. Maybe the network protocol could be extended to stat files on the server? But that would create MAJOR delays between the event happening - and the response on the other end.
Or, editor handling would be broken. E.g. you edit a file, but only the "add" event was sent before the "delay" period. Which means a modified file is sent as "added".
Issues are reported, etc., and we're back to square one.
I won't even mention how the OSX implementation is ridiculously unwieldy. One second resolution on the HFS filesystem? That's like an eternity for file monitoring tools. If the delays accumulate, users will be frustrated having to wait 4-5 seconds for an event every single time.
Summary
The solution is to fix things on app level.
E.g. if people are using Guard, then a guard plugin can open a socket and listen for events it recognizes.
Creating a listen app to send those events isn't hard. No protocol needed - since Guardfile
is user-defined anyway. Guard has both an API for queuing custom events and managing processes and sockets. Meanwhile, Listen + Ruby's TCPSocket is easy enough to send a whatever packet, wherever you want.
Why isn't this already available on a wiki or something?
- I don't have a personal need to implement a proof of concept (
guard-yield
may be sufficient) - I need a single valid test case at least
The last point is important. Personally, I don't see a good technical reason to use VM's (or even Docker) for development. "We have to use Windows" is not a good technical reason, neither is "We use OSX". Those are political/business/legacy issues - which rarely have clean technical solutions.
Alright -- you could've just said it's outside the scope of what you're solving (or not solving?). Sorry I misunderstood the conversation thus far.
Don't apologize Dan, discussion is welcomed. This is problem that a lot of
us are battling with so tempers flare!
On Sunday, February 21, 2016, Dan Moore notifications@github.com wrote:
Alright -- you could've just said it's outside the scope of what you're
solving (or not solving?). Sorry I misunderstood the conversation thus far.—
Reply to this email directly or view it on GitHub
#258 (comment).
The last point is important. Personally, I don't see a good technical reason to use VM's (or even Docker) for development. "We have to use Windows" is not a good technical reason, neither is "We use OSX". Those are political/business/legacy issues - which rarely have clean technical solutions.
This is a ridiculous statement @e2. The reason why people use virtual machines (and Docker) is to allow for closer parity between production environments and development environments. We have 7 teams working at 99designs across 10+ different primary codebases, we use Docker to allow the teams to move quickly, independently of each other, whilst still developing on environments that are configured much like the AWS environment they are deployed in, and the CI environments we test them in. Are you seriously proposing that this is a "political/business/legacy" issue? Almost every amazing tech company I've talked to has a similar setup. Not that this means it's the right setup, but it's some data points at least.
The idea of a repeatable environment is, IMO, key to developing quality software and high-functioning development teams. No matter how carefully you craft your Gemfile + package.json + bower.json + whatever else you use, there are always variables that bite you when you are trying to deploy and maintain code. Docker and to a lesser extent VM's provide this, it's just a matter of smoothing out some rough edges around making them easy to use with current developer workflows around file watching.
A lot of what you've said is reasonable, filesystem events are hard, and then trying to abstract them over a network across disparate underlying filesystems rapidly gets nightmarish. I completely agree that this needs to be solved further down the stack. Unfortunately, it's been years and years and this hasn't happened, especially on the Linux side. Inotify is terrible and hopelessly coupled to the VFS. We will never see network events plugging into that. NFSv3 has at least a mechanism for propagating these events, but all our tools couple to inotify on the Linux side.
I'm not sure how to fix this problem, at a system level honestly. I suspect the answer is getting a kernel developer interested and coming up with something to replace inotify and fanotify (see https://xkcd.com/927/), however given how slow progress on that front has been over the past 10 years, I'm not holding my breath.
Accordingly, I come back to the same conclusion as you @e2: the problem should be solved at a tooling level. The issue is that this problem crosses ecosystems and languages. I have golang, ruby and node that needs watchers in my VM's to receive change events from the host. How do I do this with minimal complexity? Well, probably I run match watch-network
on my OSX machine (or something similar) and it's responsible for providing a central point, either via pull or push that notifies containers/VM's when something has changed. That same command would presumably start up the listening component in the VM/container and handle the mapping of paths. This is absolutely a tractable problem.
So I come back to my initial proposal of a standard protocol. Good news, there is one, and it's fine. It was in the Listen gem, which gave it a nice home in the Ruby world. It's since been copied into Node and Golang and various other places. It's not broken, it doesn't need fixing, let's just make it an adhoc standard. Leave the debouncing stuff out of it, just provide raw FS events over a stream and let the receiver worry about it.
(that said, agree with what you've said about splitting it out into a separate gem :) )
TL;DR - see summary
The reason why people use virtual machines (and Docker) is to allow for closer parity between production environments and development environments.
A virtual machine is overhead, always.
It may be a necessary overhead (for business sake) but it's still overhead. So using Listen here is just a workaround for an issue in an overhead.
Get rid of the overhead and there's no problem anymore.
Almost every amazing tech company I've talked to has a similar setup.
Which means they all have the same overhead. Not a good thing.
It's not a "feature" it's a "workaround".
Are you seriously proposing that this is a "political/business/legacy" issue?
Yes, because a VM is not a technical necessity. The root cause is politics/business/legacy.
Give a specific real-life example if you don't believe me.
there are always variables that bite you when you are trying to deploy and maintain code.
Deploying and maintaining is not development.
If it seems like development, someone "screwed up". So it's an effect of a broken development process.
Yes, I know that hosting provider X may not have Y while the development environment does. The fact that a VM helps identify problems is a crutch for a bad process. And you may not have the control you need to fix that process. That's why politics/business/legacy ...
A VM (or container) is for isolation. Just like unit testing provides isolation. So, a VM is basically just a framework for the sake of unit testing a hosting setup. Basically, just to replicate production problems so you can unit test (or manually test) your workarounds for it.
But there's no real need to use a VM and filesystem monitoring of source files at the same time.
If you're changing source files on such a mounted folder, you no longer have a production environment.
The moment you install listen on the VM (along with dependencies), you no longer have a production environment. (Unless your app uses Listen itself). So the VM is no longer doing it's job of just "isolating".
So I see a valid reason for using a VM for running unit/integration tests. And a VM is useful for debugging production-related issues.
But neither of those are "development" (modifying sources).
So if you're using both a VM and "file monitoring for source files", it's philosophically a bad process.
If you "have to" do it, then it's best to minimize how this is used - instead of making it a permanent part of your whole development process.
Summary
A "cleaner" solution would be to just install nfs-server on the VM, mount that on the host and use polling on the host (lots of network activity for polling - but if it's slow, then you've found the REAL bottleneck).
If you're not installing file monitoring on a production server, why would you expect to need it in a VM?
A VM is really just another computer on the network. Spreading development over multiple workstations is overhead.
Before I even consider any development, what's the use case of using Listen over a network of computers? Filesystem sync? Backup? Listen is the wrong tool for both.
just provide raw FS events over a stream and let the receiver worry about it.
For the sake of editor support, the client would have to run stat()
after a batch of FS events after a certain time window. This would mean a round trip.
That's why "high-level events" (modified, added, removed) are better - both the "delay" and the stat
is done by the listener "server" (coalescing). But, the file still has to be sent over the network after the delay, which can make the time between "save" and "action" quite noticeable.
You can reduce the wait_for_delay
and/or latency
, but at some point it can break editor support (if saving the file takes too long).
Editor support depends somewhat on the time it takes between the first event (e.g. renaming a file) and the last event (closing the newly saved file). If you send every event over the network, "sometimes" saving a file won't trigger an event, or it will trigger multiple events - extremely frustrating. Listen would be "broken".
Inotify is terrible and hopelessly coupled to the VFS.
Sending raw events without delay may seem to work, but rb-fsevent (OSX) actually has a built in delay anyway. Ironically, in "file mode" (currently uses a "half-polling" mode) rb-fsevent generates too many events, so it would clog up the network and tie up the VM.
From my point of view (Listen maintenance and development), it's OSX which is the problem. Both fsevent and HFS are terrible design decisions, causing all the headaches.
OSX is actually one of the reasons I'm not developing Listen much - I'm afraid of breaking something on OSX. (Happened before).
People like Mac, so they're basically slitting their wrists by running docker inside a VM - just to be able to "work on a Mac" while "hosting on Linux".
I'm not bashing OSX, just saying that Listen is just one huge workaround for OSX. (Otherwise we'd all be using rb-inotify and/or inotify-tools). And I'm not interested in OSX at all.
Meanwhile, I found that installing a full workstation setup inside Docker gives you both filesystem monitoring and full tools. For isolation I just use plain old UNIX user accounts - installing Ruby, RVM, tools, etc. in /home/dev. If I need graphical tools, I can just ssh -X
.
It's kind of backwards, because I'm now using docker for "production-env-based workstation snapshots" instead of bare "production-env snapshots". I can pick an image with more or less dev tools installed - depending on the issue at hand.
There's another "solution" - have an editor plugin that just uploads the saved file into the VM. You probably have SSH anyway.
Folks who are watching this issue, I got some advice from E2 put together a workaround that turns change notificaitons into webhooks using listen and uses guard-yield to catch them.
Overview:
#246 (comment)
Gist:
https://gist.github.com/joshco/6c84fd30dfbe0118bc42570000eb268d
Closing this out as TCP support was dropped a while ago.