perwendel/spark

Providing Async method with Java 8 CompletableFuture

pprun opened this issue ยท 44 comments

Spark API currently is very simple and blocking,
as Java 8 introduced the async style method by CompletableFuture,
we hope Spark be scale to vary large throughput application.

If you're interested we got pretty good turn-around with a simple async model. The essential structure is outlined here:

Response times really dropped. Unfortunately JAXB (XML) processing remained/s the real processor hog, especially when we timed both the data-comm and XML. HTTP GET-s & PUT-s will be a lot less intensive; it's a good fit.

o0x2a commented

It would be great if Spark offer non-blocking APIs. ๐Ÿ˜

๐Ÿ‘

+1 for non blocking

suzel commented

๐Ÿ‘

vrcca commented

+1

+1

krrg commented

+1

๐Ÿ‘

ruurd commented

Actually, -1. Let spark be simple. And blocking.

Non-blocking introduces a lot of other complexities that would have to be handled also. I say NO.

krrg commented

@ruurd In the spirit of an open discussion, could you expand on this? What complexities did you have in mind?

ruurd commented

Lots of threading for example? How many requests would you want to handle simultaneously as a proces? What to do if you pass that threshold? What to do if you have passed it and now the number of simultaneous requests drops below the threshold? How can you simply and meaningfully configure this kind of stuff? Should the configuration be changeable on the fly? And and and...

Besides. If spark cannot process requests fast enough, it is simple enough to put it behind a load balancer.

krrg commented

Non-blocking != Lots of Threading. Although threads are one way of implementing a non-blocking server, it is not the only way. See https://docs.oracle.com/javase/8/docs/api/java/nio/channels/Selector.html and http://tutorials.jenkov.com/java-nio/selectors.html#why-use-a-selector, for instance.

The point here is that you can multiplex many requests on a single thread. Obviously this raises different questions of implementation, but lots of threading doesn't have to be one of them.

@ruurd: Actually, the blocking version causes more threading as the only way to scale blocking API is to feed it more threads, while non-blocking variant scales quite nicely with relatively low number of threads.

ruurd commented

Nice trick, zeroing in on the trheading stuff :-) The main point is that nonblocking IO is going to make spark bigger and more difficult to configure. I think Spark is small, lightweight, easy to get running, short time to market, microservice. If your problem does not fit, find another tool.

@ruurd stfu!
+1, big time! again!

ruurd commented

@yeshodhan stfu yourself!
-30000.

o0x2a commented

@ruurd Using nio instead of io will not make things hard for you buddy, so just chill.
Use your time to read on the topic instead.

ruurd commented

@Code-guru 1) I'm not your buddy 2) tell @yeshodhan to chill he is starting this and 3) if you really want to use something that entertains async and experience related difficulties, use node. Using an aynchronous IO paradigm will make spark harder to use, harder to maintain, harder to debug, will increase the number of failure modes it has to deal with and just plain does not fit in with what spark wants to be: easy, small, lightweight.

tipsy commented

For the people who want this, just how large are you applications?

Making Spark async is not on the roadmap currently, mostly because of the reasons @ruurd just mentioned. We think that ease of use is the main selling point of Spark, so we're very wary of changing the current paradigm into something more complex.
We'll have a look at it for Spark 3, maybe we can find a way to make it extremely simple to use.

krrg commented

My service was ~2000 lines of code, servicing about 100,000 HTTP requests a day, usually within a 12 hour window.

We ended up using Vertx, since it supported async, and had the words "Lightweight" "Easy" "Fast" and "Simple" on its homepage.

tipsy commented

@krrg Thanks. Did you have performance issues with Spark, or was it a 'better safe than sorry' decision? Did you do a comparison test?

LeifW commented

A Scala version of this framework, Scalatra, added non-blocking IO support, using Servlet 3.0+. It's not in the core, but an add-on module.
Given current Spark syntax get("/hello", (req, res) -> "Hello World"), a version using Java 8 CompleteableFuture might return a CompletableFuture<String> instead of simply a String, e.g. get("/hello", (req, res) -> CompleteableFuture("Hello World")) or get("/hello", (req, res) -> someAsyncHttpRequestTo("http://google.com/?q=foo"))

In my opinion, an async version can be less work to configure, as I don't have to pick ahead of time a number of request threads in the servlet container pool (usually just runs on one thread per CPU core).

Some other JVM web frameworks supporting async: Finagle, Netty, JAX-RS, Scalatra, Servlets, Spray, etc...

krrg commented

@tipsy It was more "better safe than sorry" approach. Unfortunately I don't have any performance results.

ruurd commented

@krrg and if you are a vertx user what is you interest in turning spark into vertx? And @LeifW I think that Scalatra is a Scala version patterned after Sinatra.

ruurd commented

@LeifW not having to pick the number of request threads introduces unexpected behavior in that case. What if you have to use your server for additional tasks? How are those tasks going to deal with a program that just hogs all CPUs because it feels like it? So instead of configuring Spark you will need to configure something else NOT to hog your CPU. I'm a big believer in convention over configuration but in this case it most probably will bite you in the proverbial behind the moment your service is being used outside of a development environment. Having to configure the number of requests threads forces you to plan ahead for the case where that number is insufficient.

@ruurd I must admit I have not had any reason to configure it, but from what I understand, the underlying fork-join api that is backing the async servlet stuff, has some knobs for tuning the threading behavior.

As a consumer of the async API I really don't have to do anything too different. Basic async servlet examples make the difference clear and very easy:

  1. get AsyncContext from a request
  2. run your processing on a separate thread with the attached async context.
  3. Call asyncContext.complete() when done.

Servlet 3.0 api itself doesn't really impose any specific threading strategies on you.

Most of the very simple samples on the internet use simple thread executor to execute a long running task off the servlet request processing thread, which makes it very malleable to thread pool configuration and execution strategies.

As a Spark api surface area, I imagine that if I register a handler for an endpoint that returns a CompletableFuture instead of a plain result, that should be enough to signal that I really want it to be run asynchronously I imagine there's really no more complexity required.

ruurd commented

@luolong OK the scenario I see before me is that you fork of a long running process then rip in no time flat through the handler and spend the rest of the time waiting for the result of the forked process to return the end result. Where did my gains go? And how long is the requestor waiting for a result?
There is only one scenario in which I can imagine that this could make sense at all: in the case that there is no one waiting for a result at short notice (most websites have an NFR that specifies 3 seconds max waiting time for all top level requests in the 99th percentile). Even long running processes are hampered by the fact that the browser will close the connection after a given amount of time. So max runtime would be what? 30 seconds?
I think that microservices should be engineered to yield a result in something in the order of 100 ms tops. And that it should be engineered to run only a single task per request. Anything long running should be handed down to a different proces over a bus as a fire-and-forget. Synchronous IO makes it much easier to measure performance, is easier from a development and testing perspective and the resulting services have a more deterministic behavior meaning that it is easier to derive how the service should be horizontally scaled. If you need to scale then use something like kong. That is specially made for managing microservices and allows you to keep microservices what they are: micro. simple. fast. synchronous :-)

vietj commented

@ruurd indeed to fully benefit of non-blocking / async you need non blocking services as well, otherwise there is no real again and more complexity. In the scenario you describe async request / blocking service then you move the thread blocked from the IO layer to another thread (usually a worker pool). However your users could use a non blocking service like a Cassandra client. That being said to me the fundamental problem is that servlet technology is blocking by nature and the non blocking programming model provided by the servlet spec is not trivial (frameworks should make it easier).

Don't get me wrong I'm not pledging for supporting async in SparkJava, you are the boss, I'm just shedding some light on the benefits / drawback of async.

Well, @ruurd you can certainly do as you like with this framework. It seems that you have thoroughly thought about this issue and decided against it. I might not share your views, but I do respect them.

The reason I was interested in async support in Spark was that my use case was intermediate service set up to translate web requests from an internal service API to an external services that had a very high probability of being slow. In addition, the internal API was heavily asynchronous.

Having async support in this situation was highly desirable. Anyway, that project is now long done and forgotten -- I ended up simply implementing bare bones Servlets and using the async support provided by Servlet spec instead.

tipsy commented

Just to clear things up, @ruurd is not a Spark maintainer.

vietj commented

@tipsy sorry for the misunderstanding, anyway I just gave my opinionated view whoever the boss is :-)

+1 for async apis. But blocking apis should remain as well, engineer should choose between them.

+1

PR submitted in this thread if anyone interested in reviewing (it's a bit of a spike and not merge-ready yet) #549

+1

kran commented

-1 of course.

As of my 5 cents:
Discussing about sync/async request handling - could it be reasonable to have both options if we talk about framework?
And leave the descision about what to use to developer?
Whenever one has to implement some feature it could be great to have technical capabilities to do the stuff without needing to leave the framework