go-distributed/meritop

Short Term Roadmap

Opened this issue · 1 comments

I think I will reiterate what Xiang said last time, in terms of what need to be done short term.

  1. Get framework and a regression test code up and running correctly on single host, with each task runs inside a different goroutine. We can have two stages here:
    a, just be able to run the regression test end to end, this requires us to finish framework coding including message pulling between task through network. We might need to have a dummy controller that starts up the all the task in goroutines, and make sure the entire taskgraph will iterate through intended epcoh (controlled by application). I will work on move dummy task into framework_regression that do this.
    b, Add fault tolorency support. And test that on single host the same way: inside the goroutine, we randomly kill some goroutine and see if it can recover.
  2. Then we move to implement controller and interaction with resource management system k8s for example, so that we can actually run something meaningfully on real hardware. We should make sure docker based dummy task regression task is running, and also we can implement one or more "real application" assuming the storage is taken care of, for example aws. One candidate is a spark in go, but we should also consider other possibilities. The main work in the framework will be handling more and more issues that comes up in the real world usage.
  3. After we have enough confidence in framework, we should focus on how to push it out, to the developer and also use other channels like Cloudera. We should of course always do this, but after we really made sure that this framework works in last two phases, this is the phase that really make or break the effort. I think this is a very fundamental piece, we should play our card carefully, to really make it work.

The key message is, I guess, we should really put in the effort, and make sure it is really usable, then we are done half of the work, the other half is to really spread out the words, and get the adoption up. We still have some ways to go, but I like direction that we are going.

My understanding is we are still on phrase 1a, so let's get it going.

@hongchao, we should sync up on wechat to finalize how to do dummy task based regression test using go routines.

Xiaoyun

Tremendous write-up! Thanks @xiaoyunwu .