CodeNow/api

Proposed API Restructure

Opened this issue · 12 comments

API Next

There has been a good amount of discussion about restructuring (rewriting) the API for various reasons, but I don't think we have ever iterated all the reasons and discussed why we want them. This document will attempt to do just that and propose a new way to format our API code so it is more reliable, readable, and testable.

Issues

Various reasons have been proposed for why we need to update our API. I want to focus on a few specific cases:

  • no coverage reports available for routes
  • unit tests are lacking for models
  • conditional logic in routes requires functional testing of the route
  • our functional tests take a long time to run

Coverage, and Route Conditional Logic

Currently, coverage reports are tough to get for API's tests. Of our two test suits, only one of them has coverage enabled: the unit tests. Coverage is a good metric for ensuring code is tested, but when we don't have enough tests to cover all our code, we won't get good feedback from this metric.

Interestingly, the argument has been made that we need "coverage for our routes". Express itself makes this difficult because of how it loads its middleware. Getting coverage in our routes is virtually impossible because of this, and I don't believe that coverage for our routes is a very good metric for our testing (yet).

The reasoning behind the argument for "coverage for our routes" stems from the fact we have conditionals embedded in our route flows. This, I strongly believe, is erroneous behavior to have defined in a route. It makes testing very difficult because we have to create complicated functional tests to trigger routes at specific times with specific conditions. All of this is moot, however, because we cannot get coverage in the first place.

Increasing coverage in the appropriate places and removing conditional logic in routes would be very solid first steps to increasing the confidence in and the testability of our API.

Unit Tests

In the discussion of coverage, I mentioned that we only have it turned on in the unit tests. This, in my view, is the correct place for coverage. However, since it has not been the paradigm to write unit tests for models from the beginning, we are lacking unit tests for many files. Currently (as of 2015/09/14), the coverage of what we do have unit tests for is at 86.42%. While this is respectable, it's still a far cry from getting all our files covered. Our testing framework, Lab, doesn't include files in coverage that aren't run during the unit tests, so we aren't including any file in which we have 0% coverage.

Addressing this issue means we need to do two things. First, we need to make sure that un-tested files are included in the coverage reports. Lab does not do this, so we need to make it visible. Then we will be able to actually write the tests and ensure our models are covered.

Test Duration

As a side effect of our in-route conditions, we have had to write a good deal of functional/integration tests that make full-blown requests through our API to test features. By increasing the number of unit tests (which are by definition much smaller and faster) and reducing the number of functional/integration tests (calling all the way through the API), we should be able to speed up our tests dramatically. This will increase developer confidence and allow us to move faster and debug problems more rapidly.

Proposed Changes

There are a handful of changes that I would like to propose and rules for moving forward.

  • Redefine the roles of Routes, Middleware, and Models
  • Remove conditional logic from routes
  • Strengthen unit tests for Models, enabling coverage

Ultimately, I want to get closer to a "Model View Controller" methodology, but Express's names and how we've architected all our logic puts us pretty far removed from being able to define those three things clearly.

These tasks are very much related, but let's try to address them each individually.

Routes, Middleware, and Models

Express is a minimal and flexible Node.js web application framework that provides a robust set of features for web and mobile applications.

That's the way that Express's homepage defines its product. What this means to me, really, is that it is not opinionated when it comes to your usage. It provides the request/response/http server for you and ways to define behavior on routes. Unfortunately, this means that without really well defining how one is going to use Express before you start, it can become messy (which, I believe API has become).

Express's lack of opinion actually semi-infuriating at times. Even the examples directory contains vastly different ways of using Express (and I'll just ignore the fact that I can't seem to find any tests in any of these examples). Two that I found useful at all are route-separation and mvc.

a quick aside...

I think that we are using express in a very broken way. By having our routes handle a lot of conditions and logic, we're making it very, very difficult for ourselves to test our code. I am to blame for this as well; I remember talking to TJ about adding then() and else() to dat-middleware. At the time, when the API was young and we were moving very quickly, it seemed like a grand idea. However, as API has become more complex, these conditional routes that we created have become a major headache. In order to move away from this, however, we need to define how to do it and where to move our conditionals.

Middleware (and branching)

"Middleware" has been abused in our routes. Having Middleware to control flow is tricky to debug and is something that has bitten us. A better definition for Middleware in an MVC viewpoint is "a function that prepares a request for use by a Controller". This means that it doesn't make any changes to a model (there are one or two exceptions, like sessions). It simply gathers information and prepares information for the Controller.

A very clear example of this in Middleware is body-parser. It looks at the body it was handed and turns it into a JSON object that is more friendly to us and provides it as req.body. That value can then be used by a Controller or modified by another Middleware.

One case were I do see branching with Middleware to be useful is in validation. If Middleware ever 'branches' in some way, it should be for the sole purpose to fail and to fail quickly. For example, dat-middleware exposes functionality like mw.body('key').require(). This checks that the request body contains key and causes an error if it does not. This makes validation very easy and clear: this code either validates the body or returns an error. A good case for dat's modification methods is mw.body().pick('key'), which prunes body to just contain body.key and allows us to prevent extra values being present. We know that dat-middleware is very well tested and has been validated, so it is a very useful Middleware.

What we need to remove and avoid doing in the future is the following:

mw.body('key').require()
  .then(doSomething)
  .else(doSomethingElse)

This conditional is very simple and is one that you may think is 'easy to test', but this is the mindset we want to avoid. The overhead to test this condition is substantial: you have to create an entire request to the API, at least twice, to validate that both Middleware. This conditional branching Middleware is difficult to test and maintain. Where we move these conditionals to is forthcoming.

There is one reasonable use for .then(): it can be used to conditionally modify the request if a value is present. However, it would be best to hide this condition as best as possible. Let's say the following exists in a route:

mw.query('username').require()
  .then(
    someHelper.create(),
    someHelper.model.findUserId('query.username'),
    mw.query().set('userId', 'someHelper.result'),
    mw.query().unset('username'));

This is quite a handful to read and isn't instantly clear in a route. If this a defined Middleware name replaceQueryUsernameWithId(req, res, next) which was just invoked, it would be immediately clear what happened to the request. Additionally, while writing and validating this new Middleware, you could test it with a fake request and make sure it is modified correctly. Importantly, the entire route doesn't need to be called to test this Middleware, and a simple unit test suite for this Middleware prevents it from being changed without acknowledgment from the developer and possibly breaking other routes that use it.

(Note that I used someHelper.findUserId and never mentioned testing it specifically. If that helper and method were written on the model and then wrapped with library [e.g. middlewarize] to make it Middleware-friendly, there should be unit tests written for that Model method already. When the custom Middleware is written and tested, the Model can be mocked out or stubbed so the Middleware chain can be tested as a unit.)

Given this quick discussion, we can come up with two rules for any given middleware:

  1. Middleware only modifies the request object in order to provide better objects for the Controllers (we'll get to Controllers in a bit)
  2. Middleware never creates a conditional branch except to fail quickly (e.g. parameter validation)

Models

Models we have defined (more or less) as something that interacts with an external service. This includes things from the Instance Model (interfacing with MongoDB and Neo4j) to the Docker Model (which, unsurprisingly, deals with Docker). I believe we are using these very well. On occasion, we can use these Models and their methods directly in our Middleware chain definition for a route to perform actions. We do this through the use of middlewarize, mongooseware, or similar, that wraps the Model conveniently so the Models can be stored on the request and worked with in a Middleware fashion. Being able to say instances.findOne({_id: 'params.id'}) and put the Instance on the request is very powerful.

These Model methods should be fairly straight forward and simple, if we are to use them as Middleware. findOne is a great example: it takes an object as options, calls to the database, and calls back with the Instance. Like dat-middleware, we have to trust that any library wrapping this Model to be used as Middleware is behaving correctly. So long as the wrapper takes the called back data and puts it on the request in the correct place, all we have left to test is the Model call itself. A couple quick unit tests later, we have 100% confidence that this will work every time.

Model Methods as Middleware Tangent

One thing I do want to make clear is that we are taking Model methods and wrapping them to use them as Middleware. For example, take this Model method:

Walrus.prototype.eatFood(amount, callback)

We are able to call walrus.model.eatFood(4) in a route and the Walrus on our request gains energy (or whatever effect comes from eating food). What we do not want to do is the following:

// This is bad. Don't do this
Walrus.prototype.eatFood(req, res, next)

This means that the Model has knowledge of how it's being used. Using this function from another Model becomes very tricky as well because you have to build up a request and a response to be used, and you don't know if this method responds or simply continues with next (or conditionally does either - scary).

All of this I like to summarize as the following: using models as middleware (wrapped through some library, usually) is a powerful tool, but models should never provide middleware.

Routes and Controllers

Now that we've discussed some important details about both Middleware and Models, we need a way to connect them to a route. In the traditional MVC way, we want to place our Middleware at the beginning of our routes and then use Controllers to call our Models. However, Express has no formal notion of 'Controllers', because it is "minimal and flexible"...

Take a look at the MVC example in Express's repository. They have a interesting methodology they employ to define "Controllers". Specifically, the Users Controller. In it, there are 4 methods defined that directly call a function on res: list, edit, show, and update. These are the defined controllers for GET /users, GET /users/:user_id/edit, GET /users/:user_id, and PUT /users/:user_id, respectively. Now, these Controllers are very simplistic. They simply are rendering data or making a very simplistic change to the 'database'. You can use your imagination to see how models could fit into this paradigm.

There is a little bit of magic happening in lib/boot.js to map routes to these Controller functions. Importantly, here is the bit I want to highlight (omitting some logs):

if (obj.before) {
  app[method](path, obj.before, handler);
} else {
  app[method](path, handler);
}
// Note: app[method](path, ...) is similar to app.get('/users', ...)

I think this is very telling of how things should be structured. What I didn't mention about this example is the before method in the Users Controller. It is used here as Middleware before the handler is called. This Middleware before is defined in the Controller and if you look at it, you can see it doesn't do anything except put a user on the request object (there's some magic with next('route') in there, but we can ignore that for now). That function is what is run before anything that could or would deal with the Model (the Controller) would run. This matches precisely with our new definition of Middleware: simply modifying the request so a Controller has the correct data.

The Big Finish

Did you see what happened there? "Controllers" showed up! And here's where I think Express lead us astray. Everything that looks like function (req, res, next) is called 'middleware' in Express-land. Unfortunately, everything looks like that. If I were to ask you what are 'Controllers' are in the current API, you could only say "the routes", but that's not strictly true. This culmination of our conditional Middleware has put our code into a weird, tricky spot to test, and we can't do it without writing crazy requests to our API that have a high amount of overhead.

I don't necessarily like moving all the validation/before logic into Controllers like the MVC example because we have some very good, well tested tools that allow us to write Middleware cleanly and quickly. When we do write our own, we can write and expose it in such a way that it is able to be unit tested, improving our confidence in the code! We do need to rid our routes of conditional branches because it doesn't make sense to have the conditions in the routes. It makes it very difficult to test and to read, and conditions in the routes means that we can never guarantee coverage of it.

Functional Example

Let's bring all this discussion home with a relatively simple example. For this, we will use an API that provides access to Walrus models. Walrus models are capable of accepting Gifts (which can be looked up in the database) and having Energy. Some gifts are edible (Food) and some are not (Shoes). The Walrus will throw away inedible Gifts, expending Energy, but will eat edible Gifts, gaining Energy. The Walrus is given Gifts by doing a PUT request to /walrus/:id with a Gift in the body ({gift: 'shoe'}). Here's a simple example that demonstrates some of the complexity we currently have in the API (with some comments for clarity):

app.put('/walrus/:id',
  mw.params('id').required().string(), // validate and require `id`
  walrus.findOneById('params.id'), // find our walrus!
  mw.body().pick('gift'), // we only accept `gift` in the body of the request
  mw.body('gift').require().string(), // validate and require `gift`
  gift.findOneByName('body.gift'), // look up the gift in the database
  mw.body('gift.isEdible').validate(equals(true)) // if the gift is edible
    .then(
      walrus.model.throwGarbageAway('gift'), // throw away the inedible garbage
      walrus.model.mopeAndGroan()) // and now the walrus is sad
    .else( // otherwise the gift is edible
      walrus.model.consumeFood('gift')), // eat the food!
  mw.res.json('walrus')); // respond back with our walrus, who will have updated energy

If you've spent any amount of time in our API, then you know that this kind of pattern is pretty prevalent. What's tough about this is that in order to test that the Walrus does the correct thing with the Gift we have to spin up the entire API and make multiple requests with multiple Gifts and validate that all our Walruses have correct values at the end.

"But, you have two model methods here (throwGarbageAway and consumeFood) that you can test! That's good, isn't it?"

Yes, but no. This pattern of a route also leads people to simply mocking the Walrus and making sure the right model methods are called. We are doing that because we are trying to test the conditional logic in our route! This is what makes our tests so difficult and expansive, especially when we get nested conditions. The conditions here are also impossible to get coverage on (because of reasons eluded to above).

Here's the same route written with a different mindset:

// Middleware: for all /walrus/:id routes
app.all('/walrus/:id',
  mw.params('id').required().string(), // validate and require `id`
  walrus.findOneById('params.id')); // find our walrus, put it on the request

// Middleware Helper(s):
var validateGift = flow.series(
  mw.body().pick('gift'), // we only accept `gift` in the body of the request  
  mw.body('gift').require().string(), // validate and require `gift`
  gift.findOneByName('body.gift')); // look up the gift in the database

// Route(s):
app.put('/walrus/:id', // earlier middleware has provided the walrus on the request
  // Middleware:
  validateGift, // this is a route-specific middleware, so we put it here
  // Controller:
  walrus.model.processGift('gift'), // process the gift, Walrus!
  // View:
  mw.res.json('walrus')); // respond back with our walrus, who will have updated energy

There isn't much changed with this refactor. Importantly, the conditional logic is gone from the route! The route doesn't know any more than it needs. And notice how this reads (the comments in the routes for clarity): Middleware, Controller, View.

There is 'global' Middleware for all /walrus/:id routes at the top. Following that is a Middleware helper that is specific to this route which does some additional work to verify and gather information about the gift. This is all functionally exactly as before.

The Controller happens next, where the Walrus is now simply calling processGift. The route doesn't do any conditional checking, the Walrus is now responsible for that! processGift is probably a function that simply calls throwGarbageAway or consumeFood as before, but now the route doesn't care. The Model function for processGift now should be completely unit tested with all sorts of Food and Shoes, and it makes the appropriate calls as necessary. The best part is that the route never needs to be called to check that conditional.

Finally, the View takes our updated walrus and returns it to us. Pretty straight forward.

What's left to test here? Our unit tests take care of "consume or throw away the Gift based on if it is edible", so we don't have to make a bunch of requests to test that. Our helper Middleware validateGift can be unit tested by mocking out database calls and making sure it causes an error if the gift doesn't exist. All we functionally care about is that if we call this route with a valid food or shoe, the walrus behaves accordingly. No more truth-table-like validation of routes, we can simply make sure the route behaves correctly.

Express doesn't make MVC very easy, and that's why we need to define these better. By separating our Middleware from our Controllers (and our Views), we can define routes in better ways to utilize Models and Controllers in a way that's more logical and testable.

In Conclusion

More or less, the ultimate crux in our testing pattern is that we've built up a massive amount of debt to conditional logic in our routes and not enough Models with unit tests. If we are able to make Models more appropriately handle any given input, remove conditions from our routes, and clean up our Middleware, we should be able to have a much easier time moving forward with the API. This discussion is one that only focuses on the formatting of routes and the roles of Models and Middleware, but it is one step in the process of improving our lives as developers and maintainers.

Let's take one last look at the proposed changes, with a little more detail:

  • Redefine the roles of Routes, Middleware, and Models

    By using Middleware as something that only prepares the request for Controllers, we limit the scope of what it does and can easily separate out 'Controller' logic from the 'Middleware' logic (logic which is all Express 'Middleware').

  • Remove conditional logic from routes

    Bringing conditions out of the routes and into Models/Controllers allows us to test these conditionals much more easily, bringing down the number of required functional tests and increasing the number of unit tests. This speeds up our testing feedback loop considerably.

  • Strengthen unit tests for Models, enabling coverage

    As we move conditional logic to Models, we can ensure that they are covered much more easily with unit tests and get better feedback out of coverage reports.

Edit: formatting

Nice write up. Having this all written out helps as a reference to everyone as we move forward.
I have a few nits with some of what you've written above, but I think our conclusion is the same.

Here is the summary in my words with more action items and tips:

  1. Move conditional logic from routes to controllers and models
  • Stop using middleware-flow for conditionals in routes
  • Beware of overloading mongo models, and calling too many external models;
    to avoid this create a controller to function as the pipework between models
    used in a particular route.
  • I agree we should make our models heavy, but I would like our mongoose models' responsibility to be focused on database operations.
  • For our more complex models, we should consider creating overarching models that can be overloaded w/ complex behavior instead of bloating our database models.
    2) Strengthen Unit tests
  • Everything in our lib directory should have unit tests.
  • Unit tests should completely isolate functions, and mock out any external function calls and data.
  • For example: our unit tests need to be fixed to not touch the database.
  1. Define roles of Routes, Middleware, and Models in API
    ** We may have some slightly differing opinions here, so let's chat and come to some solid conclusions **

I never coded in my life, but I'm a strong proponent of OOP. It's ok if we overload the models, and treat them as objects with roles, properties and methods.

Controllers IMO should be very thin. Some code duplication is OK, people optimize too much for this.

I agree with Yash, I think current trends in MVC recommend making your models "heavy".
Let me be more clear about what I meant about mongo models: I think we should be careful about overloading our mongoose models, as they should be closely related to mongodb. I think it would be better to create an overarching model for our entities and overloading it instead.

Also, why does re-doing the api server matter if we are unbundling it?

In 1 year from now, the api server will just be an http server that writes stuff to a queue?

We are discussing patterns to be formalized as we move forward. Specifically, testing patterns are immediately important. Focusing on and bulking up our unit tests will ensure we can keep development on API fast. Currently, API has a lot of integration and functional tests which have become harder to maintain as API changes.

We have had a few meetings centered around this over the months, and Bryan has taken some initiative so that we can formalize and document our thoughts. The backend has been looped in on our direction in the meetings, but having a formal document will help us as we move forward and this is the start of it.

I would like our mongoose models' responsibility to be focused on database operations... For our more complex models, we should consider creating overarching models that can be overloaded w/ complex behavior... I think we should be careful about overloading our mongoose models, as they should be closely related to mongodb. I think it would be better to create an overarching model for our entities and overloading it instead. --@tjmehta

This is something I didn't specifically cover in depth because it can get complex. More or less, the cleanest solution in my mind would to define some class that allows us to handle complex operations. For example, an InstanceDeployment class could take an Instance and interact with all the other models it needs to deploy the Instance. This class then can be instantiated and used as part of the 'Controller' middleware, and it's easily testable because you can require the class as a unit and make sure model methods are called appropriately.

To make it concrete, instead of something like this:

app.post( //...
  instance.findOne(),
  sauron.allocateNetwork('instance'),
  mavis.getDock('instance'),
  docker.deployInstance('instance', 'sauronResult', 'weaveResult')

or something like this, where the model hides all the logic but talks to multiple other services

Instance.prototype.deploy = function (callback) {
  series([
    sauron.allocateNetwork,
    mavis.getDock,
    docker.deployInstance
  ], function () { callback(self); })
}
// and then having
app.post( // ...
  instance.findOne(),
  instance.deploy())

you could end up with something like:

InstanceDeployment.deployInstance = function (instance, callback) {
  series([
    sauron.allocateNetwork,
    mavis.getDock,
    docker.deployInstance,
    instance.updateWithStuff
  ], function () { callback(updatedInstance); })
}
// with a route like
app.post( // ...
  instance.findOne(),
  InstanceDeployment.deployInstance(instance))

InstanceDeployment is something that could be very easily unit tested and then you don't built up inter-model dependencies by having the other models buried in the Instance model.

All this to more or less say: yes, the models for the Instance should more or less only deal with the database. If we can limit models to that, it should make it easier to unit test the entire file. Defining the 'overarching models' is a little more tricky but with some thought, it can happen.

Also, why does re-doing the api server matter if we are unbundling it? In 1 year from now, the api server will just be an http server that writes stuff to a queue? --@ykumar6

First, OOP is awesome, but with that comes things like separation of concerns and other paradigms that make the design cleaner. We are just trying to define here the best ways to deal with our complexity moving forward.

As for why to re-do the API? We are spending a lot of time now dealing with the complexity of the API and it's become cumbersome to do anything in it. We need to fix that debt before we can really even consider doing more things to and with the API.

Brilliant Bryan, I really like what is proposed here. The basics are solid and I think we should move forward with them.

I agree with TJ where mongo models should be just for talking to DB. Then we can create models like InstanceMachine (name pending) which would do things like

InstanceObject.prototype.deploy = function (callback) {
  series([
    sauron.allocateNetwork,
    mavis.getDock,
    docker.deployInstance,
    instance.saveToDb
  ], function () { callback(self); })
}
// and then having
app.post( // ...
  instance.findOne(),
  InstanceMachine.deploy())

@bkendall great write up.
I agree with proposed solution and I think all of us are pretty much on the same page.
I think our previous way of doing things (middlewares) was great for the phase of fast development/prototyping (our was actually concise and readable) and our new way would be great (heavy models) for the current/next phase, where we are focusing on stability.

Also I want to mention that we are already moving this way by workerizing our code.
While creating workers we are moving conditionals from the routes into unit-testable workers.
Basically we are doing this right now in small steps/iterations which I think is the best way to tackle refactoring.

I just finished re-reading this now that I'm back from my week off. I'm in complete agreement with the analysis and the proposed initiatives and I'll try to support it going forward when refactoring or building new logic.
I also agree with Anton that some of these initiatives are currently being accomplished through moving various tasks into our worker-queue system ("workerizing"). For instance, the discussion of creating over-arching parent models to orchestrate a series of operations in the models - the workers themselves are assuming this role.