aws/aws-cdk-rfcs

Monolithic Packaging

eladb opened this issue · 10 comments

eladb commented
PR Champion
#122 @rix0rrr

Description

The CDK consists of 100+ packages, one for every AWS service and then some, with
complex interdependencies. For example, the aws-ecs package depends on core,
aws-iam, aws-ecr, aws-ec2 and more packages to do its work. In fact, it
depends on 23 other packages.

This means that when a user wishes to use the aws-ecs module, their package
manager needs to fetch all 23 dependencies.

This proposal suggests to bundle and release the AWS CDK as a single monolithic
module. This means that the way users consume the CDK will change in a breaking
way.

Progress

  • Tracking Issue Created
  • RFC PR Created
  • Core Team Member Assigned
  • Initial Approval / Final Comment Period
  • Ready For Implementation
    • implementation issue 1
  • Resolved
eladb commented

Based on a whitepaper by @rix0rrr.

The ultimate root cause of the issues we are encountering is a specific feature of the NPM package manager, one that both encourages a package ecosystem that “just works” with minimal maintenance and versioning problems, as well one that significantly simplifies the package manager:

Packages can appear multiple times in the dependency graph, at different versions.

It simplifies the package manager because the package manager does not have to resolve version conflicts to arrive at a single version that will work across the entire dependency tree, and report a usable error if fails to do so. In the presence of version ranges, this is an open research problem, which NPM conveniently sidesteps.

The Feature

As an example, a package tablify could depend on a package called leftpad at version 2.0, while the application using tablify could be using leftpad at an older and incompatible version 1.0. The following dependency graph is a valid NPM dependency graph:

app (1.0)
+-- tablify (1.0)
|   +-- leftpad (2.0)
+-- leftpad (1.0)

Loading dependencies is done in NPM by calling require("leftpad") or import { ... } from "leftpad". If this statement is executed in a source file of app, it will load leftpad@1.0, and if it is executed in a source file of tablify it will load leftpad@2.0.

The feature of multiple package copies that get loaded in a context-sensitive way is not a common one. The only other package manager that I know of that supports this is Rust’s package manager, Cargo. Most other package managers require all packages to occur a single time in the dependency closure. To do so, I believe most employ a conflict resolution mechanism that boils down to “version specification closest to the root wins” (feel free to add counterexamples to an appendix if I got this wrong).

The Problem

All is well with this strategy of keeping multiple copies of a package in the dependency tree, as long as no types “escape” from the boundary of the package’s mini-closure. In the typical case of tablify and leftpad, [one can reasonably assume that] all values going in and out of the libraries are types from the standard, shared runtime (in this case strings), and the only thing being depended upon is a specific behavior provided by the library.

However, let’s say that leftpad contains a class called Paddable that tablify accepts in its public interface, and that the interface of the class had undergone a breaking change between version 1 and 2 (in JavaScript):

// --------------- leftpad 2.0 ---------------------------
class Paddable {
  constructor(s) { ... }
  padLeft(n) { ... } 
}
// --------------- tablify 1.0 ---------------------------
// Expected: Array<Array<Paddable>>
function tablify(rows) {
  return rows.map(row => row.map(cell => cell.padLeft(10));
}
// --------------- leftpad 1.0 ---------------------------
class Paddable {
  constructor(s) { ... }
  
  // Oops, forgot to camelCase, rectified in version 2.0
  padleft(n) { ... } 
}
// --------------- app 1.0 ---------------------------
import { Paddable } from 'leftpad';
import { tablify } from 'tablify';
const p = new Paddable('text');
tablify([[p]]);

In this code sample, a Paddable@1 gets constructed and passed to a function which expects to be able to use it as a Paddable@2, and the code explodes.
Every individual package here was locally correct, but the combination of package versions didn’t work.

TypeScript to the rescue

The CDK uses TypeScript, and the TypeScript compiler actually protects against this issue. If we actually typed the tablify function as

import { Paddable } from 'leftpad';
function tablify(rows:* *Array<Array<Paddable>>) { 
  ...
}

The TypeScript compiler would correctly resolve that type to Array<ArrayPaddable@2>, and refuse to compile the calling of tablify with an argument of type Array<ArrayPaddable@1>, because of the detected incompatibility between the types.
TypeScript will refuse to compile if types of multiple copies of a package are mixed(*).

(*) This is true for classes, where it will do nominal typing. For interfaces, TypeScript will do structural typing, which means any object that has members matching an interface definition is considered to implement that interface, whether it has been declared to implement it or not, and so a class from one copy of a library may be considered to implement an interface from a different copy.
Here is an example for a compiler error caused by a dependency mix:

Unable to compile TypeScript: bin/app.ts(10,36): error TS2345:
Argument of type 'import("node_modules/@aws-cdk/core/lib/app").App' is not assignable to parameter of type 'import("node_modules/my-module/node_modules/@aws-cdk/core/lib/app").App'. 
Types have separate declarations of a private property '_assembly'.

Implications on CDK

This situation arises the following use cases:

  • User wants to use an older version than the latest published one (rollback).
  • User is using a 3rd-party construct library.
    Let’s look at these in turn.

User wants to use an old version

The first one, colloquially called “rollback” is easy to understand and seemingly has a simple solution, so let’s look at it first. This situation comes up when the user is in one of 2 situations:

  • CDK team has just (accidentally) released a buggy version and they want to roll back to an older version.
  • CDK team has just released a breaking change (valid in experimental modules) and the user is not prepared to invest the effort yet to migrate, so they want to stay at an older version.
    For an example, let’s say CDK has just released version 1.13.0 but the user wants to stay at 1.12.0. The only way they have to achieve this is to control the versions in their own app’s package.json, so they write:
"dependencies": {
  "@aws-cdk/aws-ecs": "1.12.0",
  "@aws-cdk/core": "1.12.0",
}

The lack of a caret makes it a fixed version, indicating they really want 1.12.0 and not “at least 1.12.0” (which would resolve to 1.13.0 in this situation). However, because of the transitive caret dependency the complete dependency graph would look like this:

app@1.0.0      -> aws-ecs==1.12.0, core==1.12.0
aws-ecs@1.12.0 -> core^1.12.0
=== { core@1.13.0 is available } ===> 
app
+-- core@1.12.0
+-- aws-ecs@1.12.0
    +-- core@1.13.0

The dependency tree ends up with 2 versions of core in it, and we end up in a broken state.
THE OBVIOUS SOLUTION to this is to get rid of the transitive caret dependency, and make intra-CDK dependencies fixed version (==1.12.0, ==1.13.0, etc).. This puts all control over the specific version that gets pulled in in the hands of the user:

app@1.0.0      -> aws-ecs==1.12.0, core==1.12.0
aws-ecs@1.12.0 -> core==1.12.0
=== { core@1.13.0 is available, but not selected }===> 
app
+-- core@1.12.0
+-- aws-ecs@1.12.0

This works fine for the CDK, but breaks in the face of 3rd party construct
libraries that still use a caret dependency (see the next section).

User is using 3rd party construct library

The previously discussed solution works fine for first-party libraries, because
they are all authored, versioned, tested and released together.
One of the stated goals of the CDK however is to foster an ecosystem of 3rd
party construct libraries. These construct libraries are going to depend on the
1st party CDK packages the same way the packages depend on each other, and so
they run into the same issues:

  • If a 3rd-party library uses a caret dependency, then effectively all consumers
    of the package are forced to always be on the latest CDK version (or the
    dependency tree will end up in a broken state).
  • If a 3rd-party library uses a fixed version dependency on CDK then:
    • The consumer must use the same fixed version dependency on CDK, because if
      they use a caret dependency the tree will end up in a broken state.
    • The 3rd-party library author is forced to release an update every time the
      CDK releases an update (which is every week, or even more often). The
      users of the 3rd party library will not be able to migrate to the new CDK
      version until the author does so.
      These conditions will make it exceedingly onerous to maintain or consume a 3rd
      party library, requiring manual action every week for authors or be locked into
      old versions for consumers if and when authors fail to update.
      If library writing is not a “pit of success” condition, I don’t see how any 3rd
      party library ecosystem will ever evolve.

      Given that according to this analysis, 3rd party library authors should neither
      use caret dependencies nor fixed version dependencies, the only remaining
      conclusion is that they shouldn’t use dependencies at all.

Just occurred to me that cx-api needs to remain its own module since it needs to be shared by both CLI tools and the framework.

I see another problem that I think a package reorganization would address:

Many people are mixing up L1 and L2 APIs

I see a lot of people using L2 classes, and then being disappointed when they can't plug them into L1 classes. This happens a lot in Python (where there's obviously no typechecker to tell you you're doing it wrong), but also happens in other languages where people call integration APIs between L2 classes to plug the result into an L1 (see recent changes to CloudWatch dashboards, mutliple people internally and externally were calling toJson() on GraphWidgets and plugging the result into a CfnDashboard, blissfully unaware that a special method needs to be called there to do a Token-aware JSON stringification).

Apparently our current communication methods of Foo and CfnFoo classes, plus notes that these are separate classes plus notes in every talk that we do aren't enough to dissuade people from doing this.

I would therefore like to propose we drop the L1-and-L2-in-a-single-package style of bundling, and publish 2 different packages, one with all L1s and one with all L2s.

Note that this needn't imply a peerDependency of L2-to-L1 library, for most reasonable use cases(*) a regular dependency of L2-to-L1 is sufficient (since by design we don't expose L1 types in the L2 API, and in fact jsii won't allow us to do that).

(*) The existence of construct.node.defaultChild breaks this assumption a little, it's important that the consumer app and construct library agree on the version of the CfnFoo class that is accidentally exposed this way. Still, I expect this to be a niche use case that will mostly "just work"

Originally posted by @rix0rrr in https://github.com/_render_node/MDIzOlB1bGxSZXF1ZXN0UmV2aWV3VGhyZWFkMjMxMjgzNzk0OnYy/pull_request_review_threads/discussion

eladb commented

@rix0rrr wrote:

publish 2 different packages, one with all L1s and one with all L2s.

Not sure about publishing two separate packages for L1s and L2s but we can totally put all L1s under their separate aws-xxx-cfn namespace within the big module.

For example, if we release two packages, where would the core modules go?

We are experiencing the 3rd-party dependency issue noted above - I'd love to see a resolution to that. It's worth noting that we're encountering it in the Python version of CDK, but the results are the same.

It's worth noting that we're encountering it in the Python version of CDK

What are the issues you are encountering in the Python version of the CDK specifically? Doesn't user version override library version always?

Not sure about publishing two separate packages for L1s and L2s but we can totally put all L1s under their separate aws-xxx-cfn namespace within the big module.

For example, if we release two packages, where would the core modules go?

I think it's reasonable to have a subpackage that's aws_cdk_core that includes the core libraries and all L1 constructs. I think introducing a secondary namespace in the same mono-package is just rebranding the problem: people will think that because they're in the same package all constructs should be interchangeable. Having distinct packages, one for the base resources and one for the "top-level, user-facing" constructs makes this much clearer.

Further expanding on this, although NPM does not support it natively (yet), Yarn does support a resolutions field that could potentially fix this issue for Yarn users. cdk init would then prompt users to either use Yarn (and then setup the resolutions field), or use NPM with this caveat in mind.

This of course is nowhere near ideal and does not solve all of the ergonomic issues, but critically can be done without a significant breaking change.

cdk update could be introduced as a command that updates the resolutions field and then does a dry run. If anything fails, it could revert the change with a warning.

We are experiencing the 3rd-party dependency issue noted above - I'd love to see a resolution to that. It's worth noting that we're encountering it in the Python version of CDK, but the results are the same.

Below is an example for Python.

Here is an excerpt from my library's setup.py file:

setup(
    install_requires=[
        'aws_cdk.aws_iam>=1.18.0',
        'aws_cdk.aws_s3_assets>=1.18.0',
        'aws_cdk.core>=1.18.0',
        'docker'
    ]
)

From the following pip install output, you can see that I've got conflict for jsii version requirement propagation from aws_cdk.aws_iam and constructs packages.

(.venv) cdk-chalice apulver$ pip install -e .
Obtaining file:///Users/apulver/Code/cdk-chalice
Collecting aws_cdk.aws_iam>=1.18.0 (from cdk-chalice==0.7.0)
...
Collecting jsii~=1.3.0 (from aws_cdk.aws_iam>=1.18.0->cdk-chalice==0.7.0)
...
constructs 3.0.1 has requirement jsii~=1.1.0, but you'll have jsii 1.3.1 which is incompatible.
...
Installing collected packages: typing-extensions, cattrs, jsii, publication, aws-cdk.cloud-assembly-schema, aws-cdk.cx-api, constructs, aws-cdk.core, aws-cdk.region-info, aws-cdk.aws-iam, aws-cdk.assets, aws-cdk.aws-events, aws-cdk.aws-kms, aws-cdk.aws-s3, aws-cdk.aws-s3-assets, websocket-client, docker, cdk-chalice
  Running setup.py develop for cdk-chalice
Successfully installed aws-cdk.assets-1.33.0 aws-cdk.aws-events-1.33.0 aws-cdk.aws-iam-1.33.0 aws-cdk.aws-kms-1.33.0 aws-cdk.aws-s3-1.33.0 aws-cdk.aws-s3-assets-1.33.0 aws-cdk.cloud-assembly-schema-1.33.0 aws-cdk.core-1.33.0 aws-cdk.cx-api-1.33.0 aws-cdk.region-info-1.33.0 cattrs-1.0.0 cdk-chalice constructs-3.0.1 docker-4.2.0 jsii-1.3.1 publication-0.0.3 typing-extensions-3.7.4.2 websocket-client-0.57.0

This literally bit me live in an AWS Gameday. I was on 1.40.0 and 1.40.1 hit. I did a new cdk init and boom type mismatch.