Modularisation

Question

Modularisation

blahah opened this issue 7 years ago · 18 comments

The codebase for the v1 release of ScienceFair was monolithic. The ecosystem of tools planned for the next few releases requires a much more modular approach.

There's a project for managing this process. Issues related to modularisation should be added to the project and tracked through the process there.

There's also a new organisation, sciencefair-land, under which we can collect modules we want to maintain.

why modularise?

There are many good reasons in general for writing small modules (see substack and mafintosh's excellent explanations).

Specifically in the case of ScienceFair, we are building not just an app, but an ecosystem of tools. For example, we will have at least:

standalone tools for generating and managing datasources (by v1.1)
tools for developing packages (by v2)
sites for discovering datasources and packages

The most efficient way to build and maintain this ecosystem will be to abstract out units of shared logic, configuration, assets or data into standalone modules. These can then be reasoned about, tested and maintained in their own scope, and used across any number of tools.

When we start building the package system for v2, having all the parts of the app as atomic as possible will greatly ease the transition to being customisable with packages.

process

We have at least these kinds of modularisable units (i.e. things that should be separate npm packages) in the project:

general utilities not specific to sciencefair (e.g. this and this)
- these can be broken out by anyone and maintained under any user
ScienceFair specific classes (e.g. Paper and Datasource
- these should be broken out with scoped names like sciencefair-paper and maintained under the @sciencefair-land org
choo plugins:
- some that are of general utlity to choo apps (e.g. online)
  - these should be given choo-scoped names, e.g. choo-online and can be owned by anyone
- some that are specific to ScienceFair (e.g. models mapping to classes like paper)
  - these should be choo and sciencefair scoped - e.g. choo-sciencefair-paper-model
choo views which can be broken out into nanocomponents - see #47
- these should be choo and sciencefair scoped - e.g. choo-sciencefair-paper-view

tasks

prioritise the different kinds of modules
list out the modules and add them to the tracker
break out the modules, one PR at a time

Answer 1 · 2017-07-06T02:57:51.000Z

Just a suggestion but I think the most important thing to break out might placing a custom Lens build under the sciencefair GitHub or NPM. I was going to take a crack at #61 but that basically means hacking a compiled file - Lens Starter makes it way easier to add panels and converters which will be necessary as ScienceFair grows in scope.

I'd be happy to put together a proof-of-concept if you're interested.

Answer 2 · 2017-07-06T08:05:05.000Z

@CAYdenberg agreed! I actually did this before, but some bugs became easier to fix in the compiled file vs recompiling and re-releasing all the time. It's not sustainable though, as we'll need to be able to incorporate new converters for new XML sources. Could you create a new issue where we can discuss? A proof of concept for the lens build system would be very welcome :)

#61 can (and should I think) be done inside the reader view, rather than lens. I'll comment there about how.

Answer 3 · 2017-07-08T07:37:31.000Z

Hi,
I'm concerned about how this change will effect development. Is it the intention that the modules will be published on npm and required into the application? If this is the case it will make modifying the modules more difficult because one would likely edit the package in node_modules to run the application with changes. Given that you were describing atomic modules I imagine that this may be a common situation.

To contribute changes developers may end up pulling multiple repositories from git into the same folder, ie all equal depth with the science fair folder -- I think that's the approach I would take. From there would it be advisable for the developer to symlink these application module folders with their counterpart in the sciencefair node_modules?

I can't help but wonder if you've considered a mono repo. ckeditor5 is an example of this type of architecture and they share their development workflow. I feel this is the direction you may be wanting to head. lerna and mgit2 are appropriate tools.

I do not think the substack post is entirely relevant to this issue, because I think they were writing about publishing utility modules to npm versus feature modules as in this issue -- ScienceFair specific classes (e.g. Paper and Datasource)

As more modules are published to npm I expect I won't need to write so many modules but there will always be room for new stuff. line 64

I think either approach has it's drawbacks and would benefit from documentation -- developer guide?

Answer 4 · 2017-07-08T09:44:45.000Z

@slmyers thanks for raising this.

Is it the intention that the modules will be published on npm and required into the application?

Yes

can't help but wonder if you've considered a mono repo.

I have thought about it (a lot!) - and it's useful to have this opportunity to record the reasoning.

My long experience of both approaches (hypermodular vs monorepo) across many projects is that monorepos are a nightmare to work with at even the scale of 10 or so packages, and very difficult to contribute small changes to from outside the project. Tools like lerna for managing them are flaky and have poor error handling (e.g. https://github.com/lerna/lerna/issues/524#issuecomment-299264115, not linked because it's not productive for them to be notified to read this) - there is essentially no good tooling for such systems (that I have found), and they waste a lot of time on admin. They also give the appearance of a controlled ecosystem which is philosophically the opposite of what we want to achieve.

The small modules approach is much better tooled and easier to handle in development imo - and when we get to that stage I will document my own preferred workflow but there are many. One simple flow is already built into npm and yarn - to work on a package you do:

cd /path/to/small-module
npm link
cd /path/to/app-dir
npm link small-module

Then you can work in /path/to/small-module and see the changes reflected in the app. When you are done, you do rm -rf node_modules/small-module && npm install in the app repo to return to the published module dependency. See atom for an example of a hugely successful project that follows this development workflow.

It's also much easier to use forks of parts of the system this way, or replace them altogether. With an isolated module (e.g. sciencefair-land/sciencefair-paper) you can just fork that repo, then npm install your-name/sciencefair-paper and you can depend on your own fork using github. Doing the same with a monorepo is not currently possible afaik, especially because monorepo tooling often templates out parts of the package.json.

I think the substack post still applies here. As pointed out in the why modularise? part of the issue, classes like Paper and Datasource will be used across many tools - not just ScienceFair the app but also tools for producing and managing datasources, web tools, and developer utilities. They are also intended to be swappable by v2. Even if they weren't, having them isolated makes contributing easier (based on my experience and feedback from contributors on other projects), and makes isolated bug fixing and testing faster to do and easier to reason about.

would benefit from documentation -- developer guide?

I agree with this part completely. Good developer documentation, when we get to that stage, will be a priority. Simple developer tooling will also help. For example, switching a dependency for a local directory and back again should be one command each (scripts wrapping the series of npm or yarn commands). We should start planning these things soon.

Answer 5 · 2017-07-08T17:25:06.000Z

Another trick with a lot of modules is to do development in a folder named node_modules, e.g. you can set up paths like this:

~/code/node_modules/science-fair
~/code/node_modules/small-module

This allows you to use small-module from science-fair without npm link! npm looks up the tree for any node_modules folder and takes the first module it finds.

We wrote about this workflow a bit in the Dat contributing guide, but let me know if we need to clarify anything!

Answer 6 · 2017-07-08T17:31:10.000Z

@joehand omg, I've only ever had a node_modules in a parent folder be a source of bugs because it was there accidentally. Can't believe I never thought of this, great shout! Thanks :)

Also that's a good point - I think there will be considerable overlap in the dat and sciencefair contributing materials so we should look into a shared resource where possible (cc @maxogden)

Answer 7 · 2017-07-09T04:43:43.000Z

@blahah To echo @joehand, a manually managed node_modules folder placed in a subdirectory is an effective solution. We use it extensively in stdlib with support for module decomposition to great success, thus allowing us to enjoy the benefits of both a monorepo and hypermodularity.

Answer 8 · 2017-07-09T05:59:34.000Z

@kgryte interesting - this is actually the inverse of what @joehand was suggesting I think - in stdlib the node_modules is a subdirectory - in dat development it's a parent directory:

dat

node_modules/
├── dat
│   └── node_modules
│       └── module_3
├── module_1
└── module_2

stdlib

stdlib/
└── lib
    └── node_modules
        ├── module_1
        ├── module_2
        └── module_3

Is this interpretation correct? If so, I'm not sure I understand how this is different than a monorepo (for example, I can't npm install a fork of a specific module from stdlib without publishing it on npm).

Answer 9 · 2017-07-09T06:17:43.000Z

I am not clear on the dat approach. I have read and reread the contributing guide, and I can read it both ways.

What I can say for stdlib is that you can install a fork. The way we have setup the project is that we are able to decompose (using custom tooling) the entire project into individual packages (resolving all dependencies in a manner similar to browserify) and pushing each package to its own separate GitHub repository. From each separate repository, we are able to publish a package independently of the rest of the project. In which case, consumers can effectively build their own "stdlib" from the individual components. This is what I meant by having both worlds: we develop in a monorepo, allowing centralized development, while publishing as individual packages and repositories, allowing people to fork, clone, and combine individual components as needed.

And now rereading this thread, the one use case not supported by the stdlib approach is where you want to fork and then recombine in the same project. Meaning, I cannot (easily) modify a local copy of stdlib to use a forked version of an internal project package. However, in this case, I would say to simply create a new branch for the main repo containing the modified code.

Note: decomposing the project into separate repos is not live. We have proven the concept, but not flipped the switch, as our namespace is still in flux.

Answer 10 · 2017-07-09T06:20:33.000Z

@kgryte thanks, it was the separate repos part that was missing from my understanding, which makes sense if it's not live! Will you achieve it using git submodules?

Answer 11 · 2017-07-09T06:23:31.000Z

p.s. @joehand perhaps the dat guide needs some clarification if it can be read both ways? 👆

Answer 12 · 2017-07-09T06:25:29.000Z

@blahah For stdlib, separate repositories are for consumption, not development. Meaning, we publish separate repositories so consumers can fork and modify individual aspects of the project without needing to setup the entire development environment and pull down the entire codebase. If you want to contribute to stdlib, you need to contribute to the monorepo.

From the project's standpoint, publishing a repository is similar to publishing a package to npm: a repository is an end product, not a development feature.

Answer 13 · 2017-07-09T06:27:16.000Z

@kgryte understood, but I am trying to understand how you will incorporate the repositories for each module into the main repo? or is it that you do all development on the code inside the monorepo, then have a script that takes each module in lib and pushes it to its own repo as well?

Answer 14 · 2017-07-09T06:42:46.000Z

@blahah Do all development inside monorepo, and we have a script which builds the repository for each individual package.

Answer 15 · 2017-07-09T06:46:55.000Z

@kgryte thanks for explaining :)

So, I think the long and short of all this is that:

sciencefair should pursue the hypermodular approach
we should implement it gradually and document the workflow carefully as we go
minimise custom tooling or dependence on non-standard tooling (i.e. outside of standard node, electron and npm behaviour)
we now have some nice examples of other projects we can look to for examples

Answer 16 · 2017-07-09T06:48:26.000Z

@blahah Yeah, sorry if I muddied the water to begin with. What you outline seems reasonable, and I will be curious to see how things progress.

Answer 17 · 2017-07-09T07:02:40.000Z

Not at all @kgryte - it's non-trivial so having a back-and forth to get the detail is useful. I have never seen a project, afaik, that has forkable modules as part of a monorepo so it's very cool to have the stdlib example explained.

Answer 18 · 2017-07-19T07:38:44.000Z

mono repos

@slmyers @blahah on mono repos:
In dat-ecosystem/dat#824 (comment) I presented vert.x
Have a look at vert-x3: one github org containing all official modules
I would argue vert.x has all the benefits of the mono repo, without most of the hassle.

modularisation approach

I find you are having a too technical mindset wrt modularisation, as @slmyers rightfully states:

I do not think the substack post is entirely relevant to this issue, because I think they were writing about publishing utility modules to npm versus feature modules as in this issue -- ScienceFair specific classes (e.g. Paper and Datasource)

True, you will elicit modules according to technical + application lines. But don't lose sight of your domain.

For example, you could have modules for actors (e.g. Scientist, Publisher), entity types (e.g. Paper) or processes (e.g. Peer Review), etc.

In fact, I would start with the domain, create some domain models and process flows, figure out a semantically meaningful and intuitive break-up into (domain) modules, gauge architecture impact, and only then finally determine the technical module categories you have