Shopify/packwerk

Make packwerk work with in-repo engines using load_path aliases

AndrewSwerlick opened this issue · 13 comments

Description
This is similar to #47 , but there are some distinctions because we're discussing engines, which are autoloaded, vs gems that are not.

The context is that we have a mono-repo setup like this

root_dir
|- engine_1
|- engine_2
|- app_1
|- app_2

where multiple apps will use the same engines. Each engine is symlinked under an apps vendor directory like
root_dir/app_1/vendor/gems/engine_1 -> root_dir/engine_1`

We attempted to setup packwerk for a given app with a root_dir/app_1/packwerk.yml, with a root package at root_dir/app_1/package.yml and a package inside an engine like root_dir/engine_1/app/models/users/package.yml.

We found that we could properly setup the packwerk.yml include and package_paths to follow the symlinks inside of vendor and scan all the files both in the app folder, and in the engine folders, but it would never flag any violations.

In digging in, we found the problem was with the constant resolution workflow. Although the autoload paths for the engines were available in Rails.autoloaders, the path found there was root_dir/engine_1/** instead of root_dir/app_1/vendor/gems/engine_1. As a result it was being filtered out of the paths provided to the ConstantResolver by ApplicationLoadPaths.filter_relevant_paths

We experimented with monkey patching filter_relevant_paths to transform the path names for all these in-repo engines to use the full symlink path. When we did that, everything worked as expected.

I wanted to propose the idea that we upstream our patch by adding a new key to the packwerk.yml configuration file named something like load_path_aliases. This would take a hash where the keys would be the load_path as discovered in Rails.autoloaders and the value would be an alias that the path should be transformed to. Then we'd modify ApplicationLoadPaths to go through these aliases, and transform all the paths before passing them into ConstantResolver

Does this seem like a reasonable approach? If folks are open to it, I can put together a PR

To Reproduce

Setup a mono repo as described above. If folks would like a clearer example, I can setup a sample application.

Expected Behaviour

Under this proposal, a developer could a define a packwerk.yml with this new load_path_aliases something like this

load_path_aliases:
  '../engine_1/': 'vendor/gems/engine_1/'
  '../engine_2/': 'vendor/gems/engine_2'/

Then any paths that started with ../engine_1/ would be replaced with an equivalent starting with vendor/gems/engine_1

Version Information

  • Packwerk: 2.2.0
  • Ruby 2.75

I'm all to solve this problem, but not to add new configurations. Maybe we should always deal with absolute, normalize paths? Any reason why you want to use the symlink and not the absolute, non-symlinked path?

@rafaelfranca thanks for the reply.

Any reason why you want to use the symlink and not the absolute, non-symlinked path?

Since the non symlinked versions live outside of the application directory, I thought that would be an easier path to go. I figured it would be pretty hard for packwerk to discover and resolve paths to code files that were not "below" packwerk.yml. But if you see a path forward in that direction, I'm definitely interested.

Yeah, looking further into this approach, it seems like it would be pretty difficult to make the rest of packwerk play nice with paths that aren't below packwerk.yml. From what I'm seeing there are alot of things that rely on treating the directory that packwerk.yml is in as a root, and only monitoring files below that.

If we want to do this without new config options, I wonder if we can modifyApplicationLoadPaths.extract_application_autoload_paths to somehow figure out when it's dealing with an engine loaded via symlink and transform the path. I'll do some research into that.

Alright, so I might have a path forward. Here's the basic approach I'm considering

  1. Assume that any gem symlinks will be direct children in vendor/gems
  2. Loop through all the children in this folder and see if symlink? true for any of them
  3. If symlink? is true, add it to a list of path aliases.
  4. Then proceed similarly to my original proposal

How does this approach sound?

I don't think we can assume packages will be inside vendor/gems (vendor is for 3rd-party code BTW, which doesn't seems to be the case in your application since the code is yours).

Packages can be anywhere.

It probably will require a lot of changes, but I feel like it would be better to always be dealing with absolute (real) paths instead of relative paths, and by doing that the problem you are describing would not exist.

Let me see if I can understand the right way to frame this problem. What I think I'm hearing is this

  • packwerk.yml should continue to live at the root of a rails application, even in a mono repo with multiple applications like ours
  • The configuration api should stay the same, so in a monorepo like ours, we would still need to setup symlinks so packwerk.yml is still only scanning items "below" it
  • But we'll want to change how package resolution and scanned file resolution works so it can tell when it's dealing with a symlinked path, and instead get back to the actual realpath
  • We might also need to make minor changes to the ApplicationLoadPaths to avoid filtering out loads paths that aren't in the project root before passing them into the ConstantResolver

Does that sound right?

Yes. I think they key here are the changes to ApplicationLoadPaths.

We have the same setup, monorepo, with multiple engines. We do have a number of symlinks that allow these engines to share stuff, like the fixtures directory.

Would love to see support for this setup added, as it would make adopting packwerk much simpler.

@rafaelfranca Cool, this is a concrete enough start for me to dig into it.

I'm out for the next two weeks, so if there's anyone else invested enough in this problem to get started, feel free, but I'll pick it up when I come back if not.

Picking this back up now, and I might have a different approach after discussing some things with a few team members.

Rather than trying to solve the problems related to symlinks, I'm actually considering an approach which instead makes packwerk scan for packages outside of the root directory. The idea would be to use something like Rails.application.railties.select {|r|r.kind_of? Rails::Engine }.map{|r| r.root} to find all the engines active in the application, and have PackageSet scan them for packages as well. Additionally, we'll modify the application load paths to also include paths out of the app directory, so we can resolve constants from within those engines.

This means that a packwerk check in the rails application directory will scan all files in the in the rails app (but not files in engines). For each file, they'll be able to resolve constants defined in the engines, map them to packages, and check for violations.

If we want to check that inside the engine we're respecting or package boundaries, we'll have to execute a another packwerk check in each engine directory, which seems like a reasonable way of organizing things.

What do folks think about this new approach? @pboling I'm not sure if this would work for your setup or not

@AndrewSwerlick

I'm having a little trouble understanding the problem and wondering if you're available sometime (maybe next week) to jump on a quick zoom so I can understand the layout of your application and the behavior you're hoping for. Also free today 3:45 EST. If you don't have time, I can read this through again a bit more closely and try to ask some clarifying questions.

@alexevanczuk Sure I can do 3:45 today. Feel free to shoot me an email with an invite at the address on my profile, or DM me in the rubymod slack

I've gone ahead and opened a PR (#216) with this approach. It's passing all tests, but I still need to run a test against our repo. @alexevanczuk Helped and ran it against an existing repo using a traditional structure, and we know it doesn't create in regressions there.