artsy/README

RFC: Decommission Volt 2 and roll code back into Volt 1

Closed this issue · 1 comments

⚠️ Original issue with discussion is located in Volt 2 repo: https://github.com/artsy/volt-v2/issues/1225. Moving to README for posterity as V2 is about to be archived.

RFC: Decommission Volt 2 and roll code back into Volt 1

Resolution

Accepted

Level of Support

3: Majority acceptance, with conflicting feedback.

Additional Context:

We were able to sus out many pros and cons via the Decision Matrix.

Next Steps

Documented here, but in short:

  • 1 PR migrating existing JS to legacy folder
  • 1 PR adding in the new router
  • 1 PR migrating Demand + tests
  • Rest of the apps follow

Context and Background

Artsy’s Partner CMS (Volt) is now approaching 10 years old. There’s almost ~15k commits, 7k PRs, and many layers to the stack, which are both old and new, as well as outdated. Without a lot of diligent maintenance, working in an app of this size can be a slow process, and making large-scale changes nearly impossible. Poor DX impacts — and often puts a hard pause on — feature development, iteration, developer motivation, and innovation.

Volt suffered from all of these issues. Some were worse than others, but the aggregate impact was an environment that didn’t quite inspire. Additionally, the manner in which we build apps has changed over the years. Our current model emphasizes GraphQL (Metaphysics) over REST, with API interactions taking place via mutations dispatched to various micro-services. Front-ends then query Metaphysics via Relay, which render responses in React and Palette. This is different than a Rails app like Volt, rendering UI.

Our work building out Forque in Next.js (another Admin app) showed much promise in terms of this new JS-centric model; engineers could move fast without needing to worry too much about the underlying framework details, while also having fun. Given a number of previously-failed and / or incomplete attempts to modernize Volt 1, we started to think about how the successes with Forque could be applied to a new Volt, one that rolls up all of our previous best practices into a fresh greenfield app that interacts with legacy code, yet is independent from it in the ways that matter.

With that, Volt 2 was born. And it’s been working really well! We’ve been able to (rapidly) build out a number of new apps in the new framework with very little overhead. If you know React and Relay and Palette (as most of the team does), it “just works”.

Why Now?

In short, much of the DX in legacy Volt has been fixed. With the team reorg, the newly spun up Amber team was able to provide the space to look deeply at the existing issues in Volt and address. Volt has a new compiler, instant Hot Reloading, a new and modern way of interpolating new React code into old parts of the page (which make incremental migration much more actionable), all primary FE libraries upgraded to latest, and more.

Additionally, because all of the work done in V2 is Metaphysics backed, its portable in a (close to) copy-pasteable way. So all of the incredible momentum that was unlocked with V2 is not lost. And pages that were migrated to V2 from V1 were modernized. This is an important point: V2 served as a motivating factor, and it unlocked productivity. It got large new partner features moving. And now that things are moving, and the DX in V1 has been fixed, we’re faced with a crossroads of sorts that could ultimately simplify our technical ecosystem, bring what we’ve learned from V2 back to V1, and also making future page migrations easier.

We’ve also got a pretty nice app framework in place in V2 (independent of Next.js — which is a web framework), which can be migrated over to V1 wholesale.

An additional side concern also involves the overall direction of Next.js. Will the pages router be supported long term, or are they going all-in on the new RSC (react server components) app router, which the community is currently rebelling against and which, after some experimentation here at artsy, we decided we'll not support? There's questions here.

What This Solves

A number of things come to mind:

  • No more redis-backed data bridge
  • No more complex authentication handshakes and implementations
  • No more redirect issue
  • A single repo instead of two
  • A single deploy instead of two
  • No more ingress routing magic
  • No more duplicate work between two repos (double nav, access controls, integrations, etc)
  • Modernization efforts stay as close as possible to legacy code without the need to compromise, via our ability to “stitch” parts of the pages together in a way that can be (easily) re-composed into something wholly new (think: Force modernization effort, and how we used artsy/stitch)

What Still Needs to Improve

There are two key areas that are ripe for improvement, assuming V2 is decommissioned.

CI Cycle Times

One of the joys of working in V2 is the incredibly fast cycle times. One can go from a PR in review to a fully deployed feature in ~15 minutes. Not only does this make developing in V2 highly desirable for the team, but it also tightens up the product development feedback loop by getting code in front users in almost real time, which in turn (and at scale) leads to more actionable items and thus a better product (to say nothing of bug fixing and the like).

If we were to move V2 back into V1, it would be imperative that we rethink how we execute CI functions. V2 code, being MP backed, isn’t a 10 year old rails app with thousands of completely unrelated ruby tests, many of which are slow. Its a modern JS app with a test suite that should run in ~1-2 minutes. For the most part, this new section of the codebase will continue communicating with MP unless strictly necessary.

My suggestion here would be to create a v2 segmented folder that only executes CI functions related to v2, unless other parts of the app are touched in the git diff. This would bring our CI time down considerably, to almost V2-like speeds.

User Experience

V2, being a Next.js app, operates in a Single Page App (SPA) mode. What this means is that when a user clicks a link, there’s no slow, hard jump between pages where everything is reevaluated, executed and initialized. Instead, only code related to the page being rendered executes, while page shell code remains stable from page to page, and the user experience remains smooth. Users expect this kind of polish in 2024.

If we were to bring V2 code back over to V1, we would want to setup a similar JS-based router. New v2 apps mounted in this router become SPA-backed, and don’t rely on Rails for routing. Following patterns in Force (and Next) will make this fairly trivial.

(Note that 👆isn’t required, and we could move apps over wholesale into V1 without this step, but I would argue that we shouldn’t compromise our UX here, given the relative ease in which we could set this up.)

Migration Strategy

There’s currently a good amount of in-flight work happening in V2, but this shouldn’t impact migration efforts:

  • Setup Router in V1 with a sample dummy app
  • Identify a “quiet” app in V2
  • Copy / paste entirety of app into V1
  • Update import paths
  • Update next-specific APIs to point at new V1 router. There are only a few (like useRouter). APIs ideally remain the exact same, so little refactoring is required.
  • Delete staging ingress entry from Volt Hokusai config
  • QA between Staging / Prod
  • Delete production ingress entry

Timeline

A rough timeline of effort is as follows:

  • V1 Router setup: 1-3 days
  • CI segmentation: 2 days
  • Migration of Convos: 2-4 days
  • Migration of Demand: 1-2 days
  • Migration of Make Offer and everything else: < 1 day
  • Gotchas and other unexpected factors: ~1 week

Curious what others think about this!

Closing this as already adopted and applied.