facebook/hhvm

Automatically update first-party dependencies

Atry opened this issue · 12 comments

Atry commented

Is your feature request related to a problem? Please describe.
Currently HHVM dependency versions are hard coded in the CMake files like this:

SET_HHVM_THIRD_PARTY_SOURCE_ARGS(
FB_MYSQL_DOWNLOAD_ARGS
SOURCE_URL
"https://github.com/facebook/mysql-5.6/archive/refs/tags/fb-prod8-202101.tar.gz"
SOURCE_HASH
"SHA512=4e07ae4e6628792ec5d77af7e524bddc2e9ac361dff4b93060f9fb5804d72a7144824ac84138487a3b4dcac350453cd5f17afd9a951b9d8248c292bf378e1e78"
)

However, HHVM is co-evolved with other OSS projects maintained by Meta Platforms, including folly fbthrift, fb-mysql, etc. When an upstream change is made in a first-party dependency, we will have to wait for the next release of the dependency and then modify the call-site in HHVM.

Describe the solution you'd like
We need a way to automatically update first-party dependencies to make first-party change be available to HHVM as soon as possible.

Describe alternatives you've considered
Otherwise we can keep the current approach to manually update dependencies, but it would slow down the HHVM development and is not compatible with the mono repo practice in Meta Platforms.

Atry commented

Hi @fredemmott,
Previously the first-party dependency was using git submodules, which is easier to update. You changed it to CMake settings last year, which is harder and slower to update. I understand the CMake approach makes sense when the dependency is optional, which could be solved from the OS, but I don't understand the purpose of the CMake approach for first-party dependencies, given that we should always want the head revision of first-party dependencies.

@fredemmott, do you know why we download first-party dependencies from CMake?

  • Git submodules are much more painful to use; when they were submodule based, a standard part of debugging build failures was rm -rf third-party; rm -rf build/third-party; git checkout third-party; git submodule update --init --recursive; make
  • similarly, shallow clones have usability problems for updating
  • non-shallow clones are slow and huge
  • unless pointed at a tag, they're unreliable and non-reproducible after a while: GitHub stops serving requests for submodules by sha after a variable amount of time/number of commits/not sure
    it greatly improves build speed and reliability on internal build systems given they can be - and are - cached

given that we should always want the head revision of first-party dependencies.

Builds need to be reproducible; HHVM today is unlikely to be buildable with folly in 6 months' time

Additionally, head is often not in sync between folly/thrift/... - the tags are.

and is not compatible with the mono repo practice in Meta Platforms.

No reference approach is both good externally and good for a mono repo - one is a mono repo with cross-project atomic commits, one isn't, and they have different requirements. In public, you do not have atomic commits, and pretending to have them across multiple github projects will break the ability to use git bisect for issues in public builds.


It's also important to note: auto-updating is entirely independent of submodules vs externalproject_add. They're formulaic, and it would be relatively straightforward to change them to be more formulaic.

For the first-party stuff, a better way to auto-update would be to actually commit them to the HHVM github repo, similar to how flow includes hack - i.e. turn facebook/hhvm into a monorepo as far as fb deps are concerned

e.g. map fbcode/folly to third-party/folly/ - no submodules or CMake fetching, atomic commits

fmtlib

FYI, this isn't an FB-owned or FB-source-of-truth project; if FB has an internal version that you want to use instead of the public version, directly publishing that to third-party/fmt is probably the way to go

Atry commented

unless pointed at a tag, they're unreliable and non-reproducible after a while: GitHub stops serving requests for submodules by sha after a variable amount of time/number of commits/not sure

Thank you for the information! Do you know if there is any URL about the issue?

I never experienced the issue and it is surprising to me. If GitHub indeed stops serving source files by sha, it would affect NPM, Composer, Bundler, Nix and many other package managers because they all include sha in their lock files to reproduce a build with dependencies to git branches.

Atry commented

It's also important to note: auto-updating is entirely independent of submodules vs externalproject_add. They're formulaic, and it would be relatively straightforward to change them to be more formulaic.

Do you mean the current externalproject_add approach also supports source tarballs from a revision sha instead of a tag?

Not seeing a super obvious resource

it would affect NPM, Composer, Bundler, Nix and many other package managers because they all include sha in their lock files to reproduce a build with dependencies to git branches.

the usual approach is to clone the branch or a tag then reset back to the specific commit, not to clone by sha; this does mean fetching more data though. I don’t know the specific method submodules use nowadays

Do you mean the current externalproject_add also support source tar balls from a revision sha instead of a tag?

Take a look at the docs - there’s built in git support, and you can provide arbitrary commands for all the steps to do whatever you want

Atry commented

It's a ton of helpful information about the previous made decisions! #Thank you! @fredemmott

For the first-party stuff, a better way to auto-update would be to actually commit them to the HHVM github repo, similar to how flow includes hack - i.e. turn facebook/hhvm into a monorepo as far as fb deps are concerned

e.g. map fbcode/folly to third-party/folly/ - no submodules or CMake fetching, atomic commits

Just want to highlight this: if you want faster/autoupdating first-party dependencies, I strongly recommend making shipit directly copy them - the .cpp and .h files - directly into the facebook/hhvm repo. Compeltely get rid of all cross-repo stuff. That gets you live updates, working internal CI, bisectability, and atomicity.

Atry commented

Sounds reasonable! For comparison, fbthrift is using the bot to update submodules, e.g. facebook/fbthrift@3fe8c7c

Atry commented

Fixed in #9181, #9144 and #9164