Automatically update first-party dependencies
Atry opened this issue · 12 comments
Is your feature request related to a problem? Please describe.
Currently HHVM dependency versions are hard coded in the CMake files like this:
hhvm/third-party/fb-mysql/CMakeLists.txt
Lines 13 to 19 in 3c1feb0
However, HHVM is co-evolved with other OSS projects maintained by Meta Platforms, including folly fbthrift, fb-mysql, etc. When an upstream change is made in a first-party dependency, we will have to wait for the next release of the dependency and then modify the call-site in HHVM.
Describe the solution you'd like
We need a way to automatically update first-party dependencies to make first-party change be available to HHVM as soon as possible.
Describe alternatives you've considered
Otherwise we can keep the current approach to manually update dependencies, but it would slow down the HHVM development and is not compatible with the mono repo practice in Meta Platforms.
Hi @fredemmott,
Previously the first-party dependency was using git submodules, which is easier to update. You changed it to CMake settings last year, which is harder and slower to update. I understand the CMake approach makes sense when the dependency is optional, which could be solved from the OS, but I don't understand the purpose of the CMake approach for first-party dependencies, given that we should always want the head revision of first-party dependencies.
@fredemmott, do you know why we download first-party dependencies from CMake?
- Git submodules are much more painful to use; when they were submodule based, a standard part of debugging build failures was
rm -rf third-party; rm -rf build/third-party; git checkout third-party; git submodule update --init --recursive; make
- similarly, shallow clones have usability problems for updating
- non-shallow clones are slow and huge
- unless pointed at a tag, they're unreliable and non-reproducible after a while: GitHub stops serving requests for submodules by sha after a variable amount of time/number of commits/not sure
it greatly improves build speed and reliability on internal build systems given they can be - and are - cached
given that we should always want the head revision of first-party dependencies.
Builds need to be reproducible; HHVM today is unlikely to be buildable with folly in 6 months' time
Additionally, head is often not in sync between folly/thrift/... - the tags are.
and is not compatible with the mono repo practice in Meta Platforms.
No reference approach is both good externally and good for a mono repo - one is a mono repo with cross-project atomic commits, one isn't, and they have different requirements. In public, you do not have atomic commits, and pretending to have them across multiple github projects will break the ability to use git bisect
for issues in public builds.
It's also important to note: auto-updating is entirely independent of submodules vs externalproject_add. They're formulaic, and it would be relatively straightforward to change them to be more formulaic.
For the first-party stuff, a better way to auto-update would be to actually commit them to the HHVM github repo, similar to how flow includes hack - i.e. turn facebook/hhvm into a monorepo as far as fb deps are concerned
e.g. map fbcode/folly to third-party/folly/ - no submodules or CMake fetching, atomic commits
fmtlib
FYI, this isn't an FB-owned or FB-source-of-truth project; if FB has an internal version that you want to use instead of the public version, directly publishing that to third-party/fmt is probably the way to go
unless pointed at a tag, they're unreliable and non-reproducible after a while: GitHub stops serving requests for submodules by sha after a variable amount of time/number of commits/not sure
Thank you for the information! Do you know if there is any URL about the issue?
I never experienced the issue and it is surprising to me. If GitHub indeed stops serving source files by sha, it would affect NPM, Composer, Bundler, Nix and many other package managers because they all include sha in their lock files to reproduce a build with dependencies to git branches.
It's also important to note: auto-updating is entirely independent of submodules vs externalproject_add. They're formulaic, and it would be relatively straightforward to change them to be more formulaic.
Do you mean the current externalproject_add approach also supports source tarballs from a revision sha instead of a tag?
Not seeing a super obvious resource
it would affect NPM, Composer, Bundler, Nix and many other package managers because they all include sha in their lock files to reproduce a build with dependencies to git branches.
the usual approach is to clone the branch or a tag then reset back to the specific commit, not to clone by sha; this does mean fetching more data though. I don’t know the specific method submodules use nowadays
Do you mean the current externalproject_add also support source tar balls from a revision sha instead of a tag?
Take a look at the docs - there’s built in git support, and you can provide arbitrary commands for all the steps to do whatever you want
It's a ton of helpful information about the previous made decisions! #Thank you! @fredemmott
For the first-party stuff, a better way to auto-update would be to actually commit them to the HHVM github repo, similar to how flow includes hack - i.e. turn facebook/hhvm into a monorepo as far as fb deps are concerned
e.g. map fbcode/folly to third-party/folly/ - no submodules or CMake fetching, atomic commits
Just want to highlight this: if you want faster/autoupdating first-party dependencies, I strongly recommend making shipit directly copy them - the .cpp and .h files - directly into the facebook/hhvm repo. Compeltely get rid of all cross-repo stuff. That gets you live updates, working internal CI, bisectability, and atomicity.
Sounds reasonable! For comparison, fbthrift
is using the bot to update submodules, e.g. facebook/fbthrift@3fe8c7c