rust-lang/cargo

Timeouts with latest curl-sys on Linux

ehuss opened this issue · 10 comments

ehuss commented

Problem
Building Cargo with the latest curl-sys (0.4.28+curl-7.69.0) causes Cargo's network operations to time out on Linux. (I can't repro on mac.) Cargo enters a tight loop here using 100% CPU until it eventually times out.

Seen on CI (7965 7973) where the build-std job uses the newly built Cargo to download a few crates.

@alexcrichton do you see anything immediately obvious what could cause that? I'm wondering if it is due to the way Cargo is using libcurl, or if it is a problem in curl-sys itself (or the bindings). Let me know if you can't repro, and I can dig more.

Steps

  1. Build cargo with latest curl-sys.
  2. Create an empty CARGO_HOME, and run cargo fetch of a decently-sized project (like cargo itself).
  3. Notice it pauses using 100% CPU, and eventually times out.

I'm having this same issue. I decided to try to install something under strace. When the CPU is being maxed out, here's what I see:

poll([{fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}, {fd=11, events=POLLIN}], 17, 123180) = 17 ([{fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}, {fd=11, revents=POLLIN}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
poll([{fd=11, events=POLLIN|POLLPRI|POLLRDNORM|POLLRDBAND}], 1, 0) = 1 ([{fd=11, revents=POLLIN|POLLRDNORM}])
❯ cargo --version
cargo 1.41.0
❯ cat /etc/os-release
NAME="Arch Linux"
PRETTY_NAME="Arch Linux"
ID=arch
BUILD_ID=rolling
ANSI_COLOR="0;36"
HOME_URL="https://www.archlinux.org/"
DOCUMENTATION_URL="https://wiki.archlinux.org/"
SUPPORT_URL="https://bbs.archlinux.org/"
BUG_REPORT_URL="https://bugs.archlinux.org/"
LOGO=archlinux

I don't think this is related to curl-sys, but to the 7.69.0 libcurl update causing issues with http2: curl/curl#5044

Yeah, I just downgraded to 7.68.0 and the problem went away.

Confirmed that applying curl/curl@8aa04e9 fixes it. Not a cargo/rust-side bug.

If you're on Arch Linux, then curl-7.69.0-3 (specifically pkgrel=3) seems to do the trick here - at the time of this comment, it's available in the testing repository (has now been promoted from testing).

Applying the patch mentioned above does not seem to resolve the issue for Homebrew on Linux, we are still getting those timeouts even after rebuilding cargo:

==> cargo install --locked --root /home/linuxbrew/.linuxbrew/Cellar/git-trim/0.2.3 --path .
  Installing git-trim v0.2.3 (/tmp/git-trim-20200308-4655-sslvv/git-trim-0.2.3)
    Updating crates.io index
warning: package `quote v1.0.2` in Cargo.lock is yanked in registry `crates.io`, consider running without --locked
warning: package `structopt v0.3.9` in Cargo.lock is yanked in registry `crates.io`, consider running without --locked
 Downloading crates ...
warning: spurious network error (2 tries remaining): [28] Timeout was reached (download of `dialoguer v0.5.0` failed to transfer more than 10 bytes in 30s)
warning: spurious network error (2 tries remaining): [28] Timeout was reached (failed to download any data for `derive_deref v1.1.0` within 30s)
warning: spurious network error (2 tries remaining): [28] Timeout was reached (download of `vergen v3.0.4` failed to transfer more than 10 bytes in 30s)
warning: spurious network error (2 tries remaining): [28] Timeout was reached (failed to download any data for `structopt v0.3.9` within 30s)
  Downloaded anyhow v1.0.26
  Downloaded env_logger v0.7.1
warning: spurious network error (2 tries remaining): [28] Timeout was reached (download of `glob v0.3.0` failed to transfer more than 10 bytes in 30s)
  Downloaded rayon v1.3.0
  Downloaded log v0.4.8
warning: spurious network error (2 tries remaining): [28] Timeout was reached (failed to download any data for `git2 v0.10.2` within 30s)
  Downloaded structopt v0.3.9
.
.
.

@dawidd6 the latest release of curl/libcurl is broken unfortunately, archlinux and alpine are shipping curl with custom patches to mitigate this.

est31 commented

There seem to be three ways to solve/work around it:

  1. fixing curl-sys to the prior versions (this is what we did in cargo-udeps est31/cargo-udeps@b267d1d)
  2. filing PR to https://github.com/alexcrichton/curl-rust that backports the revert patch curl/curl@8aa04e9 and then getting a curl-sys point release. Requires forking the curl git repo and pointing the submodule's URL to that new fork
  3. disabling HTTP/2.0 in the curl API somehow (it only affects HTTP2 according to curl/curl#5044)

All three solutions would be temporary until we get curl 7.70.0.

Thanks for the help everyone in debugging this!

The curl update was mostly routine, there wasn't anything motivating it. I yanked the curl-sys version from crates.io that bundles 7.69.0 because of this bug and I've republished curl-sys with 7.68.0. I'll wait for the next release of curl to update the submodule again.

Hopefull that should fix everything in the meantime!

ehuss commented

Just to clarify for anyone following, whether or not curl-sys uses the bundled libcurl or the system one is complicated. macOS will always use the bundled one with Cargo (because of the http2 requirement). Otherwise it will use pkg-config (or vcpkg on windows) to use the system one. It looks like some system packagers (like homebrew mentioned above) are using unpatched 7.69.

If you want to force a static build with Cargo, you'll need to set the curl-sys/static-curl feature. The rust-lang binary distribution of Cargo does this by default.