dynup/kpatch

kpatch module build could fail when kernel 5.19+ contains dynamic symbols

sumanthkorikkar opened this issue · 23 comments

Hi All,

when building kpatch module for 5.19+ kernel with -ffunction-sections,
the vmlinux build could fail during link stage.

Reason:
s390 kernel is built with -fPIE and for kpatch purpose built with ARCH_KCFLAGS "-ffunction-sections -fdata-sections"

Output:
ld: .tmp_vmlinux.btf: too many sections: 65614 (>= 65280)
ld: final link failed: nonrepresentable section on output
BTF .btf.vmlinux.bin.o

In this scenario:

  1. gABI doesn't support dynamic symbols in output sections beyond 64k.
    Ref: binutils : check_dynsym (bfd *abfd, Elf_Internal_Sym *sym)
    https://sourceware.org/git/?p=binutils-gdb.git;a=blob;f=bfd/elflink.c;h=2b1450fa4e146936ba4fd6d02691a863f26a88b6;hb=HEAD#l10183

  2. s390 kernel
    readelf --dyn-syms vmlinux | wc
    1556

  3. x86 kernel doesn't seems to have dynamic symbols and hence does not create this problem.
    readelf --dyn-syms vmlinux | wc -l
    0

Possible fix:

  1. Provide the explicit TARGETS eg:
    TARGETS="fs/proc/" KPATCHBUILD_OPTS="-v $vmlinux -s $linux_src -d" ./kpatch-test rhel-9.0/data-new.patch

  2. Change linker script like:

diff --git a/arch/s390/kernel/vmlinux.lds.S b/arch/s390/kernel/vmlinux.lds.S
index 2e526f11b91e..1d3d2d878acb 100644
--- a/arch/s390/kernel/vmlinux.lds.S
+++ b/arch/s390/kernel/vmlinux.lds.S
@@ -48,7 +48,7 @@ SECTIONS
                IRQENTRY_TEXT
                SOFTIRQENTRY_TEXT
                FTRACE_HOTPATCH_TRAMPOLINES_TEXT
-               *(.text.*_indirect_*)
+               *(.text.*)
                *(.gnu.warning)
                . = ALIGN(PAGE_SIZE);
                _etext = .;             /* End of text section */
  1. Create custom target in kernel top Makefile. This target would build only kernel objects without linking vmlinux target.

Question:
Could you please provide me suggestions, how this could be handled better in kpatch?

  • without generating vmlinux in (original build) and (patched build).

Thank you

Best Regards
Sumanth

Hi @sumanthkorikkar, thanks for the detailed report.

Do you happen you know why s390 arch uses dynamic symbols while x86 does not?

Also, I thought (at one time) that kernel LTO efforts leveraged -ffunction-sections as well. I wonder if that project would eventually hit this limitation as well, assuming they are working with an arch that uses dynamic symbols. Perhaps they would be fellow travelers in this space and interested in supporting dynamic symbols in output sections beyond 64k.

As for possible workarounds, a few questions and ideas on proposed solutions:

  1. Provide the explicit TARGETS eg:
    TARGETS="fs/proc/" KPATCHBUILD_OPTS="-v $vmlinux -s $linux_src -d" ./kpatch-test rhel-9.0/data-new.patch

This doesn't seem ideal as the user may not know exactly which target directories need to be rebuilt (ie, kpatch-build is doing that work for us).

  1. Change linker script [ ... filter (.text.indirect*) sections ... ]

I assume this would help as we recently added the external expoline requirement for kpatch? And then does it only buy us only a few less dynamic symbols?

  1. Create custom target in kernel top Makefile. This target would build only kernel objects without linking vmlinux target.

Well, we already slightly modify link-vmlinux.sh and Makefile.modfinal so this idea is not without precedent.

Maybe by modifying a recent top level Makefile like (untested):

diff --git a/Makefile b/Makefile
index 00fd80c5dd6e..6a9afdc3ee73 100644
--- a/Makefile
+++ b/Makefile
@@ -1844,6 +1844,9 @@ $(build-dirs): prepare
        single-build=$(if $(filter-out $@/, $(filter $@/%, $(KBUILD_SINGLE_TARGETS))),1) \
        need-builtin=1 need-modorder=1
 
+.PHONY: kpatch
+kpatch: $(build-dirs)
+
 clean-dirs := $(addprefix _clean_, $(clean-dirs))
 PHONY += $(clean-dirs) clean
 $(clean-dirs):

though adding anything as specific as that runs into code drift maintenance. (I can already see that build-dirs is relatively new and missing from older kernels.) Alternatively, I think the same is achievable by building the kernel with make */

In any case, we'd lose the ability to specify the targets on the kpatch-build command line.

Do you happen you know why s390 arch uses dynamic symbols while x86 does not?

I have the same question. There will probably be other features in the future which rely on -ffunction-sections, so if there's some way for the s390 kernel to avoid using dynamic symbols then that might be the best way to "fix" the issue.

Hi Joe, Josh,

Do you happen you know why s390 arch uses dynamic symbols while x86 does not?

Discussed this with the compiler team.

x86 kernel:

  • The decompressor is compiled with -fPIC flag and it is made relocatable.
  • When CONFIG_RELOCATABLE is selected, the LDFLAGS_vmlinux is set to --emit-relocs. This ensures that the relocs stays in the binary and then the x86 kernel adjusts the final load address
  • Ref:
    • a02150610776 ("x86, relocs: Move ELF relocation handling to C")
    • 968de4f02621 ("[PATCH] i386: Relocatable kernel support")

s390 kernel:

  • kernel is linked as PIE and contains the dynamic relocations so that it can be processed during the bootup.
    (kernel is considered similar to shared libaries)
  • Ref:
    • 805bc0bc238f ("s390/kernel: build a relocatable kernel")

As for possible workarounds, a few questions and ideas on proposed solutions:

  1. Provide the explicit TARGETS eg:
    TARGETS="fs/proc/" KPATCHBUILD_OPTS="-v $vmlinux -s $linux_src -d" ./kpatch-test rhel-9.0/data-new.patch

This doesn't seem ideal as the user may not know exactly which target directories need to be rebuilt (ie, kpatch-build is doing that work for us).

Yes agree. Specifying the TARGETS would be just a temporary workaround

  1. Change linker script [ ... filter _(.text._indirect*) sections ... ]

I assume this would help as we recently added the external expoline requirement for kpatch? And then does it only buy us only a few less dynamic symbols?

With -ffunction-sections, each function would its own .text section. However, As per my understanding the
vmlinux which is created during kpatch build process does not matter. Individual object files would
still have separate text section for each function and kpatch build deals with only those.
Hence, combining all the .text sections during linking stage eliminates the ld: .tmp_vmlinux.btf: too many sections: 65614 (>= 65280) alltogether. This could be one possible approach (A quick fix).

Let me know your thoughts.

  1. Create custom target in kernel top Makefile. This target would build only kernel objects without linking vmlinux target.

Well, we already slightly modify link-vmlinux.sh and Makefile.modfinal so this idea is not without precedent.

Maybe by modifying a recent top level Makefile like (untested):

diff --git a/Makefile b/Makefile
index 00fd80c5dd6e..6a9afdc3ee73 100644
--- a/Makefile
+++ b/Makefile
@@ -1844,6 +1844,9 @@ $(build-dirs): prepare
        single-build=$(if $(filter-out $@/, $(filter $@/%, $(KBUILD_SINGLE_TARGETS))),1) \
        need-builtin=1 need-modorder=1
 
+.PHONY: kpatch
+kpatch: $(build-dirs)
+
 clean-dirs := $(addprefix _clean_, $(clean-dirs))
 PHONY += $(clean-dirs) clean
 $(clean-dirs):

though adding anything as specific as that runs into code drift maintenance. (I can already see that build-dirs is relatively new and missing from older kernels.) Alternatively, I think the same is achievable by building the kernel with make */

In any case, we'd lose the ability to specify the targets on the kpatch-build command line.

I tried this patch and this works in normal scenario. However, module.patch failed, because it couldn't identify the nfsd/export.o (module) and only identified (af_netlink.o) kpatch_string as new function. Will check further.

Thanks

With -ffunction-sections, each function would its own .text section. However, As per my understanding the vmlinux which is created during kpatch build process does not matter. Individual object files would still have separate text section for each function and kpatch build deals with only those. Hence, combining all the .text sections during linking stage eliminates the ld: .tmp_vmlinux.btf: too many sections: 65614 (>= 65280) alltogether. This could be one possible approach (A quick fix).

Let me know your thoughts.

This may not be a good long term solution. The kernel is moving towards enabling LTO, in which case kpatch-build will have to analyze vmlinux.o rather than individual translation units.

x86 also has recently added IBT, for which kpatch-build might also need to analyze vmlinux.o (not sure about this one yet).

Also, there are other features which use -ffunction-sections (fgkaslr, as one example).

So the s390 kernel needs to figure out a way to support >64k sections.

Would it be possible for s390 to use --emit-relocs?

Hi Josh, Joe

Thank you for the inputs.

Agree. we would definitely like to have emit-relocs or similar support for s390 kernel in long term. But
this might take a while to support based on the complexities.

As a short term solution for s390 kpatch, Hence, It would be necessary to provide either
explicit TARGETS or making this change in the linker script.

@sumanthkorikkar

After looking at how x86 does it, converting s390 to --emit-relocs actually seems pretty straightforward. I made the following patch, it booted successfully with CONFIG_RANDOMIZE_BASE. I'll try to give it some more testing and post upstream.

s390-reloc.patch.txt

Hm, I just spotted an obvious bug in handle_relocs(), not sure how it's booting ;-)

EDIT: oops, accidentally tested the wrong kernel! Anyway the patch is rough, but you get the idea.

Here's a working version of the patch. I haven't tested it with 64k+ symbols and kpatch yet.

s390-reloc.patch.txt
.

Hi Josh, Thank you for the patch

Few things:

  • Option -mno-pic-data-is-text-relative would generate R_390_GOTENT. This should be handled in do_reloc().
  • Greater than 64k output sections works, as no dynamic symbols are present.
  • ARCH_KFLAGS+="-fPIC" should be added to s390 kpatch tools, As -mno-pic-data-is-text-relative can be used only with -fPIC.
    kpatch seems to work with these.

I am yet to understand, if other rela types (Other than R_390_64) needs offset adjustment if any.

Also, I will be on vacation for next 4 weeks.

  • Option -mno-pic-data-is-text-relative would generate R_390_GOTENT. This should be handled in do_reloc().

But that option is only used for the livepatch, for which do_reloc() doesn't run. Instead the module relocation code runs (apply_relocate_add() in arch/s390/kernel/module.c. So I don't see the need for R_390_GOTENT in do_reloc().

  • Greater than 64k output sections works, as no dynamic symbols are present.

  • ARCH_KFLAGS+="-fPIC" should be added to s390 kpatch tools, As -mno-pic-data-is-text-relative can be used only with -fPIC.
    kpatch seems to work with these.

Yes, I discovered that as well. In kpatch-build, ARCH_KCFLAGS needs -fPIC added (along with the existing -mno-pic-data-is-text-relative) to force the use of R_390_GOTENT for text accesses to global data.

Hi @jpoimboe

I rebased your changes and tried testing it on v6.2. It looks promising to me. Could you please
send these changes across to the s390 mailing list for maintainers review.

Thanks a lot.

Hi @jpoimboe , @sumanthkorikkar , we just hit this while rebasing the integration tests to v6.3. Shall we retry with the patch from Josh's Aug 25 comment or has their been any alternate solutions explored on the s390 mailing list? Thanks.

Hi Joe, Josh,

I tried Josh Poimboeuf patch series on latest branch and added minor fixup on it.
It is currently under internal review. Will send the rebased Josh patch series to you both soon for your valuable feedback. Thank you Josh, Joe.

Hi @sumanthkorikkar if you have a WIP, rebased version of the patch for 6.4 would you mind attaching here.. we can throw it into our internal tests at least to give it some runtime and maybe find subsequent kpatch-build issues for s390x. Thanks.

Hi Joe, Josh,

Attached rebased Josh-Poimboeuf patch series (master rebase) with fixup.
Rebased it to master from the following source: https://git.kernel.org/pub/scm/linux/kernel/git/jpoimboe/linux.git/log/?h=s390
Seems to work for gcc.
clang has few concerns which is under discussion.

Let me know, if this patch works for you.
Josh-Poimboeuf_series_emit_relocs_rebase_fixup.patch.txt

Thank you Joe & Josh

This issue has been open for 30 days with no activity and no assignee. It will be closed in 7 days unless a comment is added.

In progress.

This issue has been open for 30 days with no activity and no assignee. It will be closed in 7 days unless a comment is added.

@sumanthkorikkar sorry for not being communicative on this issue, it has been a busy time. I will also be out for another two weeks, feel free to keep pinging me after that :-)

ok, Thanks Josh. Will do so.

I think this one can be closed. @jpoimboe @sumanthkorikkar anything else to do here?

I believe this was merged upstream, closing.