Linaro/uadk

rsa async tasks fail

Closed this issue · 5 comments

board: 201

kernel: https://github.com/Linaro/linux-kernel-uadk/tree/uacce-devel-5.11
make defconfig
make menuconfig
Device Drivers --->
-- IOMMU Hardware Support --->
[
] Shared Virtual Addressing support for the ARM SMMUv3
Misc devices --->
<> Accelerator Framework for User Land
-
- Cryptographic API --->
[] Hardware crypto devices --->
<
> Support for HiSilicon SEC2 crypto block cipher accelerator
<> Support for HiSilicon ZIP accelerator
<
> Support for HISI HPRE accelerator
[*] Tracers --->

uadk: master version: update to 2.2.14

openssl-uadk: master: a96fbbf ras: work with nginx via both async and sync mode
https://github.com/Linaro/openssl-uadk
https://github.com/Linaro/openssl-uadk/blob/master/INSTALL.md

Test:
for((i=0; i<10; i++))
do
echo $i
openssl speed -elapsed -engine uadk -async_jobs 36 rsa2048

done

Hung since poll->recv can not get new data

Sometime report ras error first, then hung.
NOTICE: [RasEriInterrupt]:[141L]NodeTYP0Status = 0x0
NOTICE: [RasEriInterrupt]:[157L]NodeTYP1Status = 0x2
NOTICE: [NimbusHpreNodeType1]:[2405L]This is hpre, Base = 0x208000000
NOTICE: [NimbusHpreHandle]:[2348L] HpreHacIntSt = 0x33e7c8
NOTICE: [NimbusHpreHandle]:[2349L] HpreQmIntStatus = 0x22
NOTICE: [PrintSecurityType]:[389L] SecurityType is RECOVERABLE!
NOTICE: [RasErrorDataPcieOemProcessor]:[1748L]BDF[0x79:0x0:0x0]
NOTICE: [HestGhesV2SetGenericErrorData]:[188L] Fill in HEST TABLE ,AckRegister=44010000
NOTICE: [HestNotifiedOS]:[37L]
NOTICE: [RasEriInterrupt]:[173L]NodeTYP2Status = 0x0
[ 496.016498] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[ 496.024727] {2}[Hardware Error]: event severity: recoverable
[ 496.030361] {2}[Hardware Error]: Error 0, type: recoverable
[ 496.035994] {2}[Hardware Error]: section_type: PCIe error
[ 496.041539] {2}[Hardware Error]: version: 4.0
[ 496.046047] {2}[Hardware Error]: command: 0x0006, status: 0x0010
[ 496.052199] {2}[Hardware Error]: device_id: 0000:79:00.0
[ 496.057659] {2}[Hardware Error]: slot: 0
[ 496.061735] {2}[Hardware Error]: secondary_bus: 0x00
[ 496.066849] {2}[Hardware Error]: vendor_id: 0x19e5, device_id: 0xa258
[ 496.073431] {2}[Hardware Error]: class_code: 100000

root@ubuntu:/sys/kernel/debug/hisi_hpre/0000:79:00.0/qm# cat regs
QM_ECC_1BIT_CNT = 0x00000000
QM_ECC_MBIT_CNT = 0x00000000
QM_DFX_MB_CNT = 0x0000248e
QM_DFX_DB_CNT = 0x2ce1cada
QM_DFX_SQE_CNT = 0x08971cfc
QM_DFX_CQE_CNT = 0x08971cfc
QM_DFX_SEND_SQE_TO_ACC_CNT = 0x1670ef9d
QM_DFX_WB_SQE_FROM_ACC_CNT = 0x00000000
QM_DFX_ACC_FINISH_CNT = 0x1670ef9b
QM_DFX_CQE_ERR_CNT = 0x00000206
QM_DFX_FUNS_ACTIVE_ST = 0x00000002
QM_ECC_1BIT_INF = 0x00010045
QM_ECC_MBIT_INF = 0x000600c6
QM_DFX_ACC_RDY_VLD0 = 0x01000000
QM_DFX_ACC_RDY_VLD1 = 0x0000ff00
QM_DFX_AXI_RDY_VLD = 0x00001802
QM_DFX_FF_ST0 = 0x00000ff4
QM_DFX_FF_ST1 = 0x0208c000
QM_DFX_FF_ST2 = 0x01d73fff
QM_DFX_FF_ST3 = 0x00000000
QM_DFX_FF_ST4 = 0x0fffffff
QM_DFX_FF_ST5 = 0x00ff00ff
QM_DFX_FF_ST6 = 0x00078081
QM_IN_IDLE_ST = 0x00000000

uadk master can not reproduce since only support -t 2
test_hisi_hpre rsa-sgn --mode=crt --perf --trd_mode=async --seconds=10 -t 2

If -t 3, report "failed to send: retry exit!"

  1. disable thp, no such issue
    echo never > /sys/kernel/mm/transparent_hugepage/enabled

  2. I'm able to reproduce much more reliably by setting
    /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs to 10

  3. add tlbi to cmdq, no such issue
    kernel:
    https://github.com/Linaro/linux-kernel-uadk/tree/5.11-pthread
    Linaro/linux-kernel-uadk@4a9f36f

"echo echo Y > /sys/kernel/debug/sva_force_inval"

Jean:
And when forcing TLB invalidations on the command queue (in addition to
DVM, by setting sva_force_inval to Y with the attached patch) the problem
disappears. So I think, either the command queue TLBI adds such an
overhead that it masks whatever races causes the issue, or the hardware
doesn't handle the TLBI from khugepaged properly (it should be a TLBI
ASIDE1IS, since we go through __flush_tlb_range() with a huge page, which
goes to flush_tlb_mm().

Another thing, when building a kernel in parallel to the openssl command,
I see a lot of "internal compiler" failures in the build, looks like
memory corruption. This seems to confirm the stale TLB hypothesis: because
the SMMU doesn't invalidate the TLB properly, DMA writes to old pages that
have been reallocated for the build.

test with uadk & build kernel
kernel:
make clean; make -j4

uadk:
for((i=0; i<100; i++))
do
test_hisi_hpre rsa-sgn --mode=crt --perf --trd_mode=async --seconds=10 -t 2
echo $i
done

build kenrel:
./arch/arm64/include/asm/rwonce.h:72:0: internal compiler error: Segmentation fault
./arch/arm64/include/asm/rwonce.h:72:0: internal compiler error: Aborted
Please submit a full bug report,
with preprocessed source if appropriate.
See file:///usr/share/doc/gcc-5/README.Bugs for instructions.
gcc: internal compiler error: Segmentation fault (program as)
Please submit a full bug report,
with preprocessed source if appropriate.
See file:///usr/share/doc/gcc-5/README.Bugs for instructions.
scripts/Makefile.build:279: recipe for target 'kernel/rcu/update.o' failed
make[2]: *** [kernel/rcu/update.o] Error 4
make[2]: *** Deleting file 'kernel/rcu/update.o'
make[2]: *** Waiting for unfinished jobs....
CC kernel/irq/generic-chip.o
kernel/irq/devres.c:283:1: internal compiler error: Segmentation fault

uadk:
performance test did not verify output! *** Error in `test_hisi_hpre': double free or corruption (!prev): 0x0000ffff7c021fd0 *** ./uadk.sh: line 16: 12579 Aborted (core dumped) test_hisi_hpre rsa-sgn --mode=crt --perf --trd_mode=async --seconds=10 -t 2

Shameer reported same issue on board without dvm on Jul 25, 2020
Currently the issue is happen on board with dvm, but requires multi-thread test.
Same phenomenon and can use same workaround.

copy from Shameer earlier email:
Issue:

While running test_sva_perf on a D06 board, zip dev reports
random "Hardware Error" and results in app/zip hang.

Kernel: https://github.com/Linaro/linux-kernel-warpdrive.git uacce-devel-5.8
warpdrive: master

Test Script:

#!/bin/sh
a=0
evt=0x80
while [ "$a" -lt 14 ]
do
echo $evt
./perf stat -e smmuv3_pmcg_140020/event=$evt/ ./test_sva_perf -s 2048000 -l 75000 -c 50 -v
a=$(($a+1))
evt=$(($evt+0x1))
done

[ 273.447475] {2}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[ 273.464868] {2}[Hardware Error]: event severity: recoverable
[ 273.476773] {2}[Hardware Error]: Error 0, type: recoverable
[ 273.488670] {2}[Hardware Error]: section_type: PCIe error
[ 273.500387] {2}[Hardware Error]: version: 4.0
[ 273.509915] {2}[Hardware Error]: command: 0x0006, status: 0x0010
[ 273.522907] {2}[Hardware Error]: device_id: 0000:75:00.0
[ 273.534437] {2}[Hardware Error]: slot: 0
[ 273.543046] {2}[Hardware Error]: secondary_bus: 0x00
[ 273.553845] {2}[Hardware Error]: vendor_id: 0x19e5, device_id: 0xa250
[ 273.567751] {2}[Hardware Error]: class_code: 120000
[ 252.733975] hisi_zip 0000:75:00.0: AER: aer_status: 0x00000000, aer_mask: 0x00000000
[ 252.753282] hisi_zip 0000:75:00.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
[ 252.777791] hisi_zip 0000:75:00.0: AER: aer_uncor_severity: 0x00000000
[ 252.825195] hisi_zip 0000:75:00.0: qm_acc_wb_not_ready_timeout [error status=0x40] found
[ 252.842205] hisi_zip 0000:75:00.0: zip_pre_in_data_err [error status=0x80] found
[ 252.857767] hisi_zip 0000:75:00.0: zip_com_inf_err [error status=0x100] found
[ 252.857769] hisi_zip 0000:75:00.0: zip_enc_inf_err [error status=0x200] found
[ 252.887741] hisi_zip 0000:75:00.0: zip_pre_out_err [error status=0x400] found

Debugging shows that this beahviour correlates with large number of io page faults.
Normally the above test reports iopfs in the range of 100s but when this error
happens it goes up to millions.

Also this was never reproduced on another D06 board which runs a BIOS that
enables DVM(Distributed Virtual Memory). This was kind of telling us that the
issue is probably related to SMMU tlb invalidations.

Further debugging/code review revealed that current SMMUv3 SVA code makes
it mandatory that SVA feature can only be supported if SMMUv3 has BTM
(Broadcast TLB maintenance) feature. And it looks like the assumption is
that BTM support means, DVM is also enabled (Need to verify this assumption
is always true). But on our D06 board, even though SMMU reports BTM support,
DVM is only enabled with a special BIOS.

Based on the above criteria(ie, BTM means DVM is enabled for SVA), at present
in the mm notifier -->invalidate_range() code path, it only does ATC invalidations
and there is no SMMU tlb invalidations. This will break on non-DVM
platforms as we need explicit tlbi invalidation here.

With the below quick fix, I am not seeing any Hardware Error now
(completed around 100 iterations of test_sva_perf runs) on my setup with
the above test script.

@@ -3697,6 +3699,10 @@ static void arm_smmu_mm_invalidate_range(struct mmu_notifier *mn,
{
struct arm_smmu_mmu_notifier *smmu_mn = mn_to_smmu(mn);

  •   arm_smmu_tlb_inv_range(start, end - start + 1,
    
  •                          PAGE_SIZE, false, smmu_mn->domain);
    
  • arm_smmu_atc_inv_domain(smmu_mn->domain, mm->pasid, start,
    end - start + 1);
    trace_smmu_mm_invalidate(mm->pasid, start, end);

Test with light -weight job, no io page fault, but data is not correct

  1. thp scan more frequently
    echo 10 > /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs

linaro@ubuntu:~$ sudo openssl speed -engine uadk -seconds 1 rsa2048
[sudo] password for linaro:
engine "uadk" set.
hisi sec init Kunpeng920!
Doing 2048 bits private rsa's for 1s: 3191 2048 bits private RSA's in 0.50s
Doing 2048 bits public rsa's for 1s: RSA verify failure
281473395044352:error:0407008A:rsa routines:RSA_padding_check_PKCS1_type_1:invalid padding:crypto/rsa/rsa_pk1.c:67:
-1 2048 bits public RSA's in 0.29s
OpenSSL 1.1.1a 20 Nov 2018
built on: Fri Mar 5 06:08:16 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG
sign verify sign/s verify/s
rsa 2048 bits 0.000157s -0.290000s 6382.0 -3.4

  1. add workaround data is correct
    root@ubuntu:/sys/kernel/debug# echo 1 > sva_force_inval

linaro@ubuntu:~$ sudo openssl speed -engine uadk -seconds 1 rsa2048
engine "uadk" set.
hisi sec init Kunpeng920!
Doing 2048 bits private rsa's for 1s: 3186 2048 bits private RSA's in 0.49s
Doing 2048 bits public rsa's for 1s: 45560 2048 bits public RSA's in 0.91s
OpenSSL 1.1.1a 20 Nov 2018
built on: Fri Mar 5 06:08:16 2021 UTC
options:bn(64,64) rc4(char) des(int) aes(partial) idea(int) blowfish(ptr)
compiler: gcc -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DOPENSSL_USE_NODELETE -DOPENSSL_PIC -DOPENSSL_CPUID_OBJ -DOPENSSL_BN_ASM_MONT -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DKECCAK1600_ASM -DVPAES_ASM -DECP_NISTZ256_ASM -DPOLY1305_ASM -DNDEBUG
sign verify sign/s verify/s
rsa 2048 bits 0.000154s 0.000020s 6502.0 50065.9

Have found the reason, hpre is connected to another smmu, whose dvm is
not enabled by the bios :(.

sudo busybox devmem 0x2001c0030 32
0x1 // is error
0x9 // is correct

With the updated bios, have passed stress test