falcosecurity/falcosidekick

Sidekick Crashes After Triggering the Same Rule Multiple Times in a Short Window with Falco 0.38.2

Opened this issue · 3 comments

Describe the bug

After executing Aqua Security’s kube-bench, the Sidekick service fails and crashes. This issue occurs when the same Falco rule is triggered more than 15 times within a very short time window. Instead of handling the load gracefully, the service crashes.

How to reproduce it

Run Aqua Security’s kube-bench to perform security checks.
Ensure that a specific Falco rule is triggered more than 15 times in a very short window.

Expected behaviour

The Sidekick service should handle multiple rule triggers without crashing. It should remain stable and not be terminated

Screenshots
No screenshots available.

Environment

  • Falco version:
    Falco version: 0.38.2

  • OS:

Talos 1.6.5

  • Kernel:

6.6.32-talos

  • Installation method:

Helm
Additional context

The rule triggered:

   # Note that runsv is both in protected_shell_spawner and the
   # exclusions by pname. This means that runsv can itself spawn shells
   # (the ./run and ./finish scripts), but the processes runsv can not
   # spawn shells.
   #
   # Also, trivy uses this for vulnerability scanning and kyverno uses it to clean ephemeral reports
   # And we exclude the incom user
   - rule: Incom Run shell untrusted
     desc: > 
       An attempt to spawn a shell below a non-shell application. The non-shell applications that are monitored are 
       defined in the protected_shell_spawner macro, with protected_shell_spawning_binaries being the list you can 
       easily customize. For Java parent processes, please note that Java often has a custom process name. Therefore, 
       rely more on proc.exe to define Java applications. This rule can be noisier, as you can see in the exhaustive 
       existing tuning. However, given it is very behavior-driven and broad, it is universally relevant to catch 
       general Remote Code Execution (RCE). Allocate time to tune this rule for your use cases and reduce noise. 
       Tuning suggestions include looking at the duration of the parent process (proc.ppid.duration) to define your 
       long-running app processes. Checking for newer fields such as proc.vpgid.name and proc.vpgid.exe instead of the 
       direct parent process being a non-shell application could make the rule more robust.
     condition: >
       spawned_process
       and shell_procs
       and proc.pname exists
       and not (k8s.ns.name = trivy)
       and not (k8s.ns.name = kyverno)
       and not serf_script
       and not check_process_status
       and not (container.image.repository in (incom_network_images))
       and not (user.name = incom)
       and not (proc.pexe = /bin/containerd-shim-runc-v2)
     output: Shell spawned by untrusted binary (parent_exe=%proc.pexe parent_exepath=%proc.pexepath pcmdline=%proc.pcmdline gparent=%proc.aname[2] ggparent=%proc.aname[3] aname[4]=%proc.aname[4] aname[5]=%proc.aname[5] aname[6]=%proc.aname[6] aname[7]=%proc.aname[7] evt_type=%evt.type user=%user.name user_uid=%user.uid user_loginuid=%user.loginuid process=%proc.name proc_exepath=%proc.exepath parent=%proc.pname command=%proc.cmdline terminal=%proc.tty exe_flags=%evt.arg.flags %container.info)
     priority: ERROR
     tags: [maturity_stable, host, container, process, shell, mitre_execution, T1059.004]

The error msg from the failed pod:

2024/09/23 17:48:45 [INFO]  : Slack - POST OK (200)
2024/09/23 17:48:45 [INFO]  : Pagerduty - Create Incident OK
2024/09/28 09:25:13 [INFO]  : Slack - POST OK (200)
fatal error: concurrent map iteration and map write
goroutine 502012 [running]:
github.com/falcosecurity/falcosidekick/outputs.getSortedStringKeys(0xc00089e1e0?)
   /home/runner/work/falcosidekick/falcosidekick/outputs/utils.go:12 +0x6b
github.com/falcosecurity/falcosidekick/outputs.newSlackPayload({{0xc00005e8a0, 0x24}, {0xc000aaaa00, 0x266}, 0x5, {0xc000114080, 0x19}, {0xb860900, 0xede8d3286, 0x0}, ...}, ...)
   /home/runner/work/falcosidekick/falcosidekick/outputs/slack.go:75 +0x62c
github.com/falcosecurity/falcosidekick/outputs.(*Client).SlackPost(0xc0008e1d00, {{0xc00005e8a0, 0x24}, {0xc000aaaa00, 0x266}, 0x5, {0xc000114080, 0x19}, {0xb860900, 0xede8d3286, ...}, ...})
   /home/runner/work/falcosidekick/falcosidekick/outputs/slack.go:152 +0x78
created by main.forwardEvent in goroutine 502010
   /home/runner/work/falcosidekick/falcosidekick/handlers.go:235 +0x148
goroutine 1 [IO wait]:
internal/poll.runtime_pollWait(0x7fce1861fed0, 0x72)
   $GOROOT/src/runtime/netpoll.go:345 +0x85
internal/poll.(*pollDesc).wait(0x3?, 0x1?, 0x0)
   $GOROOT/src/internal/poll/fd_poll_runtime.go:84 +0x27
internal/poll.(*pollDesc).waitRead(...)
   $GOROOT/src/internal/poll/fd_poll_runtime.go:89
internal/poll.(*FD).Accept(0xc0009dd100)
   $GOROOT/src/internal/poll/fd_unix.go:611 +0x2ac
net.(*netFD).accept(0xc0009dd100)
   $GOROOT/src/net/fd_unix.go:172 +0x29
net.(*TCPListener).accept(0xc0009c95e0)
   $GOROOT/src/net/tcpsock_posix.go:159 +0x1e
net.(*TCPListener).Accept(0xc0009c95e0)
   $GOROOT/src/net/tcpsock.go:327 +0x30
net/http.(*Server).Serve(0xc000568690, {0x3079fb0, 0xc0009c95e0})
   $GOROOT/src/net/http/server.go:3255 +0x33e
net/http.(*Server).ListenAndServe(0xc000568690)
   $GOROOT/src/net/http/server.go:3184 +0x71
main.main()
   /home/runner/work/falcosidekick/falcosidekick/main.go:934 +0x1287
goroutine 13 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc000143680)
   pkg/mod/go.opencensus.io@v0.24.0/stats/view/worker.go:292 +0x9f
created by go.opencensus.io/stats/view.init.0 in goroutine 1
   pkg/mod/go.opencensus.io@v0.24.0/stats/view/worker.go:34 +0x8d
goroutine 502011 [runnable]:
net.(*OpError).Timeout(0xc0000cf400?)
   $GOROOT/src/net/net.go:507 +0x133
net/http.(*connReader).backgroundRead(0xc00067d290)
   $GOROOT/src/net/http/server.go:708 +0xa9
created by net/http.(*connReader).startBackgroundRead in goroutine 502010
   $GOROOT/src/net/http/server.go:677 +0xba
goroutine 502013 [runnable]:
bytes.(*Buffer).WriteByte(0xc000ce8980?, 0x7b?)
   $GOROOT/src/bytes/buffer.go:285 +0x9c
encoding/json.mapEncoder.encode({0xc000b16538?}, 0xc000ce8980, {0x2426d60?, 0xc00067d3b0?, 0x2426d60?}, {0x14?, 0x0?})
   $GOROOT/src/encoding/json/encode.go:737 +0x215
encoding/json.(*encodeState).reflectValue(0xc000ce8980, {0x2426d60?, 0xc00067d3b0?, 0x7c9779?}, {0x40?, 0xde?})
   $GOROOT/src/encoding/json/encode.go:321 +0x73
encoding/json.interfaceEncoder(0xc000ce8980, {0x23dde40?, 0xc0008c66f0?, 0x6f8345?}, {0x60?, 0xa6?})
   $GOROOT/src/encoding/json/encode.go:658 +0xba
encoding/json.structEncoder.encode({{{0xc00033e488, 0x8, 0x8}, 0xc000652a80, 0xc000652ab0}}, 0xc000ce8980, {0x273f520?, 0xc0008c6680?, 0xc0000f8f20?}, {0x0, ...})
   $GOROOT/src/encoding/json/encode.go:704 +0x21e
encoding/json.ptrEncoder.encode({0xc0000f8f20?}, 0xc000ce8980, {0x2275700?, 0xc0000f8f20?, 0xc0000f8f20?}, {0xa?, 0x0?})
   $GOROOT/src/encoding/json/encode.go:876 +0x23c
encoding/json.structEncoder.encode({{{0xc00033e008, 0x8, 0x8}, 0xc000652b40, 0xc000652ba0}}, 0xc000ce8980, {0x273f640?, 0xc0000f8ea0?, 0xc000b16950?}, {0x0, ...})
   $GOROOT/src/encoding/json/encode.go:704 +0x21e
encoding/json.(*encodeState).reflectValue(0xc000ce8980, {0x273f640?, 0xc0000f8ea0?, 0x4?}, {0x60?, 0x24?})
   $GOROOT/src/encoding/json/encode.go:321 +0x73
encoding/json.(*encodeState).marshal(0x411ce5?, {0x273f640?, 0xc0000f8ea0?}, {0xc8?, 0xa5?})
   $GOROOT/src/encoding/json/encode.go:297 +0xc5
encoding/json.Marshal({0x273f640, 0xc0000f8ea0})
   $GOROOT/src/encoding/json/encode.go:163 +0xd0
github.com/PagerDuty/go-pagerduty.ManageEventWithContext({0x3089ca0, 0x46aa1a0}, {{0xc000064015, 0x20}, {0x289802d, 0x7}, {0x0, 0x0}, {0x0, 0x0, ...}, ...})
   pkg/mod/github.com/!pager!duty/go-pagerduty@v1.8.0/event_v2.go:175 +0x74
github.com/falcosecurity/falcosidekick/outputs.(*Client).PagerdutyPost(0xc0008e1e00, {{0xc00005e8a0, 0x24}, {0xc000aaaa00, 0x266}, 0x5, {0xc000114080, 0x19}, {0xb860900, 0xede8d3286, ...}, ...})
   /home/runner/work/falcosidekick/falcosidekick/outputs/pagerduty.go:34 +0x1ac
created by main.forwardEvent in goroutine 502010
   /home/runner/work/falcosidekick/falcosidekick/handlers.go:375 +0x2d28
goroutine 502010 [sync.Cond.Wait]:
sync.runtime_notifyListWait(0xc000ce8690, 0x0)
   $GOROOT/src/runtime/sema.go:569 +0x159
sync.(*Cond).Wait(0xc00067d290?)
   $GOROOT/src/sync/cond.go:70 +0x85
net/http.(*connReader).abortPendingRead(0xc00067d290)
   $GOROOT/src/net/http/server.go:729 +0xa6
net/http.(*response).finishRequest(0xc000578b60)
   $GOROOT/src/net/http/server.go:1671 +0x87
net/http.(*conn).serve(0xc000897560, {0x3089e60, 0xc00066de90})
   $GOROOT/src/net/http/server.go:2045 +0x62b
created by net/http.(*Server).Serve in goroutine 1
   $GOROOT/src/net/http/server.go:3285 +0x4b4

This is another issue created about this "bug", wasn't able to reproduce til now falcosecurity/charts#746

Which version of Falcosidekick are you running? The 2.29.0 or the latest (== master) ?

Issif commented

Are you still facing the issue?