<POD MEM LOAD> failed，and actually no chaos_os in Pod

Question

<POD MEM LOAD> failed，and actually no chaos_os in Pod

Opened this issue 2 years ago · 4 comments

ddd1123 commented 2 years ago

## Issue Description

Type: bug report

### Describe what happened (or what feature you want)

在 chaosblade-box 中通过 agent 获取 K8s 集群信息，进行 POD MEM LOAD 演练；
在 box 平台中，演练步骤提示成功，而恢复过程报错（原因: destory experiment failed, cannot get the chaos_os program）；
通过进入 pod 利用 top 命令观察演练情况发现，chaos_os 进程在演练开始时仅仅出现了一下就消失了，同时 pod 的 men load 并未发生变化。

### Describe what you expected to happen

### How to reproduce it (as minimally and precisely as possible)

### Tell us your environment

K8s：v1.18.18
chaosblade-box：v1.0.1
chaos-agent：v1.0.0
chaos-operator：v1.6.0
chaos-tool：v1.6.0

### Anything else we need to know?

Answer 1 · 2022-08-09T06:51:20.000Z

operator的日志：
演练进行（节选）：
time="2022-08-09T03:29:12Z" level=info msg="Exec command in pod" command="[/opt/chaosblade/blade create cri mem load --reserve=100 --timeout=185 --container-id f180423f7d0e --container-runtime docker]" container=chaosblade-tool podName=chaosblade-tool-ljpcv podNamespace=chaosblade
time="2022-08-09T03:29:12Z" level=info msg="get output message" command="[/opt/chaosblade/blade create cri mem load --reserve=100 --timeout=185 --container-id f180423f7d0e --container-runtime docker]" container=chaosblade-tool err= out="{"code":200,"success":true,"result":"ebba539c7840d9d0"}" podName=chaosblade-tool-ljpcv podNamespace=chaosblade
time="2022-08-09T03:29:12Z" level=info msg="exec output: {"code":200,"success":true,"result":"ebba539c7840d9d0"}\n" location=github.com/chaosblade-io/chaosblade-spec-go/util.Infof uid=

恢复（节选）：
time="2022-08-09T03:29:33Z" level=info msg="execute identifier: {ContainerObjectMeta:{Id:ebba539c7840d9d0 ContainerRuntime:docker ContainerId:f180423f7d0e ContainerName:tc-image PodName:tc-demo-7c9875798c-lhsr6 NodeName:192.168.0.4 Namespace:tc-demo} Command:/opt/chaosblade/blade destroy cri mem load --reserve=100 --timeout=185 --container-id f180423f7d0e --container-runtime docker --uid ebba539c7840d9d0 Error: Code:0 ChaosBladePodName:chaosblade-tool-ljpcv ChaosBladeNamespace:chaosblade ChaosBladeContainerName:chaosblade-tool}" experiment=1ec6141fda19ee67
time="2022-08-09T03:29:33Z" level=info msg="Exec command in pod" command="[/opt/chaosblade/blade destroy cri mem load --reserve=100 --timeout=185 --container-id f180423f7d0e --container-runtime docker --uid ebba539c7840d9d0]" container=chaosblade-tool podName=chaosblade-tool-ljpcv podNamespace=chaosblade
time="2022-08-09T03:29:33Z" level=info msg="get err message" command="[/opt/chaosblade/blade destroy cri mem load --reserve=100 --timeout=185 --container-id f180423f7d0e --container-runtime docker --uid ebba539c7840d9d0]" container=chaosblade-tool err="{"code":63063,"success":false,"error":"destory experiment failed, cannot get the chaos_os program"}" out= podName=chaosblade-tool-ljpcv podNamespace=chaosblade
time="2022-08-09T03:29:33Z" level=error msg="pods/exec: k8s exec failed, err: {"code":63063,"success":false,"error":"destory experiment failed, cannot get the chaos_os program"}\n" location=github.com/chaosblade-io/chaosblade-spec-go/util.Errorf uid=ebba539c7840d9d0
time="2022-08-09T03:29:33Z" level=info msg="success: false, statuses: [{Id:ebba539c7840d9d0 State:Error Code:63063 Error:destory experiment failed, cannot get the chaos_os program Success:false Kind:pod Identifier:tc-demo/192.168.0.4/tc-demo-7c9875798c-lhsr6/tc-image/f180423f7d0e/docker}]" experiment=1ec6141fda19ee67

Answer 2 · 2022-08-17T09:01:26.000Z

想问一下：
在chaosblade-box创建”Pod内内存负载“的演练中，关于 Fault Configuration(故障配置) 的 avoid-being-killed 这一项如果为了避免被killed应该填什么内容呢

Answer 3 · 2022-08-18T01:57:16.000Z

想问一下：在chaosblade-box创建”Pod内内存负载“的演练中，关于 Fault Configuration(故障配置) 的 avoid-being-killed 这一项如果为了避免被killed应该填什么内容呢

true 可以使该配置项生效，可以通过将该参数配置为true测试是否能解决该问题

Answer 4 · 2022-08-18T07:08:09.000Z

想问一下：在chaosblade-box创建”Pod内内存负载“的演练中，关于 Fault Configuration(故障配置) 的 avoid-being-killed 这一项如果为了避免被killed应该填什么内容呢

true 可以使该配置项生效，可以通过将该参数配置为true测试是否能解决该问题

已配置参数为true，但情况相同，问题仍未解决。