chaosblade-io/chaosblade-operator

<POD MEM LOAD> failed,and actually no chaos_os in Pod

Opened this issue · 4 comments

## Issue Description

Type: bug report

### Describe what happened (or what feature you want)

  1. 在 chaosblade-box 中通过 agent 获取 K8s 集群信息,进行 POD MEM LOAD 演练;
  2. 在 box 平台中,演练步骤提示成功,而恢复过程报错(原因: destory experiment failed, cannot get the chaos_os program);
  3. 通过进入 pod 利用 top 命令观察演练情况发现,chaos_os 进程在演练开始时仅仅出现了一下就消失了,同时 pod 的 men load 并未发生变化。

### Describe what you expected to happen

### How to reproduce it (as minimally and precisely as possible)

### Tell us your environment

K8s:v1.18.18
chaosblade-box:v1.0.1
chaos-agent:v1.0.0
chaos-operator:v1.6.0
chaos-tool:v1.6.0

### Anything else we need to know?

operator的日志:
演练进行(节选):
time="2022-08-09T03:29:12Z" level=info msg="Exec command in pod" command="[/opt/chaosblade/blade create cri mem load --reserve=100 --timeout=185 --container-id f180423f7d0e --container-runtime docker]" container=chaosblade-tool podName=chaosblade-tool-ljpcv podNamespace=chaosblade
time="2022-08-09T03:29:12Z" level=info msg="get output message" command="[/opt/chaosblade/blade create cri mem load --reserve=100 --timeout=185 --container-id f180423f7d0e --container-runtime docker]" container=chaosblade-tool err= out="{"code":200,"success":true,"result":"ebba539c7840d9d0"}" podName=chaosblade-tool-ljpcv podNamespace=chaosblade
time="2022-08-09T03:29:12Z" level=info msg="exec output: {"code":200,"success":true,"result":"ebba539c7840d9d0"}\n" location=github.com/chaosblade-io/chaosblade-spec-go/util.Infof uid=

恢复(节选):
time="2022-08-09T03:29:33Z" level=info msg="execute identifier: {ContainerObjectMeta:{Id:ebba539c7840d9d0 ContainerRuntime:docker ContainerId:f180423f7d0e ContainerName:tc-image PodName:tc-demo-7c9875798c-lhsr6 NodeName:192.168.0.4 Namespace:tc-demo} Command:/opt/chaosblade/blade destroy cri mem load --reserve=100 --timeout=185 --container-id f180423f7d0e --container-runtime docker --uid ebba539c7840d9d0 Error: Code:0 ChaosBladePodName:chaosblade-tool-ljpcv ChaosBladeNamespace:chaosblade ChaosBladeContainerName:chaosblade-tool}" experiment=1ec6141fda19ee67
time="2022-08-09T03:29:33Z" level=info msg="Exec command in pod" command="[/opt/chaosblade/blade destroy cri mem load --reserve=100 --timeout=185 --container-id f180423f7d0e --container-runtime docker --uid ebba539c7840d9d0]" container=chaosblade-tool podName=chaosblade-tool-ljpcv podNamespace=chaosblade
time="2022-08-09T03:29:33Z" level=info msg="get err message" command="[/opt/chaosblade/blade destroy cri mem load --reserve=100 --timeout=185 --container-id f180423f7d0e --container-runtime docker --uid ebba539c7840d9d0]" container=chaosblade-tool err="{"code":63063,"success":false,"error":"destory experiment failed, cannot get the chaos_os program"}" out= podName=chaosblade-tool-ljpcv podNamespace=chaosblade
time="2022-08-09T03:29:33Z" level=error msg="pods/exec: k8s exec failed, err: {"code":63063,"success":false,"error":"destory experiment failed, cannot get the chaos_os program"}\n" location=github.com/chaosblade-io/chaosblade-spec-go/util.Errorf uid=ebba539c7840d9d0
time="2022-08-09T03:29:33Z" level=info msg="success: false, statuses: [{Id:ebba539c7840d9d0 State:Error Code:63063 Error:destory experiment failed, cannot get the chaos_os program Success:false Kind:pod Identifier:tc-demo/192.168.0.4/tc-demo-7c9875798c-lhsr6/tc-image/f180423f7d0e/docker}]" experiment=1ec6141fda19ee67

想问一下:
在chaosblade-box创建”Pod内内存负载“的演练中,关于 Fault Configuration(故障配置)avoid-being-killed 这一项如果为了避免被killed应该填什么内容呢

想问一下: 在chaosblade-box创建”Pod内内存负载“的演练中,关于 Fault Configuration(故障配置)avoid-being-killed 这一项如果为了避免被killed应该填什么内容呢

true 可以使该配置项生效,可以通过将该参数配置为true测试是否能解决该问题

想问一下: 在chaosblade-box创建”Pod内内存负载“的演练中,关于 Fault Configuration(故障配置)avoid-being-killed 这一项如果为了避免被killed应该填什么内容呢

true 可以使该配置项生效,可以通过将该参数配置为true测试是否能解决该问题

已配置参数为true,但情况相同,问题仍未解决。