SIGSEGV when using QemuForkExecutor in "arm" feature, and Unknown error: Unix error: ECHILD
Opened this issue · 6 comments
The issue to be present in the current main branch
$ git log | head -n 1
commit dfd5609c10da85f32e0dec74a72a432acd85310a
Describe the issue
I am doing some fuzzing practice using an tenda VC 15 router httpd, which is 32-bit arm architecture. I use a QemuForkExecutor, but got an error when load the initial inputs:
Failed to load initial corpus at ["./seed/"]
I print the error,
if state.must_load_initial_inputs() {
state
.load_initial_inputs(&mut fuzzer, &mut executor, &mut mgr, &intial_dirs)
.unwrap_or_else(|a| {
println!("{}", a);
println!("Failed to load initial corpus at {:?}", &intial_dirs);
process::exit(0);
});
println!("We imported {} inputs from disk.", state.corpus().count());
}
and it says:
Unknown error: Unix error: ECHILD
I debug the fuzzer, and find out that the fuzzer receives a SIGSEGV in trace_edge_hitcount_ptr:
715 pub unsafe extern "C" fn trace_edge_hitcount_ptr(_: *const (), id: u64) {
716 unsafe {
717 let ptr = LIBAFL_QEMU_EDGES_MAP_PTR.add(id as usize);
► 718 *ptr = (*ptr).wrapping_add(1);
719 }
720 }
pwndbg> p ptr
$1 = (*mut u8) 0x4d55bbdb022cd456
pwndbg> p *ptr
Cannot access memory at address 0x4d55bbdb022cd456
It seems that the value of ptr cannot be dereferenced. I know that this function is used to record the coverage, but I don't know what "id" or "ptr" mean. So I read the related instrumentation code in qemu-libafl-bridge.
//$ git log | head -n 1
//commit 805b14ffc44999952562e8f219d81c21a4fa50b9
// in accel/tcg/cpu_exec.c, cpu_exec_loop
//// --- Begin LibAFL code ---
bool libafl_edge_generated = false;
TranslationBlock *edge;
/* See if we can patch the calling TB. */
if (last_tb) {
// tb_add_jump(last_tb, tb_exit, tb);
if (last_tb->jmp_reset_offset[1] != TB_JMP_OFFSET_INVALID) {
mmap_lock();
edge = libafl_gen_edge(cpu, last_tb->pc, pc, tb_exit, cs_base, flags, cflags);
mmap_unlock();
if (edge) {
tb_add_jump(last_tb, tb_exit, edge);
tb_add_jump(edge, 0, tb);
libafl_edge_generated = true;
} else {
tb_add_jump(last_tb, tb_exit, tb);
}
} else {
tb_add_jump(last_tb, tb_exit, tb);
}
}
if (libafl_edge_generated) {
// execute the edge to make sure to log it the first execution
// the edge will then jump to the translated block
cpu_loop_exec_tb(cpu, edge, pc, &last_tb, &tb_exit);
} else {
cpu_loop_exec_tb(cpu, tb, pc, &last_tb, &tb_exit);
}
//// --- End LibAFL code ---
My understanding is: if a new translation block is generated by libafl_gen_edge
, it is executed first, and then it is recorded on the coverage graph by jumping to trace_edge_hitcount_ptr
through the hook. (I use StdEdgeCoverageChildModule, and I remember it used the edge type hook.)
Also, I debugged this part of codes. Considering the contents of the TranslationBlock
structure, I found the specific contents of the edge
variable:
// edge->tc.ptr
pwndbg> p/x *itb
$7 = {
pc = 0x40a23030,
cs_base = 0x480,
flags = 0x0,
cflags = 0x800010,
size = 0x1,
icount = 0x1,
tc = {
ptr = 0x710ee4e00740,
size = 0x38
},
itree = {
rb = {
rb_parent_color = 0xfec7058d4840804b,
rb_right = 0x48fffff959e9ffff,
rb_left = 0x4de9fffffebd058d
},
start = 0x40a23030,
last = 0xffffffffffffffff,
subtree_last = 0x0
},
jmp_lock = {
value = 0x0
},
jmp_reset_offset = {0x20, 0xffff},
jmp_insn_offset = {0x1c, 0xffff},
jmp_target_addr = {0x710ee4e00500, 0x0},
jmp_list_head = 0x710ee4e002c0,
jmp_list_next = {0x0, 0x0},
jmp_dest = {0x710ee4e00440, 0x0}
}
pwndbg> x/16x 0x710ee4e00740
0x710ee4e00740 <code_gen_buffer+1811>: 0x3456be48 0x43f7dbc5 0xbf484d55 0x7f076fa0
Note the value of tc.ptr
here. It is <code_gen_buffer+1811>. The machine code it points to is 0x43f7dbc53456be48
, and gdb told me it means movabs rsi, 0x4d5543f7dbc53456
.
While tracing the code flow later, I found that the fuzzer jumped to a small section of code hook to prepare parameters(moving to rdi and rsi), and then jumped to trace_edge_hitcount_ptr
.
0x5a457a1901df <cpu_exec_loop.isra+783> mov r12, qword ptr [r8 + 0x20]
0x5a457a1901e3 <cpu_exec_loop.isra+787> test eax, 0x120
0x5a457a1901e8 <cpu_exec_loop.isra+792> jne cpu_exec_loop.isra+1720 <cpu_exec_loop.isra+1720>
0x5a457a1901ee <cpu_exec_loop.isra+798> lea rax, [rip + 0x3d2c0cb] RAX => 0x5a457debc2c0 (tcg_qemu_tb_exec) —▸ 0x710ee4e00000 ◂— push rbp /* 0x5641554154415355 */
// R12 is 0x710ee4e00740 (code_gen_buffer+1811) ◂— movabs rsi, 0x4d5543f7dbc53456 /* 0x43f7dbc53456be48 */
0x710ee4e00000 push rbp
0x710ee4e00001 push rbx
0x710ee4e00002 push r12
0x710ee4e00004 push r13
0x710ee4e00006 push r14
0x710ee4e00008 push r15
0x710ee4e0000a mov rbp, rdi RBP => 0x5a457f038920 ◂— 0x123fb400000000
0x710ee4e0000d add rsp, -0x488 RSP => 0x7ffffcc93560 (0x7ffffcc939e8 + -0x488)
0x710ee4e00014 jmp rsi <code_gen_buffer+1811>
↓
0x710ee4e00740 <code_gen_buffer+1811> movabs rsi, 0x4d5543f7dbc53456 RSI => 0x4d5543f7dbc53456
0x710ee4e0074a <code_gen_buffer+1821> movabs rdi, 0x5a457f076fa0 RDI => 0x5a457f076fa0 ◂— 0
► 0x710ee4e00754 <code_gen_buffer+1831> call qword ptr [rip + 0x16] <libafl_qemu::modules::edges::trace_edge_hitcount_ptr>
rdi: 0x5a457f076fa0 ◂— 0
rsi: 0x4d5543f7dbc53456
This seems to indicate that the number following movabs rsi,
will become the id
. But the values I have here don't look right.
My issues now are as follows:
- What does id actually represent?
- How is it calculated?
- How can I solve this problem?
- Do I need to provide any additional information?
Thank you very much!
Hi, I already know that id is generated through libafl_qemu_hook_edge_gen
->create_gen_wrapper->gen_hashed_edge_ids
(in StdEdgeCoverageChildModule
). Now I am debugging this part of code...
I found the process of calculating id and the intermediate value. The calculated id is indeed 0x4d5543f7dbc53456. Do you think there is a problem?
// src is 0x40a23030, dest is 0x40a23058
*RAX 0x4a265dc83567e8c8 hash_me(src)
*RAX 0x7731e3feea2dc9e hash_me(dest)
► 0x5af412b8763e <libafl_qemu::modules::edges::gen_hashed_edge_ids+174> xor rax, rcx RAX => 0x4d5543f7dbc53456 (0x4a265dc83567e8c8 ^ 0x7731e3feea2dc9e)
Considering that LIBAFL_QEMU_EDGES_MAP_PTR
is 0x761cc2856000, maybe it causes the SIGSEGV because it exceeds its range after the addition?
715 pub unsafe extern "C" fn trace_edge_hitcount_ptr(_: *const (), id: u64) {
716 unsafe {
717 let ptr = LIBAFL_QEMU_EDGES_MAP_PTR.add(id as usize);
► 718 *ptr = (*ptr).wrapping_add(1);
719 }
720 }
$25 = 0x4d5543f7dbc53456
pwndbg> p/x LIBAFL_QEMU_EDGES_MAP_PTR
$26 = 0x761cc2856000
I tried to change id as usize
to (id as u16).try_into().unwrap(). This part is fine for now. (This is just a temporary solution.) But when I continued, the error Unknown error: Unix error: ECHILD
still occurred. This seems to be because in the run_target
method of GenericInProcessForkExecutorInner
, the parent process does not correctly capture the exit of the child process. I will debug further.
I found that in parent
method of GenericInProcessForkExecutorInner
, waitpid
returned this error. This error, Unknown error: Unix error: ECHILD
, means there is no child process.
pub(super) fn parent(&mut self, child: Pid) -> Result<ExitKind, Error> {
let res = waitpid(child, None)?;
//...
}
The waitpid
in nix calls libc::waitpid
. I searched on search engines and only this question seems to be somewhat related.
The first fork happens in SimpleRestartingEventManager::launch
. The second fork happens in the run_target
function of the Executor called inside load_initial_inputs.
pub fn launch(mut monitor: MT, shmem_provider: &mut SP) -> Result<(Option<S>, Self), Error>
where
S: DeserializeOwned + Serialize + HasCorpus + HasSolutions,
MT: Debug,
{
// Client->parent loop
loop {
log::info!("Spawning next client (id {ctr})");
// On Unix, we fork
#[cfg(all(unix, feature = "fork"))]
let child_status = {
shmem_provider.pre_fork()?;
match unsafe { fork() }? {
ForkResult::Parent(handle) => {
unsafe {
libc::signal(libc::SIGINT, libc::SIG_IGN);
}
shmem_provider.post_fork(false)?;
handle.status()
}
ForkResult::Child => {
shmem_provider.post_fork(true)?;
break staterestorer;
}
}
};
}
fn run_target() -> Result<ExitKind, Error> {
*state.executions_mut() += 1;
unsafe {
self.inner.shmem_provider.pre_fork()?;
match fork() {
Ok(ForkResult::Child) => {
// Child
self.inner.pre_run_target_child(fuzzer, state, mgr, input)?;
(self.harness_fn)(input, &mut self.exposed_executor_state);
self.inner.post_run_target_child(fuzzer, state, mgr, input);
Ok(ExitKind::Ok)
}
Ok(ForkResult::Parent { child }) => {
// Parent
self.inner.parent(child)
}
Err(e) => Err(Error::from(e)),
}
}
}
I am still somewhat confused as to why this happened...
thank you for the detailed report.
I found the process of calculating id and the intermediate value. The calculated id is indeed 0x4d5543f7dbc53456. Do you think there is a problem?
// src is 0x40a23030, dest is 0x40a23058 *RAX 0x4a265dc83567e8c8 hash_me(src) *RAX 0x7731e3feea2dc9e hash_me(dest) ► 0x5af412b8763e <libafl_qemu::modules::edges::gen_hashed_edge_ids+174> xor rax, rcx RAX => 0x4d5543f7dbc53456 (0x4a265dc83567e8c8 ^ 0x7731e3feea2dc9e)
Considering that
LIBAFL_QEMU_EDGES_MAP_PTR
is 0x761cc2856000, maybe it causes the SIGSEGV because it exceeds its range after the addition?715 pub unsafe extern "C" fn trace_edge_hitcount_ptr(_: *const (), id: u64) { 716 unsafe { 717 let ptr = LIBAFL_QEMU_EDGES_MAP_PTR.add(id as usize); ► 718 *ptr = (*ptr).wrapping_add(1); 719 } 720 } $25 = 0x4d5543f7dbc53456 pwndbg> p/x LIBAFL_QEMU_EDGES_MAP_PTR $26 = 0x761cc2856000
about the first bug, could you print the value of LIBAFL_QEMU_EDGES_MAP_MASK_MAX
once in the gen_hashed_edge_ids
function?
I found that in
parent
method ofGenericInProcessForkExecutorInner
,waitpid
returned this error. This error,Unknown error: Unix error: ECHILD
, means there is no child process.pub(super) fn parent(&mut self, child: Pid) -> Result<ExitKind, Error> { let res = waitpid(child, None)?; //... }The
waitpid
in nix callslibc::waitpid
. I searched on search engines and only this question seems to be somewhat related. The first fork happens inSimpleRestartingEventManager::launch
. The second fork happens in therun_target
function of the Executor called inside load_initial_inputs.pub fn launch(mut monitor: MT, shmem_provider: &mut SP) -> Result<(Option<S>, Self), Error> where S: DeserializeOwned + Serialize + HasCorpus + HasSolutions, MT: Debug, { // Client->parent loop loop { log::info!("Spawning next client (id {ctr})"); // On Unix, we fork #[cfg(all(unix, feature = "fork"))] let child_status = { shmem_provider.pre_fork()?; match unsafe { fork() }? { ForkResult::Parent(handle) => { unsafe { libc::signal(libc::SIGINT, libc::SIG_IGN); } shmem_provider.post_fork(false)?; handle.status() } ForkResult::Child => { shmem_provider.post_fork(true)?; break staterestorer; } } }; } fn run_target() -> Result<ExitKind, Error> { *state.executions_mut() += 1; unsafe { self.inner.shmem_provider.pre_fork()?; match fork() { Ok(ForkResult::Child) => { // Child self.inner.pre_run_target_child(fuzzer, state, mgr, input)?; (self.harness_fn)(input, &mut self.exposed_executor_state); self.inner.post_run_target_child(fuzzer, state, mgr, input); Ok(ExitKind::Ok) } Ok(ForkResult::Parent { child }) => { // Parent self.inner.parent(child) } Err(e) => Err(Error::from(e)), } } }I am still somewhat confused as to why this happened...
maybe it's about the order in which processes die?
@rmalmain Thank you for your reply.
For the first question, my LIBAFL_QEMU_EDGES_MAP_MASK_MAX
is 0xffff.
About the second one, I write this fuzzer on an httpd program. There are many places in the program that use fork
to process commands. Maybe I should find a more appropriate way to write the fuzzer.