Proposal: add generic parameters to types to distinguish different data structures on different architectures

Question

Proposal: add generic parameters to types to distinguish different data structures on different architectures

Opened this issue 3 years ago · 3 comments

Motivation

The link between core data types (such as Capstone, Insn, InsnDetail, etc.) and architecture-specific data types (the data types under crate::arch) is not that "explicit" on the typing. For example, to get the instruction ID on the x86 arch, we have to code:

let cs = Capstone::new().x86().build().unwrap();
let insns = cs.disasm_all(CODE, 0x1000).unwrap();
for i in insns.as_ref() {
    let insn_id = unsafe {
        // Neither typing nor documentation mention this!
        std::mem::transmute::<_, X86Insn>(i.id().0)
    };
    // do something with insn_id
}

which is not easier for beginners to catch.

We can simply resolve the problem above by adding a method arch_insn_id that returns the corresponding instruction ID enum variants just like the InsnDetail::arch_detail method:

pub enum ArchInsnId {
    X86InsnId(X86Insn),
    // other architectures
}

impl<'a> Insn<'a> {
    pub fn arch_insn_id(&self) -> ArchInsnId {
        // ... code
    }
}

This leads to another slightly disturbing problem. We have to match against the return value of arch_insn_id to extract the x86-specific instruction ID, given that we are already confident about the architecture. This problem also arises when we call the InsnDetail::arch_detail method or other methods with a similarly-typed return value.

The Proposal

The proposal posted here is only an (too-)early draft and more details may be missing for further considerations and discussions.

First of all, we can add a new trait that abstracts a specific architecture:

pub trait Arch {
    type InsnId;
    type InsnDetail;
    // ... other stuff
}

Then, we add a generic parameter to Capstone, Insn and InsnDetail that represents the architecture:

pub struct Capstone<A: Arch> {
    // ... fields
}

pub struct Insn<'a, A: Arch> {
    // ... fields
}

pub struct InsnDetail<'a, A: Arch> {
    // ... fields
}

Then, the methods mentioned in the motivation section can be typed in a more straight-forward way:

impl<A: Arch> Capstone<A> {
    pub fn insn_detail<'s, 'i: 's>(
        &'s self, 
        insn: &'i Insn<'_, A>
    ) -> CsResult<InsnDetail<'i, A>> {
        // ... code
    }
}

impl<'a, A: Arch> Insn<'a, A> {
    pub fn id(&self) -> A::InsnId {
        // ... code
    }
}

impl<'a, A: Arch> InsnDetail<'a, A> {
    pub fn arch_detail(&self) -> A::InsnDetail {
        // ... code
    }
}

No more matches, as long as we're targeting a specific architecture. Also, beginners can find corresponding architecture-specific implementations just by looking at the typings. The instruction ID problem can be resolved accordingly.

Unresolved Problems

When the target architecture cannot be determined during compile-time (when the disassembler is created by the Capstone::new_raw method), the generic parameters cannot be set to represent specific architecture. To resolve this problem, maybe we need to introduce a DynamicArch that implements Arch and represents the target architecture is determined during runtime.

This proposal may be pre-mature, but I do think that it reveals some (possibly minor) problems.

Answer 1 · 2021-10-30T05:33:33.000Z

Thanks for filing the issue! I have thought about doing something like this before. I did not do this originally because I wanted to support "dynamic architecture". I like your idea of a DynamicArch.

I do not have the bandwidth to implement this myself, but feel free to start a PR (even if it's just a work-in-progress). I am happy to help without pointers/review.

Answer 2 · 2021-10-30T06:16:48.000Z

Thanks for your reply. I'm glad to try to help implement this draft.

I think I can work on a new sub-module named experimental where all the new stuff goes. When everything is ready, we merge the sub-module into the root module. Is that sounds OK?

Answer 3 · 2021-10-30T07:16:02.000Z

There's not need to make a separate submodule--just put new code in whatever module makes sense. It's always easy to move it around anyway.