Proposal: add generic parameters to types to distinguish different data structures on different architectures
Opened this issue · 3 comments
Motivation
The link between core data types (such as Capstone
, Insn
, InsnDetail
, etc.) and architecture-specific data types (the data types under crate::arch
) is not that "explicit" on the typing. For example, to get the instruction ID on the x86 arch, we have to code:
let cs = Capstone::new().x86().build().unwrap();
let insns = cs.disasm_all(CODE, 0x1000).unwrap();
for i in insns.as_ref() {
let insn_id = unsafe {
// Neither typing nor documentation mention this!
std::mem::transmute::<_, X86Insn>(i.id().0)
};
// do something with insn_id
}
which is not easier for beginners to catch.
We can simply resolve the problem above by adding a method arch_insn_id
that returns the corresponding instruction ID enum variants just like the InsnDetail::arch_detail
method:
pub enum ArchInsnId {
X86InsnId(X86Insn),
// other architectures
}
impl<'a> Insn<'a> {
pub fn arch_insn_id(&self) -> ArchInsnId {
// ... code
}
}
This leads to another slightly disturbing problem. We have to match
against the return value of arch_insn_id
to extract the x86-specific instruction ID, given that we are already confident about the architecture. This problem also arises when we call the InsnDetail::arch_detail
method or other methods with a similarly-typed return value.
The Proposal
The proposal posted here is only an (too-)early draft and more details may be missing for further considerations and discussions.
First of all, we can add a new trait that abstracts a specific architecture:
pub trait Arch {
type InsnId;
type InsnDetail;
// ... other stuff
}
Then, we add a generic parameter to Capstone
, Insn
and InsnDetail
that represents the architecture:
pub struct Capstone<A: Arch> {
// ... fields
}
pub struct Insn<'a, A: Arch> {
// ... fields
}
pub struct InsnDetail<'a, A: Arch> {
// ... fields
}
Then, the methods mentioned in the motivation section can be typed in a more straight-forward way:
impl<A: Arch> Capstone<A> {
pub fn insn_detail<'s, 'i: 's>(
&'s self,
insn: &'i Insn<'_, A>
) -> CsResult<InsnDetail<'i, A>> {
// ... code
}
}
impl<'a, A: Arch> Insn<'a, A> {
pub fn id(&self) -> A::InsnId {
// ... code
}
}
impl<'a, A: Arch> InsnDetail<'a, A> {
pub fn arch_detail(&self) -> A::InsnDetail {
// ... code
}
}
No more match
es, as long as we're targeting a specific architecture. Also, beginners can find corresponding architecture-specific implementations just by looking at the typings. The instruction ID problem can be resolved accordingly.
Unresolved Problems
When the target architecture cannot be determined during compile-time (when the disassembler is created by the Capstone::new_raw
method), the generic parameters cannot be set to represent specific architecture. To resolve this problem, maybe we need to introduce a DynamicArch
that implements Arch
and represents the target architecture is determined during runtime.
This proposal may be pre-mature, but I do think that it reveals some (possibly minor) problems.
Thanks for filing the issue! I have thought about doing something like this before. I did not do this originally because I wanted to support "dynamic architecture". I like your idea of a DynamicArch
.
I do not have the bandwidth to implement this myself, but feel free to start a PR (even if it's just a work-in-progress). I am happy to help without pointers/review.
Thanks for your reply. I'm glad to try to help implement this draft.
I think I can work on a new sub-module named experimental
where all the new stuff goes. When everything is ready, we merge the sub-module into the root module. Is that sounds OK?
There's not need to make a separate submodule--just put new code in whatever module makes sense. It's always easy to move it around anyway.