m4b/faerie

Compiler API tracking issue

Closed this issue · 6 comments

m4b commented

This is the current, completely experimental API that will likely have to change:

pub trait Artifact {
    fn new(target: Target, name: Option<String>) -> Self;
    fn add_code(&mut self, name: String, code: Code);
    fn add_data(&mut self, name: String, data: Data);
    fn import(&mut self, import: String);
    fn link_import(&mut self, caller: &str, import: &str, offset: usize);
    fn link(&mut self, to: &str, from: &str, offset: usize);
    fn write<T: Write + Seek + ::std::fmt::Debug>(self, file: T) -> error::Result<()>;
}

Things I like about it, and would like to desperately keep if possible:

  1. simple
  2. elegant
  3. simple
  4. abstract

But I don't know what the demands of a semi-complicated compiler backend will require; weak symbols? complicated relocations (probably not, most object files have about 4-5 unique ELF x86_64 or ARM relocations, for example), anything else?

It probably needs to be lived in a bit, is my guess, hence this tracking issue

m4b commented

Suggestions from @ubsan on IRC: file type should not be required immediately; instead you pass in functions, imports, relocations, etc., and then, perhaps in write (I also like the name, emit it sounds cooler), you'd do:

obj.write::<Elf>(name)?

It also just struck me, if we didn't take self, we could also emit several formats at once:

obj.write::<Elf>(name)?;
obj.write::<Mach>(name)?;
obj.write::<PE>(name)?;

etc., which is cool

m4b commented

On hindsight, Artifact should not be a trait, but rather a simple struct with an api like above, but:

pub struct Artifact {
    fn new(target: Target, name: Option<String>) -> Self;
    fn add_code(&mut self, name: String, code: Code);
    fn add_data(&mut self, name: String, data: Data);
    fn import(&mut self, import: String);
    fn link_import(&mut self, caller: &str, import: &str, offset: usize);
    fn link(&mut self, to: &str, from: &str, offset: usize);
    fn emit<O: Object>(&self) -> error::Result<Vec<u8>>;
    fn write<O: Object>(&self, name: &str) -> error::Result<()>;
}

Where Object is a trait:

pub trait Object {
  fn to_object(&Artifact) -> Vec<u8>
}

Then, Elf, MachO, and Pe will impl Object, and given an artifact, or some intermediate form with code and data, can marshal this into a vector of bytes, which the downstream consumer can then write to disk, etc.

E.g.:

  let artifact = Artifact::new(// bla bla);
  // bla bla add code data bla  bla
  let elf_bytes = artifact.emit::<Elf>()?;
  // and then we can write, or have artifact do it
  let pe_bytes = artifact.emit::<Pe>()?;
  // etc.

Is emitting several formats at once useful? Beyond each object format having its own relocations, they also their own GOT/PLT/etc. schemes, which require different instructions. Above that, there are platform-specific ABI variations, for example, Windows has a different calling convention from Darwin and ELF platforms on x64.

At first glance, it seems like there'd be too many complications to make use of this flexibility in practice, so it shouldn't be something to design the API around. But I'd be happy to learn otherwise :-).

m4b commented

Just briefly; can respond in more detail later, but first so it's clear, this repo is a cross platform object file generator, not an cross platform assembler or cross platform linker. I have toyed with a cross platform linker but that's not here right now, and is (alas) out of scope. One day.

Consequently the code you have to dump are unrelocated raw bytes (I can't assemble asm for you), and the files it outputs are the platforms version of an object file, so it is unlinked.

Afaik all platform object files have no notion of a GOT w.r.t an actual structure in the object file, as this is generated by the linker at link time depending on the code and relocations (e.g. If it has a GOTPC32 or something).

There definitely isn't a PLT, this for sure is generated at link time by the linker, so these two concerns are less important. So you and I and whoever don't really need to worry about the PLT as the system linker constructs it according to whether there are unresolved symbols generally.

This latter part is the idea behind specifying an import, symbolically, and then this repo takes care of generating the correct platform relocation.

As for calling conventions yes of course they are different, but again, that's a detail of the bytes you're sending in, and not my responsibility; faerie knows nothing about the semantics of your bytes, only that they are code, data, a string, an import, and whatever else we deem necessary to type. It could be this simplication is too brittle / not comprehensive enough and explicit platform methods are necessary. That's fine; I'm expecting the API to be driven by organic uses, and don't have a general "philosophy" for what this crate is supposed to, other than get yo dang bytes out to disk!

Lastly you will always have the platform object at your disposal to manipulate and push bytes, relocations, etc., into directly, if that is your desire.

The idea of the Object trait was a sort of generic, uniform, simple, let's get started out putting basic functions and strings kind of backend option

I also do think it's possible to design a generic backend like this that does 90% of what you want for all the container formats, and that this would be cool, but it won't likely be a driving force behind the entire libraries API, which is primarily to get your bytes out into the world in any modern container format you want.

I hope that clears some things up? Lemme know what you are thinking

:)

Makes sense. I don't expect this repo to provide GOT/PLT/etc. or calling convention abstractions. My observation is just that since there are all these platform ABI differences between platforms, and the use cases I'm imagining will need to know about them, they'll be able to pick the container format they need up front.

I don't have a strong need either way; it just seems that telling the API the container format up front might provide the implementation some extra flexibility.

m4b commented

This issue isn't really useful anymore; API is becoming essentially stabilized around import/declare/define/link api afaics.

Thanks for everyone helping dogfood the initial version to arrive at a better api all around :)