libbpf/blazesym

Support remote symbolization

danielocfb opened this issue · 1 comments

Symbolization is a potentially resource intensive process and it may not be feasible to perform it on the very system where addresses are recorded. Embedded devices, for example, with limited disk space and CPU capacity, cannot afford to perform symbolization on the device itself: debug information can be large and would be prohibitive to disk space usage and so it is unlikely to be stored on the device itself and the process of symbolization is likely to impact other running applications negatively, would be taking excessive amounts of time, or both.

For that and other reasons, we'd like to support remote (or off-device) symbolization. The below (preliminary) API proposal flushes out the idea somewhat.

The local side normalizes a list of addresses using the normalize_addresses function:

pub type Address = usize;

mod address_meta {
    use super::*;

    /// A GNU build ID.
    type BuildId = String;


    /// Meta information about a Linux kernel address.
    #[derive(Clone, Debug)]
    pub struct Kernel {
        /// The kernel's release string (i.e., roughly what `uname -r` reports).
        ///
        /// This is a free-form string.
        pub release: String,
        /// The kernel binary's build ID, if available.
        pub build_id: Option<BuildId>,
        /// The struct is non-exhaustive and open to extension.
        #[doc(hidden)]
        pub _non_exhaustive: (),
    }


    /// Meta information about a Linux kernel module address.
    #[derive(Clone, Debug)]
    pub struct KernelModule {
        /// The name of the kernel module.
        pub name: String,
        /// The kernel module's version string.
        ///
        /// This is a free-form string. It may resemble bits of `modinfo`'s
        /// `vermagic` field.
        pub version: String,
        /// The kernel's release string (i.e., roughly what `uname -r` reports).
        ///
        /// This is a free-form string.
        pub kernel_release: String,
        /// The kernel module's build ID, if available.
        pub build_id: Option<BuildId>,
        /// The struct is non-exhaustive and open to extension.
        #[doc(hidden)]
        pub _non_exhaustive: (),
    }

    /// Meta information about a user space binary (executable or shared object).
    #[derive(Clone, Debug)]
    pub struct Binary{
        /// The canonical absolute path to the binary, including its name.
        pub path: PathBuf,
        /// The binary's build ID, if available.
        pub build_id: Option<BuildId>,
        /// The struct is non-exhaustive and open to extension.
        #[doc(hidden)]
        pub _non_exhaustive: (),
    }

    /// Meta information about an address that could not be determined to be
    /// belonging to a specific component.
    #[derive(Clone, Debug)]
    pub struct Unknown {
        /// The struct is non-exhaustive and open to extension.
        #[doc(hidden)]
        pub _non_exhaustive: (),
    }
}


/// Meta information for an address.
#[derive(Clone, Debug)]
#[non_exhaustive]
pub enum AddressMeta {
    Kernel(address_meta::Kernel),
    KernelModule(address_meta::KernelModule),
    Binary(address_meta::Binary),
    Unknown(address_meta::Unknown),
}


/// A type capturing normalized addresses along with captured meta data.
#[derive(Clone, Debug)]
pub struct NormalizedAddresses {
    /// Normalized addresses along with an index into `meta` for retrieval of
    /// the corresponding [`AddressMeta`] information.
    addresses: Vec<(Address, usize)>,
    /// Meta information about the normalized addresses.
    meta: Vec<AddressMeta>,
}


/// Normalize `addresses` belonging to either a process or the kernel.
///
/// If the provided addresses belong to a process, its PID should be provided in
/// `pid`. For kernel addresses, `pid` may be `None`.
///
/// Normalized addresses are reported in the exact same order in which the
/// non-normalized ones were provided.
pub fn normalize_addresses<A>(addresses: A, pid: Option<u32>) -> Result<NormalizedAddresses, Error>
where
    A: IntoIterator<Item = Address>,
{
    // ...
}

The resulting normalized addresses together with information about their owners have to be conveyed to the remote for the actual symbolization to happen. The transfer of this information is outside of blazesym‘s purview and a responsibility of the user. For Rust users, we will provide serde derives for convenient serialization & deserialization.

On the remote system, blazesym‘s existing BlazeSymbolizer can be used to perform the symbolization using the newly added symbolize_normalized method:

/// A trait for resolving meta information for an address to a [`SymResolver`] to
/// use for the actual symbolization.
pub trait AddressMetaResolver {
    /// The type of [symbol resolver](SymResolver) returned by the
    /// `resolve_address_meta` method.
    type Resolver: SymResolver;

    /// Resolve the provided [`AddressMeta`] to a [symbol resolver](SymResolver) to use.
    fn resolve_address_meta(&self, address_meta: &AddressMeta) -> Result<Self::Resolver, Error>;
}

/// BlazeSymbolizer provides an interface to symbolize addresses with
/// a list of symbol sources.
pub struct BlazeSymbolizer {
   // ...
}

impl BlazeSymbolizer {
    // ...

    /// Symbolize a list of normalized addresses with associated meta
    /// information.
    ///
    /// Please refer to [`normalize_addresses`] for information on how to
    /// normalize addresses.
    ///
    /// The function returns one `Vec<SymbolizedResult>` for each address passed
    /// in, in the order they were passed in. Multiple `SymbolizedResult`
    /// candidates may be present in case an address is ambiguous owing to
    /// compiler optimizations.
    pub fn symbolize_normalized<R>(
        &self,
        addresses: &NormalizedAddresses,
        address_meta_resolver: R,
    ) -> Result<Vec<Vec<SymbolizedResult>>, Error>
    where
        R: AddressMetaResolver,
    {
        // ...
    }
}

The API should also allow us to enable debuginfod support, by having an implementor of AddressMetaResolver that speaks the corresponding protocol and fetches debug information from a service using it.

First set of changes enabling address normalization: #114