Author: Steven Mugisha Mizero, smugisha@uoguelph.ca
A said
(Self-Addressing Identifier) - a special type of content-addressable identifier based on encoded cryptographic digest that is self-referential.
- SAIDs facilitates immutably referenced data serialization for applications such as Verifiable Credentials or Ricardian Contracts.
saids
are encoded with CESR (Composable Event Stream Representation) CESR which includes cryptographic algorithms used to generate the digest.
- The CESR encoding used is a Blake3-256 (32 bytes) binary digest which has 44 Base-64 URL-safe characters - the first character is E representing Blake3-256.
Step 1: Fill the field to hold the said
with 44 of # characters length.
field0______field1______________________________________field2______
field0______############################################field2______
Step 2: Compute the Blake3-256 digest on the above serialized string and encoded in CESR format.
Results in the below said
:
E8wYuBjhslETYaLZcxMkWrhVbMcA8RS1pKYl7nJ77ntA
Step 3: Replace the dummy #
with the produced said
- the final serialization will include an embedded said
as follows:
field0______E8wYuBjhslETYaLZcxMkWrhVbMcA8RS1pKYl7nJ77ntA______
[!Warning] Order-Preserving Data Structures The crucial consideration in
said
generation is reproducibility.
said
generation to be reproducibly requires the ordering and sizing of fields to be fixed.- The essential feature needed for reproducible serialization of mappings (dictionaries or hash tables) is that mapping preserve the ordering of its fields on any round trip to/from a serialization.
- The natural canonical order of data structures like a hash table or dictionary is the insertion order of the fields allowing them to appear in a present order independent of the lexicographic ordering of the labels.
- A way to describe a predefined order preserving serialization is canonicalization or canonical ordering. This affects the functional perspective of ordering fields as they are sorted by labels prior to serialization.
- Hash tables such
defaultdict
inpython
, the newECMAScript
introducingReflect.ownPropertyKeys()
, and the version 1.9 ofRuby
withHash class
do preserve insertion order of the fields. Thus there is no need of any canonical serialization, one can rely on natural insertion order of the mappings to preserve the functional logic or can always opt for lexicographic ordering to create the insertion order.
- Make a copy of the embedded CESR encoded SAID string included in the serialization.
- Replace the SAID field value in the serialization with a dummy string of the same length. The dummy character is #, that is, ASCII 35 decimal (23 hex).
- Compute the digest of the serialization that includes the dummy value for the SAID field. Use the digest algorithm specified by the CESR derivation code of the copied SAID.
- Encode the computed digest with CESR to create the final derived and encoded SAID of the same total length as the dummy string and the copied embedded SAID.
- Compare the copied SAID with the recomputed SAID. If they are identical then the verification is successful; otherwise, unsuccessful.
- Initial
dict
before serialization.
{
"said": "",
"first": "Sue",
"last": "Smith",
"role": "Founder"
}
said
key will contain an encodedCESR Blake3-256
digest of 44 characters. Initialized with a total of 44 #.
{
"said": "############################################",
"first": "Sue",
"last": "Smith",
"role": "Founder"
}
-
The
dict
serialized intojson
with no extra whitespace: `{"said":"############################################","first":"Sue","last":"Smith","role":"Founder"}. -
The Blake3-256 digest is then computed on that serialized
json
:
said
: EnKa0ALimLL8eQdZGzglJG_SxvncxkmvwFDhIyLFchUk
- The final serialization with the
said
embedded as follows: `{"said":"EnKa0ALimLL8eQdZGzglJG_SxvncxkmvwFDhIyLFchUk","first":"Sue","last":"Smith","role":"Founder"}
[!Note] Preserving Order The generation steps may be reversed to verify the embedded
said
following the verification protocol defined above. However, to achieve consistencysaid
generation and verification protocol for mapping assumes that the fields in a mapping serialization such asjson
are ordered in a stable, and reproducible order, i.e., canonical.
The below section summarizes the work done by HCF to compute saids
. The emphasis is to understand the SAD
macro that contains the under hood functionalities allowing the manipulation of RUST AST of an object (i.e., a struct
) to place the computed digest inside the struct
it was calculated on. But again we want the clarity of what happens when you write an OCAFILE and get an OCA bundle that has a said
and each overlay containing individual said
.
click link to see the image. oca-image.png
In summary when a user writes an oca file
instructions such as add
, remove
or from
they dictate what the oca ast
represents. From the above picture only one attribute test
is added. oca-bundle
builds from the generated oca ast
putting the AST parts together to form a defined overlay is there is. In the example only the capture_base
is built. Each overlay same as the capture base are structs that implements SAD
macro that will be discussed in the next sections. After the digest is computed for an overlay the capture base
said
is added to it and the digest of the whole bundle is the computed on top and inserted.
The following highlights a high level description of what you achieve with crates developed b the HCF to calculate the digest.
The following example shows a digest calculated for the string literal hello there
.
let data = "hello there";
let code: HashFunction = HashFunctionCode::Blake3_256.into();
let sai = code.derive(data.as_bytes());
assert_eq!(format!("{}", sai), "ENmwqnqVxonf_bNZ0hMipOJJY25dxlC8eSY5BbyMCfLJ");
assert!(sai.verify_binding(data.as_bytes()));
assert!(!sai.verify_binding("wrong data".as_bytes()));
-
Module
sad
provides traitSAD
that has functions:compute_digest
: computes thesaid
of a data structure, places it in a chosen field.derivation_data
: returns data that are used forsaid
computation.
-
To use the
SAD
macros it has to be enabled, and the user chooses the filed to be replaced by the computed digest.SAD
only works for structures that implementSerialize
using#[derive(Serialize)]
attribute, rather than a custom implementation. -
Attributes that
SAD
macro uses:-
version
: adds the version field while computing derivation data. Version string contains compact representation of field map, serialization format, and size of a serialized message body. Attribute let user specify protocol code and its major and minor version. When attribute is used, the structure automatically implements theEncode
trait, which provides theencode
function for serializing the element according to the chosen serialization format. -
said
- this attribute allows users to choose the hash function for computing Self Addressing Identifier and serialization format. The hash function can be specified using the derivation code from the table above. The available serialization formats arejson
,CBOR
, andMGPK
. By default, JSON and Blake3-256 are used.
-
-
Fields attributes:
said
- marks field that should be replaced by computed digest duringcompute_digest
.
Example of a struct
that implements SAD
Marco:
#[derive(SAD, Serialize)]
#[version(protocol = "KERI", major = 1, minor = 0)]
struct Something {
pub text: String,
#[said]
pub d: Option<SelfAddressingIdentifier>,
}
- It is mandatory to annotate the field that will hold the computed digest with
#[said]
.
A test example demonstrating said
computation of the above struct
.
#[cfg(feature = "macros")]
mod tests {
use said::derivation::HashFunctionCode;
use said::sad::SAD;
use said::version::format::SerializationFormats;
use said::SelfAddressingIdentifier;
use serde::Serialize;
#[test]
pub fn basic_derive_test() {
#[derive(SAD, Serialize)]
struct Something {
pub text: String,
#[said]
pub d: Option<SelfAddressingIdentifier>,
}
let mut something = Something {
text: "Hello world".to_string(),
d: None,
};
let code = HashFunctionCode::Blake3_256;
let format = SerializationFormats::JSON;
something.compute_digest(&code, &format);
let computed_digest = something.d.as_ref();
let derivation_data = something.derivation_data(&code, &format);
assert_eq!(
format!(
r#"{{"text":"Hello world","d":"{}"}}"#,
"############################################"
),
String::from_utf8(derivation_data.clone()).unwrap()
);
assert_eq!(
computed_digest,
Some(
&"EF-7wdNGXqgO4aoVxRpdWELCx_MkMMjx7aKg9sqzjKwI"
.parse()
.unwrap()
)
);
assert!(something
.d
.as_ref()
.unwrap()
.verify_binding(&something.derivation_data(&code, &format)));
}
}```
Part I:
- [Derivation crate](https://github.com/THCLab/cesrox/tree/master/said/src/derivation) - `use said::derivation::HashFunctionCode`, wraps the possible hash functions supported by [CESROX](https://github.com/THCLab/cesrox/tree/master/cesr). i.e.; in the above code snippet `Blake3_256` is used.
- [Version crate](https://github.com/THCLab/cesrox/tree/master/said/src/version) `use said::version::format::SerializationFormats`, this crate allow a users to retrieve the serialization format. i.e.; in the above code snippet `JSON` is used.
- [SAD crate](https://github.com/THCLab/cesrox/tree/master/said/src/sad) `use said::sad::SAD` defines the `trait` with two functions `compute_digest` and `derivation_data` which are implemented using **Procedural Macros**.
- [SelfAddressingIdentifier](https://github.com/THCLab/cesrox/blob/master/said/src/lib.rs) `use said::SelfAddressingIdentifier` defines a struct with `derivation: HashFunction` and `digest: Vec<u8>` as the fields.
- `use serde::Serialize` a standard Serialize crate.
Part II:
```Rust
let mut something = Something {
text: "Hello world".to_string(),
d: None,
};
-
A struct
with only two fieldstext
(which can be anything, it can also be a struct) andd
(the computed said will be held here). -
The below five lines of code computes the digest and places it inside
something
struct where it is assigned to the fieldd
.
let code = HashFunctionCode::Blake3_256;
let format = SerializationFormats::JSON;
something.compute_digest(&code, &format);
let computed_digest = something.d.as_ref();
let derivation_data = something.derivation_data(&code, &format);
- Annotating
Something
struct with#[derive(SAD, Serialize)]
allows to use thecomputed_digest
function defined in theSAD
crate and implemented underSAD
Macros to be applied on it and compute the digest, and does all the background process of replacingNone
with the hash. let derivation_data = something.derivation_data(&code, &format)
this line serializes the struct and replaces the field to hold the hash with #.
Part III:
assert_eq!
and assert!
are used to the output of the functions.
This section goes one level deeper and explains how we achieve the placement of the computed digest inside the struct
it was initially computed on.
How does Rust allow manipulating a struct
:
- Rust provides Procedural macros feature that allows to manipulate the a struct's Rust AST leveraging the power of accepting some code as an input, operate on that code, and produce some code as an output.
- With this concept we can define a function that takes a struct's Rust AST and do some changes with it (replacing # with a computed digest) and then return an other struct's Rust AST with the
said
embedded.
For more in-depth about Procedural Macros take a look at a simple practical example here or refer to this documentation here for more beginner friendly introduction to Procedural Macros concepts.
This is how the HCF computes said
using Procedural Macros:
[!Note] why
SAD
macro? Starting with thesad
trait which implements two functions:compute_digest
andderivation_data
. ASAD
macro is created to avoid structs to individually implementing these functions but rather depend on theSAD
macro abstraction. For usability theSAD
macro the structs have to be annotated with#[derive](SAD)
, and implementSerialize
trait.
The below code snippet shows the SAD
trait which can also be found here:
// importing dependency crates
use crate::derivation::HashFunctionCode;
pub use crate::version::format::SerializationFormats;
pub use cesrox::derivation_code::DerivationCode;
#[cfg(feature = "macros")]
pub use sad_macros::SAD;
pub trait SAD {
fn compute_digest(
&mut self,
derivation: &HashFunctionCode,
format: &SerializationFormats
);
fn derivation_data(
&self,
derivation: &HashFunctionCode,
format: &SerializationFormats,
) -> Vec<u8>;
}
[!Warning] SAD Macro
Similar to other Procedural Macros its start by defining the function that takes in a TokenStream
and returns TokenStream
. The TokenStream
is generated by the Rust compiler and it is essentially a serialized version of the syntax tree of the annotated item, representing Rust code in a way that can be manipulated programmatically.
// importing dependecy crates
use field::TransField;
use proc_macro::TokenStream;
use quote::quote;
use syn::{self};
mod field;
mod version;
use version::parse_version_args;
#[proc_macro_derive(SAD, attributes(said, version))]
pub fn compute_digest_derive(input: TokenStream) -> TokenStream {
let ast = syn::parse(input).unwrap();
impl_compute_digest(&ast)
}
A simple example of using the SAD
macro:
// Notice that we have added Serialize attribute to be able to use the macro
#[derive(SAD, Serialize)]
struct FirstStruct {
#[said]
d: None
}
- Here Rust compiler will call the Procedural Macro function
compute_digest_derive
passing theTokenStream
representingstruct FirstStruct
as the argument. - Inside
compute_digest_derive
function, theTokenStream
is parsed into a more usable AST using thesyn
crate allowing the inspection and manipulation. - The
impl_computed_digest
function takes in a reference to Rust AST and returns Rust code as aTokenStream
representing theFirstStruct
with the computeddigest
placed in (i.e., assigned to ). - Finally the compiler takes the returned
TokenStream
and integrates it back into the compilation process, replacing the annotatedFirstStruct
with theFirstStruct
generated by theSAD
macro (i.e., the variabled
holding the digest).
As an example: the digest could be something like E8wYuBjhslETYaLZcxMkWrhVbMcA8RS1pKYl7nJ77ntA
. Thus, the final struct might look like the below:
struct FirstStruct {
d: E8wYuBjhslETYaLZcxMkWrhVbMcA8RS1pKYl7nJ77ntA
}
Implementation of impl_computed_digest
:
fn impl_compute_digest(ast: &syn::DeriveInput) -> TokenStream { }
This function hides all the manipulations of placing the computed digest inside a struct when annotated with #[derive(SAD)]
, thus the name said
.
syn
is an important library that parses a stream of Rust tokens into a syntax tree (AST) of Rust source code, it contains a few APIs that are generally useful, andDerive
is the API particularly useful to derive Procedural Macros thus theimpl_compute_digest
function takes in a reference of&syn::DeriveInput
(look at last page of this document to see how aDeriveInput
AST look like). To explore more about Rust AST use AST Explorer and select Rust language.
The full code of the imp_compute_digest
can be found here. The below is the dissection of the code to help understand how each part play a role in manipulating the structs annotated with #[derive(SAD)]
.
N.B: There is less wording as the code are self explanatory the comments that were provided in the code provide the general idea of what the below chunk of codes does.
Part I:
let name = &ast.ident;
let fname = format!("{}TMP", name);
let varname = syn::Ident::new(&fname, name.span());
let generics = &ast.generics;
let (impl_generics, ty_generics, where_clause) = generics.split_for_impl();
// Check if versioned attribute is added.
let version = ast
.attrs
.iter()
.find(|attr| attr.path().is_ident("version"))
.map(parse_version_args);
let fields = match &ast.data {
syn::Data::Struct(s) => s.fields.clone(),
_ => panic!("Not a struct"),
}
.into_iter()
.map(TransField::from_ast);
The first part of the function includes the following:
- Grabs the name field the
struct
e.gFirstStruct
and formats it and stores it undervarname
. let (impl_generics, ty_generics, where_clause) = generics.split_for_impl()
this lines gets the fields to manipulate in case thestruct
incudes generic types. To read more about generics in Rust click here- It is important to note that the
attrs
field under theDeriveInput
AST includes other annotations if any provided by thestruct
(i.e., other traits that thestruct
implements). Thus if a struct is annotated with#[version(protocol = "KERI", major = 1, minor = 0)]
,parse_version_args
function is used to return a well formatted version of the fields. - The
data
field inDeriveInput
AST contains all the fields defined in the a crate. Fields here are referred to asstructs
but they could also beenums
that is why the code panic if no struct if found in the whole crate.- The code snippet above uses a match statement to find all the structs and map the fields into
TransField
defined here . In short the functionfrom_ast
ofTransField
struct inputs a type ofsyn::Field
(which is part of theDeriveInput
) as input and returns a transformedTransField
struct. In other words, each struct is transformed to match the logic defined in theTransField
implementation.
- The code snippet above uses a match statement to find all the structs and map the fields into
Part II:
// Generate body of newly created struct fields.
// Replace field type with String if it is tagged as said.
let body = fields.clone().map(|field| {
if !field.said {
let original = field.original;
quote! {#original}
} else {
let name = &field.name;
let attrs = field.attributes;
quote! {
#(#attrs)*
#name: String
}
}
});
// Set fields tagged as said to computed digest string, depending on
// digest set in `dig_length` variable. Needed for generation of From
// implementation.
let concrete = fields.clone().map(|field| {
let name = &field.name;
if field.said {
quote! {#name: "#".repeat(dig_length).to_string()}
} else {
quote! {#name: value.#name.clone()}
}
});
// Set fields tagged as said to computed SAID set in `digest` variable.
let out = fields.map(|field| {
let name = &field.name;
if field.said {
quote! {self.#name = digest.clone();}
} else {
quote! {}
}
});
// Adding version field logic.
let version_field = if version.is_some() {
quote! {
#[serde(rename = "v")]
version: SerializationInfo,
}
} else {
quote! {}
};
// If version was set, implement Encode trait
let encode = if let Some((prot, major, minor)) = version.as_ref() {
quote! {
#[derive(Serialize)]
struct Version<D> {
v: SerializationInfo,
#[serde(flatten)]
d: D
}
use said::version::Encode;
impl #impl_generics Encode for #name #ty_generics #where_clause {
fn encode(&self, code: &HashFunctionCode,
format: &SerializationFormats) -> Result<Vec<u8>,
said::version::error::Error> {
let size = self.derivation_data(code, format).len();
let v = SerializationInfo::new(#prot.to_string(),
#major, #minor, format.clone(), size);
let versioned = Version {v, d: self.clone()};
Ok(format.encode(&versioned).unwrap())
}
}
}
} else {
quote!()
};
-
N.B: the macro
quote!
is used to turning Rust AST data structures into tokens of source code. -
The Encode trait is found here, also its implementation is performed under
SAD
macro when the version is provided.
pub trait Encode {
fn encode(
&self,
code: &HashFunctionCode,
serialization_format: &SerializationFormats,
) -> Result<Vec<u8>, Error>;
}
Part III:
// Creates a temporarily struct that has a well formatted version variable and copying the concrete part of the struct defined in part II.
let tmp_struct = if let Some((prot, major, minor)) = version {
quote! {
let mut tmp_self = Self {
version: SerializationInfo::new_empty(#prot.to_string(),
#major, #minor, SerializationFormats::JSON),
#(#concrete,)*
};
let enc = tmp_self.version.serialize(&tmp_self).unwrap();
tmp_self.version.size = enc.len();
tmp_self
}
} else {
quote! {Self {
#(#concrete,)*
}}
};
let gen = quote! {
// Create temporary, serializable struct and implements the compute_digest and derivation_data on the newly generate struct.
#[derive(Serialize)]
struct #varname #ty_generics #where_clause {
#version_field
#(#body,)*
}
#encode
impl #impl_generics From<(&#name #ty_generics, usize)> for #varname #ty_generics #where_clause {
fn from(value: (&#name #ty_generics, usize)) -> Self {
let dig_length = value.1;
let value = value.0;
#tmp_struct
}
}
impl #impl_generics SAD for #name #ty_generics #where_clause {
fn compute_digest(&mut self, code: &HashFunctionCode,
format: &SerializationFormats ) {
use said::derivation::{HashFunctionCode, HashFunction};
let serialized = self.derivation_data(code, format);
let digest = Some(HashFunction::from(code.clone()).derive(&serialized));
#(#out;)*
}
fn derivation_data(&self, code: &HashFunctionCode,
serialization_format: &SerializationFormats) -> Vec<u8> {
use said::derivation::HashFunctionCode;
use said::sad::DerivationCode;
let tmp: #varname #ty_generics = (self, code.full_size()).into();
serialization_format.encode(&tmp).unwrap()
}
};
};
gen.into()
- https://www.ietf.org/archive/id/draft-ssmith-said-03.html#name-appendix-embedding-saids-in
- https://github.com/THCLab/cesrox/tree/master/said
- https://snyk.io/advisor/npm-package/self-addressing-identifier
This is the DeriveInput
AST for the below struct:
#[derive(SAD, Serialize)]
struct Something {
pub text: String,
#[said]
pub d: Option<SelfAddressingIdentifier>,
}
DeriveInput {
attrs: [],
vis: Visibility::Inherited,
ident: Ident {
ident: "Something",
span: #0 bytes(321..330),
},
generics: Generics {
lt_token: None,
params: [],
gt_token: None,
where_clause: None,
},
data: Data::Struct {
struct_token: Struct,
fields: Fields::Named {
brace_token: Brace,
named: [
Field {
attrs: [],
vis: Visibility::Public(
Pub,
),
mutability: FieldMutability::None,
ident: Some(
Ident {
ident: "text",
span: #0 bytes(349..353),
},
),
colon_token: Some(
Colon,
),
ty: Type::Path {
qself: None,
path: Path {
leading_colon: None,
segments: [
PathSegment {
ident: Ident {
ident: "String",
span: #0 bytes(355..361),
},
arguments: PathArguments::None,
},
],
},
},
},
Comma,
Field {
attrs: [
Attribute {
pound_token: Pound,
style: AttrStyle::Outer,
bracket_token: Bracket,
meta: Meta::Path {
leading_colon: None,
segments: [
PathSegment {
ident: Ident {
ident: "said",
span: #0 bytes(377..381),
},
arguments: PathArguments::None,
},
],
},
},
],
vis: Visibility::Public(
Pub,
),
mutability: FieldMutability::None,
ident: Some(
Ident {
ident: "d",
span: #0 bytes(399..400),
},
),
colon_token: Some(
Colon,
),
ty: Type::Path {
qself: None,
path: Path {
leading_colon: None,
segments: [
PathSegment {
ident: Ident {
ident: "Option",
span: #0 bytes(402..408),
},
arguments: PathArguments::AngleBracketed {
colon2_token: None,
lt_token: Lt,
args: [
GenericArgument::Type(
Type::Path {
qself: None,
path: Path {
leading_colon: None,
segments: [
PathSegment {
ident: Ident {
ident:
"SelfAddressingIdentifier",
span: #0
bytes(409..433),
},
arguments:
PathArguments::None,
},
],
},
},
),
],
gt_token: Gt,
},
},
],
},
},
},
Comma,
],
},
semi_token: None,
},
}