/recursiveHashing

some prototyping and thinking

Primary LanguageHaskell

This is experimental and tries to understand how to hash tree structures like JSON or YAML

Basics:

We will define a function render, that takes a JSON and generates a String from it, that we will use for hashing.

Lets have the function hashString, which

  • takes a string
  • converts that string into byte representation, using utf8 encoding
  • hashes these bytes with the sha256 algorithm and
  • returns the result as a base64 encoded string

This function satisfies the following:

hashString("") = tbd.
hashString("example") = tbd.

With these two functions can we define hash as their combination: hash(json) = hashString(render(json))

hashes of base types

Define the render function for base types as follows:

render( null ) = "null"
render( true ) = "true"
render( false ) = "false"

Hashing number

Hashing of numbers is just defined for integrals. Exponents are removed and the result is written as an integer of arbitrary size. So it should satisfy the following examples:

render( 0 ) = "0"
render( 1234 ) = "1234"
render( -1234 ) = "-1234"
render( 1234e3 ) = "1234000"
render( 1234E3 ) = "1234000"

Hashing string

Strings are rendered by just wrapping them in quotes

render( "" ) = "\"\""
render( "abc def" ) = "\"abc def\""

hashes of recursive types

Recursive types are solved by recursion.

Hashing array

Arrays are hashed, by hashing each element, and then composing them

render( [] ) = "[]"
render( "[123,"456"]" ) = "[" + hash(123) + "," + hash("456") +"]"

Hashing object

An object consists of members. Let us assume, that keys are unique, otherwise the hash is undefined.

Hashing object members

A member consists of a key (string) and a value. The rendered string is the key seperated from hash( value ) by a ::

render( member ) = member.key + ":" + hash (member.value)

Hashing a whole object

first hash every member, sort the hashes alphabetically, and composing them similarly to arrays, but wrapped with {..}

render( {} ) = "{}"

# let $object be an object containing of the three members $member1, $member2, $member3
$memberHashes = [ hash( $member1 ), hash( $member2 ), hash( $member3 ) ]
$sortedHashes = sort( memberHashes )
render( $object ) = "{" + $sortedHashes[0] + "," + $sortedHashes[1] + "," + $sortedHashes[2] + "}"

There is no overlap:

Rendered results are of the following forms:

null
true
false
$INTEGER
"$STRING"
[]
[$HASH1,$HASH2,..]
{}
{$HASH1,$HASH2,..}

It is obvious, that there is no overlap bewteen renderings (up to hash collitions).

TODOs:

why recursive hashing instead of recursive rendering and hashing at the end?