This is experimental and tries to understand how to hash tree structures like JSON or YAML
Basics:
We will define a function render
, that takes a JSON and generates a String from it, that we will use for hashing.
Lets have the function hashString
, which
- takes a string
- converts that string into byte representation, using utf8 encoding
- hashes these bytes with the sha256 algorithm and
- returns the result as a base64 encoded string
This function satisfies the following:
hashString("") = tbd. hashString("example") = tbd.
With these two functions can we define hash
as their combination: hash(json) = hashString(render(json))
hashes of base types
Define the render
function for base types as follows:
render( null ) = "null" render( true ) = "true" render( false ) = "false"
Hashing number
Hashing of numbers is just defined for integrals. Exponents are removed and the result is written as an integer of arbitrary size. So it should satisfy the following examples:
render( 0 ) = "0" render( 1234 ) = "1234" render( -1234 ) = "-1234" render( 1234e3 ) = "1234000" render( 1234E3 ) = "1234000"
Hashing string
Strings are rendered by just wrapping them in quotes
render( "" ) = "\"\"" render( "abc def" ) = "\"abc def\""
hashes of recursive types
Recursive types are solved by recursion.
Hashing array
Arrays are hashed, by hashing each element, and then composing them
render( [] ) = "[]" render( "[123,"456"]" ) = "[" + hash(123) + "," + hash("456") +"]"
Hashing object
An object consists of members. Let us assume, that keys are unique, otherwise the hash is undefined.
Hashing object members
A member consists of a key
(string) and a value
. The rendered string is the key
seperated from hash( value )
by a :
:
render( member ) = member.key + ":" + hash (member.value)
Hashing a whole object
first hash every member, sort the hashes alphabetically, and composing them similarly to arrays, but wrapped with {..}
render( {} ) = "{}" # let $object be an object containing of the three members $member1, $member2, $member3 $memberHashes = [ hash( $member1 ), hash( $member2 ), hash( $member3 ) ] $sortedHashes = sort( memberHashes ) render( $object ) = "{" + $sortedHashes[0] + "," + $sortedHashes[1] + "," + $sortedHashes[2] + "}"
There is no overlap:
Rendered results are of the following forms:
null true false $INTEGER "$STRING" [] [$HASH1,$HASH2,..] {} {$HASH1,$HASH2,..}
It is obvious, that there is no overlap bewteen renderings (up to hash collitions).