/lean-map

Lean Hash Array Mapped Trie implementation in ClojureScript

Primary LanguageClojureEclipse Public License 1.0EPL-1.0

Lean Hash Array Mapped Trie (Lean Map)

A ClojureScript implementation of Lean Hash Array Mapped Tries. See reference.

Leiningen

[lean-map "0.4.0"]

Technical Terms

  • HAMT is Hash Array Mapped Trie

Primary Goals

  • See if Lean Hash Array Mapped Tries performs better then the current HAMT implementation in terms of code size, memory usage, or operation speed
  • Have a Property Test Suite that tests the current HAMT implementation and can be used to test the Lean HAMT
  • Have an benchmark suite testing maps of sizes (100, 10000, and 100000) for hash, assoc, dissoc, assoc!, and dissoc! operations
  • Be a nearly drop in replacement for the current HAMT implementation

Secondary Goals

  • Be a reference implementation of how to implement a Lean HAMT
  • Have detailed documentation about how the Lean HAMT differs from the current HAMT implementation

Current Status

All standard ClojureScript map functionality (assoc, dissoc, get, seq, reduce-kv, persistent!, and transient) have been implemented and passed Collection Check tests. The improved iteration has been implemented, the improved equality checking is the last feature that needs to be completed.

Performance

Here are the performance gains over the reference ClojureScript HAMT implementation

  • 2x for scanning over sequences (except in Firefox which has a 2x slowdown)
  • 2 - 4x for hashing maps
  • One order of magnitude for equality checking in the worst case (no structural sharing) and two orders of magnitude in the best case (structural sharing)

The other operations are comparable to ~25% slower then the reference implementation.

Use script/bench.sh the performance benchmarks JavaScript (creates benchmarks inresources/bench/app.js). The benchmarks are meant to run how ClojureScript runs tests

Main Ideas of the Paper

Consolidation of key values pairs and HAMT nodes

The primary idea behind the paper is changing the current HAMT node implementation from this

[key1, value1, null, <HAMT Node 1>, key2, value2, key3, value3, null, <HAMTNode 2>]

to this

[key1, value1, key2, value2, key3, value3, <HAMT Node 1>, <HAMT Node 2>]

consolidating the key value pairs and HAMT nodes instead of having them interspersed.

This leads to memory savings from not having to have a marker null in front of HAMT node.

Compaction on delete

The secondary idea is to have delete always return the most compact HAMT. For example deleting D from this HAMT

[A, <HAMT Node 1>] -> [<HAMT Node 2>, D] -> [B, C]

returns this HAMT

[A, <HAMT Node 1>] -> [<HAMT Node 2>] -> [B, C]

that has an unnecessary HAMT node (HAMT Node 2). Ideally the delete operation should return

[A, <HAMT Node 1>] -> [B, C]

removing the superfluous HAMT node

According to the paper this leads to an 80 to 100 percent speedup for iteration and equality checks and comparable to better performance on insertion, deletion, and lookup.

Usage

Lean HAMT's are implemented in the cljs.lean-map.core namespace. User functions live in the cljs.lean-map.util namespace. Here is documentation of their usage

  • set-maps-to-lean-map!

    Modifies the literal maps {} so that they become lean-maps instead of standard ClojureScript maps. Note this does not change the hash-map function to output lean-maps.

    Usage: (set-maps-to-lean-map!)

  • set-maps-to-cljs-map!

    Modifies the literal maps {} back to the default ClojureScript maps.

    Usage: (set-maps-to-cljs-map!)

  • using-lean-maps?

    Usage: (using-lean-maps?)

  • lean-map?

    Check if a map is a lean-map

    Usage: (lean-map? {0 0 1 1 2 2 3 3 4 4 5 5})

  • lean-map-seq?

    Check if a seq is a seq of a lean-map

    Usage: (lean-map? {0 0 1 1 2 2 3 3 4 4 5 5})

  • hash-map

    Copy of the ClojureScript hash-map function but returns a lean-map instead of a ClojureScript map.

    Usage: (hash-map 0 0 1 1 2 2 3 3 4 4 5 5)

  • empty

    An empty lean-map for use in place of {}

    Usage: (assoc empty :foo :bar)

Testing

To test on node use script/node-test.sh. For phantomjs use script/phantom-test. If you want to run tests with a custom engine use:

lein with-profile test doo <js environment> <name of test build e.g. "test">

For a custom test build add it to :cljsbuild in project.clj

Thanks

  • Bendyworks for letting me work on this

  • Michael J. Steindorfer and Jurgen J. Vinju for the Lean HAMT Paper

  • Use The Source for their reference implementation

  • Zach Tellman for writing Collection Check

  • Martin Klepsch for porting Collection Check to ClojureScript and Nicolás Berger for helping me get it setup

  • David Nolen for perf and profiling suggestions

License

Copyright © 2015 Peter Schuck

Distributed under the Eclipse Public License either version 1.0 or (at your option) any later version.