Built on top of Google Guava and Apache Commons, this includes core infrastructure classes used in the Rowboat Advisors Portfolio Optimizer and Tax-Aware Backtester.
However, most code is not domain-specific, such as our immutable collections.
Built since March 2016, As of April 2024, there are 115,000+ lines of source code; 65,000+ of them are tests. So this is heavily tested.
- Immutable maps and sets (compile-time, not just at runtime)
- Performance-optimized collections for classes exposing numeric long keys (utilizes GNU Trove)
- Contiguous
RangeMap
; e.g. [2, 3) maps to "a", [3, 8) maps to "b", etc. - Scala-like
Either
,EitherOrBoth
,EitherOrNeither
- Indexable immutable arrays: compact array storage but map-like key-based lookup
- Java-generic partitions with weights summing to 100% + support to splice in & remove items
- Fluent extensions of Google GSON
- JSON converters for all data classes above - clearer than GSON default conversions.
- Support for arithmetic & geometric progressions and numeric sequences.
- Fluent extensions of matrices (CERN's Colt library)
- Instrument master
- BigDecimal-based typesafe wrappers (e.g.
SignedMoney
,BuyQuantity
) - Daily time series for contiguous market-open days
- Quick-and-dirty IO
- Extensions on Guava preconditions
- Unit-testing functionality
- Null avoidance via helper methods on Optional.
Follow the detailed Javadoc documentation (linked above) for more.
However, we will call out some highlights below, in decreasing order of how useful we think a general developer (i.e. without our specific business focus) would find these.
In a previous life, we have seen catastrophic bugs due to mutable classes. We went to great lengths to avoid that here.
Java is not great when it comes to immutable classes. final
is not as expressive as C++,
so it only prevents you from changing the reference (pointer) to an object, but it will still let
you modify its contents if methods. Java 14 (and later) Records address this somewhat, but not perfectly.
Let's take the widely used Map
interface. Google's Guava library has ImmutableMap
, which
throws a runtime exception if you try to change its contents via its put()
method.
However, an even safer approch is not to have a put()
method at all. It gives you compile-time safety.
Another issue with Map
is that its semantics rely on null, which is frowned upon by modern software
development practices (for good reason!). For example, get()
returns null if an item is not present,
which means the caller has to check explicitly.
Our solution is to implement two classes, MutableRBMap
and RBMap
.
MutableRBMap
is clearly mutable; it's in the name. In fact, our data classes are immutable across the board
consistently, so our naming pattern has to explicitly call out cases of immutability. Moreover, unlike RBMap
,
it has several variants of the Map#put()
method with cleaner semantics:
putAssumingAbsent
throws an exception if you are trying to overwrite a key, which is rarely the desired behavior.putAssumingAbsentAllowingNullValue
allows nulls, which is uncommon.putAssumingNoChange
takes in a predicate that compares the old and new value being added under an existing key; this is useful for epsilon comparisons.putAssumingPresent
throws an exception if the key does not already have a value.putIfAbsent
only adds a key/value pair, but does nothing if the key already maps to a value.putOrModifyExisting
avoids the 'if empty then modify object, else add object' type of logic, and lets you pass in a lambda.
You could say 'if this is so good, why doesn't Java already do it?'. Well, our codebase optimizes for safety, even at a slight cost to performance. That's not practical to do with language-standard map classes, because some users want the extra performance.
This repo has several general collection classes that follow this MutableXYZ / XYZ pattern. We almost never pass a MutableXYZ as an argument, except possibly inside a private method - and even that is rare. Instead, there is often a single method that instantiates a MutableXYZ, fills in its contents, and returns ar XYZ. From that point on, there's no way to change XYZ's contents.
The main collection classes that implement this are RBMap
/ RBSet
and IidMap
/ IidSet
(see below).
For performance, certain data are effectively a Java long. We have created wrappers around GNU Trove, a set of collections (maps and sets, mostly) optimized for memory and performance for the cases where the key is a primitive.
A 2-d array is indexable by two indices that start with 0. It is sometimes convenient to index by some other unique keys. For example, you may want to store stock returns by unique numeric instrument ID (roughly equivalent to a stock ticker) and date. You could use a map of maps, but that is not guaranteed to be 'rectangular' like a 2d array, and it would also incur the overhead of having multiple maps, one per row (or column). Because our data is almost always immutable, this can be modeled by a 2d array (to store the actual data in a compacted form) and two maps of (date) -> (row index) and (instrument id) -> (column index). This results in space savings. Plus, the semantics are clearer, because the data type tells you that you are dealing with rectangular 2d data.
The type for the map of key -> (array index) is ArrayIndexMapping
. An additional advantage is that it
can be shared across different maps, not just across all rows of the same map.
A common scenario is having one item per consecutive day when the market is open. This concept of 'market' is general,
and can include all weekdays, all calendar days, etc. It uses a shared ArrayIndexMapping
so that
all different daily time series objects are effectively arrays (sometimes even arrays of primitives),
which results in big space savings.
Our collections have many 'eager' transformation methods (in the software sense, i.e. as opposed to 'lazy'). These are similar to e.g. Guava Maps.transform, except:
- eagerness eliminates an entire class of bugs where the underlying object is changed after the lazy transformation has been created. This is very unlikely in our code in the first place, because we use immutable classes throughout, but it's a belt-and-suspenders extra safe approach.
- they apply to our own data structures - e.g.
RBMap
.
Again, the focus is on clarity and reducing bugs over performance.
Almost all our classes are in one of two categories:
- Plain-old Java objects (POJOs) that store data. We also call these 'noun classes' and 'data classes'.
- 'Verb classes' which have code that operates on the data. Verb classes are injected into other verb classes, using Google Guice dependency injection. This greatly aids testing.
A common need when unit testing is 'instantiate a dummy object for class XYZ'. Almost all of our unit test for
data classes extend RBTestMatcher
. It forces you to implement 3 instantiation methods: a trivial object
(e.g. for a collection class, that would be an empty collection), a non-trivial one, and another non-trivial one that is
epsilon-different (when applicable) than the non-trivial one. You also have to create a TypeSafeMatcher
.
This is a more general notion than hashCode / equals; for example, it allows for comparisons that take
an epsilon as an argument.
Then, RBTestMatcher
gives you some unit tests 'for free': it tests that your TypeSafeMatcher
works
correctly:
- each one of the (trivial, non-trivial, matching non-trivial) objects must match itself
- the trivial one must not match the non-trivial ones
- the non-trivial and matching non-trivial must match each other.
By forcing you to implement TypeSafeMatcher
s throughout, it is possible to nest matchers in a very legible way.
Example:
public static TypeSafeMatcher<Orders> ordersMatcher(Orders expected, MatcherEpsilons e) {
return makeMatcher(expected,
match(v -> v.getBuyOrders(), f -> buyOrdersMatcher(f, e)),
match(v -> v.getSellOrders(), f -> sellOrdersMatcher(f, e)));
}
This utilizes some additional syntactic sugar we have created (makeMatcher
, match
) to create a succinct
way of saying: "for two Orders objects to match subject to epsilons, both the buy orders and sell orders must match
using those same epsilons".
That, in turn, greatly facilitates adding single-statement tests. Here is example, although not from the codebase. It says 'regardless of starting portfolio, if trading is disabled, do not generate any orders':
assertThat(
makeTestObject() // this constructs a verb class that makes the trading decision
.generateOrders(
new PortfolioTest().makeDummyObject(), // a hint to the reader that the value doesn't matter here
TradingStatus.DISABLED),
ordersMatcher(
emptyOrders());
Another example. Imagine an integration test that calculates orders based on specific inputs, and you expect it to buy 1.1 shares of AAPL and 2.2 shares of GOOG:
assertThat(
generateActualOrders(...), // some code that is expected to generate orders for an initial investment
ordersMatcher(
orders(
testBuyOrders(
buyOrder(I_AAPL, buyQuantity(1.1), price(11.1), LocalDateTime.of(1974, 4, 4, 9, 30, 0))
buyOrder(I_GOOG, buyQuantity(2.2), price(22.2), LocalDateTime.of(1974, 4, 4, 9, 30, 0))),
emptySellOrders())));
This is better than assertEquals()
, because it allows for epsilon-based 'almost-equality'.
This test is more succinct than comparing results one item at a time (e.g. first compare buy orders, then sell orders).
It's also more robust: if more data get added to the Orders
class later, you just update
OrdersTest#ordersMatcher
, and then all the tests that use ordersMatcher
will automatically perform the
new, more expanded checks.
There is a lot more documentation in the Javadoc.