MatrixChainMultiply

The cost of multiplying a chain of matrices can vary significantly (e.g., by 1-2 orders of magnitude in real problems) depending on the order in which the multiplication steps are applied. The Matrix chain multiplication algorithm applied here (described in Introduction to Algorithms, 3rd Edition by Cormen et al.) finds the optimal multiplication sequence.

With Julia, it is easy to allow for specialization on the matrix types. For example, the types given in the ArrayFire.jl package or matrices of matrices (block matrices) should work, too. To be clear, the *(x, y) that applies to a specific types of x and y is automatically used with Julia's multiple-dispatch system after the operation order is optimized.

Installing this Package

Since this package is not registered, you must install it by cloning. To add this package, use:

Pkg.clone("https://github.com/AustinPrivett/MatrixChainMultiply.jl")

Using this Package

The only function most users should need is matrixchainmultiply, added to the scope by writing:

using MatrixChainMultiply
using BenchmarkTools

a = rand(1000,1000)
b = rand(1000,100)
c = rand(100, 500)
d = rand(500)

@benchmark result1 = matrixchainmultiply(a,b,c,d)
@benchmark result2 = *(a,b,c,d)

which gives

julia> @benchmark result1 = matrixchainmultiply(a,b,c,d)
BenchmarkTools.Trial:
  memory estimate:  18.45 KiB
  allocs estimate:  42
  --------------
  minimum time:     265.715 μs (0.00% GC)
  median time:      299.071 μs (0.00% GC)
  mean time:        329.042 μs (0.16% GC)
  maximum time:     4.162 ms (74.04% GC)
  --------------
  samples:          10000
  evals/sample:     1

julia> @benchmark result2 = *(a,b,c,d)
BenchmarkTools.Trial:
  memory estimate:  4.59 MiB
  allocs estimate:  5
  --------------
  minimum time:     4.178 ms (0.00% GC)
  median time:      4.765 ms (0.00% GC)
  mean time:        5.281 ms (8.05% GC)
  maximum time:     11.310 ms (18.85% GC)
  --------------
  samples:          944
  evals/sample:     1

You can see the optimal ordering and the cost of the operation

matrixchainmultiply(a',b,c,d; printout=true)

Note that the optional keyword argument printout=true prints out the optimal multiplication order and the number of FLOPS in case you want to investigate. It is false by default.

When to use this

It is certainly possible to see speedups of one or two orders of magnitude through the use of the optimal operation order generated here.

If all matrices are the same size, the overhead of the optimization is not worth the (non-existing) benefit.

Future improvements

Recognize situations that don't need the analysis to eliminate the overhead (allowing for a more ubiquitous use of matrixchainmultiply(...) instead of *(...)).
For repeated uses, given some matrix size sequence (so at run time), produce the function a single time at run time so that the analysis isn't done unnecessarily.
Faster or more general algorithms (i.e., those that apply to the general tensor problem) are certainly desirable if you would like to add them.

Similar Package

It has come to my attention that another package implementing this is DynMultiply.jl.