Nixtla/hierarchicalforecast

[Methods] Undesired conversion to a dense matrix

Opened this issue · 3 comments

What happened + What you expected to happen

Current implementation of BottomUpSparse reconciliation method causes an undesired conversion to a dense matrix, potentially leading to an OOM on large datasets. The lack of speedup was already observed before as noted in makoren's comment in the class definition.

The culprit is the idx_bottom indexed assignment:

P[idx_bottom] = S[idx_bottom]

Locally I managed to fix it by replacing this line with the following

for idx in idx_bottom:
    P[idx] = S.getrow(idx)

However, the performance is quite suboptimal, but it's sufficient for my use case, hence I would rather raise this issue than create a PR :)

Versions / Dependencies

hierarchicalforecast==0.4.1

Reproduction script

using an S matrix with n>200k leads to OOM with BottomUpSparse reconciliation

Issue Severity

Medium: It is a significant difficulty but I can work around it.

Eventually I managed to achieve a respectable performance through

    def _get_PW_matrices(self, S, idx_bottom):
        n_hiers, n_bottom = S.shape
        P = sparse.lil_matrix(S.shape, dtype=np.float32)
        rows = np.array(idx_bottom)
        cols = np.arange(n_bottom)
        data = np.ones_like(rows, dtype=np.float32)
        P = sparse.csr_matrix((data, (rows, cols)), shape=(n_hiers, n_bottom)).T
        W = sparse.identity(n_hiers, dtype=np.float32)
        return P, W

However I noticed a note about possibly deprecating the idx_bottom argument in the future, should I open a PR?

@nulam Thanks for using this! Why don't you open a PR, I haven't worked on this recently but I'll have a look (I now got my hands on 200k dataset as well), and then hopefully someone from nixtla finds the time to check if they have any problem with idx_bottom or anything else.

hey @nulam and @mcsqr! currently, the efforts to remove the idx_bottom are paused. would you like to open a PR with your improvements, @nulam?❤️