NatLibFi/Annif

Support batch suggest in STWFSA backend

osma opened this issue · 2 comments

osma commented

PR #663 is going to bring support for batch suggest operations.

The STWFSA backend could benefit from implementing _suggest_batch instead of _suggest. It could process a batch of texts with parallel and/or vector operations.

osma commented

I tried this, the code is on the branch issue666-suggest-batch-stwfsa.

Unfortunately, the results were not very encouraging. Batched suggest done this way seems to be slower than the original. Maybe switching to a new representation for suggestion results (see #678) could help.

I also tried using the predict_proba method of stwfsapy, which returns the results as a sparse matrix. But here the problem is that stwfsapy internally uses different numeric IDs for concepts than Annif, so there would have to be an ID mapping mechanism to convert the results into something that Annif can use.

osma commented

I'm too lazy to make a table, but here are the main test results. I'm evaluating a YSO STWFSA English model on jyu-theses/eng-test on my 4 core laptop.

Before (master)

1 job

User time (seconds): 201.96
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:23.56

4 jobs

User time (seconds): 288.02
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:19.72

After (issue666-suggest-batch-stwfsa branch)

1 job

User time (seconds): 181.12
Elapsed (wall clock) time (h:mm:ss or m:ss): 3:02.69

4 jobs

User time (seconds): 322.29
Elapsed (wall clock) time (h:mm:ss or m:ss): 2:27.98

Summary

Evaluation was faster when using just 1 job, but slower with 4 jobs.
I didn't include memory usage but it was basically unchanged.