Odd "ValueError: buffer source array is read-only" error with dtaidistance inside ipyparallel
Closed this issue · 3 comments
I'm trying to compute the DTW distances to various queries against many other time series. For each query I run subsequence_search
with an array of time series (i.e. an array of arrays):
def getQuerySeriesResults(q, series, additions, minMatchCount, maxMatchCount):
sa = subsequence_search(q, series, dists_options={'use_c': True})
best = sa.kbest_matches_fast(k = maxMatchCount)
return getSearchParameters(best, sa.distances, additions, minMatchCount)
Since I have many queries to compare against time series, this warrants a use case for parallelization, and so I setup a cluster with ipyparallel:
import ipyparallel as ipp
clusterProcessesCount = 4
cluster = ipp.Cluster(n = clusterProcessesCount)
cluster.start_cluster_sync()
rc = cluster.connect_client_sync()
rc.wait_for_engines(clusterProcessesCount)
lview = rc.load_balanced_view()
dview = rc[:]
It works fine when running arbitrary python code:
tests = np.array([1, 2, 3])
def f(test):
return test
for result in lview.imap(f, tests, ordered = False, max_outstanding = 'auto'):
print(result)
However, as soon as I invoke sa.kbest_matches_fast
inside one of the child processes, I get this exception:
exception calling callback for <AsyncResult(<ipyparallel.serialize.serialize.PrePickled object at 0x7f7e2a170f70>): failed>
Traceback (most recent call last):
File "/Users/tommedema/opt/anaconda3/lib/python3.9/site-packages/ipyparallel/client/asyncresult.py", line 528, in _resolve_result
raise r
ipyparallel.error.RemoteError: [0:apply] ValueError: buffer source array is read-only
The entire log can be found here
I could ask in the ipyparallel repo too though the issue seems to only occur with dtaidistance when setting use_c
to True and so I am wondering if you might have an idea what is going on here?
I just found that when I disable C ("use_c": False
) this error does not occur. However, that defeats the purpose of my performance optimization with parallelization.
If it helps, I can create a minimal reproducible example using Google Collab (though I'm not sure if I can run child processes there). Please let me know and I really appreciate any help on this.
A minimal reproducible example would be helpful, yes. I do not seem to be able to repeat the error when using ipyparallel. And I can not immediately see where a non-writable view would be created.
You could also test the inputs (query and series) to the subsequence_alignment function with the following test to check if this is triggered by an operation before calling dtaidistance:
if query.flags.writeable is False:
assert('query is not writeable')
if series.flags.writeable is False:
assert('series is not writeable')