twmacro/pyyeti

Speed up reading op4

Closed this issue · 9 comments

First of all, Pyyeti is very useful.
But I'd like to speed up when reading op4 files.

I have 250 mb size op4, which takes about 20 seconds for reading by pyyeti.
for 600 mb, 50 seconds.

@xper0418, that is something I'd like to see as well! And it's something I think about every so often. I'm just not sure how to tackle this problem. One solution would be to rewrite these routines in C, but that doesn't sound fun for a number of reasons. Another idea I've had is to parallelize the reading. That sounds more fun, but I'm uncertain how practical/general that solution would be. My solution so far for big op4 files is to read once and then use flammkuchen (https://pypi.org/project/flammkuchen/) to save/load after that ... not elegant, but it works.

Thanks for your input! It gives me the impetus to look into this again, with more urgency. Before doing anything major, I'll run a profiler to see where the biggest bottlenecks are.

Do you have any other ideas on how to speed it up?

Thank you for your response.
I've seen an article that when 'pyNastran' reads op2, the speed comes out up to 500 mb/s(when use ssd). I think it is worth to look into it.
And I found numpy is much more faster than unpack. Refer to the link below.
https://stackoverflow.com/questions/54679949/unpacking-binary-file-using-struct-unpack-vs-np-frombuffer-vs-np-ndarray-vs-np-f

@twmacro Hi. Any updates on this?

Hi @xper0418, sorry for the delay in responding. I have spent some time on this, mainly profiling to see the bottlenecks, but I also experimented some with different ideas. I tried numba, numexpr, threading, using different buffer sizes and other stuff I'm sure. Unfortunately, I don't have any quick fixes for this. Output4 files are not the most efficient format to read/write. My conclusion so far is that these routines might have to be written in C to get significantly better performance (which I can't see myself doing anytime soon). I still have a couple experiments I plan to try, but I don't have high hopes.

From the profiling, I concluded that these two lines inside the loop take the lion's share of the time:

Y = np.fromfile(fp, numform2, nwords)
X[r : r + len(Y), c] = Y

Is there a simple speed up for that code I wonder?? I think it would speed things up a bit if np.fromfile could store the values directly to X, but I don't know if that's easily doable.

Hey there @twmacro, I read the above and I'm wondering - I don't think the np.fromfile() call can be sped up without a drastic re-write, but regarding the setting of data in X with Y values... Is that taking a significant portion of the total run time? What order is X, "C" or "F"? I wonder if the indexing itself taking a while could be improved any by changing the order of X. Worth a try?

Or, I wonder if storing a big list of 1D vectors then stacking at the very end could possibly be faster than repeatedly indexing the 2D array. Unlikely, but just a thought.

Hello @jeremypriest! Excellent thoughts! It's been quite a long time, but I recall switching from "C" to "F" order on these matrices to enhance speed. It makes sense to me that "F" would be faster (since that matches the order in the .op4 file), but I haven't experimented recently with this. I also like your other idea of using 1D vectors. I'll add these ideas to the "to-do" list! :)

Hey Tim - I was wondering if the pyNastran package reads op4 files any faster than pyyeti. I use pyyeti only, but I notice that pyNastran has (limited, unlike pyyeti) support for op4 reading. But if it does read certain files faster, it's open source and may provide some insight into how they do it.

Thinking on my above comment, I did some benchmarking and found that pyYeti is, in my opinion, pretty fast in loading large matrices from op4 files (tested using a 1.1 GB binary file).

It looks like assembling the matrices as sparse generally causes a large performance hit, but otherwise, I think pyYeti is competitive in op4 load speeds with pyNastran.

Outside of porting the Python code used to read the op4 data over to something more performant and low-level, I don't think there's too much to gain here.

I would support closure of this issue, or renaming of this issue to something like "TODO: port op4 reading to XYZ language" if @twmacro prefers that.

Here is the data I collected:

Module Sparse setting Speed (sec)
pyYeti sparse=False 2
pyYeti sparse=None 58
pyNastran Not change-able, but similar to "sparse=None" in pyYeti 55

Thank you so much for your input @jeremypriest! I'll close this issue now since I don't have any good way that I know of to speed this up. I will however keep an open mind for any good ideas and perhaps experiment from time to time.