govert/SQLiteForExcel

far slower reading data from cursor in Office 365 due to character encoding

Opened this issue · 12 comments

My company recently updated to office 365 and all my tools got vastly slower using this. Specifically running the DLL api functions to step thru the cursor, read the datatype and convert is many times slower. It's slightly less noticeable when data is only numeric, but I believe it's still slower. I believe this problem is related to the text encoding.

IOW, sqlite db works just as fast, but when you return a lot of data, it takes a lot longer to read it in to an array stepping thru the cursor (so to speak, using the step function). I'm not sure what is going on. I haven't found a fix yet, but it's sad because you can write such amazing solutions using this .. I read all excel tables in automatically to a database in memory as they're added. You'd never even know you basically have an excel db in RAM and I wrote my own sql editor with VBA forms.

I think the part that is slower in newer Excel versions is some of the COM object model interactions with Excel. So if you are reading or writing cell-by-cell, that might be slower, and some interactions are much slower. See this answer to a similar question Excel-DNA/ExcelDna#281 (comment) where I said:

There were quite a few performance-related issues in the Excel 2010 -> Excel 2013 time frame, and I think you'll find some COM interop scenarios became much slower.

So I don't think this is related to SQLiteForExcel directly and the VBA side. Maybe you can do some tests where there is no use of the Excel COM object model, just VBA code. I'm also not sure if the debug output in VBA has reliable performance across versions.

Have you perhaps moved from 32-bit to 64-bit?
Maybe that makes such a difference.

I do know anything about the character encoding in the SQLite db - this SQLiteForExcel code assumes it's always UTF8.

I can't really believe the VBA side is the slow part of the call - it's just converting the UTF8 bytes into a VBA string and storing into an array. But the SQLite side didn't change at all. So it doesn't really make sense to me.

Storing the data as UTF16 might bloat your SQLite DB, so it's not obvious that there will much of a performance improvement.

The only Windows API called for text columns is MultiByteToWideChar and I haven't yet found anything indicating that it might have changed recently.

You might try to test the performance of StringToUtf8Bytes and Utf8PtrToString pair to get some sense of how fast the text encoding and decoding is done. I'd suspect it is super fast compared to getting the bytes from SQLite.

Thank you @gburns53 for posting back - that's very interesting and surprising.

The AMSI stuff is interesting - I had not heard of this before.

Are you suggesting the two calls to the Win32 API MultiByteToWideChar for every string is the place where the slowdown happens? I guess you could check this by making your own implementation of Utf8PtrToString which just copies the Utf8 bytes into every second location of the target string. That would not be correct for strings containing non-ANSI characters, but would be fast. You could even check if the character sequence represents and interesting Unicode character, and only call the Windows API in that case - or implement the whole thing in VBA.