Embedded usage of muparser size vs speed
Closed this issue · 8 comments
Hello!
I have important request/question in regards of use of muparser in embedded environment. I am making datalogger with built in conversion and storage of data. Muparser is awesome and best solution I found so far. I really like that I can specify my own functions to internally control my device.
Although because of amount of data types and sources I encountered the problem: either speed or size.
If I use single globally defined muparser then after each evaluation I need to I need to clear the values (and define variables and setExpr every time). This takes time, usually from 5ms up to 25ms on ESP32. Imagining I run like ~15 different expressions and they are triggered at different times (due to how data comes back on callback) this isn't ideal.
Another option was to create separate muparser instances and then speed is basically instant (below 1ms). Problem is muparser itself takes around 324 bytes of size (and then adds lots on top once maps and other bits starts working) which in Embedded world isn't the best and it is causing some memory issues.
My question is then how to solve this?
Is is possible to "extract" just configuration for given expression and store it separately then load (bytecode)?
Or are some flags possible to deactivate some features that I don't need?
All parsers are configured exactly the same - using same defined functions - the only thing that is different are expressions and pointers to the actual values - so I bet it is possible to "merge" them somehow for example using some static stuff?
Thanks!
Not sure about this. You have a GetByteCode function that will return the bytecode from the parser. You do not have a function to bring it back in but that should be fairly easy to add.
I guess you could set up 15 different equations and for each one you could save the bytecode. As long as the referenced variables remain valid the bytecode should remain valid. Once a computation request arrives you could look up the proper version of the bytecode and restore it.
I may be overlooking something but i would simply add an SetByteCode function that reinserts a previously extracted bytecode. you will also have to set "m_pParseFormula = &ParserBase::ParseCmdCode;".
Check the master branch and look at the SetByteCode function and example1.cpp lines 478-484 for saving bytecode and lines 503-512 for restoring bytecode.
You are gold man!
I did some tests trying to optimise it (on older 2.3.2 version that I used for long time), I set float as value type and basically set those as static inside function:
funmap_type ParserBase::m_FunDef;
funmap_type ParserBase::m_PostOprtDef;
funmap_type ParserBase::m_InfixOprtDef;
funmap_type ParserBase::m_OprtDef;
valmap_type ParserBase::m_ConstDef;
strmap_type ParserBase::m_StrVarDef;
string_type ParserBase::m_sNameChars;
string_type ParserBase::m_sOprtChars;
string_type ParserBase::m_sInfixOprtChars;
Of course after properly checking everything. It worked well and parser size was reduced, meaning I basically got what I wanted and it worked well.
Your approach seems much better but problem that I have is I got random errors (random as they happen at different expressions).
Access to parser and values is protected with mutex too (for init too).
Anyway here is backtrace I will try to figure this out if not I will come back to original method.
[ 5478][E][ExpressionFunctions.hpp:258] updateVal(): Expr: 2.5*(filter("Analog 1", 10)-0.5)
abort() was called at PC 0x401bfc57 on core 1
Backtrace: 0x40083d15:0x3fff8c80 0x4008f181:0x3fff8ca0 0x40095295:0x3fff8cc0 0x401bfc57:0x3fff8d40 0x401bfc9e:0x3fff8d60 0x401bfbff:0x3fff8d80 0x401181af:0x3fff8da0 0x40118dd2:0x3fff8dc0 0x40119521:0x3fff8f10 0x40203ff9:0x3fff8f30 0x400dbf09:0x3fff8f50 0x400ee5b1:0x3fff8f70
#0 0x40083d15:0x3fff8c80 in panic_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/panic.c:408
#1 0x4008f181:0x3fff8ca0 in esp_system_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/esp_system.c:137
#2 0x40095295:0x3fff8cc0 in abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/newlib/abort.c:46
#3 0x401bfc57:0x3fff8d40 in __cxxabiv1::__terminate(void (*)()) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4 0x401bfc9e:0x3fff8d60 in std::terminate() at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
#5 0x401bfbff:0x3fff8d80 in __cxa_throw at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:95
#6 0x401181af:0x3fff8da0 in mu::ParserBase::Error(mu::EErrorCodes, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const at lib/muparser/src/muParserBase.cpp:1573 (discriminator 4)
#7 0x40118dd2:0x3fff8dc0 in mu::ParserBase::ParseCmdCodeBulk(int, int) const at lib/muparser/src/muParserBase.cpp:1221 (discriminator 3)
#8 0x40119521:0x3fff8f10 in mu::ParserBase::ParseCmdCode() const at lib/muparser/src/muParserBase.cpp:1044
#9 0x40203ff9:0x3fff8f30 in mu::ParserBase::Eval() const at lib/muparser/src/muParserBase.cpp:1826 (discriminator 4)
#10 0x400dbf09:0x3fff8f50 in ExpressionFunc::updateVal(float&, bool) at src/processing/expressions/ExpressionFunctions.hpp:292
#11 0x400ee5b1:0x3fff8f70 in DataEntryNumeric::updateExtra(float&, bool, bool) at src/processing/DataEntryNumeric.hpp:292
(inlined by) processTask(void*) at src/Gauge-s.hpp:368
PS: Is there any way to donate you for this awesome library?
PS2: I assume this type of assignment is ok?
p.SetExpr(expression);
p.Eval();
bc = p.GetByteCode();
PS3: I think I found culprit.
On each ReInit m_vStringBuf
is gettin cleared.. Same goes for token reader although I am not sure if this would cause any issues.
I guess we need to get m_vStringBuf
with TokenReader altogether for it to work? I think it is only affected by string formulas which are part of expression in my case..
Let me think how can I get around it on my end especially I cannot ask users to change this approach for new stuff and it's really useful to have bigger number of functions like this.
No need for donations but i appreciate the offer.
This line is most likely not ok:
bc = p.GetByteCode();
The reason is that GetByteCode is returning a const reference to the internally held bytecode. This will always return the same reference since the bytecode is reused and never recreated.
You have to create a deep copy of the bytecode. This is done either by creating a new bytecode object and initializing it with the parser bytecode:
ParserByteCode bytecode1(parser.GetByteCode());
or by using the Assign function on an existing object:
bytecode1.Assign(p.GetByteCode())
The reason is that i chose not to change the existing API of GetByteCode which should rather return a deep copy of the bytecode as a new object and not the reference to the internally used object. However if i change this i will break existing programs. I havn't made up my mind regarding the final API i may still decide to change it.
PS3: I think I found culprit. On each ReInit
m_vStringBuf
is gettin cleared.. Same goes for token reader although I am not sure if this would cause any issues.I guess we need to get
m_vStringBuf
with TokenReader altogether for it to work? I think it is only affected by string formulas which are part of expression in my case..Let me think how can I get around it on my end especially I cannot ask users to change this approach for new stuff and it's really useful to have bigger number of functions like this.
@beltoforion thanks but .Assign() didn't help (also ParserByteCode has = operator which basically does that..) so there is no juice from this and I still believe above to be culprit.
Any ideas to go around it? I could potentially do manual "extraction" of parameter \"Analog 4\"
for example but then should I keep it as std::string and just define var giving pointer to it? But that I think still doesn't solve problem if user wants to use two different string expressions and I would have to create separate string vectors or smth like that..
might work now.
Don't mind those. I am not good at using GitHub..
Anyway.. It's working now! It's really awesome and I bet some people would see performance gains (although on PC I guess you can just create separate parsers and be good to go..) although I really like how it works now.
I gained a little performance in terms of memory allocations (heap fragmentation) which is important, memory is similar to my static approach but this one looks much better and is easier to manage.
#0 0x40083d15:0x3fff7b00 in panic_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/panic.c:408
#1 0x4008f181:0x3fff7b20 in esp_system_abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/esp_system/esp_system.c:137
#2 0x40095295:0x3fff7b40 in abort at /Users/ficeto/Desktop/ESP32/ESP32S2/esp-idf-public/components/newlib/abort.c:46
#3 0x401c02ff:0x3fff7bc0 in __cxxabiv1::__terminate(void (*)()) at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:47
#4 0x401c0346:0x3fff7be0 in std::terminate() at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_terminate.cc:57
#5 0x401c02f9:0x3fff7c00 in __cxa_rethrow at /builds/idf/crosstool-NG/.build/HOST-x86_64-w64-mingw32/xtensa-esp32-elf/src/gcc/libstdc++-v3/libsupc++/eh_throw.cc:133
#6 0x4011f051:0x3fff7c20 in std::_Deque_base<int, std::allocator<int> >::_M_initialize_map(unsigned int) at c:\users\sorek\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/stl_deque.h:708
#7 0x401266f1:0x3fff7c50 in std::_Deque_base<int, std::allocator<int> >::_Deque_base() at c:\users\sorek\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/stl_deque.h:493
(inlined by) std::deque<int, std::allocator<int> >::deque() at c:\users\sorek\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/stl_deque.h:898
(inlined by) std::stack<int, std::deque<int, std::allocator<int> > >::stack<std::deque<int, std::allocator<int> >, void>() at c:\users\sorek\.platformio\packages\toolchain-xtensa-esp32@8.4.0+2021r2-patch5\xtensa-esp32-elf\include\c++\8.4.0\bits/stl_stack.h:149
(inlined by) mu::ParserTokenReader::ReInit() at lib/muparser/src/muParserTokenReader.cpp:256
#8 0x401268be:0x3fff7d00 in mu::ParserTokenReader::SetFormula(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) at lib/muparser/src/muParserTokenReader.cpp:226
#9 0x4011c17a:0x3fff7d20 in mu::ParserBase::SetByteCode(mu::ParserByteCode const&) at lib/muparser/src/muParserBase.cpp:285
#10 0x400dbfe5:0x3fff7db0 in ExpressionFunc::updateVal(float&, bool) at src/processing/expressions/ExpressionFunctions.hpp:290
#11 0x400de8e1:0x3fff7dd0 in DataEntryNumeric::updateExtra(float&, bool, bool) at src/processing/DataEntryNumeric.hpp:292
(inlined by) DataEntryNumeric::addVal(float, bool) at src/processing/DataEntryNumeric.hpp:311
#12 0x400ebe97:0x3fff7e00 in DataEntryCAN::addValCAN(CAN_frame_t&) at src/processing/DataEntryCAN.hpp:286
(inlined by) DataEntryCanBase::processCAN(CAN_frame_t&) at src/processing/DataEntryCAN.hpp:303
#13 0x400ebf94:0x3fff7e50 in DataEntryCanBase::receiveCAN() at src/processing/DataEntryCAN.hpp:95
(inlined by) DataProcessor::receiveCAN() at src/processing/DataProcessor.hpp:213
(inlined by) canTask(void*) at src/Gauge-s.hpp:318
Although I am still getting problems with heap since there are lots of allocs and deallocs. I would have asked you if you have any idea of using some static allocations (especially of stack) but I don't want to drag you.
You helped me a lot. Thank you for all of that!
Just so you may be interested I tried both methods yours and mine (static alloc of some maps and strings) - you can look it up on my fork in static-old.
All values are microseconds (unless otherwise stated) there are two expressions on canbus. From process task we only care about heap usage and Time process expr
.
Here is your approach:
receiveCAN(): Can id 0x329 | Len 8 | Hz 201.13 | Took (us): 165 | Waiting (ms): 2037 | Queue 0.00000
|----processTask-----:------------|
| Time accel + analog: 1216 |
| Time GPS : 387 |
| Time SD Logging : 42 |
| Time add new data : 55 |
| Time process expr : 346 |
| Time task delayed : 7924 |
| Time task took time: 2049 |
| Time free heap (KB): 65.27 |
|--------------------:------------|
Here is static alloc:
receiveCAN(): Can id 0x329 | Len 8 | Hz 200.54 | Took (us): 35 | Waiting (ms): 2357 | Queue 0.00000
|----processTask-----:------------|
| Time accel + analog: 1039 |
| Time GPS : 521 |
| Time SD Logging : 38 |
| Time add new data : 43 |
| Time process expr : 136 |
| Time task delayed : 8196 |
| Time task took time: 1780 |
| Time free heap (KB): 57.00 |
|--------------------:------------|
Note that for single parser we need to use mutexes to protect data which I think isn't ideal either. For systems with more memory calling more parsers is easy - but I still think "single init" of functions and then defining vars looks better? But even so static muparser only takes 108 bytes per instance now (as opposed to 360 bytes) while after change ParserByteCode increased from 24 to 60 which.. basically makes no sense to use it now. I believe constant re-initing and heap allocations on swaps are making it much slower.
I would consider doing something like I did so #define that you can set static allocation of those common parsers but that's totally up to you - I can keep it in my fork like that and you don't have to worry really. I will work on future optimisations of this.
Thank you for your work and help. Your code it's really inspiring for me! I learned a lot.