Tclx profile support for 8.6
Opened this issue · 14 comments
Draft implementation in AndroWish (and undroidwish), see this check-in
https://www.androwish.org/index.html/info/c29648d90f7440aa
Somewhat tested on Android x86 and Linux x86 where it didn't crash anymore.
Further simplified the changes for Tcl 8.6 in check-in
This is the patch for tclXprofile.c, no Tcl core changes seem necessary
https://www.androwish.org/index.html/fdiff?v1=a6f5cf9f8b21bf4d&v2=eaeaabfa2e4ac909&patch
Stupid me, I was on a wrong old TclX checkout. Seems some of the work was done 2 years ago.
Following the ticket http://core.tcl.tk/tcl/tktview/cd82cec7ce46a55af099 I now understand the problem with NRE better. So execution traces are the only way to perform proper profiling with 8.6 and it looks not impossible to implement the entire TclX profile facility in Tcl only.
An unoptimal draft implementation for TclX's profile command using "trace add|remove execution ..." internally is in
https://www.androwish.org/index.html/info/c6070d8891e3d4a0
It is rudimentary tested with a somewhat fused profile.test and may still have memory leaks. As I have no more time to spend on this topic, if anybody finds it useful, it could serve as a starting point.
Thanks for the effort. The bounty is still up for grabs.
I ran valgrind full leak check on a small example and did not see any leaks.
I also noticed that profiling is 2 orders of magnitude slower than un-profiled code. I don't think that's really usable. I've never used Tclx profile before, so I don't know if the old mechanism had less overhead.
By Tcl_Obj'fying parts of TraceExecutionProc() of tclTrace.c the overhead can be reduced to about 60% which is still slow compared to unprofiled execution, see attached patch. But this is a core change and as usual unfinished, untested, and untipped ;-)
Here's the slightly improved patch:
tclTrace.diff.txt
And this script (including some recorded output) was used to get an idea on profile's timing impact:
perftest.tcl.txt
Conclusion: profiled execution is 100 to 250 times slower on 8.6.6 (core-8-6-branch).
For some reason unknown to me, the Tcl_FindCommand() call does not find the "yield" when called from the Tclx profiler. If the profile code ignores/works-around this error it appears to collect profile data for coroutines without crashing.
Here is a patch to ignore the issue with "yield" as proof. It should do until the issue with Tcl_FindCommand() is understood and/or fixed.
profile_yield.patch.txt
I don't think the Tcl_Obj-ing of the Tcl trace code requires a TIP. A Core maintainer should be able to preview and apply the patch.
The "crash" is fixed and the profile results seem reasonable. I think the next step is for FA to try the profiler on some real code to see if the performance and results are expectable.
Regarding the "yield" effect/crash please would you post a test case? I've tried this snippet (w/o the yield patch) and had no crashes (on 32bit i386 with core-8-6-branch plus Tcl_Obj'fication):
package require Tclx
profile on
proc changed_c known_val {
set new_val [ yield ]
while 1 {
if { $new_val != $known_val } {
set known_val $new_val
set new_val [ yield 1 ]
} else {
set new_val [ yield 0 ]
}
}
}
coroutine changed changed_c 0 ;# initial state == 0
puts [ changed 1 ]; # new state 1 ( a change 0 -> 1 )
puts [ changed 1 ]; # state 1 ( no change 1 -> 1 )
puts [ changed 1 ]; # state 1 ( no change 1 -> 1 )
puts [ changed 2 ]; # new state 2 ( change 1 -> 2 )
puts [ changed 1 ]; # new state 1 ( change 2 -> 1 )
puts [ changed 3 ]; # new state 3 ( change 1 -> 3 )
puts [ changed 1 ]; # new state 1 ( change 3 -> 1 )
puts [ changed 1 ]; # state 1 ( no change 1 -> 1 )
profile off PROF
profrep PROF calls
profrep PROF real
Modify coroutine.test to load package Tclx and turn profile on.
I see this failure:
==== coroutine-7.6 Early yield crashes FAILED
==== Contents of test case:
proc foo args {}
trace add execution foo enter {catch yield}
coroutine demo foo
rename foo {}
---- Test generated error; Return code was: 1
---- Return code should have been one of: 0 2
---- errorInfo: invalid command name "catch yield"
while executing
"{catch yield} foo enter"
(enter trace on "foo")
invoked from within
"coroutine demo foo"
("uplevel" body line 4)
invoked from within
"uplevel 1 $script"
---- errorCode: TCL LOOKUP COMMAND {catch yield}
==== coroutine-7.6 FAILED
Test file error: TclX profile bug id = 1000
This panic only occurs in tcltest and is due to Tcl_FindCommand returning NULL.
I may have gone off the deep end with this. It looks like the Tcl_Obj-ifying of the TclTrace code is causing failures and a hang in the test suite. I'll work on getting a complete list of failures and post the results here.
There are 11 failures, including event-5.3 which hangs so it has to be skipped in order to run the full suite.
trace_patch_failed_tests.txt
Looking into the problem I see that the patch is trying to build a list for the callback command. The basic problem is that although a list is a valid command, a command is not necessarily a valid list. Many of the failing cases have callback commands that are not valid lists, or when made into a list, end up not being the intended command.
I conclude that the Tcl_Obj-ification of the tclTrace code is not possible. It must be a string that is subsequently parsed by Tcl.
An alternative would be to add profiling directly into the core. The trace infrastructure could be used without having to bother with callbacks.
Apparently, the first version of Tcl_Obj'fication wasn't a big hit. Here's the second version which uses an array of parsed Tcl_Obj's attached to the TraceCommandInfo struct:
tclTrace-v2.diff.txt
This patch is better, it has only 3 regressions in the test suite:
info.test
==== info-40.1 location information not confused by literal sharing, bug 2933089 FAILED
==== Contents of test case:
test_info_frame;
join $result \n
---- Result was:
type source line 1852 file info.test cmd {if "$x != } proc ::test_info_frame level 1
type source line 1857 file info.test cmd {if "$$y != } proc ::test_info_frame level 1
---- Result should have been (exact matching):
type source line 1854 file info.test cmd print_one proc ::test_info_frame level 1
type source line 1859 file info.test cmd print_one proc ::test_info_frame level 1
==== info-40.1 FAILED
trace.test
==== trace-21.11 trace execution and alias FAILED
==== Contents of test case:
lappend res [namespace eval ::a y]
trace add execution ::x enter {
rename ::x {}
proc ::x {} {return ::}
#}
lappend res [namespace eval ::a y]
---- Test generated error; Return code was: 1
---- Return code should have been one of: 0 2
---- errorInfo: wrong # args: should be "rename oldName newName"
while executing
"rename ::x {} proc ::x {} {return ::}"
(enter trace on "x")
(in namespace eval "::a" script line 1)
invoked from within
"namespace eval ::a y"
("uplevel" body line 7)
invoked from within
"uplevel 1 $script"
---- errorCode: TCL WRONGARGS
==== trace-21.11 FAILED
==== trace-40.1 execution trace errors become command errors FAILED
==== Contents of test case:
proc foo args {}
trace add execution foo enter {rename foo {}; error bar;#}
catch foo m
return -level 0 $m[unset m]
---- Result was:
wrong # args: should be "rename oldName newName"
---- Result should have been (exact matching):
bar
==== trace-40.1 FAILED
Tests ended at Wed Dec 28 20:40:53 PST 2016
all.tcl: Total 31259 Passed 30181 Skipped 1075 Failed 3
Turns out that the latest patch doesn't honour this important sentence from "man Tcl_EvalObjv":
"Tcl_EvalObjv executes a single pre-parsed command instead of a script."
Thus, the new strategy is to build a pre-parsed Tcl_Obj array only for simple (single command) cases in order to speed up execution tracing, and to fall back to the old way of building a string to be Tcl_EvalEx'ed otherwise.
See this patch: tclTrace-v3.diff.txt
This does not fix the failure of "info-40.1" since in the optimized case the Tcl "info frame" layout is different and must be "info frame -2" instead of "info frame -3" for that test to succeed.
As some centuries ago in Denmark, the fine question now is to claim or not to claim ;-) What are FA's expectations regarding impacts of profiling on overall timing, how compares this approach to Tcl 8.5?