[Bug] Performance bug: SampleInitPopulation
Closed this issue · 6 comments
I am encountering a performance issue in EvolutionaryNode::SampleInitPopulation
. Some perf data:
- SampleInitPopulation: ~26s
- EvolveWithCostModel: ~14s
- Build & Measure: ~66s
It is not reasonable that SampleInitPopulation
is much slower than EvolveWithCostModel
. In theory it should be like 5-10x faster than EvolveWithCostModel
.
Glancing through htop, I noticed that there are only use 8 threads active when executing SampleInitPopulation
, which is supposed to be 32 threads on my AMD 3950x (16C/32T).
There is only one lock in the code, which is very much unlikely to affect performance, because it is only acquired 2048 times during this 26s.
Therefore I opened this thread in case I forgot. Will dig a bit deeper later.
going to work on it
The post processor VerifyGPU
is the cause of the issue
The pass VerifyGPUCode
is the root cause
The root cause is the exception try-catch pass is really slow...Using the helper provided in tir analysis, we can reduce the time from ~26s to ~8s
Amazing that try-catch can make such huge impact!
@MasterJH5574 Yeah the error message generation is a bit time consuming: https://github.com/Hzfengsy/tvm-tensorir/blob/master/src/tir/analysis/verify_gpu_code.cc#L305-L313