Serious performance regression in v1.1.1
KieranP opened this issue · 11 comments
I upgrade rails_autolink the other day to 1.1.3, and suddenly, pages using the autolink functionality went from loading in 2s to never loading, consuming 100% CPU power, and steadily increasing in memory. I didn't catch it before it went into production, and suddenly 15 passenger processes are using 100% CPU over 4 cores and memory going up to 3GB into swap.
I downgraded to version 1.1.0 and the issue disappeared. I upgraded to v1.1.1 and the problem reappeared.
Now there is only really one commit between the release of 1.1.0 and the release of 1.1.1: 8dd7191 , so I'm fairly certain this new REGEXP is the cause of the speed decrease...
I'm happy to stay at 1.1.0 for now, but I'd recommend reverting that change since it is pretty crippling.
@xuanxu @andresbravog Any progress on this? The new regexp is causing some serious performance issues.
@KieranP can you give me more feedback:
- Ruby version
- Type of pages are you working with
Are this problem appearing just in your production env?
Have your application unitary testing for this feature so you can measure regression time?
I did some benchmarks on ruby 2.0 and couldn't find any significant difference
- Ruby version ruby-2.0.0-p247
- Rails version rails-3.2.14
- rails_autolink version 1.1.1+
- Environment - development and production
- Project page with up to 100's of comments on the page, each that runs through rails_autolink
Testing in our app didn't pick up this problem, but probably because tests only had 1 or 2 comments on the project, not 100's. It seems the more comments and the more variety in those comments the longer autolink takes. Sometimes (not on every project, but consistently on the same projects) it just locks up, 100% CPU, and memory climbs steadily. Is it possible for a regexp to recurse?
If no regression comes up with singular test sounds like a ruby regexp malfunction with some texts like http://bugs.ruby-lang.org/issues/5149 , maybe you can find the exactly text failing and report it to ruby core.
Without knowing anything about your application, shouldn't you store those comments after running rails_autolinks so you don't have to process it in each page load?
cheers
I can provide more info on this, as I just got bit by it. Here's an example text which takes nearly 0.5s on my machine when passed through auto_link
:
'x@..............................'
The key seems to be the dots in the domain part. Here are more test results:
x@.............................. 0.470000 0.000000 0.470000 ( 0.471416)
x@............................... 0.750000 0.010000 0.760000 ( 0.761386)
x@................................ 1.230000 0.000000 1.230000 ( 1.237689)
x@................................. 1.980000 0.000000 1.980000 ( 2.007114)
x@.................................. 3.210000 0.000000 3.210000 ( 3.238421)
x@................................... 5.190000 0.000000 5.190000 ( 5.230534)
x@.................................... 8.490000 0.030000 8.520000 ( 9.219805)
x@..................................... 13.630000 0.040000 13.670000 ( 14.814795)
x@...................................... 22.020000 0.050000 22.070000 ( 23.397128)
x@....................................... 35.620000 0.060000 35.680000 ( 37.329822)
x@........................................ 57.660000 0.140000 57.800000 ( 61.572949)
I'm testing this in a Rails 4.0 console, where I've done include ActionView::Helpers::TextHelper
. The test code is:
Benchmark.bm do |bm|
30.upto(40).each do |dots|
email = "x@#{'.' * dots}"
bm.report(email) { auto_link email }
end
end
I've tested v1.1.4 on Ruby 2.0.0-p247 and Ruby 2.1.0-preview1.
@mitio thx for your valuable information here
i'm now able to reproduce the error, it's something related on how ruby processes the email regex:
AUTO_EMAIL_LOCAL_RE = /[\w.!#\$%&'*\/=?^`{|}~+-]/
AUTO_EMAIL_RE = /[\w.!#\$%+-]\.?(?:#{AUTO_EMAIL_LOCAL_RE}+\.)*#{AUTO_EMAIL_LOCAL_RE}*@[\w-]+(?:\.[\w-]+)+/
as we can see with the script
AUTO_EMAIL_LOCAL_RE = /[\w.!#\$%&'*\/=?^`{|}~+-]/
AUTO_EMAIL_RE = /[\w.!#\$%+-]\.?(?:#{AUTO_EMAIL_LOCAL_RE}+\.)*#{AUTO_EMAIL_LOCAL_RE}*@[\w-]+(?:\.[\w-]+)+/
require 'benchmark'
Benchmark.bm do |bm|
30.upto(40).each do |dots|
email = "x@#{'.' * dots}"
bm.report(email) { email.gsub(AUTO_EMAIL_RE, ' ') }
end
end
giving the output:
x@.............................. 0.420000 0.000000 0.420000 ( 0.429706)
x@............................... 0.690000 0.000000 0.690000 ( 0.686012)
x@................................ 1.090000 0.000000 1.090000 ( 1.095430)
x@................................. 1.780000 0.000000 1.780000 ( 1.786115)
x@.................................. 2.870000 0.010000 2.880000 ( 2.881852)
x@................................... 4.690000 0.000000 4.690000 ( 4.696742)
x@.................................... 7.680000 0.020000 7.700000 ( 7.719964)
x@..................................... 12.290000 0.010000 12.300000 ( 12.319188)
x@...................................... 19.880000 0.020000 19.900000 ( 19.928903)
x@....................................... 32.350000 0.050000 32.400000 ( 32.483979)
x@........................................ 51.900000 0.080000 51.980000 ( 52.099437)
i'm working on finding a better 'ruby friendly' regex
I've tested v1.1.4 on Ruby 2.0.0-p247 and Ruby 1.9.3
I think I fixed the issue, with the regex:
AUTO_EMAIL_LOCAL_RE = /[\w.!#\$%&'*\/=?^`{|}~+-]/
AUTO_EMAIL_RE = /[\w.!#\$%+-]#{AUTO_EMAIL_LOCAL_RE}{0,*}@[\w-]+(?:\.[\w-]+)+/
getting better results:
user system total real
x@.............................. 0.000000 0.000000 0.000000 ( 0.000171)
x@............................... 0.000000 0.000000 0.000000 ( 0.000008)
x@................................ 0.000000 0.000000 0.000000 ( 0.000012)
x@................................. 0.000000 0.000000 0.000000 ( 0.000007)
x@.................................. 0.000000 0.000000 0.000000 ( 0.000004)
x@................................... 0.000000 0.000000 0.000000 ( 0.000004)
x@.................................... 0.000000 0.000000 0.000000 ( 0.000004)
x@..................................... 0.000000 0.000000 0.000000 ( 0.000007)
x@...................................... 0.000000 0.000000 0.000000 ( 0.000007)
x@....................................... 0.000000 0.000000 0.000000 ( 0.000006)
x@........................................ 0.000000 0.000000 0.000000 ( 0.000004)
I'll create a pull request
@andresbravog I just sent a PR with the change plus a timeout test for these inputs.
I based the value of AUTO_EMAIL_RE
on your suggestion but changed it a bit so that tests pass. Is there any particular reason you added {0,*}
as a quantifier of AUTO_EMAIL_LOCAL_RE
instead of just *
?
haha you was faster than me, just working on testing it into different ruby versions. It's 👍 for me
Thank you both @andresbravog & @mitio!