tenderlove/rails_autolink

Serious performance regression in v1.1.1

KieranP opened this issue · 11 comments

I upgrade rails_autolink the other day to 1.1.3, and suddenly, pages using the autolink functionality went from loading in 2s to never loading, consuming 100% CPU power, and steadily increasing in memory. I didn't catch it before it went into production, and suddenly 15 passenger processes are using 100% CPU over 4 cores and memory going up to 3GB into swap.

I downgraded to version 1.1.0 and the issue disappeared. I upgraded to v1.1.1 and the problem reappeared.

Now there is only really one commit between the release of 1.1.0 and the release of 1.1.1: 8dd7191 , so I'm fairly certain this new REGEXP is the cause of the speed decrease...

I'm happy to stay at 1.1.0 for now, but I'd recommend reverting that change since it is pretty crippling.

@xuanxu @andresbravog Any progress on this? The new regexp is causing some serious performance issues.

@KieranP can you give me more feedback:

  • Ruby version
  • Type of pages are you working with

Are this problem appearing just in your production env?
Have your application unitary testing for this feature so you can measure regression time?

I did some benchmarks on ruby 2.0 and couldn't find any significant difference

  • Ruby version ruby-2.0.0-p247
  • Rails version rails-3.2.14
  • rails_autolink version 1.1.1+
  • Environment - development and production
  • Project page with up to 100's of comments on the page, each that runs through rails_autolink

Testing in our app didn't pick up this problem, but probably because tests only had 1 or 2 comments on the project, not 100's. It seems the more comments and the more variety in those comments the longer autolink takes. Sometimes (not on every project, but consistently on the same projects) it just locks up, 100% CPU, and memory climbs steadily. Is it possible for a regexp to recurse?

If no regression comes up with singular test sounds like a ruby regexp malfunction with some texts like http://bugs.ruby-lang.org/issues/5149 , maybe you can find the exactly text failing and report it to ruby core.

Without knowing anything about your application, shouldn't you store those comments after running rails_autolinks so you don't have to process it in each page load?

cheers

I can provide more info on this, as I just got bit by it. Here's an example text which takes nearly 0.5s on my machine when passed through auto_link:

'x@..............................'

The key seems to be the dots in the domain part. Here are more test results:

x@..............................  0.470000   0.000000   0.470000 (  0.471416)
x@...............................  0.750000   0.010000   0.760000 (  0.761386)
x@................................  1.230000   0.000000   1.230000 (  1.237689)
x@.................................  1.980000   0.000000   1.980000 (  2.007114)
x@..................................  3.210000   0.000000   3.210000 (  3.238421)
x@...................................  5.190000   0.000000   5.190000 (  5.230534)
x@....................................  8.490000   0.030000   8.520000 (  9.219805)
x@..................................... 13.630000   0.040000  13.670000 ( 14.814795)
x@...................................... 22.020000   0.050000  22.070000 ( 23.397128)
x@....................................... 35.620000   0.060000  35.680000 ( 37.329822)
x@........................................ 57.660000   0.140000  57.800000 ( 61.572949)

I'm testing this in a Rails 4.0 console, where I've done include ActionView::Helpers::TextHelper. The test code is:

Benchmark.bm do |bm|
  30.upto(40).each do |dots|
    email = "x@#{'.' * dots}"
    bm.report(email) { auto_link email }
  end
end

I've tested v1.1.4 on Ruby 2.0.0-p247 and Ruby 2.1.0-preview1.

@mitio thx for your valuable information here

i'm now able to reproduce the error, it's something related on how ruby processes the email regex:

AUTO_EMAIL_LOCAL_RE = /[\w.!#\$%&'*\/=?^`{|}~+-]/
AUTO_EMAIL_RE = /[\w.!#\$%+-]\.?(?:#{AUTO_EMAIL_LOCAL_RE}+\.)*#{AUTO_EMAIL_LOCAL_RE}*@[\w-]+(?:\.[\w-]+)+/

as we can see with the script

AUTO_EMAIL_LOCAL_RE = /[\w.!#\$%&'*\/=?^`{|}~+-]/
AUTO_EMAIL_RE = /[\w.!#\$%+-]\.?(?:#{AUTO_EMAIL_LOCAL_RE}+\.)*#{AUTO_EMAIL_LOCAL_RE}*@[\w-]+(?:\.[\w-]+)+/

require 'benchmark'

Benchmark.bm do |bm|
  30.upto(40).each do |dots|
    email = "x@#{'.' * dots}"
    bm.report(email) { email.gsub(AUTO_EMAIL_RE, ' ') }
  end
end

giving the output:

x@..............................  0.420000   0.000000   0.420000 (  0.429706)
x@...............................  0.690000   0.000000   0.690000 (  0.686012)
x@................................  1.090000   0.000000   1.090000 (  1.095430)
x@.................................  1.780000   0.000000   1.780000 (  1.786115)
x@..................................  2.870000   0.010000   2.880000 (  2.881852)
x@...................................  4.690000   0.000000   4.690000 (  4.696742)
x@....................................  7.680000   0.020000   7.700000 (  7.719964)
x@..................................... 12.290000   0.010000  12.300000 ( 12.319188)
x@...................................... 19.880000   0.020000  19.900000 ( 19.928903)
x@....................................... 32.350000   0.050000  32.400000 ( 32.483979)
x@........................................ 51.900000   0.080000  51.980000 ( 52.099437)

i'm working on finding a better 'ruby friendly' regex

I've tested v1.1.4 on Ruby 2.0.0-p247 and Ruby 1.9.3

I think I fixed the issue, with the regex:

AUTO_EMAIL_LOCAL_RE = /[\w.!#\$%&'*\/=?^`{|}~+-]/
AUTO_EMAIL_RE = /[\w.!#\$%+-]#{AUTO_EMAIL_LOCAL_RE}{0,*}@[\w-]+(?:\.[\w-]+)+/

getting better results:

       user     system      total        real
x@..............................  0.000000   0.000000   0.000000 (  0.000171)
x@...............................  0.000000   0.000000   0.000000 (  0.000008)
x@................................  0.000000   0.000000   0.000000 (  0.000012)
x@.................................  0.000000   0.000000   0.000000 (  0.000007)
x@..................................  0.000000   0.000000   0.000000 (  0.000004)
x@...................................  0.000000   0.000000   0.000000 (  0.000004)
x@....................................  0.000000   0.000000   0.000000 (  0.000004)
x@.....................................  0.000000   0.000000   0.000000 (  0.000007)
x@......................................  0.000000   0.000000   0.000000 (  0.000007)
x@.......................................  0.000000   0.000000   0.000000 (  0.000006)
x@........................................  0.000000   0.000000   0.000000 (  0.000004)

I'll create a pull request

@andresbravog I just sent a PR with the change plus a timeout test for these inputs.

I based the value of AUTO_EMAIL_RE on your suggestion but changed it a bit so that tests pass. Is there any particular reason you added {0,*} as a quantifier of AUTO_EMAIL_LOCAL_RE instead of just *?

haha you was faster than me, just working on testing it into different ruby versions. It's 👍 for me

Thank you both @andresbravog & @mitio!