/despamilator

spam detection in text, made specifically for web forms

Primary LanguageRuby

Despamilator

DESCRIPTION:

Despamilator is a plugin based spam detector designed for use on your web forms borne out of two annoyances: Spam being submitted in my web forms and CAPTCHAS being intrusive. Despamilator will apply some commonly used heuristics from the world of anti-spam to help you decide whether your users are human or machine.

FEATURES/PROBLEMS:

  • Moved Rails-esque validation gem to the despamilator-rails gem.

SYNOPSIS:

Using Despamilator:

require 'despamilator'

# some time later...

dspam = Despamilator.new('some text with an <h2> tag qthhg')

dspam.score #=> the total score for this string (1 is considered high)
dspam.matches #=> array of matching filters
first_match = dspam.matches.first #=> first matching filter
first_match[:filter].name #=> some string with the name of the filter
first_match[:filter].description #=> some string to describe
first_match[:score] #=> the individual score assigned by this matching filter

FILTERING:

As stated, this is a heuristic scanner so its up to the user to decide the thresholds of the scanner. I usually say “it’s spam” if the score reaches 1.

The score will be added to incrementally by each matching filter. So if there is some HTML in there, it will be added to the score. If there is also a script tag of some sort, that will add more.

Each filter decides how much of a score it assigns. For example, detecting a number next to a letter (the numbers_an_words filter) is only a mild hint compared with a script tag (detected by the script_tag filter).

NEW FILTERS:

I absolutely welcome new filters and experiments. New filters should be put in the ‘lib/despamilator/filter/’ directory. The core filtering code will detect and use what is in there so you only need to drop the code in. Filters should be simple, no classes etc wrapped around them and should try to perform one simple task. They should always supply the following methods:

* name #=> the name of your filter.
* description #=> what your filter will look for.
* parse(subject) #=> the method that will be called when parsing. A copy of the message is passed in.

The subject of the detection (including text) is passed into the “parse” method. It provides the following methods…

* text #=> The text of the message. This is immutable but you can alter a duplicate (using "dup").
* register_match! #=> Method that receives the instance of the filter and the assigned score if matched.

Take a look at the “prices” code and tests in “spec/filters/prices.rb”.

Example Filter:

This example is to detect the letter “a”. Put the code in lib/despamilator/filter/detect_letter_a.rb:

require 'despamilator/filter_base'

module DespamilatorFilter

  class DetectLetterA < Despamilator::Filter

    def name
      'Detecting the letter A'
    end

    def description
      'Detects the letter "a" in a string for no reason other than a demo'
    end

    def parse subject
      if subject.text.downcase.scan(/a/)
        # add 0.1 to the score of the text
        subject.register_match!({:score => 0.1, :filter => self})
    end
  end
end

As previously stated, ensure you put a spec test together as well!

REQUIREMENTS:

  • domainatrix

INSTALL:

$ sudo gem install despamilator

LICENSE:

Copyright © 2011 Stephen Hardisty

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the ‘Software’), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED ‘AS IS’, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.