Regular Expresions: "Regex"

Student Learning Objectives:

  • By the end of this lesson you should be able to write regular expressions to:

    *

    1) test a string to see whether it matches a pattern.

    *

    2) extract from a string the sections that match all or part of a pattern.

    *

    3) change the string, replacing parts that match a pattern.

Why learn Regular Expressions:

  • Regular Expressions are a part of many programming languages: Ruby, JS, Perl, Python, Java, unix shell scripts (grep) ...

  • Regular Expressions are used for pattern matching, find and replace tasks.

  • Good for model validations: valid email format, valid phone format, valid ssn ...

  • Regex is real cryptic code. You have to decipher it to understand it.

1) Match a pattern in a string

  • The strategy is to only match what you want.

  • Each character in a regex is either a metacharacter with special meaning, or a regular character with its literal meaning.

  • Exact pattern matches

    *

    Place exact string between to forward slashes: /cat/ matches "cat"

    *

    However, /cat/ also matches "catastrophe" and "scat"

  • Use metacharacters to be more specific on what you match.

Whiteboard:

  • Type this sentence into Rubular (http://www.rubular.com/) and follow along.

  • Dogs are not dogmatic about dog things unlike Madog the bad dog.

Ruby Matcher: .match

  • .match(/regex matcher/, "replacement string")

  • "I love my cat.".match(/cat/)

  • match[0] => cat

    string = "I love my cat."

    if string.match(/cat/)

    puts "I found a match."
    

    end

Metacharacters:

  • . - wild card, matches any char

  • \ - escape a character, turns a metacharacter into a string literal - to match a literal period use \\.

  • | - logical or, /cat|dog/ matches "cat" or "dog"

  • \s - matches any whitespace

  • \S - matches any non-whitespace

  • ^ - start of line

  • $ - end of line

  • /cAt/i - "/i" case insensitive, matches Cat, cAt, CAT, CaT, cat ...

Exercise 1:

  • open "match_string.rb" in sublime, and follow the instructions within

  • run using: $ruby match_string.rb string_data

2) Select section out of a string

Whiteboard:

Metacharacters:

  • (...) - enclosed matches are assigned variable names $1+ that can be reused

  • [] - range of characters or numbers to match: [0-9] or [a-z]

  • \d - any digit, same as [0-9]

  • \D - any non-digit

  • {} - exact number of times character or number is repeated, a{2} == "aa"

  • + - one or more

  • * - zero or more

  • ? - zero or one

Exercise 2:

  • open "real_time_data.rb" in sublime, and follow the instructions within

  • run using: $ruby read_time_data.rb time_data

3) Find and replace parts of a string

  • Commands: .sub and .gsub

  • .sub substitutes on first match, .gsub substitutes globally all matches

  • Syntax: my_string.sub( /sub string to match/ , "replacement sub string" )

  • "That is a cute dog.".sub(/dog/, "cat") => "That is a cute cat."

  • Can be chained together:

    *

    "Red is my favorite color, I just love red".sub(/red/, "blue").sub(/Red/,"Blue")

Exercise 3:

  • open "find_replace.rb" in sublime, and follow the instructions within

  • run using: $ruby find_replace.rb find_replace_data

##@@@@@@@@@@@@@@@@@@@@@@@@

Resources

Rubular: regular expression tester: http://www.rubular.com/

Regex Crosswords: http://regexcrossword.com/