Ruby uses Perl-compatible regular expressions, so if you’re familiar with the preg_*
functions in PHP, it will be easy to learn regular expressions in Ruby.
Build a regular expression
Regular expressions in Ruby can be created using different syntaxes.
The most common is by enclosing the pattern in forward-slashes.
is usually used when the pattern contains a lot of forward-slashes (such as a filepath).
Regular expressions can also be explicitly instantiated using the Regexp class.
[codesyntax lang=”rails”]
/[a-z0-9]+\s/mi %r{/path/to/gif\.gif}mi"[a-z0-9]+\s", Regexp::IGNORECASE | Regexp::MULTILINE) [/codesyntax]
Constructs a new regular expression from pattern, which can be either a String or a Regexp (in which case that regexp’s options are propagated, and new options may not be specified (a change as of Ruby 1.8).
r1 =‘^a-z+:\\s+\w+’) #=> /^a-z+:\s+\w+/
r2 =‘cat’, true) #=> /cat/i
r3 =‘dog’, Regexp::EXTENDED) #=> /dog/x
r4 = #=> /cat/i
If options is a Fixnum, it should be one or more of the constants:
Regexp::EXTENDED – /x – extended mode – whitespace is ignored
Regexp::IGNORECASE – /i – case insensitive
Regexp::MULTILINE – /m – multiline mode – ‘.’ will match newline
or-ed together.
Otherwise, if options is not nil, the regexp will be case insensitive.
The lang parameter enables multibyte support for the regexp:
`n’, `N’ = none,
`e’, `E’ = EUC,
`s’, `S’ = SJIS,
`u’, `U’ = UTF-8.
Read more about
Use variables in regular expressions
[codesyntax lang=”rails”]
foo = ‘[\.\d]+’ # a string which is variable
pattern = “referer:#{foo}”
reg1 =, Regexp::IGNORECASE | Regexp::MULTILINE)
reg2 = /referer:#{foo}/mi
reg3 = /referer:[\.\d]+/mi
Each expression evaluates to the same expression /referer:[\.\d]+/mi
if you need to escape a string in the variable foo:
[codesyntax lang=”rails”]
foo = ‘’ # a string which is variable
reg1 = /referer:#{Regexp.escape(foo)}/mi
Evaluates to /referer:192\.168\.1\.5/mi
#use regular expression
[codesyntax lang=”rails”]
string = “Here is some text referer:”
Regular Expressions and UTF-8 strings
Working with multibyte strings in regular expressions using \uNNNN:
[codesyntax lang=”rails”]
pattern = ‘[\u0000-\u002F]+’
reg = /#{pattern }/
reg = pattern, nil
# or with options
reg = pattern, Regexp::IGNORECASE | Regexp::MULTILINE
Both strings in regular expressions and string you are searching in must be UTF-8 encoded.
Otherwise you may get errors like
‘incompatible encoding regexp match (ASCII-8BIT regexp with UTF-8 string)’.
To ensure this you can use .encode method for a string
[codesyntax lang=”rails”]
pattern = ‘[\u0000-\u002F]+’.encode(‘UTF-8’)
reg = /#{pattern }/
# or
reg = Regexp.pattern pattern.encode(‘UTF-8’), nil
# or with options
reg = pattern.encode(‘UTF-8’), Regexp::IGNORECASE | Regexp::MULTILINE
puts reg.encoding # UTF-8
If you use option ‘n’ then the Regexp object will be in ASCII-8BIT or US-ASCII encoding even the pattern string is in UTF-8. But it still must work to search in UTF-8 strings.
[codesyntax lang=”rails”]
# 1
reg = ‘[a-zA-Z]+’.encode(‘UTF-8’), Regexp::IGNORECASE | Regexp::MULTILINE, ‘n’
puts reg.encoding # US-ASCII
# 2
reg = ‘[\x80-\xFF]+’.encode(‘UTF-8’), Regexp::IGNORECASE | Regexp::MULTILINE, ‘n’
puts reg.encoding # ASCII-8BIT
# 3
reg = ‘[\u0000-\u002F]+’.encode(‘UTF-8’), Regexp::IGNORECASE | Regexp::MULTILINE, ‘n’
puts reg.encoding # US-ASCII
Working with multibyte strings in regular expressions using \xNN:
If you use sequences like \xNN in your regular expressions then you may get the error like “invalid multibyte escape” in Ruby 1.9.x
For example, the following regular expression gives an error.
[codesyntax lang=”rails”]
pattern = ‘[\x00-\x2F]+’
reg1 =, Regexp::IGNORECASE | Regexp::MULTILINE)
# error: invalid multibyte escape
To avoid this problem you should use the following syntax to create a Regexp object:
[codesyntax lang=”rails”]
pattern = ‘[\x00-\x2F]+’
reg1 = pattern, nil, ‘n’
Find more discussions about this issue:
Read more:
– Basic Regexp methods in Ruby and comparsion with PHP functions