The most useful features of Perl among many is its powerful string manipulation functions.
At the heart of this comes the regular expression (RE) which is shared among many other utilities
in UNIX.
The Regular expressions
A regular expression is always contained inbetween slashes, and the matching occurs with
"=~" operator. The following expression is true only if the string appears in the variable $sntnce.
$sntnce =~ /the/
The RE is case sensitive, so if
$sntnce = "The quick brown fox";
then the above match will be false. The operator !~ is used for
spotting a non-match. In the above example
$sntnce !~ /the/
is true because the string the does not appear in $sentence.
We can make use of a conditional for string matching as follows
if ($sntnce =~ /under/)
{
print "We're talking about VYOM\n";
}
which would print out a message if we had either of the following
$sntnce = "Up and under";
$sntnce = "Best winkles in Sunderland";
But it is much easier if we assign the sntnce to the special
variable $_ which is a scalar entity. If we follow this then we can
avoid making use of match and non-match operators and the above code can simply be
written as.
if (/under/)
{
print "We're talking about VYOM\n";
}
The variable $_ is default for many Perl operations and tends to
be used very heavily.
More about REs
There are many special characters in an RE, and it is these which
give them both power and also make them look very complicated. It is
better to build your use of REs slowly; their creation can something be
of an art form.
Here are some of the RE special characters along with their meaning
.
# Any single character except a newline
^
# The beginning of the line or string
$
# The end of the line or string
*
# Zero or more of the last character
+
# One or more of the last character
?
# Zero or one of the last character
Here are some of the example matches. Remember that it should be enclosed within
/.../ slashes to be used.
t.e
# t followed by anthing followed by e
# This will match the
# tre
# tle
# but not te
# tale
^f
# f at the beginning of a line
^ftp
# ftp at the beginning of a line
e$
# e at the end of a line
tle$
# tle at the end of a line
und*
# un followed by zero or more d characters
# This will match un
# und
# undd
# unddd (etc)
*
# Any string without a newline. This is because
# the . matches anything except a newline and
# the * means zero or more of these.
^$
# A line with nothing in it.
To match any one of the characters inside them Square brackets are used. Inside the square
brackets a - represents "between" and at the beginning a^ means "not":
[qjk]
# Either q or j or k
[^qjk]
# Neither q nor j nor k
[a-z]
# Anything from a to z inclusive
[^a-z]
# No lower case letters
[a-zA-Z]
# Any letterv
[a-z]+
# Any non-zero sequence of lower case letters
# spaces: "/0" or "/ 0" or "/ 0" etc.
\ / \ s*0
# A division by zero with possibly some
# whitespace.
"or" is represented by the vertical bar "|" and parentheses (...) are used
to group things together:
jelly|cream
# Either jelly or cream
(eg|le)gs
# Either eggs or legs
(da)+
# Either da or dada or dadada or...
Here are some more special characters:
\n
# A newline
\n
# A tab
\w
# Any alphanumeric (word) character.
# The same as [a-zA-Z0-9_]
\W
# Any non-word character.
# The same as [^a-zA-Z0-9_]
\d
# Any digit. The same as [0-9]
\D
# Any non-digit. The same as [^0-9]
\s
# Any whitespace character: space
# tab, newline, etc
\S
# Any non-whitespace character
\b
# A word boundary, outside [] only
\B
# A word boundary, outside [] only
Clearly characters such as $, |, [, ), \, / and so on are all peculiar cases
in regular expressions. If you want to match any of these then you
should preceed it by a backslash as shown below.
\ |
# Vertical bar/td>
\ [
# An open square bracket
\ )
# A closing parenthesis.
\*
# An asterisk
\^
# A carat symbol
\ /
# A slash
\\
# A backslash
Some RE examples
It is probably best to build up your use of regular expressions slowly, as was mentioned
earlier,. Given below are some examples.
[01]
# Either "0" or "1"/td>
\ /0
# A division by zero: "/0"
\ / 0
# A division by zero with a space: "/ 0"
\ / \ s0
# A division by zero with a whitespace:
# "/ 0" where the space may be a tab etc.
\ / *0
# A division by zero with possibly some
# spaces: "/0" or "/ 0" or "/ 0" etc.
\ / \ s*0
# A division by zero with possibly some
# whitespace.
\ / \ s*0 \.0*
# As the previous one, but with decimal
# point and maybe some 0s after it. Accepts
# "/0." and "/0.0" and "/0.00" etc and
# "/ 0." and "/ 0.0" and "/ 0.00" etc.
Share And Enjoy:These icons link to social bookmarking sites where readers can share and discover new web pages.