String matching with regular expression in PHP
- A sequence of characters describing a special search pattern in the text string.
- Basically used in programming algorithms for matching some loosely defined patterns (not define as specific manner).
- Helps in fetching the required strings based on a pattern definition.
- Perform specific operations on the string as Search a specific string inside another string, Replace one string with another string, and also split a string into multiple chunks.
- Use arithmetic operators (+, -, ^) to create complex expressions.
- By default, regular expressions are case-sensitive.
Advantages and uses of Regular expressions:
Regular expressions are used everywhere in application programming.
- Help invalidation of text strings for programmer’s interest.
- A powerful tool for analyzing, searching a pattern, and modifying the text data.
- Helps in searching specific string pattern and extracting matching results in a flexible manner.
- Helps in describing text files searching for a defined sequence of characters for further analysis or data manipulation.
- By in-built regexes, functions find easy and simple solutions for identifying patterns.
- Saves a lot of development time in search of a specific string pattern.
- Helps in important user information validations like email address, phone numbers, and IP address.
- Helps in highlighting special keywords in a file based on search results or input.
- Helps in identifying specific template tags and replacing those data with the actual data as per the requirement.
- Very useful for the creation of an HTML template system recognizing tags.
- Mostly used for browser detection, spam filtration, checking password strength, and form validations.
- Meta-characters allow us to create more complex patterns.
Operators in Regular Expression
Operator |
Description |
^ |
It
indicates the start of string. |
$ |
It
indicates the end of the string. |
. |
It indicates
any single character. |
() |
It indicates
a group of expressions. |
[] |
It
finds a range of characters, e.g., [abc] means a, b, or
c. |
[^] |
It
finds the characters which are not in range, e.g.,
[^xyz] means NOT x, y, or z. |
- |
It
finds the range between the elements, e.g., [a-z] means
a through z. |
| |
It is a
logical OR operator, which is used between the elements. E.g.,
a|b, which means either a OR b. |
? |
It
indicates zero or one of preceding character or element range. |
* |
It
indicates zero or more of preceding character or element range. |
+ |
It
indicates zero or more of preceding character or element range. |
{n} |
It
denotes at least n times of preceding character range. For example - n{3} |
{n, } |
It
denotes at least n, but it should not be more than m times, e.g., n{2,5}
means 2 to 5 of n. |
{n, m} |
It
indicates at least n, but it should not be more than m times. For example -
n{3,6} means 3 to 6 of n. |
\ |
It
denotes the escape character. |
Special Character |
Description |
\n |
It
indicates a new line. |
\r |
It
indicates a carriage return. |
\t |
It
represents a tab. |
\v |
It
represents a vertical tab. |
\f |
It
represents a form feed. |
\xxx |
It
represents an octal character. |
\xxh |
It
denotes hexadecimal character hh. |
PHP use two sets of regular expression functions:
- POSIX Regular Expression
- PERL Style Regular Expression
1. POSIX(The Portable Operating System Interface ) Regular Expression
The structure of POSIX regular expression is similar to the typical arithmetic expression: several operators/elements are combined together to form more complex expressions.
The simplest regular expression is one that matches a single character inside the string.
Brackets [] used to find the range of characters inside it.
Expression |
Description |
[0-9] |
It
matches any decimal digit 0 to 9. |
[a-z] |
It
matches any lowercase character from a to z. |
[A-Z] |
It
matches any uppercase character from A to Z. |
[a-Z] |
It matches
any character from lowercase a to uppercase Z. |
Quantifiers
A special character can represent the position of bracketed character sequences and single characters.
Every special character has a specific meaning.
The symbols +, *, ?, $, and {int range} flags all follow a character sequence.
Expression |
Description |
p+ |
It
matches any string that contains atleast one p. |
p* |
It
matches any string that contains one or more p's. |
p? |
It
matches any string that has zero or one p's. |
p{N} |
It
matches any string that has a sequence of N p's. |
p{2,3} |
It
matches any string that has a sequence of two or three p's. |
p{2, } |
It
matches any string that contains atleast two p's. |
p$ |
It
matches any string that contains p at the end of it. |
^p |
It
matches any string that has p at the start of it. |
PHP Regexp POSIX Function
PHP provides seven functions to search strings using POSIX-style regular expression -
Function |
Description |
ereg() |
It
searches a string pattern inside another string and returns true if the
pattern matches otherwise return false. |
ereg_replace() |
It
searches a string pattern inside the other string and replaces the matching
text with the replacement string. |
eregi() |
It
searches for a pattern inside the other string and returns the length of
matched string if found otherwise returns false. It is a case
insensitive function. |
eregi_replace() |
This function
works same as ereg_replace() function. The only
difference is that the search for pattern of this function is case
insensitive. |
split() |
The
split() function divide the string into array. |
spliti() |
It is
similar to split() function as it also divides a string into array by regular
expression. |
Sql_regcase() |
It
creates a regular expression for case insensitive match and returns a valid
regular expression that will match string. |
Note: Note that the above functions were deprecated in PHP 5.3.0 and removed in PHP 7.0.0.
PERL Style Regular Expression
Perl-style regular expressions are much similar to POSIX.
The POSIX syntax can be used with Perl-style regular expression function interchangeably. The quantifiers introduced in the POSIX section can also be used in PERL-style regular expression.
Meta-characters
A meta-character is an alphabetical character followed by a backslash that gives a special meaning to the combination.
For example - '\d' meta-character can be used search large money sums: /([\d]+)000/. Here /d will search the string of numerical character.
The list of meta-characters that can be used in PERL Style Regular Expressions –
Character |
Description |
. |
Matches
a single character |
\s |
It
matches a whitespace character like space, newline, tab. |
\S |
Non-whitespace
character |
\d |
It
matches any digit from 0 to 9. |
\D |
Matches
a non-digit character. |
\w |
Matches
for a word character such as - a-z, A-Z, 0-9, _ |
\W |
Matches
a non-word character. |
[aeiou] |
It matches
any single character in the given set. |
[^aeiou] |
It
matches any single character except the given set. |
(foo|baz|bar) |
Matches
any of the alternatives specified. |
Modifiers
There are several modifiers available, which makes the work much easier with a regular expression.
The list of modifiers used in PERL Style Regular Expressions –
Character |
Description |
i |
Makes
case insensitive search |
m |
It
specifies that if a string has a carriage return or newline characters, the $
and ^ operator will match against a newline boundary rather than a string
boundary. |
o |
Evaluates
the expression only once |
s |
It
allows the use of .(dot) to match a newline character |
x |
This
modifier allows us to use whitespace in expression for clarity. |
g |
It globally
searches all matches. |
cg |
It
allows the search to continue even after the global match fails. |
PHP Regexp POSIX Function(Pattern matching in Php, Replacing text, and Splitting a string)
PHP currently provides seven functions to search strings using POSIX-style regular expression -
Function |
Description |
|
preg_match() |
Pattern
matching in Php This
function searches the pattern inside the string and returns true if
the pattern exists otherwise returns false. |
|
preg_match_all() |
This function
matches all the occurrences of pattern in the string. |
|
preg_replace() |
Replacing
text The
preg_replace() function is similar to the ereg_replace() function, except
that the regular expressions can be used in search and replace. |
|
preg_split() |
Splitting
a string This
function exactly works like split() function except the condition is that it
accepts regular expression as an input parameter for pattern. Mainly it
divides the string by a regular expression. |
|
preg_grep() |
The
preg_grep() function finds all the elements of input_array and returns the
array elements matched with regexp (relational expression) pattern. |
|
preg_quote() |
Quote
the regular expression characters. (Takes str and puts a backslash in front of
every character that is part of the regular expression syntax.) |
Example:-
Using preg_match()
<?php
$str = "Visit SJKpgm";
$pattern = "/sjkpgm/i";
echo preg_match($pattern, $str);
?>
// Outputs
1
Using preg_match_all()
<?php
$str = "The rain in SPAIN falls mainly on the plains.";
$pattern = "/ain/i";
echo preg_match_all($pattern, $str);
?>
// Outputs
4
preg_replace()
<?php
$str = "Visit Microsoft!";
$pattern = "/microsoft/i";
echo preg_replace($pattern, "Sjkpgm", $str);
?>
// Outputs
"Visit Sjkpgm!"
<?php
$pattern = "/\s/";
$replacement = "-";
$text = "Earth revolves around\nthe\tSun";
// Replace spaces, newlines and tabs
echo preg_replace($pattern, $replacement, $text);
echo "<br>";
// Replace only spaces
echo str_replace(" ", "-", $text);
?>
Output:-
Earth-revolves-around-the-Sun
Earth-revolves-around the Sun
Preg_split()
<?php
$pattern = "/[\s,]+/";
$text = "My favourite colors are red, green and blue";
$parts = preg_split($pattern, $text);
// Loop through parts array and display substrings
foreach($parts as $part){
echo $part . "<br>";
}
?>
Output:-
My
favourite
colors
are
red
green
and
blue
Position Anchors (preg_grep()
)
In string where to match at the beginning or end of a line, word, or string.
For this use anchors.
Two common anchors are caret (^
) which represent the start of the string, and the dollar ($
) sign which represent the end of the string.
RegExp | What it Does |
^p | Matches the letter p at the beginning of a line. |
p$ | Matches the letter p at the end of a line. |
PHP preg_grep()
function:
<?php
$pattern = "/^J/";
$names = array("Jhon Carter", "Clark Kent", "John Rambo");
$matches = preg_grep($pattern, $names);
// Loop through matches array and display matched names
foreach($matches as $match){
echo $match . "<br>";
}
?>
Output:-
Jhon Carter
John Rambo
preg_quote()
<?php
$keywords = '$40 for a g3/400';
$keywords = preg_quote($keywords, '/');
echo $keywords;
?>
Output:-
\$40 for a g3\/400
0 Comments