String matching with regular expression in PHP

String matching with regular expression in PHP

  • A sequence of characters describing a special search pattern in the text string.
  • Basically used in programming algorithms for matching some loosely defined patterns (not define as specific manner).
  • Helps in fetching the required strings based on a pattern definition.
  • Perform specific operations on the string as Search a specific string inside another string, Replace one string with another string, and also split a string into multiple chunks.
  • Use arithmetic operators (+, -, ^) to create complex expressions.
  • By default, regular expressions are case-sensitive.

Advantages and uses of Regular expressions:

Regular expressions are used everywhere in application programming.
  • Help invalidation of text strings for programmer’s interest.
  • A powerful tool for analyzing, searching a pattern, and modifying the text data.
  • Helps in searching specific string pattern and extracting matching results in a flexible manner.
  • Helps in describing text files searching for a defined sequence of characters for further analysis or data manipulation.
  • By in-built regexes, functions find easy and simple solutions for identifying patterns.
  • Saves a lot of development time in search of a specific string pattern.
  • Helps in important user information validations like email address, phone numbers, and IP address.
  • Helps in highlighting special keywords in a file based on search results or input.
  • Helps in identifying specific template tags and replacing those data with the actual data as per the requirement.
  • Very useful for the creation of an HTML template system recognizing tags.
  • Mostly used for browser detection, spam filtration, checking password strength, and form validations.
  • Meta-characters allow us to create more complex patterns.

Operators in Regular Expression

Operator

Description

^

It indicates the start of string.

$

It indicates the end of the string.

.

It indicates any single character.

()

It indicates a group of expressions.

[]

It finds a range of characters, e.g., [abc] means a, b, or c.

[^]

It finds the characters which are not in range, e.g., [^xyz] means NOT x, y, or z.

-

It finds the range between the elements, e.g., [a-z] means a through z.

|

It is a logical OR operator, which is used between the elements. E.g., a|b, which means either a OR b.

?

It indicates zero or one of preceding character or element range.

*

It indicates zero or more of preceding character or element range.

+

It indicates zero or more of preceding character or element range.

{n}

It denotes at least n times of preceding character range. For example - n{3}

{n, }

It denotes at least n, but it should not be more than m times, e.g., n{2,5} means 2 to 5 of n.

{n, m}

It indicates at least n, but it should not be more than m times. For example - n{3,6} means 3 to 6 of n.

\

It denotes the escape character.

Special character class in Regular Expression

Special Character

Description

\n

It indicates a new line.

\r

It indicates a carriage return.

\t

It represents a tab.

\v

It represents a vertical tab.

\f

It represents a form feed.

\xxx

It represents an octal character.

\xxh

It denotes hexadecimal character hh.

PHP use two sets of regular expression functions:

  1. POSIX Regular Expression
  2. PERL Style Regular Expression

1.     POSIX(The Portable Operating System Interface ) Regular Expression

The structure of POSIX regular expression is similar to the typical arithmetic expression: several operators/elements are combined together to form more complex expressions.

The simplest regular expression is one that matches a single character inside the string. 

Brackets [] used to find the range of characters inside it.

Expression

Description

[0-9]

It matches any decimal digit 0 to 9.

[a-z]

It matches any lowercase character from a to z.

[A-Z]

It matches any uppercase character from A to Z.

[a-Z]

It matches any character from lowercase a to uppercase Z.

Quantifiers

A special character can represent the position of bracketed character sequences and single characters.

Every special character has a specific meaning.

The symbols +, *, ?, $, and {int range} flags all follow a character sequence.

Expression

Description

p+

It matches any string that contains atleast one p.

p*

It matches any string that contains one or more p's.

p?

It matches any string that has zero or one p's.

p{N}

It matches any string that has a sequence of N p's.

p{2,3}

It matches any string that has a sequence of two or three p's.

p{2, }

It matches any string that contains atleast two p's.

p$

It matches any string that contains p at the end of it.

^p

It matches any string that has p at the start of it.

PHP Regexp POSIX Function

PHP provides seven functions to search strings using POSIX-style regular expression -

Function

Description

ereg()

It searches a string pattern inside another string and returns true if the pattern matches otherwise return false.

ereg_replace()

It searches a string pattern inside the other string and replaces the matching text with the replacement string.

eregi()

It searches for a pattern inside the other string and returns the length of matched string if found otherwise returns false. It is a case insensitive function.

eregi_replace()

This function works same as ereg_replace() function. The only difference is that the search for pattern of this function is case insensitive.

split()

The split() function divide the string into array.

spliti()

It is similar to split() function as it also divides a string into array by regular expression.

Sql_regcase()

It creates a regular expression for case insensitive match and returns a valid regular expression that will match string.

Note: Note that the above functions were deprecated in PHP 5.3.0 and removed in PHP 7.0.0.

PERL Style Regular Expression

Perl-style regular expressions are much similar to POSIX.

The POSIX syntax can be used with Perl-style regular expression function interchangeably. The quantifiers introduced in the POSIX section can also be used in PERL-style regular expression.

Meta-characters

A meta-character is an alphabetical character followed by a backslash that gives a special meaning to the combination.

For example - '\d' meta-character can be used search large money sums: /([\d]+)000/. Here /d will search the string of numerical character.

The list of meta-characters that can be used in PERL Style Regular Expressions –

Character

Description

.

Matches a single character

\s

It matches a whitespace character like space, newline, tab.

\S

Non-whitespace character

\d

It matches any digit from 0 to 9.

\D

Matches a non-digit character.

\w

Matches for a word character such as - a-z, A-Z, 0-9, _

\W

Matches a non-word character.

[aeiou]

It matches any single character in the given set.

[^aeiou]

It matches any single character except the given set.

(foo|baz|bar)

Matches any of the alternatives specified.

Modifiers

There are several modifiers available, which makes the work much easier with a regular expression. 

The list of modifiers used in PERL Style Regular Expressions –

Character

Description

i

Makes case insensitive search

m

It specifies that if a string has a carriage return or newline characters, the $ and ^ operator will match against a newline boundary rather than a string boundary.

o

Evaluates the expression only once

s

It allows the use of .(dot) to match a newline character

x

This modifier allows us to use whitespace in expression for clarity.

g

It globally searches all matches.

cg

It allows the search to continue even after the global match fails.

PHP Regexp POSIX Function(Pattern matching in Php, Replacing text, and Splitting a string)

PHP currently provides seven functions to search strings using POSIX-style regular expression -

Function

Description

 

preg_match()

Pattern matching in Php

This function searches the pattern inside the string and returns true if the pattern exists otherwise returns false.

preg_match_all()

This function matches all the occurrences of pattern in the string.

preg_replace()

Replacing text

The preg_replace() function is similar to the ereg_replace() function, except that the regular expressions can be used in search and replace.

preg_split()

Splitting a string

This function exactly works like split() function except the condition is that it accepts regular expression as an input parameter for pattern. Mainly it divides the string by a regular expression.

preg_grep()

The preg_grep() function finds all the elements of input_array and returns the array elements matched with regexp (relational expression) pattern.

preg_quote()

Quote the regular expression characters. (Takes str and puts a backslash in front of every character that is part of the regular expression syntax.)

Example:-

Using preg_match()

<?php
$str = "Visit SJKpgm";
$pattern = "/sjkpgm/i";
echo preg_match($pattern, $str); 
?>
// Outputs 
1

Using preg_match_all()
<?php
$str = "The rain in SPAIN falls mainly on the plains.";
$pattern = "/ain/i";
echo preg_match_all($pattern, $str); 
?>
// Outputs 
4

preg_replace()

<?php
$str = "Visit Microsoft!";
$pattern = "/microsoft/i";
echo preg_replace($pattern, "Sjkpgm", $str); 
?>
// Outputs 
"Visit Sjkpgm!"

<?php
$pattern = "/\s/";
$replacement = "-";
$text = "Earth revolves around\nthe\tSun";
// Replace spaces, newlines and tabs
echo preg_replace($pattern, $replacement, $text);
echo "<br>";
// Replace only spaces
echo str_replace(" ", "-", $text);
?>
Output:-
Earth-revolves-around-the-Sun
Earth-revolves-around the Sun

Preg_split()

<?php
$pattern = "/[\s,]+/";
$text = "My favourite colors are red, green and blue";
$parts = preg_split($pattern, $text);
// Loop through parts array and display substrings
foreach($parts as $part){
    echo $part . "<br>";
}
?>
Output:-
My
favourite
colors
are
red
green
and
blue

Position Anchors (preg_grep() )

In string where to match at the beginning or end of a line, word, or string.

For this use anchors.

Two common anchors are caret (^) which represent the start of the string, and the dollar ($) sign which represent the end of the string.


RegExpWhat it Does
^pMatches the letter p at the beginning of a line.
p$Matches the letter p at the end of a line.

PHP preg_grep() function:

<?php
$pattern = "/^J/";
$names = array("Jhon Carter", "Clark Kent", "John Rambo");
$matches = preg_grep($pattern, $names);
// Loop through matches array and display matched names
foreach($matches as $match){
    echo $match . "<br>";
}
?>
Output:-
Jhon Carter
John Rambo

preg_quote()

<?php

   $keywords = '$40 for a g3/400';

   $keywords = preg_quote($keywords, '/');

   echo $keywords;

?>

Output:-

\$40 for a g3\/400


Post a Comment

0 Comments