regular expressions

Retrieve email using regex

This horrendous regular expression will parse a string and return a valid email address from it.

$email = "<'Freddy'>";
preg_match('/[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/', $email, $match);
echo $match[0];

This will return:

Basically, if you pass an array as the third parameter of the preg_match method, it will be filled with the preg_match results, and the first item of the array will be the matching string. If you use capturing groups, these will also be filled. Read more about preg_match at the PHP site here.

I am told that this expression will match 99.9% of valid email addresses in the wild.

Regex – “The” searching

Say you have a list of movie titles, and you want to either sort them, or search through them, and some of them have “The ” at the start, for example:

  • The Simpsons
  • Simpson Street

When doing a MySQL search:

SELECT * FROM movies WHERE title LIKE "The Simp%";

Would only return the first row. But if you are working in a company where there is no standard set, the movie title could be formatted as “Simpsons, The” – and then, it won’t be found.

To solve this, you could replace the “The ” letters with blank, and then sort out the field contents during the query:

$str_query = preg_replace("/(title like "(the )(.*)%")/i",
    "REPLACE(LOWER(title), "the ", "") LIKE ("$3%")",    

This will change :

"(title LIKE "The Simpsons")"


"(title LIKE "Simpsons")"

But, the (the) in line 2 tells PHP to only replace it if starts with “The ” (case insenstive).

However, what if you want to search for “the”  (not sure why you would…)

You need to do a negative lookahead, to tell the expression to only carry on, if the search phrase is not exactly “the”

if (preg_match("/(title like "(?!the)(.*)%")/i", $str_query)) {

The (?!the) is the readahead.

(.*) matches any string but it is greedy and you have to be carfeul that it doesn’t just accept everything to the end of $str_query. (but its okay in our case, as we are looking for % (the LIKE wildcard))

After all this, we can run:

SELECT * FROM movies WHERE $str_query;

But what about sorting? All the titles beginning with “The” will appear in the T section. Whereas really, we want the Simpsons to appear in the S section.

Add an easy ORDER BY clause here:

SELECT * FROM movies WHERE $str_query ORDER BY (REPLACE(title, "the ", "") ASC;


REGEX – Remove Letters from string

Removing letters from a string using Regular Expressions.

Very simple. but brain bending – All I wanted to do was remove a prefix from a string.

The prefix was always letters, and I only wanted the numerical suffix returned, so though, preg_replace was my best bet.

echo preg_replace("/[a-zA-Z]*/", '', '12345MystrING67890');

This returns :