Working with regular expressions (preg_) and UTF-8 strings in PHP

Regular expression patterns \w, \d, \s will not work as expected for non-latin letters in a UTF-8 string when you use preg_ functions (like preg_match, preg_split, preg_replace).

First of all you must use modifier /u to work with UTF-8 strings correctly.

One of the best solutions to common tasks is to use the pattern escapes \P, \p, and \X, which refer to Unicode character properties.

Continue reading