Regular Expression to find words in Unicode (PHP, preg_)

November 13, 2011 · Posted in Development 

PHP has a special pattern \w to define alphabetic characters. But it doesn’t find words on other languages but English.

Use the following regular expression to find words on different languages in a string in Unicode (UTF):

$text = "your text here";
 
$res = preg_match_all('/[a-z\x80-\xFF]+/uim', $text, $matches);
 
if ($res) print_r($matches);

Using the modificator ‘u’ is necessary to search in Unicode.

Comments