Regular Expression to find words in Unicode (PHP, preg_)

PHP has a special pattern \w to define alphabetic characters. But it doesn’t find words on other languages but English.

Use the following regular expression to find words on different languages in a string in Unicode (UTF):

[codesyntax lang=”php”]

$text = "your text here";

$res = preg_match_all('/[a-z\x80-\xFF]+/uim', $text, $matches);

if ($res) print_r($matches);

[/codesyntax]

Using the modificator ‘u’ is necessary to search in Unicode.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>