Cuelogic Blog Icon
Cuelogic Career Icon
Home > Blog > All > Regular expression not to replace unicode chars

Regular expression not to replace unicode chars

Regular expression is kind of not so much likely subject for developers. If any one want any regular expression they just go on google and find some ready made code always 🙂 Even I used to hate this area.

So here I am explaining how to replaces unicode chars. We always want our slug should be clean one so we use this regular expression “/[a-zA-z0-9]/”. So that it will remove all other non alphanumeric chars.

But what to do in case of this word ” Уточнение “? Any idea how not to remove those chars using our regular expression?

Here we go use this one “/[^\p{Ll}\p{Lu}0-9]/u” and it will not replace those other language chars at all.

Also do remember to have your database column collation to be “utf-8” else it will not saved properly if collation is “latin”

Below listed few useful links from where I and my client found this solution.

regex – php preg_replace: unicode modifier for ascii strings – Stack Overflow

I need to handle strings in my php script using regular expressions. But there is a problem – different strings have different encodings. If string contains just ascii symbols, function returns ‘ASCII’. But if string contains russian symbols, for example, returns ‘UTF-8’. It’s not good idea to check encoding of each string manually, I suppose. So the question is – is it correct to use preg_replace (with unicode modifier) for ascii strings? Is it right to write such code for both ascii and utf-8 …

ruby – How do you specify a regex character range that will work in European languages other than English? – Stack Overflow

I’m working with Ruby’s regex engine. I need to write a regex that does this