International characters in PHP

I must have done this one hundred times, yet I always forget the exact syntax needed to convert UTF8 characters into Unicode or HTML entities.

Not a biggie for US developers, but here in Europe and I’d bet Asia as well, you end up with pages filled with characters that look this way:  Ã¨.

This happens because the web form, or maybe the RSS feed or the sql database from where the text is coming from, used Utf-8 coding.

Quick solution:

$string_to_clean='è';
utf8_decode  ($string_to_clean);

This will properly print ‘è’. We can even go further and write:

htmlentities (utf8_decode  ($string_to_clean));

That will return è instead of  ’è’, and is also a wise security measure to harden our forms from HTML injection. Just remember to do this before you add your own HTML tags.

That’s to say -assuming we are building a list from -say- an RSS of Tweets:

$text.='<li>'.htmlentities (utf8_decode  ($string_to_clean)).'</li>';

If we were to clean the resulting $text variable, we would loose all the  ’<li>’ built in the loop.

Of course this will slow down execution a bit, because both functions are called evey time.

Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>