Problem: You have a string encoded in UTF-8 and you need to convert it to two byte UTF encoded string. For example, you might want to display convert the string to CP1251 (Windows-1251) encoding using standard encoding tables.
Solution:
Here is a simple procedure in PHP that can convert UTF-8 string to corresponding full UTF representation. The function was constructed to solve specific problem and is not complete. It supports only 1, 2 and 3 octet (byte) UTF-8 entries. It also doesn’t support BOM.
/** * Decode UTF8 without BOM string to UTF string. * * @param $string string Original UTF-8 encoded string we need to decode. * @param $strip_zeroes Remove trailing zeroes from converted UTF entry points. Default is false. * @return string UTF representation of the original UTF8 encoded string. * * @todo Add support for four byte characters. * @author Ivan Georgiev */ function utf8_decode($string, $strip_zeroes = false) { $pos = 0; $len = strlen($string); $result = ''; while ($pos < $len) { $code1 = ord($string[$pos++]); if ($code1 < 0x80) { $result .= chr($code1); } elseif ($code1 < 0xE0) { // Two byte $code1 = 0x1F & $code1; $code2 = 0x3F & ord($string[$pos++]); $res_code1 = $code1 >> 2; if ($res_code1 > 0 || $strip_zeroes) { $result .= chr($res_code1); } $result .= chr( ($code1 << 6) | $code2); } elseif ($code1 < 0xF0) { // Three byte $code1 = $code1; // No need to mask $code2 = 0x3F & ord($string[$pos++]); $code3 = 0x3F & ord($string[$pos++]); $res_code1 = chr( ($code1 << 4) | ($code2 >> 2)); if ($res_code1 > 0 || $strip_zeroes) { $result .= chr($res_code1); } $result .= chr( ($code2 << 6) | $code3); } } return $result; }
See also: Generic File Input Stream in PHP with UTF-8 Support
Back to: PHP Tips and Recipes

Add A Comment
You must be logged in to post a comment.