Si usted tiene una cadena UTF-8, donde cada byte es correcta ('O' -> [195, 0], [150, 0]), se puede utilizar los siguientes:
public static string Utf8ToUtf16(string utf8String)
{
/***************************************************************
* Every .NET string will store text with the UTF-16 encoding, *
* known as Encoding.Unicode. Other encodings may exist as *
* Byte-Array or incorrectly stored with the UTF-16 encoding. *
* *
* UTF-8 = 1 bytes per char *
* ["100" for the ansi 'd'] *
* ["206" and "186" for the russian '?'] *
* *
* UTF-16 = 2 bytes per char *
* ["100, 0" for the ansi 'd'] *
* ["186, 3" for the russian '?'] *
* *
* UTF-8 inside UTF-16 *
* ["100, 0" for the ansi 'd'] *
* ["206, 0" and "186, 0" for the russian '?'] *
* *
* First we need to get the UTF-8 Byte-Array and remove all *
* 0 byte (binary 0) while doing so. *
* *
* Binary 0 means end of string on UTF-8 encoding while on *
* UTF-16 one binary 0 does not end the string. Only if there *
* are 2 binary 0, than the UTF-16 encoding will end the *
* string. Because of .NET we don't have to handle this. *
* *
* After removing binary 0 and receiving the Byte-Array, we *
* can use the UTF-8 encoding to string method now to get a *
* UTF-16 string. *
* *
***************************************************************/
// Get UTF-8 bytes and remove binary 0 bytes (filler)
List<byte> utf8Bytes = new List<byte>(utf8String.Length);
foreach (byte utf8Byte in utf8String)
{
// Remove binary 0 bytes (filler)
if (utf8Byte > 0) {
utf8Bytes.Add(utf8Byte);
}
}
// Convert UTF-8 bytes to UTF-16 string
return Encoding.UTF8.GetString(utf8Bytes.ToArray());
}
En mi caso, el resultado DLL es una cadena UTF-8 también, pero desafortunadamente la cadena UTF-8 se interpreta con codificación UTF-16 ('O' -> [195, 0 ], [19, 32]). Así que la ANSI '-' que es 150 se convirtió en el UTF-16 '-', que es 8211. Si usted tiene este caso, puede utilizar el siguiente lugar:
public static string Utf8ToUtf16(string utf8String)
{
// Get UTF-8 bytes by reading each byte with ANSI encoding
byte[] utf8Bytes = Encoding.Default.GetBytes(utf8String);
// Convert UTF-8 bytes to UTF-16 bytes
byte[] utf16Bytes = Encoding.Convert(Encoding.UTF8, Encoding.Unicode, utf8Bytes);
// Return UTF-16 bytes as UTF-16 string
return Encoding.Unicode.GetString(utf16Bytes);
}
O el nativo-Método :
[DllImport("kernel32.dll")]
private static extern Int32 MultiByteToWideChar(UInt32 CodePage, UInt32 dwFlags, [MarshalAs(UnmanagedType.LPStr)] String lpMultiByteStr, Int32 cbMultiByte, [Out, MarshalAs(UnmanagedType.LPWStr)] StringBuilder lpWideCharStr, Int32 cchWideChar);
public static string Utf8ToUtf16(string utf8String)
{
Int32 iNewDataLen = MultiByteToWideChar(Convert.ToUInt32(Encoding.UTF8.CodePage), 0, utf8String, -1, null, 0);
if (iNewDataLen > 1)
{
StringBuilder utf16String = new StringBuilder(iNewDataLen);
MultiByteToWideChar(Convert.ToUInt32(Encoding.UTF8.CodePage), 0, utf8String, -1, utf16String, utf16String.Capacity);
return utf16String.ToString();
}
else
{
return String.Empty;
}
}
Si lo necesita al revés, consulte Utf16ToUtf8. Espero que pueda ser de ayuda.
Eso no es una cadena UTF8. Esa es una cadena corrupta que ha sido mal convertida desde bytes utilizando la codificación incorrecta. – spender
UTF-8 * es * Unicode. –
La cadena fuente no es válida UTF-8. – alexn