file_get_contents() Descompone caracteres UTF-8

Estoy cargando un HTML desde un servidor externo. El marcado HTML tiene codificación UTF-8 y contiene caracteres tales como L, S, C, T, Z etc. Cuando cargo el HTML con file_get_contents() como este:file_get_contents() Descompone caracteres UTF-8

$html = file_get_contents('http://example.com/foreign.html');

se mete el UTF-8 caracteres y cargas Å, ¾, ¤ y tonterías similares en lugar de los caracteres UTF-8 apropiados.

¿Cómo puedo solucionar esto?

ACTUALIZACIÓN:

He intentado tanto salvar el HTML en un archivo y la salida con codificación UTF-8. Ambos no funcionan, por lo que significa que file_get_contents() ya está devolviendo HTML roto.

Update2:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> 
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="sk" lang="sk"> 
<head> 

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 
<meta http-equiv="Content-Style-Type" content="text/css" /> 
<meta http-equiv="Content-Language" content="sk" /> 
<title>Test</title> 

</head> 
<body> 


<?php 

$html = file_get_contents('http://example.com'); 
echo htmlentities($html); 

?> 

</body> 
</html>

Fuente

2010-02-10 Richard Knop

¿Los emite con UTF-8? –

¿Dónde está viendo el HTML cargado? –

No lo estoy produciendo. Lo guardo en un archivo y luego lo leo. Pero es irrelevante porque traté de producirlo con UTF-8 y todavía está en mal estado. Re –

bien. Descubrí que file_get_contents() no está causando este problema. Hay una razón diferente de la que hablo en otra pregunta. Tonto de mí.

Ver esta pregunta: Why Does DOM Change Encoding?

Fuente

2010-02-10 13:05:31

file_get_contents() está causando el problema. Tenía un archivo JSON que estaba abriendo con file_get_contents() pero al hacer una print_r() después de cargar el JSON, los caracteres Unicode estaban allí, pero no en el JSON. La ejecución de mb_convert_encoding() en el archivo_get_contents() solucionó el problema. – Reado

'$ string = mb_convert_encoding ($ string, 'HTML-ENTITIES'," UTF-8 ");' lo resolvió para mí. – WEBjuju

function file_get_contents_utf8($fn) { 
    $content = file_get_contents($fn); 
     return mb_convert_encoding($content, 'UTF-8', 
      mb_detect_encoding($content, 'UTF-8, ISO-8859-1', true)); 
}

También podría probar suerte con http://php.net/manual/en/function.mb-internal-encoding.php

Fuente

2010-02-10 12:26:46 Gordon

Esta solución es genial, ¡gracias! – brentonstrine

Esto debe marcarse como la mejor respuesta. Gracias Gordon. – helpse

Creo que simplemente tiene una doble conversión del tipo de carácter no: D

Puede ser, porque se abre un documento HTML dentro de un documento HTML. Así que hay algo que se parece esto al final

<!DOCTYPE html> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 
<title></title> 
</head> 
<body> 
<!DOCTYPE html> 
<head> 
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> 
<title>Test</title>.......

El uso de mb_detect_encoding por lo tanto, puede usted conducir a otros problemas.

Fuente

2012-11-10 18:59:00

tuve un problema similar con lengua polaca

Traté:

$fileEndEnd = mb_convert_encoding($fileEndEnd, 'UTF-8', mb_detect_encoding($fileEndEnd, 'UTF-8', true));

Traté:

$fileEndEnd = utf8_encode ($fileEndEnd);

Traté:

$fileEndEnd = iconv("UTF-8", "UTF-8", $fileEndEnd);

Y entonces -

$fileEndEnd = mb_convert_encoding($fileEndEnd, 'HTML-ENTITIES', "UTF-8");

Esto último funcionó perfectamente !!!!!!

Fuente

2013-03-03 08:20:40 ugniesdebesys

+1 para 'HTML-ENTITIES' – Raptor

Impresionante, esto lo resolvió para mí. –

Usted hizo mi día. – vikingmaster

Pruebe esto también

$url = 'http://www.domain.com/'; 
    $html = file_get_contents($url); 

    //Change encoding to UTF-8 from ISO-8859-1 
    $html = iconv('UTF-8', 'ISO-8859-1//TRANSLIT', $html);

Fuente

2014-11-19 13:55:28 Mohamm6d

en turco, mb_convert_encoding o cualquier otro juego de caracteres de conversión no funcionó.

Y también urlencode no funcionó debido a espacio char convertido a + char. Debe ser% 20 para la codificación porcentual.

¡Funcionó!

$url = rawurlencode($url); 
    $url = str_replace("%3A", ":", $url); 
    $url = str_replace("%2F", "/", $url); 

    $data = file_get_contents($url);

Fuente

2016-10-26 08:24:31

Estoy trabajando con 35000 líneas de datos.

$f=fopen("veri1.txt","r"); 
$i=0; 
while(!feof($f)){ 
    $i++; 
    $line=mb_convert_encoding(fgets($f), 'HTML-ENTITIES', "UTF-8"); 
    echo $line; 
}

Este código convierte mis extraños caracteres en normales.

Fuente

2017-11-15 10:49:54 matasoy

file_get_contents() Descompone caracteres UTF-8

Respuesta

Cuestiones relacionadas