2010-04-16 5 views

Respuesta

11

Según el user contributed notes en el manual, html_entity_decode() parece que solo convierte unas 100 entidades en lugar de las más de 200 que existen en la especificación.

Hay un script llamado html_entity_decode_full que parece proporcionar decodificación para todas las entidades, pero el sitio original del autor está inactivo. Here es un enlace a un repositorio de Drupal que contiene la función. Esto convertirá todas las entidades:

/* Test call */ 
echo decode_entities_full("– THIS IS A DASH", ENT_COMPAT, "utf-8"); 

/** 
* Helper function for drupal_html_to_text(). 
* 
* Calls helper function for HTML 4 entity decoding. 
* Per: http://www.lazycat.org/software/html_entity_decode_full.phps 
*/ 
function decode_entities_full($string, $quotes = ENT_COMPAT, $charset = 'ISO-8859-1') { 
    return html_entity_decode(preg_replace_callback('/&([a-zA-Z][a-zA-Z0-9]+);/', 'convert_entity', $string), $quotes, $charset); 
} 

/** 
* Helper function for decode_entities_full(). 
* 
* This contains the full HTML 4 Recommendation listing of entities, so the default to discard 
* entities not in the table is generally good. Pass false to the second argument to return 
* the faulty entity unmodified, if you're ill or something. 
* Per: http://www.lazycat.org/software/html_entity_decode_full.phps 
*/ 
function convert_entity($matches, $destroy = true) { 
    static $table = array('quot' => '"','amp' => '&','lt' => '<','gt' => '>','OElig' => 'Œ','oelig' => 'œ','Scaron' => 'Š','scaron' => 'š','Yuml' => 'Ÿ','circ' => 'ˆ','tilde' => '˜','ensp' => ' ','emsp' => ' ','thinsp' => ' ','zwnj' => '‌','zwj' => '‍','lrm' => '‎','rlm' => '‏','ndash' => '–','mdash' => '—','lsquo' => '‘','rsquo' => '’','sbquo' => '‚','ldquo' => '“','rdquo' => '”','bdquo' => '„','dagger' => '†','Dagger' => '‡','permil' => '‰','lsaquo' => '‹','rsaquo' => '›','euro' => '€','fnof' => 'ƒ','Alpha' => 'Α','Beta' => 'Β','Gamma' => 'Γ','Delta' => 'Δ','Epsilon' => 'Ε','Zeta' => 'Ζ','Eta' => 'Η','Theta' => 'Θ','Iota' => 'Ι','Kappa' => 'Κ','Lambda' => 'Λ','Mu' => 'Μ','Nu' => 'Ν','Xi' => 'Ξ','Omicron' => 'Ο','Pi' => 'Π','Rho' => 'Ρ','Sigma' => 'Σ','Tau' => 'Τ','Upsilon' => 'Υ','Phi' => 'Φ','Chi' => 'Χ','Psi' => 'Ψ','Omega' => 'Ω','alpha' => 'α','beta' => 'β','gamma' => 'γ','delta' => 'δ','epsilon' => 'ε','zeta' => 'ζ','eta' => 'η','theta' => 'θ','iota' => 'ι','kappa' => 'κ','lambda' => 'λ','mu' => 'μ','nu' => 'ν','xi' => 'ξ','omicron' => 'ο','pi' => 'π','rho' => 'ρ','sigmaf' => 'ς','sigma' => 'σ','tau' => 'τ','upsilon' => 'υ','phi' => 'φ','chi' => 'χ','psi' => 'ψ','omega' => 'ω','thetasym' => 'ϑ','upsih' => 'ϒ','piv' => 'ϖ','bull' => '•','hellip' => '…','prime' => '′','Prime' => '″','oline' => '‾','frasl' => '⁄','weierp' => '℘','image' => 'ℑ','real' => 'ℜ','trade' => '™','alefsym' => 'ℵ','larr' => '←','uarr' => '↑','rarr' => '→','darr' => '↓','harr' => '↔','crarr' => '↵','lArr' => '⇐','uArr' => '⇑','rArr' => '⇒','dArr' => '⇓','hArr' => '⇔','forall' => '∀','part' => '∂','exist' => '∃','empty' => '∅','nabla' => '∇','isin' => '∈','notin' => '∉','ni' => '∋','prod' => '∏','sum' => '∑','minus' => '−','lowast' => '∗','radic' => '√','prop' => '∝','infin' => '∞','ang' => '∠','and' => '∧','or' => '∨','cap' => '∩','cup' => '∪','int' => '∫','there4' => '∴','sim' => '∼','cong' => '≅','asymp' => '≈','ne' => '≠','equiv' => '≡','le' => '≤','ge' => '≥','sub' => '⊂','sup' => '⊃','nsub' => '⊄','sube' => '⊆','supe' => '⊇','oplus' => '⊕','otimes' => '⊗','perp' => '⊥','sdot' => '⋅','lceil' => '⌈','rceil' => '⌉','lfloor' => '⌊','rfloor' => '⌋','lang' => '〈','rang' => '〉','loz' => '◊','spades' => '♠','clubs' => '♣','hearts' => '♥','diams' => '♦','nbsp' => ' ','iexcl' => '¡','cent' => '¢','pound' => '£','curren' => '¤','yen' => '¥','brvbar' => '¦','sect' => '§','uml' => '¨','copy' => '©','ordf' => 'ª','laquo' => '«','not' => '¬','shy' => '­','reg' => '®','macr' => '¯','deg' => '°','plusmn' => '±','sup2' => '²','sup3' => '³','acute' => '´','micro' => 'µ','para' => '¶','middot' => '·','cedil' => '¸','sup1' => '¹','ordm' => 'º','raquo' => '»','frac14' => '¼','frac12' => '½','frac34' => '¾','iquest' => '¿','Agrave' => 'À','Aacute' => 'Á','Acirc' => 'Â','Atilde' => 'Ã','Auml' => 'Ä','Aring' => 'Å','AElig' => 'Æ','Ccedil' => 'Ç','Egrave' => 'È','Eacute' => 'É','Ecirc' => 'Ê','Euml' => 'Ë','Igrave' => 'Ì','Iacute' => 'Í','Icirc' => 'Î','Iuml' => 'Ï','ETH' => 'Ð','Ntilde' => 'Ñ','Ograve' => 'Ò','Oacute' => 'Ó','Ocirc' => 'Ô','Otilde' => 'Õ','Ouml' => 'Ö','times' => '×','Oslash' => 'Ø','Ugrave' => 'Ù','Uacute' => 'Ú','Ucirc' => 'Û','Uuml' => 'Ü','Yacute' => 'Ý','THORN' => 'Þ','szlig' => 'ß','agrave' => 'à','aacute' => 'á','acirc' => 'â','atilde' => 'ã','auml' => 'ä','aring' => 'å','aelig' => 'æ','ccedil' => 'ç','egrave' => 'è','eacute' => 'é','ecirc' => 'ê','euml' => 'ë','igrave' => 'ì','iacute' => 'í','icirc' => 'î','iuml' => 'ï','eth' => 'ð','ntilde' => 'ñ','ograve' => 'ò','oacute' => 'ó','ocirc' => 'ô','otilde' => 'õ','ouml' => 'ö','divide' => '÷','oslash' => 'ø','ugrave' => 'ù','uacute' => 'ú','ucirc' => 'û','uuml' => 'ü','yacute' => 'ý','thorn' => 'þ','yuml' => 'ÿ' 
         ); 
    if (isset($table[$matches[1]])) return $table[$matches[1]]; 
    // else 
    return $destroy ? '' : $matches[0]; 
} 
+0

eso es genial, muchas gracias! – martin

9

O simplemente puede utilizar:

echo html_entity_decode('&', ENT_NOQUOTES, 'UTF-8'); 
4

escribí originalmente the code that's pasted into Pekka's answer - es casi completamente innecesario salvo en casos específicos (sobre todo para la eliminación de entidades inexistentes, lo que html_entity_decode() no toca, o para convertir a entidades numéricas si está haciendo XML que no está codificado en UTF-8).

TinyMCE tiene una opción para devolver la salida en bruto (no codificada en HTML), por lo que ese debería ser su primer punto de llamada. Mira esto entry in the TinyMCE manual para más información sobre eso.

+0

sí, 'entity_encoding' es lo que estaba buscando gracias :). – martin

Cuestiones relacionadas