No es capaz de analizar HTML usando lxml Xpath analizador

Estoy tratando de analizar revisión de esta página: http://www.amazon.co.uk/product-reviews/B00143ZBHY No es capaz de analizar HTML usando lxml Xpath analizador

Utilizando el enfoque siguiente:

Código

html # a variable which contains exact html as given at the above page. 
from lxml import etree 
tree = etree.HTML(html) 
r = tree.xpath(".//*[@id='productReviews']/tbody/tr/td[1]/div[9]/text()[4]") 
print len(r) 
print r[0].tag

salida

0 
Traceback (most recent call last): 
    File "c.py", line 37, in <module> 
    print r[0].tag 
IndexError: list index out of range

p, s ,: Mientras uso el mismo xpath en el complemento xpath checker de firefox, puedo hacerlo todo fácilmente. Pero no hay resultados aquí, ¡por favor ayuda!

Fuente

2012-07-12 codersofthedark

no saben qué cromo mostró tbody en XPath :( – codersofthedark

Se genera automáticamente – fedosov

Intente eliminar /tbody formulario XPath - no hay <tbody> en .

import urllib2 
html = urllib2.urlopen("http://www.amazon.co.uk/product-reviews/B00143ZBHY").read() 
from lxml import etree 
tree = etree.HTML(html) 
r = tree.xpath(".//*[@id='productReviews']/tr/td[1]/div[9]/text()[4]") 
print r[0]

de salida:.

bought this as replacement for the original cover which came with my greenhouse and which ripped in the wind. so far this seems a good replacement although for some reason it seems slightly too small for my greenhouse so that i cant zip both sides of the front at the same time. seems sturdier and thicker than the cover i had before so hoping it lasts a bit longer!

Fuente

2012-07-12 19:14:24 fedosov

lol error tonto, sus trabajan ahora, gracias :) – codersofthedark

puedo aceptar la respuesta sólo después de 15 a partir de la publicación de la pregunta, espere, lo haría en 3 minutos – codersofthedark

@dragosrsupercool No es un error tonto, lea aquí: http://stackoverflow.com/a/5586627/1167879 –

No es capaz de analizar HTML usando lxml Xpath analizador

Respuesta

Cuestiones relacionadas