búsqueda de Python <title>

Quiero recuperar el título de una página web que abro usando urllib2. Cuál es la mejor manera de hacer esto, analizar el html y encontrar lo que necesito (por ahora solo la etiqueta, pero podría necesitar más en el futuro).búsqueda de Python <title>

¿Hay una buena lib de análisis para este propósito?

Fuente

2009-11-02 xintron

Sí, recomendaría BeautifulSoup

Si usted está recibiendo el título es simplemente:

soup = BeautifulSoup(html) 
myTitle = soup.html.head.title

myTitle = soup('title')

Tomado de the documentation

Es muy robusto y analizará el html no importa lo sucio que es.

Fuente

2009-11-02 09:55:11 RobbR

Use Beautiful Soup.

html = urllib2.urlopen("...").read() 
from BeautifulSoup import BeautifulSoup 
soup = BeautifulSoup(html) 
print soup.title.string

Fuente

2009-11-02 09:54:09 orip

Trate Beautiful Soup:

url = 'http://www.example.com' 
response = urllib2.urlopen(url) 
html = response.read() 

soup = BeautifulSoup(html) 
title = soup.html.head.title 
print title.contents

Fuente

2009-11-02 09:55:06

¿Por qué están importando una biblioteca adicional para una tarea? No hay expresiones regulares? ¿no fue la solicitud de urllib not bs4 o mech que son terceros? para hacer con las bibliotecas estándar analizar el html y hacer coincidir la cadena y luego dividir el '>''<' con re o whateves.

N=(len(html)) 
for a in html(N): 
    if '<title>' in a: 
     Title=(str(a))

eso pitón 2 creo, puede despojarlo

Fuente

2014-12-01 13:58:17 foofum

amor la respuesta. Esa fue mi pregunta, ¿por qué agregaría una dependencia completamente diferente para UNA invocación? Gracias por tu sabiduría :) – raTM

Respuesta

Cuestiones relacionadas