Python urllib descargar contenido de un directorio en línea

Estoy tratando de crear un programa que abra un directorio, luego use expresiones regulares para obtener los nombres de los puntos de poder y luego cree archivos localmente y copie su contenido. Cuando ejecuto esto, parece que funciona, sin embargo, cuando intento abrir los archivos, siguen diciendo que la versión es incorrecta.Python urllib descargar contenido de un directorio en línea

from urllib.request import urlopen 
import re 

urlpath = urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/') 
string = urlpath.read().decode('utf-8') 

pattern = re.compile('ch[0-9]*.ppt') #the pattern actually creates duplicates in the list 

filelist = pattern.findall(string) 
print(filelist) 

for filename in filelist: 
    remotefile = urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/' + filename) 
    localfile = open(filename,'wb') 
    localfile.write(remotefile.read()) 
    localfile.close() 
    remotefile.close()

Fuente

2012-06-04 davelupt

Debe ** nunca ** analizar HTML con RegEx, ver http://stackoverflow.com/a/1732454/851737. Use una biblioteca de análisis HTML como lxml o BeautifulSoup. – schlamar

BeautifulSoup es. Gracias por su recomendación. – davelupt

Este código funcionó para mí. Lo modifiqué un poco porque el tuyo estaba duplicando cada archivo ppt.

from urllib2 import urlopen 
import re 

urlpath =urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/') 
string = urlpath.read().decode('utf-8') 

pattern = re.compile('ch[0-9]*.ppt"') #the pattern actually creates duplicates in the list 

filelist = pattern.findall(string) 
print(filelist) 

for filename in filelist: 
    filename=filename[:-1] 
    remotefile = urlopen('http://www.divms.uiowa.edu/~jni/courses/ProgrammignInCobol/presentation/' + filename) 
    localfile = open(filename,'wb') 
    localfile.write(remotefile.read()) 
    localfile.close() 
    remotefile.close()

Fuente

2012-06-04 01:10:43 apple16

Gracias, eres un campeón. – davelupt

Ver mi comentario [arriba] (http://stackoverflow.com/questions/10875215/python-urllib-downloading-contents-of-an-online-directory#comment14174956_10875215) por el motivo negativo. – schlamar

esto es increíble, gracias – Anuj

Python urllib descargar contenido de un directorio en línea

Respuesta

Cuestiones relacionadas