Url2Html.py 中有个小的问题

第61行，download_img 方法中如果image 已经存在，返回的替换imageurl是以全路径替换的。而第一次替换的时候是以basename路径拼接的，意味着用相同image的第一次和第二次是不同的路径，导致同一源的image后续使用的时候会路径不对，可以改成统一的。

wechat_articles_spider/wechatarticles/Url2Html.py

Lines 55 to 67 in 3d399c6

    
           name = "{}.{}".format(url.split("/")[-2], url.split("/")[3].split("_")[-1]) 
        
           imgpath = os.path.join(self.img_path, name) 
        
           # 如果该图片已被下载，可以无需再下载，直接返回路径即可 
        
           if os.path.isfile(imgpath): 
        
               with open(imgpath, "rb") as f: 
        
                   img = f.read() 
        
               return imgpath, img 
        
           response = requests.get(url, proxies=self.proxies) 
        
           img = response.content 
        
           with open(imgpath, "wb") as f: 
        
               f.write(img) 
        
           return imgpath, img

多谢提醒，我也不记得当时为啥要加这个basename了。已删除

	name = "{}.{}".format(url.split("/")[-2], url.split("/")[3].split("_")[-1])
	imgpath = os.path.join(self.img_path, name)
	# 如果该图片已被下载，可以无需再下载，直接返回路径即可
	if os.path.isfile(imgpath):
	with open(imgpath, "rb") as f:
	img = f.read()
	return imgpath, img

	response = requests.get(url, proxies=self.proxies)
	img = response.content
	with open(imgpath, "wb") as f:
	f.write(img)
	return imgpath, img