首先准备实验材料
需要批量下载的文件的列表。 PyCharm编辑器 一台搭建了Python开发环境的电脑。我推荐使用 miniconda 进行python开发环境的搭建。
单线程VS多线程批量下载的表现
上篇文章我们讲述了python单线程下载文件代码的实现,时间隔得有些久了,大家如果忘记了,可以查看这个专题的上一篇文章。今天咱们废话不多说,继续来看Python多线程下载文件的实现。
单线程批量下载文件的表现
D:\ProgramFiles\miniconda3\envs\py38\python.exe D:/code/learnPython/start.py
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/5a2e7b79-89ae-42ee-8215-e84de7dc7dd1.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/1ad7a090-3394-4c85-9616-537ec28ccef0.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/b5334f2e-699a-4423-87d4-4e8ab21ea6e4.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/fdf399dd-a2af-4515-a44a-84e53b6450a3.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/03f21acf-f1c7-40f0-9fc9-6619009c1869.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/a769165b-5323-44a6-8f81-e32bf837f9a7.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/476f641e-e70f-4023-90c9-974cbf086153.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/905e82a0-8050-4a87-b165-b193f51c1b9b.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/ab36a676-d9fd-4efe-bab9-f1ab4611cecf.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/c0f2040b-7459-4866-8abd-75b25168c47d.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/2dfd2def-62cb-4754-96cc-63531d4443be.mp3
下载用时 0.851069s
共耗时 0.853999s
Process finished with exit code 0多线程批量下载文件的表现
D:\ProgramFiles\miniconda3\envs\py38\python.exe D:/code/learnPython/start.py
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/5a2e7b79-89ae-42ee-8215-e84de7dc7dd1.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/1ad7a090-3394-4c85-9616-537ec28ccef0.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/b5334f2e-699a-4423-87d4-4e8ab21ea6e4.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/fdf399dd-a2af-4515-a44a-84e53b6450a3.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/03f21acf-f1c7-40f0-9fc9-6619009c1869.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/a769165b-5323-44a6-8f81-e32bf837f9a7.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/476f641e-e70f-4023-90c9-974cbf086153.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/905e82a0-8050-4a87-b165-b193f51c1b9b.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/ab36a676-d9fd-4efe-bab9-f1ab4611cecf.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/c0f2040b-7459-4866-8abd-75b25168c47d.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/2dfd2def-62cb-4754-96cc-63531d4443be.mp3
下载用时 0.156159s
共耗时 0.159087s
Process finished with exit code 0
可以明显的看出多线程在文件批量下载方面比单线程表现优秀很多。那接下来,小编就带着大家,编写一个多线程批量下载文件的DEMO
ThreadPoolExecutor实现多线程批量下载
完整的代码包可以关注公众号 猿部落 发送 211014 获取
这里我就简单的贴一下核心代码块
import requests
from concurrent.futures import ThreadPoolExecutor
class Mp3Downloader:
def __init__(self):
pass
def _process(self, ts_url, process_type='_multi_thread'): # 实际执行的例程
print(f'{process_type} downloading {ts_url}')
return requests.get(ts_url).content
def single_thread(self, ts_url_list): # 单线程下载方法
response_content_list = []
for ts_url in ts_url_list:
response_content_list.append(self._process(ts_url, '_single_thread'))
return response_content_list
def multi_thread(self, ts_url_list): # 多线程下载方法
tp = ThreadPoolExecutor(10)
response_content_list = []
task_list = []
for ts_url in ts_url_list: # 将任务批量提交
task = tp.submit(self._process, ts_url, '_multi_thread')
task_list.append(task)
for task in task_list: # 批量获取任务执行的结果
content = task.result()
response_content_list.append(content)
return response_content_list
# 批量提交、执行任务,然后批量获取任务执行的结果进行下一步处理是多线程处理IO密集型任务的核心思路
ThreadPoolExecutor
ThreadPoolExecutor继承自concurrent.futures.Executor类,后者是一个抽象类。其中这个抽象类提供了3个异步执行调用方法。其中submit是最常用的。
*submit(fn, *args, *kwargs)
安排可调用的fn,被当作fn(*args, **kwargs)执行,然后返回一个Future对象代表fn的执行结果。Future封装了异步调用的结果。其中result方法返回调用执行的结果。如果调用没有执行完成,那么此方法将会一直等待,直到达到超时的状态。
*map(func, *iterables, *timeout=None, chunksize=1)
这个方法我个人认为是submit的升级版,可以简代码,我们可以将我们实现的multi_thread做如下更改,效果是一样的
def multi_thread(self, ts_url_list):
tp = ThreadPoolExecutor(10)
response_content_list = []
for result in tp.map(self._process, ts_url_list):
response_content_list.append(result)
return response_content_listshutdown(wait=True)
通知执行者,当当前挂起的futures执行完成后,它应该释放所有的资源。在执行完shutdown方法后,再调用
Executor.submit()
和Executor.map()
会导致抛出 RuntimeError异常。
文章转载自直截了当,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。