多线程下载批量文件Python实现

直截了当 2021-10-14

1499

首先准备实验材料

需要批量下载的文件的列表。
PyCharm编辑器
一台搭建了Python开发环境的电脑。我推荐使用 miniconda 进行python开发环境的搭建。

单线程VS多线程批量下载的表现

上篇文章我们讲述了python单线程下载文件代码的实现，时间隔得有些久了，大家如果忘记了，可以查看这个专题的上一篇文章。今天咱们废话不多说，继续来看Python多线程下载文件的实现。

单线程批量下载文件的表现

D:\ProgramFiles\miniconda3\envs\py38\python.exe D:/code/learnPython/start.py
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/5a2e7b79-89ae-42ee-8215-e84de7dc7dd1.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/1ad7a090-3394-4c85-9616-537ec28ccef0.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/b5334f2e-699a-4423-87d4-4e8ab21ea6e4.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/fdf399dd-a2af-4515-a44a-84e53b6450a3.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/03f21acf-f1c7-40f0-9fc9-6619009c1869.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/a769165b-5323-44a6-8f81-e32bf837f9a7.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/476f641e-e70f-4023-90c9-974cbf086153.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/905e82a0-8050-4a87-b165-b193f51c1b9b.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/ab36a676-d9fd-4efe-bab9-f1ab4611cecf.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/c0f2040b-7459-4866-8abd-75b25168c47d.mp3
_single_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/2dfd2def-62cb-4754-96cc-63531d4443be.mp3
下载用时 0.851069s
共耗时 0.853999s

Process finished with exit code 0


复制

多线程批量下载文件的表现

D:\ProgramFiles\miniconda3\envs\py38\python.exe D:/code/learnPython/start.py
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/5a2e7b79-89ae-42ee-8215-e84de7dc7dd1.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/1ad7a090-3394-4c85-9616-537ec28ccef0.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/b5334f2e-699a-4423-87d4-4e8ab21ea6e4.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/fdf399dd-a2af-4515-a44a-84e53b6450a3.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/03f21acf-f1c7-40f0-9fc9-6619009c1869.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/a769165b-5323-44a6-8f81-e32bf837f9a7.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/476f641e-e70f-4023-90c9-974cbf086153.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/905e82a0-8050-4a87-b165-b193f51c1b9b.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/ab36a676-d9fd-4efe-bab9-f1ab4611cecf.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/c0f2040b-7459-4866-8abd-75b25168c47d.mp3
_multi_thread downloading http://www.snhm.org.cn/museum/uploadFiles/exhibit/2dfd2def-62cb-4754-96cc-63531d4443be.mp3
下载用时 0.156159s
共耗时 0.159087s

Process finished with exit code 0


复制

可以明显的看出多线程在文件批量下载方面比单线程表现优秀很多。那接下来，小编就带着大家，编写一个多线程批量下载文件的DEMO

ThreadPoolExecutor实现多线程批量下载

完整的代码包可以关注公众号 猿部落 发送 211014 获取

这里我就简单的贴一下核心代码块

import requests
from concurrent.futures import ThreadPoolExecutor


class Mp3Downloader:
    def __init__(self):
        pass

    def _process(self, ts_url, process_type='_multi_thread'): # 实际执行的例程
        print(f'{process_type} downloading {ts_url}')
        return requests.get(ts_url).content

    def single_thread(self, ts_url_list): # 单线程下载方法
        response_content_list = []
        for ts_url in ts_url_list:
            response_content_list.append(self._process(ts_url, '_single_thread'))
        return response_content_list

    def multi_thread(self, ts_url_list): # 多线程下载方法
        tp = ThreadPoolExecutor(10)
        response_content_list = []
        task_list = []
        for ts_url in ts_url_list: # 将任务批量提交
            task = tp.submit(self._process, ts_url, '_multi_thread')
            task_list.append(task)
        for task in task_list: # 批量获取任务执行的结果
            content = task.result()
            response_content_list.append(content)
        return response_content_list
 # 批量提交、执行任务，然后批量获取任务执行的结果进行下一步处理是多线程处理IO密集型任务的核心思路

复制

ThreadPoolExecutor

ThreadPoolExecutor继承自concurrent.futures.Executor类，后者是一个抽象类。其中这个抽象类提供了3个异步执行调用方法。其中submit是最常用的。

*submit(fn, *args, *kwargs)
安排可调用的fn，被当作fn(*args, **kwargs)执行，然后返回一个Future对象代表fn的执行结果。Future封装了异步调用的结果。其中result方法返回调用执行的结果。如果调用没有执行完成，那么此方法将会一直等待，直到达到超时的状态。

*map(func, *iterables, *timeout=None, chunksize=1)

这个方法我个人认为是submit的升级版，可以简代码，我们可以将我们实现的multi_thread做如下更改，效果是一样的

def multi_thread(self, ts_url_list):
    tp = ThreadPoolExecutor(10)
    response_content_list = []
    for result in tp.map(self._process, ts_url_list):
        response_content_list.append(result)
    return response_content_list

复制

shutdown(wait=True)
通知执行者，当当前挂起的futures执行完成后，它应该释放所有的资源。在执行完shutdown方法后，再调用 Executor.submit()
和 Executor.map()
会导致抛出 RuntimeError异常。

python

文章转载自直截了当，如果涉嫌侵权，请发送邮件至：contact@modb.pro进行举报，并提供相关证据，一经查实，墨天轮将立刻删除相关内容。

多线程下载批量文件Python实现

单线程VS多线程批量下载的表现

ThreadPoolExecutor实现多线程批量下载

ThreadPoolExecutor

评论

相关阅读