暂无图片
暂无图片
1
暂无图片
暂无图片
暂无图片
Whale:统一多种并行化策略的分布式深度学习框架.pdf
470
19页
0次
2021-04-14
50墨值下载
A Unified Distributed Training Framework
Whale
Ang Wang
wangang.wa@alibaba-inc.com
PAI, Alibaba Cloud
15/12/2020
Motivation
25
117
340
1500
11000
175000
0
50000
100000
150000
200000
Parameters(M)
8
16
32
80
0
10
20
30
40
50
P4 P100 V100 A100
Memory (GB)
[1]
Models are getting larger
and more complex
Larger models lead to better
results with lower validation
perplexities
Model size grows far beyond
upgrading of hardware
[1] https://developer.nvidia.com/blog/training-bert-with-gpus/
Models are getting larger
Data Parallel(DP) is widely used in distributed training
as it is simple and easy to implement.
DP is not always optimal for every distributed training
workloads.
Necessary to find an efficient parallel strategy that can
make full use of the resources and speedup the training.
Motivation
Distribute the training
workload with data parallelism
Data parallelism becomes less optimal for lots of distributed workloads
of 19
50墨值下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。