Whale：统一多种并行化策略的分布式深度学习框架.pdf

Dataplus

470

19页

0次

2021-04-14

50墨值下载

A Unified Distributed Training Framework

Whale

Ang Wang

wangang.wa@alibaba-inc.com

PAI, Alibaba Cloud

15/12/2020

Motivation

117

340

1500

11000

175000

50000

100000

150000

200000

Parameters(M)

P4 P100 V100 A100

Memory (GB)

[1]

Models are getting larger

and more complex

Larger models lead to better

results with lower validation

perplexities

Model size grows far beyond

upgrading of hardware

[1] https://developer.nvidia.com/blog/training-bert-with-gpus/

Models are getting larger

• Data Parallel(DP) is widely used in distributed training

as it is simple and easy to implement.

• DP is not always optimal for every distributed training

workloads.

• Necessary to find an efficient parallel strategy that can

make full use of the resources and speedup the training.

Motivation

Distribute the training

workload with data parallelism

Data parallelism becomes less optimal for lots of distributed workloads

of 19

50墨值下载

gtc2020

文档被以下合辑收录

NVIDIA GTC中国线上大会2020（PPT汇总）（共140篇）

NVIDIA GTC (GPU 技术大会)是全球公认的顶级AI盛会，中国线上大会已于2020年12月15日-19日召开，在15日的演讲中，全球最负盛名的计算机科学家之一、NVIDIA研究院负责人Bill Dally将会分享关于AI、计算机图形学、HPC、医疗、边缘计算、自动化机器等领域最前沿创新以及AI推理、硅光子学和GPU集群加速等领域最新的研究成果。

文档被以下合辑收录

相关文档

评论