• Data Parallel(DP) is widely used in distributed training
as it is simple and easy to implement.
• DP is not always optimal for every distributed training
workloads.
• Necessary to find an efficient parallel strategy that can
make full use of the resources and speedup the training.
Motivation
Distribute the training
workload with data parallelism
Data parallelism becomes less optimal for lots of distributed workloads
文档被以下合辑收录
相关文档
评论