
Evaluating the Performance of Erasure Codes
Ping Lu
12*
, Zhihao Huang
3
, Shengmei Luo
2
, Hui Li
3
, Jun Chen
3
1
The School of Information Science and Engineering, Southeast University, Nanjing, 210096, China
2
The Cloud Computing and IT Institute of ZTE Corporation, Nanjing, 210012, China
3
Shenzhen Graduate School, Peking University, Shenzhen, 518055, China
Email: lu.ping@zte.com.cn
Abstract—Recently, erasure codes such as Reed-Solomon (RS)
code and Cauchy Reed-Solomon (CRS) code have been widely
used in distributed file system to reduce the large storage
overhead incurred by replication scheme. Now, there is a new
erasure code called Binary Reed-Solomon (BRS) code that can
achieve better performance than that of RS code, CRS code and
is available to be deployed in distributed file system. In this paper,
we compare the encoding and decoding performance of BRS
code with CRS code and RS code practically. Moreover, we
evaluate how the performance of these codes is affected by cache
size and multi-core from modern computer architecture
perspective. The experimental results show that it is possible to
replace RS code or CRS code with BRS code in distributed file
system. These works let us have a better understanding of BRS
code, CRS code and RS code and help us to make full use of
modern computer to get best performance of these codes from
modern computer architecture perspective.
Keywords—evaluate, performance, erasure codes
I. INTRODUCTION
Nowadays, cloud storage has been increasingly popular in
our daily life. Distributed file system with a large number of
inexpensive and unreliable storages nodes has been designed to
store a large amount of data, which is key to the success of
cloud storage service. Nevertheless, node failures in distributed
file systems are “norm rather than exception” [1]. If one node
in the distributed storage system breaks down, the data on that
node will be lost. Therefore, many commercial distributed file
systems, such as the Google File System [1] and the Hadoop
File System [2], use triple replication as the default storage
policy to ensure the reliability and availability of their system.
However, triple replication incurs a storage overhead of 200%,
which is very costly.
Erasure codes can be used in distributed file system to
protect data from losing at any failed node and reduce storage
overhead. Maximum Distance Separable (MDS) [3] code is a
desirable erasure code that make full use of the parity while
protecting data from multiple nodes failure. Encoded by MDS
code, a file is divided into k source blocks, and generates m
parity blocks from the k source blocks. It can get =+
blocks in the end. Any file encoded by MDS code can be
recovered from any k available blocks in all n blocks generated
from that file.
Reed-Solomon (RS) [4] code is a famous class of MDS
codes that can tolerate multiple blocks failed. However, the
encoding and decoding complexities of traditional RS code is
very high because traditional RS code requires a large finite
Galois field. Compared with the RS code, Cauchy Reed-
Solomon (CRS) [5] code converts the finite Galois field
arithmetic into XOR operation. The performance of CRS code
is much better than that of RS code. Therefore, CRS code has
been widely used in commercial distributed file system.
However, the encoding and decoding complexities of CRS
code are still higher than that of the optimal MDS code. In this
case, MDS codes with optimal encoding complexity were
proposed. For example, MDS array code is one family of such
codes. One of the most important property of MDS array code
is that it uses only XOR operation in both encoding and
decoding process [6]. Thus, MDS array code is more
computationally efficient in practical implementation. EVEN-
ODD code, first presented in 1995, was the first MDS array
code for two nodes failure. Most of the parity-based codes use
row-diagonal parity in different forms. The EVENODD code
uses row-diagonal parity for a ( −1) × ( + 2) array,
where p is a prime [7]. RDP code can recover any two nodes
failure and shows a ( − 1) × array with row-diagonal
parity method, where p is a prime [8]. Moreover, STAR code
[9] was proposed for dealing with three or more parity blocks.
When two or more nodes break down, however, for both
RS code and traditional MDS array code, there is a severe
problem that a complex matrix must be solved first before
decoding the original data. As the matrix becomes harder to be
solved with the increase of the number of missing data blocks,
the decoding complexity is very high. To solve this problem
and get a better decoding complexity, Binary Reed-Solomon
(BRS) code is proposed [10]. The necessary and sufficient
MDS property of BRS code is given in [10]. BRS code only
requires a fast operation over GF(2) for both encoding and
decoding, so it is supposed to take place of traditional MDS
codes. The biggest disadvantage of BRS code is that the length
of the parity block is slightly larger with o bits overhead, where
=
(
−1
)
∗(−1), but the encoding and decoding
complexities of BRS code are much better than that of CRS
code. Hence, BRS code can get a much better performance at
the cost of a slight storage overhead.
In this paper, firstly, we implement programs of BRS code
and RS code in C programming language, including encoding
and decoding method. Moreover, we evaluate the performance
of these codes in practical computation. Then, we explore how
cache size and multi-core affect the performance of these codes
from modern computer architecture perspective. The
experimental results show that for both encoding and decoding
2015 Third International Conference on Advanced Cloud and Big Data
978-1-4673-8537-4/15 $31.00 © 2015 IEEE
DOI 10.1109/CBD.2015.42
213
2015 Third International Conference on Advanced Cloud and Big Data
978-1-4673-8537-4/15 $31.00 © 2015 IEEE
DOI 10.1109/CBD.2015.42
213
2015 Third International Conference on Advanced Cloud and Big Data
978-1-4673-8537-4/15 $31.00 © 2015 IEEE
DOI 10.1109/CBD.2015.42
213
Authorized licensed use limited to: ZTE CORPORATION. Downloaded on July 18,2023 at 08:10:42 UTC from IEEE Xplore. Restrictions apply.
评论