暂无图片
暂无图片
暂无图片
暂无图片
暂无图片
Evaluating_the_Performance_of_Erasure_Codes.pdf
112
6页
3次
2023-07-18
免费下载
Evaluating the Performance of Erasure Codes
Ping Lu
12*
, Zhihao Huang
3
, Shengmei Luo
2
, Hui Li
3
, Jun Chen
3
1
The School of Information Science and Engineering, Southeast University, Nanjing, 210096, China
2
The Cloud Computing and IT Institute of ZTE Corporation, Nanjing, 210012, China
3
Shenzhen Graduate School, Peking University, Shenzhen, 518055, China
Email: lu.ping@zte.com.cn
AbstractRecently, erasure codes such as Reed-Solomon (RS)
code and Cauchy Reed-Solomon (CRS) code have been widely
used in distributed file system to reduce the large storage
overhead incurred by replication scheme. Now, there is a new
erasure code called Binary Reed-Solomon (BRS) code that can
achieve better performance than that of RS code, CRS code and
is available to be deployed in distributed file system. In this paper,
we compare the encoding and decoding performance of BRS
code with CRS code and RS code practically. Moreover, we
evaluate how the performance of these codes is affected by cache
size and multi-core from modern computer architecture
perspective. The experimental results show that it is possible to
replace RS code or CRS code with BRS code in distributed file
system. These works let us have a better understanding of BRS
code, CRS code and RS code and help us to make full use of
modern computer to get best performance of these codes from
modern computer architecture perspective.
Keywordsevaluate, performance, erasure codes
I. INTRODUCTION
Nowadays, cloud storage has been increasingly popular in
our daily life. Distributed file system with a large number of
inexpensive and unreliable storages nodes has been designed to
store a large amount of data, which is key to the success of
cloud storage service. Nevertheless, node failures in distributed
file systems are norm rather than exception [1]. If one node
in the distributed storage system breaks down, the data on that
node will be lost. Therefore, many commercial distributed file
systems, such as the Google File System [1] and the Hadoop
File System [2], use triple replication as the default storage
policy to ensure the reliability and availability of their system.
However, triple replication incurs a storage overhead of 200%,
which is very costly.
Erasure codes can be used in distributed file system to
protect data from losing at any failed node and reduce storage
overhead. Maximum Distance Separable (MDS) [3] code is a
desirable erasure code that make full use of the parity while
protecting data from multiple nodes failure. Encoded by MDS
code, a file is divided into k source blocks, and generates m
parity blocks from the k source blocks. It can get =+
blocks in the end. Any file encoded by MDS code can be
recovered from any k available blocks in all n blocks generated
from that file.
Reed-Solomon (RS) [4] code is a famous class of MDS
codes that can tolerate multiple blocks failed. However, the
encoding and decoding complexities of traditional RS code is
very high because traditional RS code requires a large finite
Galois field. Compared with the RS code, Cauchy Reed-
Solomon (CRS) [5] code converts the finite Galois field
arithmetic into XOR operation. The performance of CRS code
is much better than that of RS code. Therefore, CRS code has
been widely used in commercial distributed file system.
However, the encoding and decoding complexities of CRS
code are still higher than that of the optimal MDS code. In this
case, MDS codes with optimal encoding complexity were
proposed. For example, MDS array code is one family of such
codes. One of the most important property of MDS array code
is that it uses only XOR operation in both encoding and
decoding process [6]. Thus, MDS array code is more
computationally efficient in practical implementation. EVEN-
ODD code, first presented in 1995, was the first MDS array
code for two nodes failure. Most of the parity-based codes use
row-diagonal parity in different forms. The EVENODD code
uses row-diagonal parity for a ( −1) × ( + 2) array,
where p is a prime [7]. RDP code can recover any two nodes
failure and shows a ( − 1) ×  array with row-diagonal
parity method, where p is a prime [8]. Moreover, STAR code
[9] was proposed for dealing with three or more parity blocks.
When two or more nodes break down, however, for both
RS code and traditional MDS array code, there is a severe
problem that a complex matrix must be solved first before
decoding the original data. As the matrix becomes harder to be
solved with the increase of the number of missing data blocks,
the decoding complexity is very high. To solve this problem
and get a better decoding complexity, Binary Reed-Solomon
(BRS) code is proposed [10]. The necessary and sufficient
MDS property of BRS code is given in [10]. BRS code only
requires a fast operation over GF(2) for both encoding and
decoding, so it is supposed to take place of traditional MDS
codes. The biggest disadvantage of BRS code is that the length
of the parity block is slightly larger with o bits overhead, where
=
(
−1
)
∗(−1), but the encoding and decoding
complexities of BRS code are much better than that of CRS
code. Hence, BRS code can get a much better performance at
the cost of a slight storage overhead.
In this paper, firstly, we implement programs of BRS code
and RS code in C programming language, including encoding
and decoding method. Moreover, we evaluate the performance
of these codes in practical computation. Then, we explore how
cache size and multi-core affect the performance of these codes
from modern computer architecture perspective. The
experimental results show that for both encoding and decoding
2015 Third International Conference on Advanced Cloud and Big Data
978-1-4673-8537-4/15 $31.00 © 2015 IEEE
DOI 10.1109/CBD.2015.42
213
2015 Third International Conference on Advanced Cloud and Big Data
978-1-4673-8537-4/15 $31.00 © 2015 IEEE
DOI 10.1109/CBD.2015.42
213
2015 Third International Conference on Advanced Cloud and Big Data
978-1-4673-8537-4/15 $31.00 © 2015 IEEE
DOI 10.1109/CBD.2015.42
213
Authorized licensed use limited to: ZTE CORPORATION. Downloaded on July 18,2023 at 08:10:42 UTC from IEEE Xplore. Restrictions apply.
speed , BRS > CRS > RS. Also, the cache size has significant
impact on the encoding and decoding speed of these codes.
Using multithreaded program to encode and decode can
achieve almost 2x improvement in throughput, comparing with
the performance of the program running 1 thread.
The rest of this paper is organized as follows. Section II
gives a brief introduction to RS code and CRS code, then
describes the encoding and decoding method of BRS code.
Section III describes the testing environment and give the test
results. We also analyze the performance of these codes.
Section IV shows how cache size and multi-core influence the
performance of these codes. Section V summarizes the whole
paper.
II. B
ACKGROUND
In this section, we give a brief introduction to RS code and
CRS code first. Then, we mainly describe the basic knowledge
about BRS code, including encoding and decoding method of
BRS code.
A. RS code
From [4], we can know that RS code is defined over the
finite field GF(q). In the construction of RS code, it generates
the parity blocks from a square Vandermonde matrix using
multiplication operations over the finite field GF(q). In
deconstruction of RS code, it mainly uses Gaussian
elimination to decode with multiplication operations over the
finite field GF(q).
B. CRS code
From [5], we can know that CRS code converts the finite
Galois field arithmetic into XOR operation. In the
construction of CRS code, it generates the parity blocks from
a Cauchy matrix instead of a Vandermonde matrix. Also, CRS
code uses XOR operations instead of multiplication operations
in both encoding and decoding process. These differences
improve the encoding and decoding of CRS code comparing
with RS code.
C. Encoding of BRS code
In the construction of BRS code, the data file is split into k
source blocks
,1. Each original data block consists
of L bits and can be represented by the polynomial as follow
(
)
=
,


(1)
where
,
is the ( + 1)-th bit of
.
Then m parity blocks are generated by the source blocks.
The i-th parity blocks (1≤≤) is given as follow
(
)
=
,
(
)

(2)
Thus, the corresponding matrix form of both original data
blocks and parity blocks can be given as follow
()
()
=
()
 
(
)
=:
(
)
() (3)
where
is a × identify matrix, () is a ×
Vandermonde matrix. The (,)-th element of () is
,
=
()()
(4)
For example, let =3 and =3, the encoding matrix
is as follow
G
(
)
=
100
010
001
111
1
1
(5)
Here,
(
)
means that right shift
(
)
with offset a.
From above, we can see that in the construction of BRS
decodable code, there is at most
=
(
−1
)
∗(−1) (6)
bits overhead in the parity blocks. The encoding complexity of
BRS code is O(k) per parity bit.
The encoding process of this example is given as follow:
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
=s
s
s
,
,
,
,
,
,
,
,
,
,
,
,
,
,
=s
s
s
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
=s
s
s
From above, we can see the encoding process of BRS
code clearly.
214214214
Authorized licensed use limited to: ZTE CORPORATION. Downloaded on July 18,2023 at 08:10:42 UTC from IEEE Xplore. Restrictions apply.
of 6
免费下载
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文档的来源(墨天轮),文档链接,文档作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论

关注
最新上传
暂无内容,敬请期待...
下载排行榜
Top250 周榜 月榜