Table of Contents
概述
总所周知,在数据库的实际应用中,容灾的建设对于一个企业很重要,这对于数据库保护性和可用性有关键作用。
得知国产数据库YashanDB个人版发布,兴致冲冲地体验一下YashanDB高可用架构的搭建、切换和相关使用。
容灾原理
YashanDB主备的原理是:主库发送日志到从库,备机通过重演主机发送过来的日志来恢复数据页面,以达到和主机的同步。
日志传送有SYNC和ASYNC两种方式,SYNC是直接读取log cache里边的数据发送到备库,性能比较高。而ASYNC是从redo里边读取数据发送到备库,适用于异地备库场景。日志发送采用的模式是由系统自动选择的。
主备搭建
环境准备
主机IP | 主机配置 | 说明 |
---|---|---|
192.168.56.101 | 4C 8G 100G | 用于部署主库 |
192.168.56.102 | 4C 8G 100G | 用于部署备库 |
系统配置
注:主备库均需配置
1. 修改内核参数 echo "vm.swappiness = 0">> /etc/sysctl.conf echo "net.ipv4.ip_local_port_range = 32768 60999" >> /etc/sysctl.conf echo "vm.max_map_count=2000000" >> /etc/sysctl.conf swapoff -a sysctl -p 2. 调整使用限制 echo " * soft nofile 65536 * hard nofile 65536 * soft nproc 65536 * hard nproc 65536 * soft rss unlimited * hard rss unlimited * soft nproc 8192 * hard nproc 8192 * soft core unlimited * hard core unlimited " >> /etc/security/limits.conf 3. 关闭透明大页 echo never > /sys/kernel/mm/transparent_hugepage/enabled echo never > /sys/kernel/mm/transparent_hugepage/defrag 4. 关闭防火墙 systemctl stop firewalld systemctl disable firewalld
复制
数据库搭建
1. 创建用户(所有节点) [root@yashan1-test ~]# useradd yashan [root@yashan1-test ~]# group add YASDBA [root@yashan1-test ~]# usermod -a -G YASDBA yashan [root@yashan1-test ~]# echo "yashan" | passwd --stdin yashan 2. 主节点解压软件包 [root@yashan1-test ~]# su - yashan [yashan@yashan1-test ~]$ mkdir -p /home/yashan/install ##上传安装包到/home/yashan/install下 [yashan@yashan1-test ~]$ cd /home/yashan/install [yashan@yashan1-test install]$ tar xzf yashandb-personal-23.1.1.100-linux-x86_64.tar.gz 3. 生成主备部署文件和安装软件 [yashan@yashan1 install]$ ./bin/yasboot package se gen --cluster yashandb -u yashan -p yashan --ip 192.168.56.101,192.168.56.102 --port 22 --install-path /data/yashan/yasdb_home --data-path /data/yashan/yasdb_data --begin-port 1688 --node 2 hostid | group | node_type | node_name | listen_addr | replication_addr | data_path ---------------------------------------------------------------------------------------------------------------- host0001 | dbg1 | db | 1-1 | 192.168.56.101:1688 | 192.168.56.101:1689 | /data/yashan/yasdb_data ----------+-------+-----------+-----------+---------------------+---------------------+------------------------- host0002 | dbg1 | db | 1-2 | 192.168.56.102:1688 | 192.168.56.102:1689 | /data/yashan/yasdb_data ----------+-------+-----------+-----------+---------------------+---------------------+------------------------- Generate config success [yashan@yashan1 install]$ ./bin/yasboot package install -t hosts.toml -i yashandb-personal-23.1.1.100-linux-x86_64.tar.gz checking install package... install version: yashandb 23.1.1.100 host0001 100% [====================================================================] 3s host0002 100% [====================================================================] 3s update host to yasom... 3. 数据库部署和配置环境变量 [yashan@yashan1 install]$ ./bin/yasboot cluster deploy -t yashandb.toml type | uuid | name | hostid | index | status | return_code | progress | cost ------------------------------------------------------------------------------------------------------------ task | 94c181a52b4d2d23 | DeployYasdbCluster | - | yashandb | SUCCESS | 0 | 100 | 27 ------+------------------+--------------------+--------+----------+---------+-------------+----------+------ task completed, status: SUCCESS ##所有节点使用yashan用户运行 [yashan@yashan2 install]$ cat /data/yashan/yasdb_home/yashandb/23.1.1.100/conf/yashandb.bashrc >> ~/.bashrc [yashan@yashan2 install]$ source ~/.bashrc 4. 设置sys用户口令 [yashan@yashan1 install]$ ./bin/yasboot cluster password set -n yashan666! -c yashandb type | uuid | name | hostid | index | status | return_code | progress | cost ---------------------------------------------------------------------------------------------------------- task | 9f9af07bb64ecb51 | YasdbPasswordSet | - | yashandb | SUCCESS | 0 | 100 | 3 ------+------------------+------------------+--------+----------+---------+-------------+----------+------ task completed, status: SUCCESS 5. 验证数据库是否安装成功 [yashan@yashan1 install]$ ./bin/yasboot cluster status -c yashandb -d hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path ------------------------------------------------------------------------------------------------------------------------------------------------- host0001 | db | 1-1:1 | 4894 | open | normal | primary | 192.168.56.101:1688 | /data/yashan/yasdb_data/db-1-1 ----------+-----------+--------+------+-----------------+-----------------+---------------+---------------------+-------------------------------- host0002 | db | 1-2:2 | 5606 | open | normal | standby | 192.168.56.102:1688 | /data/yashan/yasdb_data/db-1-2 ----------+-----------+--------+------+-----------------+-----------------+---------------+---------------------+--------------------------------
复制
主备库参数说明
参数 | 值 | 说明 |
---|---|---|
LISTEN_ADDR | 本机数据库的IP与监听端口 | 该参数在安装时已配置 |
NODE_ID | 各节点编号 | 该参数在创建节点时已配置,且不能重复,主要用于主备机自动选举 |
REPLICATION_ADDR | 本机数据库的IP与同步复制端口 | 可不配置,则与LISTEN_ADDR共享一个端口 |
ARCHIVE_DEST_1 | 本机数据库的IP与监听端口 | 必须配置 |
DB_FILE_NAME_CONVERT | 对端到本机的数据文件转换路径,支持递归转换 | 主备数据文件路径不同时必须配置 |
REDO_FILE_NAME_CONVERT | 对端到本机的日志文件转换路径,支持递归转换 | 主备日志文件路径不同时必须配置 |
DB_BUCKET_NAME_CONVERT | 对端到本机的LSC表的bucket转换路径,支持递归转换 | 主备bucket路径不同时必须配置 |
HA_SSL_ENABLE | 主备间SSL加密传输开关 | content3 |
主备切换
Switchover切换
主备切换即是主库降级为备库,备库升级为主库。执行主备切换时候要求主备的网络连接正常,并且主备实例都处于OPEN状态。具体的切换过程如下:
检查主备机同步情况
1 主库查看同步情况 SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM V$ARCHIVE_DEST_STATUS; DEST_ID CONNECTION STATUS DATABASE_MODE RECEIVED_LFN APPLIED_LFN SYNCHRONIZED ------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------ 1 CONNECTED NORMAL OPEN 2189 2189 YES 1 row fetched. 2 备库查看同步情况 SQL> SELECT connection, status, peer_addr, transport_lag, apply_lag FROM V$REPLICATION_STATUS; CONNECTION STATUS PEER_ADDR TRANSPORT_LAG APPLY_LAG ----------------- ----------------- ---------------------------------------------------------------- ------------- ------------ CONNECTED NORMAL 192.168.56.101:1689 0 0 1 row fetched.
复制
备库执行切换
SQL> ALTER DATABASE SWITCHOVER;
复制
检查新主从状态
1 查看新主库情况 SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM V$ARCHIVE_DEST_STATUS; DEST_ID CONNECTION STATUS DATABASE_MODE RECEIVED_LFN APPLIED_LFN SYNCHRONIZED ------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------ 1 CONNECTED NORMAL OPEN 2205 2205 YES 1 row fetched. 2 查看新备库同步情况 SQL> SELECT connection, status, peer_addr, transport_lag, apply_lag FROM V$REPLICATION_STATUS; CONNECTION STATUS PEER_ADDR TRANSPORT_LAG APPLY_LAG ----------------- ----------------- ---------------------------------------------------------------- ------------- ------------ CONNECTED NORMAL 192.168.56.102:1689 0 0 1 row fetched.
复制
Failover切换
Failover切换一般是在主库发生故障,将备库强制升级为主库的一种切换方式,该方式会破坏原有的主从关系。具体的切换过程如下:
选择备机升主
在紧急状态下进行的故障切换可能会丢失数据,为尽可能降低损失,在多台备机中,应该选择日志量最多的一个备机执行Failover。
检查V$REPLICATION_STATUS视图中的received_lfn和gap_seq#字段:
- 如果gap_seq#都为0,选择received_lfn值最大的备机。
- 如果某台备机gap_seq#不为0,说明该备机缺少一些归档,升主后将丢失一部分数据,此时需要查看所有备机上的该视图,选择gap_seq#值最大,并且received_lfn值最大的备机。
1 确认主备机的角色,为主机或备机 SQL> SELECT database_id,database_name,log_mode,open_mode,database_role FROM V$DATABASE; DATABASE_ID DATABASE_NAME LOG_MODE OPEN_MODE DATABASE_ROLE --------------------- ---------------------------------------------------------------- ----------------- ----------------- ----------------- 3102204658 yashandb ARCHIVELOG READ_ONLY STANDBY 1 row fetched. 2 确认备机连接已经断连,并且RECEIVED_LFN是剩余备机里最大的,GAP_SEQ#为0 SQL> SELECT connection,status,received_lfn,gap_seq# FROM V$REPLICATION_STATUS; CONNECTION STATUS RECEIVED_LFN GAP_SEQ# ----------------- ----------------- --------------------- ------------ DISCONNECTED NORMAL 2251 0
复制
执行Failover,激活备库
SQL> ALTER DATABASE FAILOVER;
复制
查看新主库状态
SQL> SELECT database_id,database_name,log_mode,open_mode,database_role FROM V$DATABASE; DATABASE_ID DATABASE_NAME LOG_MODE OPEN_MODE DATABASE_ROLE --------------------- ---------------------------------------------------------------- ----------------- ----------------- ----------------- 3102204658 yashandb ARCHIVELOG READ_WRITE PRIMARY 1 row fetched.
复制
日常维护
主备库同步状态日常维护中常用到的视图有:
主库:v$archive_dest_status,这个视图展示了所有备机的统计信息。
从库:v$replication_status和V$RECOVERY_PROGRESS视图。v$replication_status 这个视图显示集群中所有节点的备机redo传输汇总信息。V$RECOVERY_PROGRESS视图 这个视图显示日志回放进度汇总信息。
v$archive_dest_status 视图:
v$replication_status 视图:
V$RECOVERY_PROGRESS视图:
主备同步检查
1 查看数据库状态 SQL> SELECT database_name,log_mode,open_mode,protection_mode,database_role,block_size,status FROM V$DATABASE; DATABASE_NAME LOG_MODE OPEN_MODE PROTECTION_MODE DATABASE_ROLE BLOCK_SIZE STATUS ---------------------------------------------------------------- ----------------- ----------------- --------------------------------- ----------------- ------------ --------------------------------- yashandb ARCHIVELOG READ_WRITE MAXIMUM AVAILABILITY PRIMARY 8192 NORMAL 1 row fetched. 2 查看归档日志情况 SQL> SELECT name, SEQUENCE#, blocks, block_size, completion_time FROM V$ARCHIVED_LOG; NAME SEQUENCE# BLOCKS BLOCK_SIZE COMPLETION_TIME ---------------------------------------------------------------- ------------ ------------ ---------- -------------------------------- /data/yashan/yasdb_data/db-1-1/archive/arch_0_1.ARC 1 9 4096 2023-12-02 1 row fetched. 3 主库查看同步情况 SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized FROM V$ARCHIVE_DEST_STATUS; DEST_ID CONNECTION STATUS DATABASE_MODE RECEIVED_LFN APPLIED_LFN SYNCHRONIZED ------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------ 1 CONNECTED NORMAL OPEN 2189 2189 YES 1 row fetched. 4 备库查看同步情况 SQL> SELECT connection,status,peer_addr,transport_lag,apply_lag FROM V$REPLICATION_STATUS; CONNECTION STATUS PEER_ADDR TRANSPORT_LAG APPLY_LAG ----------------- ----------------- ---------------------------------------------------------------- ------------- ------------ CONNECTED NORMAL 192.168.56.101:1689 0 0 1 row fetched. 5 yasboot工具监控 SQL> SELECT * FROM V$RECOVERY_PROGRESS; ITEM UNITS VALUE --------------------------------- ----------------- --------------------- Active Apply Rate KB/sec 4000000 Average Apply Rate KB/sec 117870 Maximum Apply Rate KB/sec 232000000 Redo Applied MB 0 Redo Remain MB 0 First Applied Redo LFN 2168 Last Applied Redo LFN 2189 Active Time Seconds 0 Remain Time Seconds 0 Non Physical Log Count Number 3 Parallelism Threads 16 Latest Load Size KB 4 Latest Load Time Millisecond 0 Latest Analysis Time Millisecond 0 Wait Previous Apply Time Millisecond 0 Latest Apply Time Millisecond 0 16 rows fetched. 6 yasboot工具监控 [yashan@yashan1 install]$ yasboot cluster status -c yashandb -d hostid | node_type | nodeid | pid | instance_status | database_status | database_role | listen_address | data_path ------------------------------------------------------------------------------------------------------------------------------------------------- host0001 | db | 1-1:1 | 4894 | open | normal | primary | 192.168.56.101:1688 | /data/yashan/yasdb_data/db-1-1 ----------+-----------+--------+------+-----------------+-----------------+---------------+---------------------+-------------------------------- host0002 | db | 1-2:2 | 5606 | open | normal | standby | 192.168.56.102:1688 | /data/yashan/yasdb_data/db-1-2 ----------+-----------+--------+------+-----------------+-----------------+---------------+---------------------+--------------------------------
复制
保护模式切换
保护模式有三种:最大性能、最大可用、最大保护模式
模式 | 说明 |
---|---|
最大性能模式 | 主机事务提交无需等待备机复制结果,提交后即向客户端返回事务提交成功。该模式的优点为,主机性能较高,且备机断网或故障不会影响主机业务;缺点为,如果主机故障,备机升主后可能发生数据丢失。 |
最大可用模式 | 当任意同步备机连接正常时,采取最大保护模式,当所有同步备机断网或故障时,采取最大性能模式。最大可用模式提供尽可能的数据保护,和主数据库的可用性,但也存在数据丢失风险。 |
最大保护模式 | 主机事务提交后,必须等待同步备机将对应日志刷盘后,才能向客户端返回事务提交成功。该模式的优点为,如果主机故障,同步备机升主后可以保证数据不丢失;缺点为,如果同步备机断网或故障,将阻塞主机业务。 |
## 查看数据库保护模式 SQL> SELECT protection_mode, protection_level FROM V$DATABASE; PROTECTION_MODE PROTECTION_LEVEL --------------------------------- --------------------------------- MAXIMUM AVAILABILITY MAXIMUM AVAILABILITY 1 row fetched. ## 切换成最大性能模式(数据库在OPEN或者MOUNT状态下操作) SQL> ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PERFORMANCE; SQL> SELECT protection_mode, protection_level FROM V$DATABASE; PROTECTION_MODE PROTECTION_LEVEL --------------------------------- --------------------------------- MAXIMUM PERFORMANCE MAXIMUM PERFORMANCE 1 row fetched. ## 切换成最大保护模式(数据库在OPEN或者MOUNT状态下操作,建议先切换到最大可用,等待redo同步,再切换到最大保护模式) SQL> ALTER DATABASE SET STANDBY DATABASE TO MAXIMIZE PROTECTION; SQL> SELECT protection_mode, protection_level FROM V$DATABASE; PROTECTION_MODE PROTECTION_LEVEL --------------------------------- --------------------------------- MAXIMUM PROTECTION MAXIMUM PROTECTION 1 row fetched.
复制
GAP修复
模拟环境
1)关闭从库 yasql / as sysdba -c "shutdown immediate;" -- 主库可看到日志发送失败记录 tail -100f $YASDB_DATA/log/alert/alert.log 2023-12-02 21:13:26.864|22|StandbyDisconnect|0|0||redo sender failed to connect with standby 2023-12-02 21:17:27.446|22|StandbyDisconnect|0|1| 2023-12-02 21:17:57.959|26|Incident|0|0||the incident was controlled by flood, problem id=1, last incident id=5 2) 手动生成归档文件 SQL> alter system switch logfile; SQL> select name from v$archived_log; NAME ---------------------------------------------------------------- /data/yashan/yasdb_data/db-1-2/archive/arch_0_2.ARC /data/yashan/yasdb_data/db-1-2/archive/arch_0_3.ARC /data/yashan/yasdb_data/db-1-2/archive/arch_0_4.ARC 3)主库新增数据和归档 SQL> create table t1 (id number); SQL> insert into t1 values (1); SQL> insert into t1 values (2); SQL> commit; 4)手动生成归档文件 SQL> alter system switch logfile; SQL> alter system switch logfile; SQL> select name from v$archived_log; NAME ---------------------------------------------------------------- /data/yashan/yasdb_data/db-1-2/archive/arch_0_2.ARC /data/yashan/yasdb_data/db-1-2/archive/arch_0_3.ARC /data/yashan/yasdb_data/db-1-2/archive/arch_0_4.ARC /data/yashan/yasdb_data/db-1-2/archive/arch_0_5.ARC /data/yashan/yasdb_data/db-1-2/archive/arch_0_6.ARC /data/yashan/yasdb_data/db-1-2/archive/arch_0_7.ARC 5)模拟主库归档丢失 mv /data/yashan/yasdb_data/db-1-2/archive/arch_0_6.ARC /data/yashan/yasdb_data/db-1-2/archive/BAK_arch_0_6.ARC 6) 启动从库 yasboot cluster start -c yashandb -d
复制
查看主从同步情况
1)主库查看同步情况 SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized, GAP_STATUS FROM V$ARCHIVE_DEST_STATUS; DEST_ID CONNECTION STATUS DATABASE_MODE RECEIVED_LFN APPLIED_LFN SYNCHRONIZED GAP_STATUS ------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------ ----------------- 1 CONNECTED NORMAL OPEN 2224 2217 NO HAS GAP 1 row fetched. 2) 备库查看同步情况 SQL> SELECT connection, status, peer_addr, transport_lag, apply_lag, GAP_SEQ# FROM V$REPLICATION_STATUS; CONNECTION STATUS PEER_ADDR TRANSPORT_LAG APPLY_LAG GAP_SEQ# ----------------- ----------------- ---------------------------------------------------------------- ------------- ------------ ------------ CONNECTED NORMAL 192.168.56.102:1689 0 1800163 6 1 row fetched.
复制
上面主库查询中可以看到,GAP_STATUS的状态为HAS GAP,表示从库库已经出现了GAP,需要修复。从库查询中可以看到GAP_SEQ#为6,表示从库开始出现GAP的归档SEQUENCE为第6个。
主库执行命令修复GAP
1 查看备机standby_name SQL> select dest_id, dest_name, service, db_unique_name from V$ARCHIVE_DEST; DEST_ID DEST_NAME SERVICE DB_UNIQUE_NAME ------- ----------------- ---------------------------------------------------------------- --------------------------------- 1 ARCHIVE_DEST_1 192.168.56.101:1689 standby1 1 row fetched. 2 修复备机GAP SQL> BUILD DATABASE INCREMENTAL TO STANDBY standby1;
复制
主库查看同步情况
SQL> SELECT dest_id,connection,status,database_mode,received_lfn,applied_lfn,synchronized, GAP_STATUS FROM V$ARCHIVE_DEST_STATUS; DEST_ID CONNECTION STATUS DATABASE_MODE RECEIVED_LFN APPLIED_LFN SYNCHRONIZED GAP_STATUS ------- ----------------- ----------------- ----------------- --------------------- --------------------- ------------ ----------------- 1 CONNECTED NORMAL OPEN 2234 2234 YES NO GAP 1 row fetched.
复制
个人感受
- YashanDB的主备搭建简单,无需手工介入配置主从同步参数,yasboot工具可以直接实现主备架构自动化搭建,避免人工操作出错。
- YashanDB的主备切换步骤相较于Oracle简化了不少,简化了运维成本。
- YashanDB针对从库GAP的情况在主库上集成了命令可以处理,比较方便。
- 整个主备搭建和切换演练过程中,数据库日志记录较少,只记录了主从同步中归档发送失败的日志。