暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

【畅想】未来期货ctp中Tomcat高可用实现

恒生DBA公社 2021-04-21
542

 前言 

考虑到Tomcat 使用场景,很多客户都在考虑如何对Tomcat做高可用以实现业务的连续性。

Tomcat服务器是一个免费的开放源代码的Web 应用服务器,如期货的CTP历史就是通过Tomcat实现的查询服务。业务通过在Tomcat服务上部署静态的业务代码即可实现业务的访问。基于Tomcat的特性,单就高可用而言实现不需要考虑数据同步一致性的问题也不需要考虑负载均衡下的Session共享问题。我们只需要考虑如何保证软件的持续运行,类似的方案可以选择LinuxRHCS或者AIXHACMP/PowerHA这些方案。但是LinuxRHCS一旦出现故障很难去分析,可以分析的日志量非常的少,AIX系统运行的代价较高。当然Tomcat也有自己的集群,也可以通过Nginx在前端做负载和高可用端口检查。我们今天就讨论下基于Oracle 12C FlexClustertomcat高可用实现方案。




 FlexCluster架构 

如图FlexCluter和传统的RAC cluster区别在于其多出了一些叶节点(Leaf nodes),传统的RAC中的节点被称之为Hub节点。Hub就是和12c之前的RAC一样需要通过心跳连接私网,通过存储链路连接共享存储。而如果只Leaf节点不需要连接存储,只需要和Hub节点接入到同一个网络即可,其上面运行着Oracle12.2 Clusterware软件。Leaf节点之间也没有心跳线连接,Leaf节点之前是独立的。一个或多个Leaf节点连接到Hub节点上。Leaf节点故障会将资源在leaf节点之间做切换,其连接的Hub节点故障则会将leaf节点切换到其他存活的Hub节点上。通过这种松耦合的架构可以方便地对FlexCluster做横向扩展。

我们也可以将Leaf节点连上共享存储,继而在上面运行只读的Oracle实例。这个时候该只读实例称之为(Reader Nodes)。这将会是一个不错的读写分离的方案,不涉及数同步问题。因为他们访问的同一套存储。如果目前的架构中的读瓶颈在CPU上,可以通过横向扩展leaf节点来实现压力分散。




 Leaf节点应用及Reader Nodes 

Leaf节点的架构特点就是松耦合的架构。得益于leaf节点之前没有心跳,网络链接的链路要比传统相同节点数的RAC要少很多。在Leaf节点上可以运行大量的查询和并行,Leaf可以被认为是一个完全的计算节点。通过几个运算节点的资源并行提高数据分析的效率。同样可以在计算节点部署第三方应用,如HadoopHDFS。如图





 在FlexCluster Leaf节点上运行Tomcat服务 

OracleRAC(real application clusters) 真正的应用集群。Oracle的集群同样可以管理非Oracle的资源实现资源的节点间切换,进而实现软件层面的高可用。我们需要在集群中配置一个Tomcatvip资源,该资源启后会激活一个vip地址和Oracle RAC VIP一样可以在节点间切换。同时需要为tomcat的配置一个服务,由Oracle集群管理服务的启动和管理操作。Tomcatavip资源是tomcat资源的依赖资源。启动tomcat集群服务的前提是tomcatvip资源能够正常启动。如果tomcatvip资源故障则一起切换到其他节点leaf节点。





 部署Tomcat 

[root@we2flexdb3 soft]# mkdir -p u01/local/

[root@we2flexdb3 soft]# tar -zxvf apache-tomcat-7.0.42.tar.gz  -d u01/local/

[root@we2flexdb3 soft]# cd u01/local/

[root@we2flexdb3 soft]# mv apache-tomcat-7.0.42 tomcat

[root@we2flexdb3 soft]# cd u01/local/tomcat/bin/

[root@we2flexdb3 bin]# chmod 777 *.sh

[root@we2flexdb3bin]# sh startup.sh



 配置Oracle集群tomcat虚拟IP资源 

在测试环境中,flexdb3/flexdb4FlexClusterleaf节点。我们在flexdb3上开始配置tomcat服务使用的vip地址192.168.123.200


[root@we2flexdb3 ~]# appvipcfg create -h

Using configuration parameter file:  /u01/app/12.2.0/grid/crs/install/crsconfig_params

The log of current session can be found at:

   /u01/app/grid/crsdata/we2flexdb3/scripts/appvipcfg.log

  Usage: appvipcfg create  -network=<network_number> -ip=<ip_address>  -vipname=<vipname>

                           -user=<user_name>[-group=<group_name>]                              

                           [-resource_group=<rg_name>] [-failback=0 | 1]  

                   delete  -vipname=<vipname> [-force]

                   modify  -vipname=<vipname> [-network=<network_number>]  [-ip=<ipaddress>]

[root@we2flexdb3 ~]# srvctl add network -netnum 2 -subnet  192.168.123.0/255.255.255.0/ens33 -leaf

[root@we2flexdb3 ~]# appvipcfg create -network=2 -ip=192.168.123.200  -vipname=tomcatvip -user=root

   

[root@we2flexdb3 ~]# crsctl stat res tomcatvip

NAME=tomcatvip

TYPE=app.appviptypex2.type

TARGET=OFFLINE

STATE=OFFLINE

 

[root@we2flexdb3 ~]# crsctl start res tomcatvip

CRS-2672: Attempting to start 'ora.net2.network' on 'we2flexdb3'

CRS-2676: Start of 'ora.net2.network' on 'we2flexdb3' succeeded

CRS-2672: Attempting to start 'tomcatvip' on 'we2flexdb3'

CRS-2676: Start of 'tomcatvip' on 'we2flexdb3' succeeded

 

[root@we2flexdb3 ~]# ping -c 3 192.168.123.200

PING 192.168.123.200 (192.168.123.200) 56(84) bytes of data.

64 bytes from 192.168.123.200: icmp_seq=1 ttl=64 time=0.019 ms

64 bytes from 192.168.123.200: icmp_seq=2 ttl=64 time=0.041 ms

64 bytes from 192.168.123.200: icmp_seq=3 ttl=64 time=0.035 ms



 修改tomcat配置监听vip地址 增加红色部分 

[root@we2flexdb3 ~]# cd  /u01/local/tomcat/conf/

[root@we2flexdb3 conf]#  cat server.xml | more

 <Connector

        port="8080"  protocol="HTTP/1.1"

        connectionTimeout="20000"

        redirectPort="8443"

     >

     <Connector

        port="8080"  protocol="HTTP/1.1"

        address="192.168.123.200"

        maxPostSize="2147483647"

        connectionTimeout="20000"

        redirectPort="8443"

     >




 编辑集群配置文件 

集群配置文件主要负载启动、关闭、检查、清理tomcat的操作。后续由Oracle集群统一调配和切换实现高可用切换。检查的原理通过访问localhostvip8080端口下的一个测试文件检查当前servertomcat是否存活。通过echo $返回指令执行结果进行判断。


[root@we2flexdb3  public]# cat tomcat_action.sh

#!/bin/bash

 

WEBPAGECHECK=http://192.168.123.200:8080/icons/check.alive

case $1 in

'start')

    u01/local/tomcat/bin/startup.sh

   RET=$?

;;

sleep(10)

;;

'stop')

    u01/local/tomcat/bin/shutdown.sh

   RET=$?

;;

'clean')

    u01/local/tomcat/bin/shutdown.sh

   RET=$?

;;

'check')

    usr/bin/wget -q --delete-after  $WEBPAGECHECK

   RET=$?

;;

*)

   RET=0

;;

esac

# 0: success; 1 : error

if [ $RET -eq 0 ]; then

exit 0

else

exit 1

fi



 测试集群配置文件执行情况 

[root@we2flexdb3  public]# ls -l u01/local/tomcat/webapps/icons/check.alive

-rw-r--r-- 1 root root  0 Apr 20 17:45 u01/local/tomcat/webapps/icons/check.alive

 

[root@we2flexdb3  public]# chmod 777 tomcat_action.sh

[root@we2flexdb3  public]# sh tomcat_action.sh start

Using  CATALINA_BASE:   u01/local/tomcat

Using  CATALINA_HOME:   u01/local/tomcat

Using CATALINA_TMPDIR:  /u01/local/tomcat/temp

Using JRE_HOME:        /

Using CLASSPATH:        /u01/local/tomcat/bin/bootstrap.jar:/u01/local/tomcat/bin/tomcat-juli.jar

[root@we2flexdb3  public]#

[root@we2flexdb3  public]# echo $?

0

[root@we2flexdb3  public]#

[root@we2flexdb3  public]# sh tomcat_action.sh stop

Using CATALINA_BASE:   u01/local/tomcat

Using  CATALINA_HOME:   /u01/local/tomcat

Using CATALINA_TMPDIR:  /u01/local/tomcat/temp

Using JRE_HOME:        /

Using CLASSPATH:        /u01/local/tomcat/bin/bootstrap.jar:/u01/local/tomcat/bin/tomcat-juli.jar

[root@we2flexdb3 public]#  

[root@we2flexdb3  public]# echo $?

0

[root@we2flexdb3  public]#

[root@we2flexdb3  public]# sh tomcat_action.sh start

Using  CATALINA_BASE:   /u01/local/tomcat

Using  CATALINA_HOME:   /u01/local/tomcat

Using CATALINA_TMPDIR:  /u01/local/tomcat/temp

Using JRE_HOME:        /

Using CLASSPATH:        /u01/local/tomcat/bin/bootstrap.jar:/u01/local/tomcat/bin/tomcat-juli.jar

[root@we2flexdb3  public]#

[root@we2flexdb3  public]# sh tomcat_action.sh  check

[root@we2flexdb3  public]# echo $?

0

[root@we2flexdb3  public]#

[root@we2flexdb3  public]# sh tomcat_action.sh  clean

Using  CATALINA_BASE:   /u01/local/tomcat

Using  CATALINA_HOME:   /u01/local/tomcat

Using CATALINA_TMPDIR:  /u01/local/tomcat/temp

Using JRE_HOME:        /

Using CLASSPATH:       /u01/local/tomcat/bin/bootstrap.jar:/u01/local/tomcat/bin/tomcat-juli.jar

[root@we2flexdb3  public]#

[root@we2flexdb3  public]# echo $?

0



 添加tomcat资源到Oracle集群 

crsctl add resource  tomcatha -type cluster_resource

-attr "ACTION_SCRIPT=/u01/app/12.2.0/grid/crs/public/tomcat_action.sh,                                                                AUTO_START=always,          

CHECK_INTERVAL=10,OFFLINE_CHECK_INTERVAL=10,

RELOCATE_BY_DEPENDENCY=1,SCRIPT_TIMEOUT=60, 

RESTART_ATTEMPTS=10,                                                                                                                                         START_DEPENDENCIES=hard(tomcatvip),

 STOP_DEPENDENCIES=hard(tomcatvip),   

HOSTING_MEMBERS=we2flexdb3 we2flexdb4                                                       PLACEMENT=restricted"

 

Action_script 为集群调用的启动关闭脚本文件

Auto_start 机器重启后自动启动tomcatha资源

Check_* 为检查资源状态的时间间隔

参数的详细说明可以参考官方手册:

https://docs.oracle.com/en/database/oracle/oracle-database/12.2/cwadd/making-applications-highly-available-using-oracle-clusterware.html#GUID-30DC08E5-72CE-4E93-B6D6-E209DEADF83E




 资源切换测试





 日志分析 


脚本日志位于leaf节点的:

/u01/app/grid/diag/crs/we2flexdb4/crs/trace/ crsd_scriptagent_root.trc

两个leaf节点的crsd_scriptagent_root.trc分别记录的资源在原始节点oflline及在新的节点online的过程


2018-04-23 14:30:50.790 :CLSDYNAM:2277496576:  [tomcatha]{101:17472:1060} [check] Executing action script:  /u01/app/12.2.0/grid/crs/public/tomcat_action.sh[check]

2018-04-23 14:30:50.790 :     AGFW:2273294080: {101:17472:1060} Agent sending reply for:  RESOURCE_STOP[tomcatha 1 1] ID 4099:4770

2018-04-23  14:30:51.944 :    AGFW:2273294080:  {101:17472:1060} tomcatha 1 1 state changed from: STOPPING to: OFFLINE

2018-04-23 14:30:51.944 :     AGFW:2273294080: {101:17472:1060} Switching online monitor to offline  one

2018-04-23 14:30:51.944 :     AGFW:2273294080: {101:17472:1060} Starting offline monitor

2018-04-23 14:30:51.944 :     AGFW:2273294080: {101:17472:1060} Started implicit monitor for  [tomcatha 1 1] interval=10000 delay=10000

2018-04-23 14:30:51.944 :     AGFW:2273294080: {101:17472:1060} Agent sending last reply for:  RESOURCE_STOP[tomcatha 1 1] ID 4099:4770

2018-04-23 14:31:01.946 :CLSDYNAM:2277496576:  [tomcatha]{101:17472:1060} [check] Executing action script:  /u01/app/12.2.0/grid/crs/public/tomcat_action.sh[check]

2018-04-23 14:31:03.499 : CRSCOMM:2485782272:  IpcC: IPC client connection 19 to member 0  has been removed

2018-04-23 14:31:03.499 :CLSFRAME:2485782272:  Removing IPC  Member:{Relative|Node:0|Process:0|Type:1}

2018-04-23 14:31:03.499 :CLSFRAME:2485782272:  Disconnected from CRSD:we2flexdb4 process:  {Relative|Node:0|Process:0|Type:1}

2018-04-23 14:31:03.499 :    AGENT:2273294080: {0:19:35} {0:19:35} Created alert : (:CRSAGF00117:)  :  Disconnected from server, Agent is  shutting down.

2018-04-23 14:31:03.500 :    AGENT:2273294080: {0:19:35} Agfw calling user exitCB, will exit on  return

2018-04-23 14:31:03.500 : USRTHRD:2273294080: {0:19:35} Script  agent is exiting..

 

2018-04-23 14:31:03.500 :    AGENT:2273294080: {0:19:35} Agent is exiting with exit code: 2

2018-04-23 14:31:03.413 :     AGFW:1976366848: {101:17472:1060} tomcatha 1 1 state changed from:  UNKNOWN to: STARTING

2018-04-23  14:31:03.414 :CLSDYNAM:1980569344: [tomcatha]{101:17472:1060} [start]  Executing action script:  /u01/app/12.2.0/grid/crs/public/tomcat_action.sh[start]

2018-04-23  14:31:03.465 :    AGFW:1980569344:  {101:17472:1060} Command: start for resource: tomcatha 1 1 completed with  status: SUCCESS

2018-04-23 14:31:03.466 :CLSDYNAM:1980569344:  [tomcatha]{101:17472:1060} [check] Executing action script:  /u01/app/12.2.0/grid/crs/public/tomcat_action.sh[check]

2018-04-23 14:31:03.466 :CLSFRAME:2180321344:  TM [MultiThread] is changing desired thread  # to 3. Current # is 2

2018-04-23 14:31:03.467 :     AGFW:1976366848: {101:17472:1060} Agent sending reply for:  RESOURCE_START[tomcatha 1 1] ID 4098:617

2018-04-23 14:31:03.518 :     AGFW:1976366848: {101:17472:1060} tomcatha 1 1 state changed from:  STARTING to: OFFLINE

2018-04-23 14:31:03.518 :     AGFW:1976366848: {101:17472:1060} Starting offline monitor

2018-04-23 14:31:03.518 :     AGFW:1976366848: {101:17472:1060} Started implicit monitor for  [tomcatha 1 1] interval=10000 delay=10000

2018-04-23 14:31:03.518 :     AGFW:1976366848: {101:17472:1060} Agent sending last reply for:  RESOURCE_START[tomcatha 1 1] ID 4098:617

2018-04-23 14:31:13.520 :CLSDYNAM:1984771840:  [tomcatha]{101:17472:1060} [check] Executing action script:  /u01/app/12.2.0/grid/crs/public/tomcat_action.sh[check]

2018-04-23  14:31:13.672 :    AGFW:1976366848:  {101:17472:1060} tomcatha 1 1 state changed from: OFFLINE to: ONLINE



 总结

期货的CTP历史目前通过Tomcat实现的查询服务。业务通过在Tomcat服务上部署静态的业务代码实现业务的访问,考虑到业务连续性的需求,通过对Tomcat做高可用不失为另一种可选选项,尤其以后大量面临生产系统上oracle 12c后,合理利用好新特性的优势可以给未来提供更多的可能性。



文章转载自恒生DBA公社,如果涉嫌侵权,请发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论