第 1 章** 环境说明
1.1 官方文档
Greenplum官方安装说明:
https://gpdb.docs.pivotal.io/5160/install_guide/install_extensions.html
1.2 系统要求
Operating System | *• Red Hat Enterprise Linux 64-bit 7.x (See the following Note)********• Red Hat Enterprise Linux 64-bit 6.x********• SuSE Linux Enterprise Server 64-bit 12 SP2 and SP3 with kernel version greater than 4.4.73-5. (See********the following Note)********• SuSE Linux Enterprise Server 64-bit 11 SP4 (See the following Note)********• CentOS 64-bit 7.x********• CentOS 64-bit 6.x* |
---|---|
File Systems | • xfs required for data storage on SUSE Linux and Red Hat (ext3 supported for root file system) |
Minimum CPU | Pentium Pro compatible (P3/Athlon and above) |
Minimum Memory | 16 GB RAM per server |
---|---|
Disk Requirements | • 150MB per host for Greenplum installation • Approximately 300MB per segment instance for meta data • Appropriate free space for data with disks at no more than 70% capacity • High-speed, local storage |
Network Requirements | 10 Gigabit Ethernet within the array Dedicated, non-blocking switch NIC bonding is recommended when multiple interfaces are present |
Software and Utilities | zlib compression libraries bash shell GNU tars GNU zip GNU sed (used by Greenplum Database gpinitsystem) perl secure shell |
*Important********😗 SSL is supported only on the Greenplum Database master host system.
1.3 搭建环境介绍
操作系统:CentOS Linux release 7.4.1708 (Core)
机器型号 | PowerEdge R330 *4 |
---|---|
CPU | Intel® Xeon® CPU E3-1220 v5 @ 3.00GHz *4物理核心(16cores) |
内存 | 16G |
磁盘 | 4T |
Swap | 32G |
第 2 章 安装架构
2.1 安装大致步骤
Perform the following tasks in order:
*1.* Make sure your systems meet the System Requirements
*2.* Setting the Greenplum Recommended OS Parameters
*3.* (master only) Creating the Greenplum Database Administrative User Account
*4.* (master only) Installing the Greenplum Database Software
*5.* Installing and Configuring Greenplum on all Hosts
*6.* (Optional) Installing Oracle Compatibility Functions
*7.* (Optional) Installing Optional Modules
*8.* (Optional) Installing Greenplum Database Extensions
*9.* (Optional) Installing and Configuring the Greenplum Platform Extension Framework (PXF)
****10.****Creating the Data Storage Areas
****11.****Synchronizing System Clocks
****12.****Next Steps
2.2 软件信息
greenplum-db-5.10.2-rhel6-x86_64.zip是mpp软件,
greenplum-cc-web-4.3.1-LINUX-x86_64.zip是web监控平台
2.3 架构说明
准备 4 台服务器, 1 台做 master, 3台都做存储共部署 6 个segment 及其镜像
IP | 主机名 | cpu | 内存 | 组件规划 |
---|---|---|---|---|
10.102.254.24 | sdw1 | 16 | 16 | 4*segment |
10.102.254.25 | sdw2 | 16 | 16 | 4*segment |
10.102.254.26 | sdw3 | 16 | 16 | 4*segment |
10.102.254.27 | mdw1 | 16 | 16 | master |
=架构目标==
mdw | sdw1 | sdw2 | sdw3 |
---|---|---|---|
master | seg0p seg1p seg5m seg4m |
seg2p seg3p seg0m seg1m |
seg4p seg5p seg2m seg3m smdw |
第 3 章 安装架构
3.1 关闭防火墙和selinux-所有节点
systemctl stop firewalld.service
systemctl disable firewalld.service
iptables -F
vi /etc/selinux/config SELINUX=DISABLED
# sestatus SELinuxstatus: disabled
复制
3.2 修改host文件 -所有节点
-
1. vi /etc/hosts 2. 3. 10.102.254.24 sdw1 4. 10.102.254.25 sdw2 5. 10.102.254.26 sdw3 6. 10.102.254.27 mdw
复制
3.2 修改主机名
hostnamectl set-hostname sdw1 hostnamectl status 状态 /etc/sysconfig/network
复制
3.3 系统内核参数优化
vi /etc/sysctl.conf
kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 250 512000 100 2048
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1025 65535
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.overcommit_memory = 2
sysctl -p
cat > /etc/sysctl.conf << EOF
\# sysctl settings are defined through files in
\# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
\#
\# Vendors settings live in /usr/lib/sysctl.d/.
\# To override a whole file, create a new file with the same in
\# /etc/sysctl.d/ and put new settings there. To override
\# only specific settings, add a file with a lexically later
\# name in /etc/sysctl.d/ and put new settings there.
\#
\# For more information, see sysctl.conf(5) and sysctl.d(5).
kernel.shmmax = 500000000
kernel.shmmni = 4096
kernel.shmall = 4000000000
kernel.sem = 500 1024000 200 4096
kernel.sysrq = 1
kernel.core_uses_pid = 1
kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.msgmni = 2048
net.ipv4.tcp_syncookies = 1
net.ipv4.ip_forward = 0
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_max_syn_backlog = 4096
net.ipv4.conf.all.arp_filter = 1
net.ipv4.ip_local_port_range = 1025 65535
net.core.netdev_max_backlog = 10000
net.core.rmem_max = 2097152
net.core.wmem_max = 2097152
vm.overcommit_memory = 2
vm.swappiness = 1
kernel.pid_max = 655350
EOF
sysctl -p
复制
3.4 修改Linux最大限制
cat /etc/security/limits.conf vi /etc/security/limits.conf * soft nofile 65536 * hard nofile 65536 * soft nproc 131072 * hard nproc 131072 cat > /etc/security/limits.conf << EOF * soft nofile 65536 * hard nofile 65536 * soft nproc 131072 * hard nproc 131072 EOF
复制
如何是rhel 6.x 请注意 /etc/security/limits.d/90-nproc.conf,详细情况请见文档
3.5 设备与IO-文件系统
设置XFS文件系统并挂载
EXT4是第四代扩展文件系统(英语:Fourth EXtended filesystem,缩写为ext4)是Linux系统下的日志文件系统,是ext3文件系统的后继版本。
Ext4的文件系统容量达到1EB,而文件容量则达到16TB,这是一个非常大的数字了。对一般的台式机和服务器而言,这可能并不重要,但对于大型磁盘阵列的用户而言,这就非常重要了。
XFS是一个64位文件系统,最大支持8EB减1字节的单个文件系统,实际部署时取决于宿主操作系统的最大块限制。对于一个32位Linux系统,文件和文件系统的大小会被限制在16TB。
二者各有特点,而性能表现基本上是差不多的。例如,谷歌公司就考虑将EXT2系统升级,最终确定为EXT4系统。谷歌公司表示,他们还考虑过XFS和JFS。结果显示,EXT4和XFS的表现类似,不过从EXT2升级到EXT4比升级到XFS容易。
例子:
cat >> /etc/fstab << EOF
/dev/sdb1 /greenplum xfs rw,nodev,noatime,inode64,allocsize=16m 0 0
EOF
rw,nodev,noatime,nobarrier,inode64
cat /etc/fstab
3.6 磁盘访问策略
Linux磁盘I/O调度器对磁盘的访问支持不同的策略,默认的为CFQ,GP建议设置为deadline
查看磁盘的I/O调度策略,看到默认的为[cfq]
The deadline scheduler option is recommended. To specify a scheduler until the next system reboot,
run the following:
# echo schedulername > /sys/block/devname/queue/scheduler
echo deadline > /sys/block/sda/queue/scheduler
linux 7
# grubby --update-kernel=ALL --args=“elevator=deadline”
grubby --info=ALL
3.7 调整磁盘预读扇区数
fdisk -l
检查
/sbin/blockdev --getra /dev/sda
设置
/sbin/blockdev --setra 16384 /dev/sda
在参数文件 /etc/rc.d/rc.local 中增加
DELL : blockdev --setra 16384 /dev/sd* (红色部分为硬盘设备标识) HP:blockdev --setra 16384 /dev/cciss/c?d?*
3.8 禁用THP
On systems that use grub2 such as RHEL 7.x or CentOS 7.x, use the system utility grubby. This
command adds the parameter when run as root.
# grubby --update-kernel=ALL --args=“transparent_hugepage=never”
After adding the parameter, reboot the system.
This cat command checks the state of THP. The output indicates that THP is disabled.
$ cat /sys/kernel/mm/*transparent_hugepage/enabled
always [never]
服务方式注册
# 创建 init.d 脚本
echo '#!/bin/sh
case $1 in
start)
if [ -d /sys/kernel/mm/transparent_hugepage ]; then
thp_path=/sys/kernel/mm/transparent_hugepage
elif [ -d /sys/kernel/mm/redhat_transparent_hugepage ]; then
thp_path=/sys/kernel/mm/redhat_transparent_hugepage
else
exit 0
fi
echo never > ${thp_path}/enabled
echo never > ${thp_path}/defrag
unset thp_path
;;
esac’ > /etc/init.d/disable-transparent-hugepages
# 注册systemd文件
echo '[Unit]
Description=Disable Transparent Hugepages
After=multi-user.target
[Service]
ExecStart=/etc/init.d/disable-transparent-hugepages start
Type=simple
[Install]
WantedBy=multi-user.target’ > /etc/systemd/system/disable-thp.service
# 磁盘预读扇区数
/sbin/blockdev --getra /dev/sdb1 # 查看大小
/sbin/blockdev --setra 65535 /dev/sdb1 # 设置大小
# 创建 init.d 脚本
echo '#!/bin/sh
device_name=/dev/sdb1
case $1 in
start)
if mount | grep "^${device_name}" > /dev/null
;then
/sbin/blockdev --setra 65535 ${device_name}
else
exit 0
fi
unset device_name
;;
esac’ > /etc/init.d/blockdev-setra-sdb
# 注册systemd文件
echo '[Unit]
Description=Blocdev --setra N
After=multi-user.target
[Service]
ExecStart=/etc/init.d/blockdev-setra-sdb start
Type=simple
[Install]
WantedBy=multi-user.target’ > /etc/systemd/system/blockdev-setra-sdb.service
# 授权并设置开机启动
chmod 755 /etc/init.d/disable-transparent-hugepages
chmod 755 /etc/init.d/blockdev-setra-sdb
chmod 755 /etc/systemd/system/disable-thp.service
chmod 755 /etc/systemd/system/blockdev-setra-sdb.service
systemctl enable disable-thp blockdev-setra-sdb
3.9 Disable IPC object removal for RHEL 7 or CentOS 7
Set this parameter in /etc/systemd/logind.conf on the Greenplum
Database host systems.
RemoveIPC=no
The setting takes effect after restarting the systemd-login service or rebooting the system. To
restart the service, run this command as the root user.
service systemd-logind restart
cat /etc/systemd/logind.conf
3.10 时间同步
/etc/chrony.conf
systemctl status chronyd.service --查看状态
systemctl start chronyd.service --启动
systemctl enable chronyd.service --使其开机自启
systemctl status chronyd.service
server 10.1.3.1 prefer
查看时间同步源
chronyc sources -v
chronyc sourcestats -v
3.11 控制ssh连接数
/etc/ssh/sshd_config
MaxStartups 10:30:200
systemctl restart sshd.service
3.12 系统依赖包
yum -y install epel-release
yum -y install wget cmake3 git gcc gcc-c++ bison flex libedit-devel zlib zlib-devel perl-devel perl-ExtUtils-Embed python-devel libevent libevent-devel libxml2 libxml2-devel libcurl libcurl-devel bzip2 bzip2-devel net-tools libffi-devel openssl-devel
第 4章 安装软件
4.1 创建用户和组
master only
# groupadd gpadmin # useradd gpadmin -g gpadmin # passwd gpadmin New password: <changeme> Retype new password: <changeme> echo gpadmin | passwd gpadmin --stdin
复制
4.2 root用户解压缩和安装
./greenplum-db-5.10.2-rhel6-x86_64.bin I HAVE READ AND AGREE TO THE TERMS OF THE ABOVE PIVOTAL SOFTWARE LICENSE AGREEMENT. ******************************************************************************** Do you accept the Pivotal Database license agreement? [yes|no] ******************************************************************************** yes ******************************************************************************** Provide the installation path for Greenplum Database or press ENTER to accept the default installation path: /usr/local/greenplum-db-5.10.2 ******************************************************************************** ******************************************************************************** Install Greenplum Database into /usr/local/greenplum-db-5.10.2? [yes|no] ******************************************************************************** yes ******************************************************************************** /usr/local/greenplum-db-5.10.2 does not exist. Create /usr/local/greenplum-db-5.10.2 ? [yes|no] (Selecting no will exit the installer) ******************************************************************************** 安装完成后授权 \# chown -R gpadmin /usr/local/greenplum*(在创建gpadmin后执行) \# chgrp -R gpadmin /usr/local/greenplum*(在创建gpadmin后执行)
复制
4.3 编辑环境变量
cat >> .bashrc << EOF export MASTER_DATA_DIRECTORY=/greenplum/gpdata/master/gpseg-1 source /usr/local/greenplum-db/greenplum_path.sh EOF source .bashrc cat >> /home/gpadmin/.bash_profile <<EOF export MASTER_DATA_DIRECTORY=/greenplum/gpdata/master/gpseg-1 source /usr/local/greenplum-db/greenplum_path.sh export PGPORT=5432 export PGDATABASE=archdata EOF source /home/gpadmin/.bash_profile
复制
4.4 进行文件配置
切换root
source /usr/local/greenplum-db/greenplum_path.sh
------只在mdw,smdw执行
mkdir /home/gpadmin/gpconfig
chown -R gpadmin:gpadmin /home/gpadmin/gpconfig
------只在mdw,smdw执行
cat >> /home/gpadmin/gpconfig/all_host <<EOF
mdw
sdw1
sdw2
sdw3
EOF
------只在mdw,smdw执行
cat >> /home/gpadmin/gpconfig/all_segment <<EOF
sdw1
sdw2
sdw3
EOF
chown -R gpadmin:gpadmin /home/gpadmin/gpconfig/all_host
chown -R gpadmin:gpadmin /home/gpadmin/gpconfig/all_segment
4.5 设置主机免密码登陆 -
source /usr/local/greenplum-db/greenplum_path.sh
/usr/local/greenplum-db/bin/gpssh-exkeys -f /home/gpadmin/gpconfig/all_host
4.6 确认检查主机连接状态
gpssh -f /home/gpadmin/gpconfig/all_host -e “ls -l”
4.7 批量创建其他节点的用户
gpssh -f /home/gpadmin/gpconfig/all_segment
groupadd gpadmin
useradd gpadmin -g gpadmin
passwd gpadmin
echo gpadmin | passwd gpadmin --stdin
4.8 gpadmin用户-互信
source /usr/local/greenplum-db/greenplum_path.sh
/usr/local/greenplum-db/bin/gpssh-exkeys -f /home/gpadmin/gpconfig/all_host
gpssh -f /home/gpadmin/gpconfig/all_host -e “ls -l”
4.9 检查时间同步
gpssh -f /home/gpadmin/gpconfig/all_host -e “date”
4.10 分发所有seg节点软件
root执行
source /usr/local/greenplum-db/greenplum_path.sh
gpseginstall -f /home/gpadmin/gpconfig/all_host -u gpadmin -p gpadmin
4.11 检查安装情况
o Log in as the gpadmin user and source
• source /usr/local/greenplum-db/greenplum_path.sh
o Use the gpssh utility to see if you can login to all hosts without a password prompt
•
4.12 创建相关目录(root用户)
mkdir -p /greenplum/gpdata/master
chown gpadmin:gpadmin /greenplum/gpdata/master
gpssh -f /home/gpadmin/gpconfig/all_segment -e ‘mkdir -p /greenplum/gpdata/primary1’
gpssh -f /home/gpadmin/gpconfig/all_segment -e ‘mkdir -p /greenplum/gpdata/primary2’
gpssh -f /home/gpadmin/gpconfig/all_segment -e ‘chown -R gpadmin:gpadmin /greenplum/gpdata’
gpssh -f /home/gpadmin/gpconfig/all_segment -e ‘mkdir -p /greenplum/gpdata/mirror1’
gpssh -f /home/gpadmin/gpconfig/all_segment -e ‘mkdir -p /greenplum/gpdata/mirror2’
gpssh -f /home/gpadmin/gpconfig/all_segment -e ‘chown -R gpadmin:gpadmin /greenplum/gpdata’
or 批量创建
gpssh -f /home/gpadmin/gpconfig/all_segment -e ‘mkdir -p /greenplum/gpdata/primary{1…2}’
gpssh -f /home/gpadmin/gpconfig/all_segment -e ‘chown -R gpadmin:gpadmin /greenplum/gpdata’
4.13 验证系统
检查系统参数和测试性能
检查命令:gpcheck -f host_file -m mdw -ssmdw
Validating Hardware Performance
o gpcheckperf can be used to identify hardware and system-level issues on the machines in your Greenplum
Database array.
o Network Performance (gpnetbench*)
• gpcheckperf -f hostfile_gpchecknet_ic1 -r N -d /tmp > subnet1.out
o Disk I/O Performance (dd test) & Memory Bandwidth (stream test)
• gpcheckperf -f hostfile_gpcheckperf -r ds -D -d /data/primary -d /data/mirror
4.14 验证OS配置
source /usr/local/greenplum-db/greenplum_path.sh
gpcheck -f /home/gpadmin/gpconfig/all_host -m mdw
验证硬件性能–这个需要确认(网络和IO)
gpcheckperf -f /home/gpadmin/gpconfig/all_host -r N -d /tmp > checknetwork.out [root@mdw greenplum-db]# cat checknetwork.out /usr/local/greenplum-db/./bin/gpcheckperf -f /home/gpadmin/gpconfig/all_host -r N -d /tmp \------------------- -- NETPERF TEST \------------------- ==================== == RESULT ==================== Netperf bisection bandwidth test mdw -> sdw1 = 112.340000 sdw2 -> sdw3 = 112.340000 sdw1 -> mdw = 112.330000 sdw3 -> sdw2 = 112.330000 Summary: sum = 449.34 MB/sec min = 112.33 MB/sec max = 112.34 MB/sec avg = 112.33 MB/sec median = 112.34 MB/sec gpcheckperf -f /home/gpadmin/gpconfig/all_host -r ds -D -d /greenplum/gpdata/primary1 -d /greenplum/gpdata/mirror1 > checkDISKIO.out [root@mdw greenplum-db]# gpcheckperf -f /home/gpadmin/gpconfig/all_host -r ds -D -d /greenplum/gpdata/primary1 -d /greenplum/gpdata/mirror1 /usr/local/greenplum-db/./bin/gpcheckperf -f /home/gpadmin/gpconfig/all_host -r ds -D -d /greenplum/gpdata/primary1 -d /greenplum/gpdata/mirror1 \-------------------- -- DISK WRITE TEST \-------------------- \-------------------- -- DISK READ TEST \-------------------- \-------------------- -- STREAM TEST \-------------------- ==================== == RESULT ==================== disk write avg time (sec): 20.88 disk write tot bytes: 132920115200 disk write tot bandwidth (MB/s): 6074.65 disk write min bandwidth (MB/s): 1476.04 [ mdw] disk write max bandwidth (MB/s): 1551.18 [sdw3] -- per host bandwidth -- disk write bandwidth (MB/s): 1476.04 [ mdw] disk write bandwidth (MB/s): 1537.63 [sdw1] disk write bandwidth (MB/s): 1509.80 [sdw2] disk write bandwidth (MB/s): 1551.18 [sdw3] disk read avg time (sec): 59.80 disk read tot bytes: 132920115200 disk read tot bandwidth (MB/s): 2175.57 disk read min bandwidth (MB/s): 454.54 [sdw2] disk read max bandwidth (MB/s): 700.04 [sdw1] -- per host bandwidth -- disk read bandwidth (MB/s): 520.03 [ mdw] disk read bandwidth (MB/s): 700.04 [sdw1] disk read bandwidth (MB/s): 454.54 [sdw2] disk read bandwidth (MB/s): 500.96 [sdw3] stream tot bandwidth (MB/s): 49348.52 stream min bandwidth (MB/s): 12297.76 [ mdw] stream max bandwidth (MB/s): 12388.57 [sdw2] -- per host bandwidth -- stream bandwidth (MB/s): 12297.76 [ mdw] stream bandwidth (MB/s): 12321.47 [sdw1] stream bandwidth (MB/s): 12388.57 [sdw2] stream bandwidth (MB/s): 12340.73 [sdw3]
复制
第 5 章 初始化database
- copy配置文件
cp $GPHOME/docs/cli_help/gpconfigs/gpinitsystem_config /home/gpadmin/gpconfig/gpinitsystem_config
复制
- 设置初始化系统文件列表
cat >> /home/gpadmin/gpconfig/hostfile_gpinitsystem <<EOF sdw1 sdw2 sdw3 EOF chown -R gpadmin:gpadmin /home/gpadmin/gpconfig/gpinitsystem_config chown -R gpadmin:gpadmin /home/gpadmin/gpconfig/hostfile_gpinitsystem
复制
- 调整参数
ARRAY_NAME="EMC Greenplum DW" PORT_BASE=40000 SEG_PREFIX=gpseg declare -a DATA_DIRECTORY=(/greenplum/gpdata/primary1 /greenplum/gpdata/primary2) MASTER_HOSTNAME=mdw MASTER_DIRECTORY=/greenplum/gpdata/master MASTER_PORT=5432 TRUSTED_SHELL=ssh CHECK_POINT_SEGMENTS=8 ENCODING=UNICODE MIRROR_PORT_BASE=50000 REPLICATION_PORT_BASE=41000 MIRROR_REPLICATION_PORT_BASE=51000 declare -a MIRROR_DATA_DIRECTORY=(/greenplum/gpdata/mirror1 /greenplum/gpdata/mirror2) vim /home/gpadmin/gpconfig/gpinitsystem_config 修改如下 [gpadmin@mdw ~]$ cat /home/gpadmin/gpconfig/gpinitsystem_config \# FILE NAME: gpinitsystem_config \# Configuration file needed by the gpinitsystem \################################################ \#### REQUIRED PARAMETERS \################################################ \#### Name of this Greenplum system enclosed in quotes. ARRAY_NAME="Greenplum Data Platform" \#### Naming convention for utility-generated data directories. SEG_PREFIX=gpseg \#### Base number by which primary segment port numbers \#### are calculated. PORT_BASE=40000 \#### File system location(s) where primary segment data directories \#### will be created. The number of locations in the list dictate \#### the number of primary segments that will get created per \#### physical host (if multiple addresses for a host are listed in \#### the hostfile, the number of segments will be spread evenly across \#### the specified interface addresses). declare -a DATA_DIRECTORY=(/greenplum/gpdata/primary1 /greenplum/gpdata/primary2) \#### OS-configured hostname or IP address of the master host. MASTER_HOSTNAME=mdw \#### File system location where the master data directory \#### will be created. MASTER_DIRECTORY=/greenplum/gpdata/master \#### Port number for the master instance. MASTER_PORT=5432 \#### Shell utility used to connect to remote hosts. TRUSTED_SHELL=ssh \#### Maximum log file segments between automatic WAL checkpoints. CHECK_POINT_SEGMENTS=8 \#### Default server-side character set encoding. ENCODING=UNICODE \################################################ \#### OPTIONAL MIRROR PARAMETERS \################################################ \#### Base number by which mirror segment port numbers \#### are calculated. MIRROR_PORT_BASE=50000 \#### Base number by which primary file replication port \#### numbers are calculated. REPLICATION_PORT_BASE=41000 \#### Base number by which mirror file replication port \#### numbers are calculated. MIRROR_REPLICATION_PORT_BASE=51000 \#### File system location(s) where mirror segment data directories \#### will be created. The number of mirror locations must equal the \#### number of primary locations as specified in the \#### DATA_DIRECTORY parameter. declare -a MIRROR_DATA_DIRECTORY=(/greenplum/gpdata/mirror1 /greenplum/gpdata/mirror2) \################################################ \#### OTHER OPTIONAL PARAMETERS \################################################ \#### Create a database of this name after initialization. \#DATABASE_NAME=name_of_database \#### Specify the location of the host address file here instead of \#### with the the -h option of gpinitsystem. \#MACHINE_LIST_FILE=/home/gpadmin/gpconfigs/hostfile_gpinitsystem
复制
- 初始化database
gpadmin用户
gpinitsystem -c /home/gpadmin/gpconfig/gpinitsystem_config -h /home/gpadmin/gpconfig/hostfile_gpinitsystem 如何添加master standby和修改mirror分布策略spread mirror gpinitsystem -c gpconfigs/gpinitsystem_config -h gpconfigs/hostfile_gpinitsystem -s standby_master_hostname -S (with a standby master and a spread mirror configuration)
复制
- 检查环境变量
MASTER_DATA_DIRECTORY=/data/master/gpseg-1 GPHOME=/usr/local/greenplum-db PGDATABASE=gpadmin [gpadmin@mdw ~]$ cat .bash_profile \# .bash_profile \# Get the aliases and functions if [ -f ~/.bashrc ]; then . ~/.bashrc fi \# User specific environment and startup programs PATH=$PATH:$HOME/.local/bin:$HOME/bin export PATH export MASTER_DATA_DIRECTORY=/greenplum/gpdata/master/gpseg-1 source /usr/local/greenplum-db/greenplum_path.sh export PGPORT=5432 export PGDATABASE=archdata
复制
第 6 章 连接测试
6.1设置gpadmin远程访问密码
psql postgres gpadmin
alter user gpadmin encrypted password ‘gpadmin’;
\q
6.2查询测试
psql -hmdw -p 5432 -d postgres -U gpadmin -c ‘select dfhostname, dfspace,dfdevice from gp_toolkit.gp_disk_free order by dfhostname;’
[gpadmin@mdw ~]$ psql -hmdw -p 5432 -d postgres -U gpadmin -c ‘select dfhostname, dfspace,dfdevice from gp_toolkit.gp_disk_free order by dfhostname;’
dfhostname | dfspace | dfdevice
------------±---------±---------------------------
sdw1 | 98708120 | /dev/mapper/VolGroup-root
sdw1 | 98708120 | /dev/mapper/VolGroup-root
sdw2 | 98705600 | /dev/mapper/VolGroup-root
sdw2 | 98705600 | /dev/mapper/VolGroup-root
sdw3 | 98705144 | /dev/mapper/VolGroup-root
sdw3 | 98705144 | /dev/mapper/VolGroup-root
(6 rows)
psql -h hmdw -p 5432 -d postgres -U gpadmin -c ‘\l+’
[gpadmin@mdw ~]$ psql -h mdw -p 5432 -d postgres -U gpadmin -c ‘\l+’
List of databases
Name | Owner | Encoding | Access privileges | Size | Tablespace | Description
-----------±--------±---------±--------------------±------±-----------±--------------------------
postgres | gpadmin | UTF8 | | 73 MB | pg_default |
template0 | gpadmin | UTF8 | =c/gpadmin | 72 MB | pg_default |
: gpadmin=CTc/gpadmin
template1 | gpadmin | UTF8 | =c/gpadmin | 73 MB | pg_default | default template database
: gpadmin=CTc/gpadmin
(3 rows)
[gpadmin@mdw ~]$
第 7 章 常用命令
-
数据库启动:gpstart
常用可参数: -a : 直接启动,不提示终端用户输入确认
-m:只启动master 实例,主要在故障处理时使用 -
数据库停止:gpstop:
常用可参数:-a:直接停止,不提示终端用户输入确认
-m:只停止master 实例,与gpstart –m 对应使用
-M fast:停止数据库,中断所有数据库连接,回滚正在运
行的事务
-u:不停止数据库,只加载pg_hba.conf 和postgresql.conf中运行时参数,当改动参数配置时候使用。
评:-a用在shell里,最多用的还是-M fast。 -
查看实例配置和状态
select * from gp_configuration order by 1 ;
主要字段说明:
Content:该字段相等的两个实例,是一对P(primary instance)和M(mirror
Instance)
Isprimary:实例是否作为primary instance 运行
Valid:实例是否有效,如处于false 状态,则说明该实例已经down 掉。
Port:实例运行的端口
Datadir:实例对应的数据目录 -
gpstate :显示Greenplum数据库运行状态,详细配置等信息
常用可参数:-c:primary instance 和 mirror instance 的对应关系
-m:只列出mirror 实例的状态和配置信息
-f:显示standby master 的详细信息
-Q:显示状态综合信息
该命令默认列出数据库运行状态汇总信息,常用于日常巡检。
评:最开始由于网卡驱动的问题,做了mirror后,segment经常down掉,用-Q参数查询综合信息还是比较有用的。 -
查看用户会话和提交的查询等信息
select * from pg_stat_activity 该表能查看到当前数据库连接的IP 地址,用户名,提交的查询等。另外也可以在master 主机上查看进程,对每个客户端连接,master 都会创建一个进程。ps -ef |grep -i postgres |grep -i con
评:常用的命令,我经常用这个查看数据库死在那个sql上了。 -
查看数据库、表占用空间
select pg_size_pretty(pg_relation_size(‘schema.tablename’));
select pg_size_pretty(pg_database_size(‘databasename’));
必须在数据库所对应的存储系统里,至少保留30%的自由空间,日常巡检,要检查存储空间的剩余容量。
评:可以查看任何数据库对象的占用空间,pg_size_pretty可以显示如mb之类的易读数据,另外,可以与pg_tables,pg_indexes之类的系统表链接,统计出各类关于数据库对象的空间信息。 -
收集统计信息,回收空间
定期使用Vacuum analyze tablename 回收垃圾和收集统计信息,尤其在大数据量删除,导入以后,非常重要
评:这个说的不全面,vacuum分两种,一种是analize,优化查询计划的,还有一种是清理垃圾数据,postres删除工作,并不是真正删除数据,而是在被删除的数据上,坐一个标记,只有执行vacuum时,才会真正的物理删除,这个非常重用,有些经常更新的表,各种查询、更新效率会越来越慢,这个多是因为没有做vacuum的原因。 -
查看数据分布情况
两种方式:
l Select gp_segment_id,count(*) from tablename group by 1 ;
l 在命令运行:gpskew -t public.ate -a postgres
如数据分布不均匀,将发挥不了并行计算的优势,严重影响性能。
评:非常用,gp要保障数据分布均匀。 -
实例恢复:gprecoverseg
通过gpstate 或gp_configuration 发现有实例down 掉以后,使用该命令进行回复。 -
查看锁信息:
SELECT locktype, database, c.relname, l.relation, l.transactionid, l.transaction, l.pid, l.mode, l.granted, a.current_query
FROM pg_locks l, pg_class c, pg_stat_activity a
WHERE l.relation=c.oid AND l.pid=a.procpid
ORDER BY c.relname;
主要字段说明:
relname: 表名
locktype、mode 标识了锁的类型 -
explain:在提交大的查询之前,使用explain分析执行计划、发现潜在优化机会,避免将系统资源熬尽。
评:少写了个analyze,如果只是explain,统计出来的执行时间,是非常坑爹的,如果希望获得准确的执行时间,必须加上analyze。 -
数据库备份 gp_dump
常用参数:-s: 只导出对象定义(表结构,函数等)
-n: 只导出某个schema
gp_dump 默认在master 的data 目录上产生这些文件:
gp_catalog_1__ :关于数据库系统配置的备份文件
gp_cdatabase_1__:数据库创建语句的备份文件
gp_dump_1__:数据库对象ddl语句
gp_dump_status_1__:备份操作的日志
在每个segment instance 上的data目录上产生的文件:
gp_dump_0__:用户数据备份文件
gp_dump_status_0__:备份日志__ -
数据库恢复 gp_restore
必参数:–gp-k=key :key 为gp_dump 导出来的文件的后缀时间戳
-d dbname :将备份文件恢复到dbname -
登陆与退出Greenplum
#正常登陆
psql gpdb
psql -d gpdb -h gphostm -p 5432 -U gpadmin
#使用utility方式
PGOPTIONS="-c gp_session_role=utility" psql -h -d dbname hostname -p port
#退出
在psql命令行执行\q -
参数查询
psql -c ‘SHOW ALL;’ -d gpdb
gpconfig --show max_connections
评:这个用,可以管道给grep。 -
创建数据库
createdb -h localhost -p 5432 dhdw -
创建GP文件系统
# 文件系统名
gpfsdw
# 子节点,视segment数创建目录
mkdir -p /gpfsdw/seg1
mkdir -p /gpfsdw/seg2
chown -R gpadmin:gpadmin /gpfsdw
# 主节点
mkdir -p /gpfsdw/master
chown -R gpadmin:gpadmin /gpfsdw
gpfilespace -o gpfilespace_config
gpfilespace -c gpfilespace_config -
创建GP表空间
psql gpdb
create tablespace TBS_DW_DATA filespace gpfsdw;
SET default_tablespace = TBS_DW_DATA; -
删除GP数据库
gpdeletesystem -d /gpmaster/gpseg-1 -f -
查看segment配置
select * from gp_segment_configuration; -
文件系统
select * from pg_filespace_entry; -
磁盘、数据库空间
SELECT * FROM gp_toolkit.gp_disk_free ORDER BY dfsegment;
SELECT * FROM gp_toolkit.gp_size_of_database ORDER BY sodddatname;
日志
SELECT * FROM gp_toolkit.__gp_log_master_ext;
SELECT * FROM gp_toolkit.__gp_log_segment_ext; -
表数据分布
SELECT gp_segment_id, count(*) FROM GROUP BY gp_segment_id; -
表占用空间
SELECT relname as name, sotdsize/1024/1024 as size_MB, sotdtoastsize as toast, sotdadditionalsize as other
FROM gp_toolkit.gp_size_of_table_disk as sotd, pg_class
WHERE sotd.sotdoid = pg_class.oid ORDER BY relname; -
索引占用空间
SELECT soisize/1024/1024 as size_MB, relname as indexname
FROM pg_class, gp_toolkit.gp_size_of_index
WHERE pg_class.oid = gp_size_of_index.soioid
AND pg_class.relkind=‘i’; -
OBJECT的操作统计
SELECT schemaname as schema, objname as table, usename as role, actionname as action, subtype as type, statime as time
FROM pg_stat_operations
WHERE objname = ‘’; -
锁
SELECT locktype, database, c.relname, l.relation, l.transactionid, l.transaction, l.pid, l.mode, l.granted, a.current_query
FROM pg_locks l, pg_class c, pg_stat_activity a
WHERE l.relation=c.oid
AND l.pid=a.procpid
ORDER BY c.relname; -
队列
SELECT * FROM pg_resqueue_status; -
gpfdist外部表
# 启动服务
gpfdist -d /share/txt -p 8081 –l /share/txt/gpfdist.log &
# 创建外部表,分隔符为’/t’
drop EXTERNAL TABLE TD_APP_LOG_BUYER;
CREATE EXTERNAL TABLE TD_APP_LOG_BUYER (
IP text,
ACCESSTIME text,
REQMETHOD text,
URL text,
STATUSCODE int,
REF text,
name text,
VID text)
LOCATION (‘gpfdist://gphostm:8081/xxx.txt’)
FORMAT ‘TEXT’ (DELIMITER E’/t’
FILL MISSING FIELDS) SEGMENT REJECT LIMIT 1 percent; -
创建普通表
create table test select * from TD_APP_LOG_BUYER;
# 索引
# CREATE INDEX idx_test ON test USING bitmap (ip);
# 查询数据
select ip , count() from test group by ip order by count();
gpload
# 创建控制文件
# 加载数据
gpload -f my_load.yml
copy
COPY country FROM ‘/data/gpdb/country_data’
WITH DELIMITER ‘|’ LOG ERRORS INTO err_country
SEGMENT REJECT LIMIT 10 ROWS; -
gpfdist外部表
创建可写外部表
CREATE WRITABLE EXTERNAL TABLE unload_expenses
( LIKE expenses )
LOCATION (‘gpfdist://etlhost-1:8081/expenses1.out’,
‘gpfdist://etlhost-2:8081/expenses2.out’)
FORMAT ‘TEXT’ (DELIMITER ‘,’)
DISTRIBUTED BY (exp_id);
# 写权限
GRANT INSERT ON writable_ext_table TO ;
# 写数据
INSERT INTO writable_ext_table SELECT * FROM regular_table;
copy
COPY (SELECT * FROM country WHERE country_name LIKE ‘A%’) TO ‘/home/gpadmin/a_list_countries.out’;
执行sql文件
psql gpdbname –f yoursqlfile.sql
或者psql登陆后执行
\i yoursqlfile.sql