暂无图片
暂无图片
暂无图片
暂无图片
暂无图片

Percona-Toolkit 安装及使用

Ty3306 2022-06-28
1034


1. 安装

下载地址

https://www.percona.com/downloads/percona-toolkit/LATEST/

# 下载 rpm 格式的包



安装相关包

yum install perl-IO perl-Digest perl perl-devel perl-Time-HiRes perl-DBI perl-DBD-MySQL perl-IO-Socket-SSL perl-TermReadKey perl-Digest-MD5


安装

rpm -ivh percona-toolkit-3.2.1-1.el7.x86_64.rpm


权限

pt工具使用权限:SELECT, PROCESS, SUPER, REPLICATION SLAVE


使用方法

参考:https://blog.csdn.net/qq_31144297/article/details/103660677


工具列表

pt-align :              # 输出格式化
pt-archiver : # 归档数据,将一个表的数据转移到另一个表或文件中
ptaskset                      
pt-config-diff : # 对比参数文件和参数的不同点
pt-deadlock-logger : # 记录死锁信息
pt-diskstats : # 检查磁盘信息
pt-duplicate-key-checker : # 检查某个表是否有重复的索引
pt-fifo-split : # 切分文件(当 load data infile 的 文件比较大,则可以通过这个工具进行切分),也可以用 split 工具,见 Load Data.note
pt-find : # 查找,如:数据库中的空表,一天前建立的表 等等 https://www.cnblogs.com/shengdimaya/p/7064370.html
pt-fingerprint              
pt-fk-error-logger : # 功能是记录MySQL外键错误。
pt-heartbeat : # 检查主从延迟            
pt-index-usage : # 这个工具主要是用来分析慢查询的索引使用情况。从log文件中读取插叙语句,并用explain分析他们是如何利用索引。完成分析之后会生成一份关于索引没有被查询使用过的报告。              
pt-ioprofile : # 查看指定进程的 io 负载信息,影响较大,慎用
pt-kill : # 杀会话场景
pt-mext : # 并行查看SHOW GLOBAL STATUS的多个样本的信息。
pt-mongodb-query-digest
pt-mongodb-summary
pt-mysql-summary : # 输出mysql基本信息
pt-online-schema-change : # 在线修改表结构
pt-pg-summary
pt-pmp : # 一是获取进程的堆栈信息,二是对这些堆栈信息进行汇总。
pt-query-digest : # 日志分析  
pt-secure-collect : # 用途:收集、清理、打包和加密数据。
pt-show-grants : # 查看每个账号的权限
pt-sift
pt-slave-delay : # pt-slave-delay故意让主上的操作延迟制定的时间写入到从
pt-slave-find : # pt-slave-find查找和打印出MySQL从库的层次树。
pt-slave-restart : # pt-slave-restart监控一个或者多个MySQL复制slave的错误,然后当复制停止时试图重启。
pt-stalk : # pt-stalk工具可以解决更细粒度的故障现场采集,守护进程的方式试用了一下,可以帮助我们解决一些问题。
pt-summary : # pt-summary方便地汇总服务器的状态和配置。它并不是一个调优工具或者诊断工具
pt-table-checksum : # 检查主从表是否一致(或独立的两个表)
pt-table-sync : # 通过上面的工具检查出问题后可以通过该工具进行同步,最终达到一致
pt-table-usage
pt-upgrade : # 这个工具用来检查在新版本中运行的SQL是否与老版本一样,返回相同的结果,最好的应用场景就是数据迁移的时候。这在升级服务器的时候非常有用,可以先安装并导数据到新的服务器上,然后使用这个工具跑一下sql看看有什么不同,可以找出不同版本之间的差异。
pt-variable-advisor : # pt-variable-advisor是一款分析参数,并且给出参数设置建议的一款PT工具 pt-variable-advisor localhost -uroot -p123
pt-visual-explain : # 格式化explain出来的执行计划按照tree方式输出,方便阅读pt-visual-explain -h192.168.65.10 -P3306 -uroot -p123 -Dmydb_a --connect full.sql
ptx


2. pt-query-digest(慢查询分析工具)

参数说明

--create-review-table       # 当使用--review参数把分析结果输出到表中时,如果没有表就自动创建。

--create-history-table # 当使用--history参数把分析结果输出到表中时,如果没有表就自动创建。

--filter # 对输入的慢查询按指定的字符串进行匹配过滤后再进行分析。

--limit # 限制输出结果百分比或数量,默认值是20,即将最慢的20条语句输出,如果是50%则按总响应时间占比从大到小排序,输出到总   和达到50%位置截止。

--host # mysql服务器地址

--user # mysql用户名

--password # mysql用户密码

--history # 将分析结果保存到表中,分析结果比较详细,下次再使用--history时,如果存在相同的语句,且查询所在的时间区间和历史表 中的不同,则会记录到数据表中.

--review # 将分析结果保存到表中,这个分析只是对查询条件进行参数化,一个类型的查询一条记录,比较简单。当下次使用--review 时,如果存在相同的语句分析,就不会记录到数据表中。

--output # 分析结果输出类型,值可以是report(标准分析报告)、slowlog(Mysql slow log)、json、json-anon,一般使用 report,以便于阅读。

--since # 从什么时间开始分析,可以是指定的某个”yyyy-mm-dd [hh:mm:ss]”格式的时间点,也可以是简单的一个时间值:s(秒)、 h(小时)、m(分钟)、d(天),如12h就表示从12小时前开始统计。

--until # 截止时间,配合—since可以分析一段时间内的慢查询。


例句

# 直接分析慢查询文件:
pt-query-digest slow.log > slow20170803.log

# 分析最近1小时内的查询:
pt-query-digest --since=1h slow.log > 1slow.log

# 分析指定时间范围内的查询:
pt-query-digest slow.log --since '2017-08-03 15:30:00' --until '2017-08-04 10:30:00' > slow83-84.log

# 分析指含有select语句的慢查询:
pt-query-digest --filter '$event->{fingerprint} =~ m/^select/i' slow.log > slow4.log

# 针对某个用户的慢查询:
pt-query-digest --filter '($event->{user} || "") =~ m/^admin/i' slow.log > slow5.log

# 查询所有所有的全表扫描或full join的慢查询:
pt-query-digest --filter '(($event->{Full_scan} || "") eq "yes") ||(($event->{Full_join} || "") eq "yes")' slow.log> slow6.log

# 把查询保存到query_review表:
pt-query-digest --user=root –password=abc123 --review h=localhost,D=test,t=query_review --create-review-table slow.log

# 把查询保存到query_history表:
pt-query-digest --user=root –password=abc123 --review h=localhost,D=test,t=query_ history --create-review-table slow.log

# 通过tcpdump抓取mysql的tcp协议数据,然后再分析:
tcpdump -s 65535 -x -nn -q -tttt -i any -c 1000 port 3306 > mysql.tcp.txt
pt-query-digest --type tcpdump mysql.tcp.txt> slow_report9.log

# 分析binlog:
mysqlbinlog mysql-bin.000001 > mysql-bin000001.sql
pt-query-digest --type=binlog mysql-bin000001.sql > slow10.log

# 分析general log:
pt-query-digest --type=genlog localhost.log > slow11.log


输出说明

[root@iZ2zebthf35ejlps5v87ksZ bin]# ./pt-query-digest /usr/local/src/slowsqlExample/slow0312.log 

第一部分
该工具执行日志分析的用户时间,系统时间,物理内存占用大小,虚拟内存占用大小
# 360ms user time, 10ms system time, 22.56M rss, 187.09M vsz
工具执行时间
# Current date: Fri Mar 20 22:54:14 2020
运行分析工具的主机名    
# Hostname: iZ2zebthf35ejlps5v87ksZ
被分析的文件名
# Files: /usr/local/src/slowsqlExample/slow0312.log
语句总数量,唯一的语句数量,QPS,并发数
# Overall: 906 total, 21 unique, 0.02 QPS, 0.07x concurrency _____________
日志记录的时间范围
# Time range: 2020-03-11 12:22:13 to 2020-03-12 00:16:57
# Attribute         total     min     max     avg     95% stddev median
# ============     ======= ======= ======= ======= ======= ======= =======
语句执行时间
# Exec time         2991s     2s     10s     3s     5s     1s     3s
锁占用时间
# Lock time         552ms   24us   371ms   609us   103us   12ms   57us
发送到客户端的行数
# Rows sent       167.53k       0 17.99k 189.35 487.09   1.22k       0
select语句扫描行数
# Rows examine     980.73M     238   1.96M   1.08M   1.95M 757.80k 753.18k
查询的字符数
# Query size       258.71k     17   1.77k 292.41 463.90 202.02 329.68


第二部分
# Profile
Rank:所有语句的排名,默认按查询时间降序排列,通过--order-by指定
Query ID:语句的ID,(去掉多余空格和文本字符,计算hash值)
Response:总的响应时间
time:该查询在本次分析中总的时间占比
calls:执行次数,即本次分析总共有多少条这种类型的查询语句
R/Call:平均每次执行的响应时间
V/M:响应时间Variance-to-mean的比率
Item:查询对象
# Rank Query ID                       Response time   Calls R/Call V/M  
# ==== =============================== =============== ===== ====== =====
#   1 0xABD1DCCCCD5AA5128E10C27B34... 1246.6948 41.7%   283 4.4053 0.04 UPDATE ziweidashi_deviceinfo
#   2 0x6914B81AAD1785E50708ABD113... 877.6900 29.3%   339 2.5891 0.09 SELECT birthDay_notify
#   3 0x44D9474C6D5C58DD07B5FEEA0D... 299.4193 10.0%   71 4.2172 0.05 SELECT tmall_product_orders
#   4 0xA9BE84CBE3DAA9B1CDD9B5A9EC... 127.0137 4.2%   46 2.7612 0.04 SELECT daily_user_action_log
#   5 0xCF0E12117C971C3013142E3717... 118.3138 4.0%   49 2.4146 0.05 SELECT tmall_user_take_coupon_record
#   6 0x94263184D24186330B13193534...   97.0805 3.2%   35 2.7737 0.56 SELECT tgg_users
#   7 0xC51165F1287A2ECDA221AC1F54...   52.5870 1.8%   22 2.3903 0.04 SELECT util_user_task_log
#   8 0xB8004D6D8A7A7967E04CD81E26...   43.7895 1.5%   16 2.7368 0.08 SELECT daily_user_action_log
#   9 0x910E19224F33DAA6391927B8E8...   41.3720 1.4%   15 2.7581 1.17 SELECT qifugong_tianbi_record
# MISC 0xMISC                           86.7871 2.9%   30 2.8929   0.0 <12 ITEMS>


第三及后续部分,第一条查询语句 query id:0xABD1DCCCCD5AA5128E10C27B34BC04E7
# Query 1: 0.01 QPS, 0.03x concurrency, ID 0xABD1DCCCCD5AA5128E10C27B34BC04E7 at byte 355748
# Scores: V/M = 0.04
# Time range: 2020-03-11 12:24:03 to 2020-03-12 00:16:13
# Attribute   pct   total     min     max     avg     95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         31     283
# Exec time     41   1247s     4s     8s     4s     5s   437ms     4s
# Lock time     69   386ms   24us   371ms     1ms   93us   21ms   44us
# Rows sent     0       0       0       0       0       0       0       0
# Rows examine 18 180.00M 651.14k 651.45k 651.29k 650.62k       0 650.62k
# Query size   10 27.64k     100     100     100     100       0     100
# String:
数据库名
# Databases   taxen_ziweidashi
执行主机
# Hosts       118.190.93.166
执行用户
# Users       devAccount
查询时间占比
# Query_time distribution
#   1us
# 10us
# 100us
#   1ms
# 10ms
# 100ms
#   1s ################################################################
# 10s+
# Tables
#   SHOW TABLE STATUS FROM `taxen_ziweidashi` LIKE 'ziweidashi_deviceinfo'\G
#   SHOW CREATE TABLE `taxen_ziweidashi`.`ziweidashi_deviceinfo`\G
UPDATE ziweidashi_deviceinfo
        SET expired = 1
       WHERE createTime   <=   1583942580685\G
# Converted for EXPLAIN
# EXPLAIN /*!50100 PARTITIONS*/
select  expired = 1 from ziweidashi_deviceinfo where createTime   <=   1583942580685\G

# Query 2: 0.03 QPS, 0.07x concurrency, ID 0x6914B81AAD1785E50708ABD11319E02E at byte 13829
# Scores: V/M = 0.09
# Time range: 2020-03-11 12:22:13 to 16:05:47
# Attribute   pct   total     min     max     avg     95% stddev median
# ============ === ======= ======= ======= ======= ======= ======= =======
# Count         37     339
# Exec time     29   878s     2s     4s     3s     4s   472ms     2s
# Lock time     5   29ms   31us     4ms   86us   98us   229us   66us
# Rows sent     0     24       0       2   0.07       0   0.32       0
# Rows examine 67 665.20M   1.96M   1.96M   1.96M   1.96M       0   1.96M
# Query size   59 154.47k     462     467 466.60 463.90   2.07 463.90
# String:
# Hosts       10.66.186.115
# Users       root
# Query_time distribution
#   1us
# 10us
# 100us
#   1ms
# 10ms
# 100ms
#   1s ################################################################
# 10s+
# Tables
#   SHOW TABLE STATUS LIKE 'birthDay_notify'\G
#   SHOW CREATE TABLE `birthDay_notify`\G
# EXPLAIN /*!50100 PARTITIONS*/
select birthdayno0_.id as id1_1_, birthdayno0_.index_card_show_date as index_ca2_1_, birthdayno0_.userId as userId3_1_, birthdayno0_.push_content as push_con4_1_, birthdayno0_.card_content as card_con5_1_, birthdayno0_.birthday_userId as birthday6_1_, birthdayno0_.birthday_contactId as birthday7_1_, birthdayno0_.need_push as need_pus8_1_ from birthDay_notify birthdayno0_ where birthdayno0_.userId=1304747 and birthdayno0_.index_card_show_date='2020-03-11 00:00:00'\G

……省略


3.pt-duplicate-key-checker

指定整库或某个表都行

pt-duplicate-key-checker --host=192.168.65.10 --user='root' --password='123' --databases=mydb_a --tables=class


4.pt-online-schema-change

OSC原理:
1.按照原表创建新表;
2.修改新表的字段(需要修改的列)
3.在创建一个新表,用于存放在变更过程当中产生的数据;
4.给原始表创建 增、删、改 触发器
5.将原始表中的数据导出(这里可以分片)到文件;
6.将文件到入新表;
7.导入之前先删除新表的辅助索引,然后在导入数据;
8.导入新表的过程中会产生新的数据,把这一部分新的数据也导入到新表(这部分数据在第三部产生的表中)
9.重新吧新表的辅助索引建起来;
10.上一步建索引的同时也会有新的数据产生,也要导入到新表;
11.将新表修改为原始表的表名;

修改存储引擎
pt-online-schema-change --charset=utf8mb4 --no-version-check --user=root --password=123 --host=192.168.65.10 P=3306,D=mydb_a,t=t --alter="ENGINE=InnoDB" --execute --nocheck-replication-filters

添加列
pt-online-schema-change --charset=utf8mb4 --no-version-check --user=root --password=123 --host=192.168.65.10 P=3306,D=mydb_a,t=t --alter="add column f1 int default null" --execute --nocheck-replication-filters

修改字段类型
pt-online-schema-change --charset=utf8mb4 --no-version-check --user=root --password=123 --host=192.168.65.10 P=3306,D=mydb_a,t=t --alter="MODIFY f1 varchar(11)" --execute --nocheck-replication-filters

修改字段名
pt-online-schema-change --charset=utf8mb4 --no-version-check --user=root --password=123 --host=192.168.65.10 P=3306,D=mydb_a,t=t --alter="change f2 f3 int" --execute --nocheck-replication-filters
PS:修改前一定要确保名字、类型 的正确性,以免数据丢失,如果报错,则加上--no-check-alter 即可,切记检查、检查、在检查。

删除列
pt-online-schema-change --charset=utf8mb4 --no-version-check --user=root --password=123 --host=192.168.65.10 P=3306,D=mydb_a,t=t --alter="drop column f2" --execute --nocheck-replication-filters

加索引
pt-online-schema-change --charset=utf8mb4 --no-version-check --user=root --password=123 --host=192.168.65.10 P=3306,D=mydb_a,t=t --alter="add index index_name (age)" --execute --nocheck-replication-filters


5.pt-table-checksum(检查数据库、表一致性)

# 使用方法

pt-table-checksum --host=192.168.65.10 --port=3306 --user=root --password=123 -d[db_name] -t[table_name] --nocheck-replication-filters --no-check-binlog-format --replicate=test.checksum

不带 -t 参数,则检查整个库中的表

PS:一般与 pt-table-checksum 一起使用,先用 pt-table-checksum 检查是否一致,然后在用 pt-table-sync 处理不一致的问题

参考:https://blog.csdn.net/leshami/article/details/78377444
 https://www.cnblogs.com/xiaoyanger/p/5584554.html



6.pt-table-sync(数据库、表同步)

6.1 有主从关系

a.同时同步两个slave 中的表

 使用 --no-check-slave 才能同时同步多个 从库
    pt-table-sync --no-check-slave --charset=utf8 --databases=mydb_a --tables=class dsn=u=root,p=123,h=192.168.65.10,P=3306 dsn=u=root,p=123,h=192.168.65.11,P=3306 dsn=u=root,p=123,h=192.168.65.11,P=3307 --execute --print

    使用 --sync-to-master 一次只能同步一个
    pt-table-sync --sync-to-master --charset=utf8 --databases=mydb_a --tables=class dsn=u=root,p=123,h=192.168.65.11,P=3306 --execute --print

    使用 --replicate 不能实现,会报错  

    PS:使用 --no-check-slave 时,第一个 dsn 是主,后面2个都是需要同步的从库


b.同步多个库

pt-table-sync --sync-to-master --charset=utf8 --ignore-databases=mysql,sys dsn=u=root,p=123,h=192.168.65.11,P=3306 --execute --print



c.同步单个库

pt-table-sync --sync-to-master --charset=utf8 --databases=mydb_a --ignore-databases=mysql,sys dsn=u=root,p=123,h=192.168.65.11,P=3306 --execute --print


d.同步单个库中某个表

pt-table-sync --sync-to-master --charset=utf8 --databases=mydb_a --tables=t dsn=u=root,p=123,h=192.168.65.10,P=3306 dsn=u=root,p=123,h=192.168.65.11,P=3306 --execute --print


e.同步单个库中某个表的部分数据

pt-table-sync --sync-to-master --charset=utf8 --databases=mydb_a --tables=teacher --where="teacher_birth < '2001-01-01'" dsn=u=root,p=123,h=192.168.65.10,P=3306 dsn=u=root,p=123,h=192.168.65.11,P=3306 --execute --print
PS:

1.如果需要同步的表没有主见或唯一键,则需要加上参数 --no-check-slave,否则报错 'REPLICATION SAFETY'

2.如果需要同步的表有主键的话也可以使用 --no-check-slave,或 --sync-to-master ,如果都不加就会报错 'REPLICATION SAFETY'

3.指定了--replicate(指定dsn为主)或者--sync-to-master(指定dsn为从)只能为命令行指定一个dsn

4.如果指定了 --replicate 或 --sync-to-master ,则不会同步没有 主键 或 唯一键 的表

5.如果是只指定选项--replicate,则DSN主机对应的为主库的连接串,在这之前选项--replicate指定的表有保存之前数据不一致的校验结果,可以先通过工具pt-table-checksum进行校验,否则并不会进行同步变更修复。

如:--replicate='percona.checksums'


6.2 无主从关系

a.同时同步两个slave 中的表

pt-table-sync --no-check-slave --charset=utf8 --databases=mydb_a --tables=class dsn=u=root,p=123,h=192.168.65.10,P=3306 dsn=u=root,p=123,h=192.168.65.11,P=3306 dsn=u=root,p=123,h=192.168.65.11,P=3307 --execute --print


b.同步多个库

pt-table-sync --charset=utf8 --ignore-databases=mysql,sys dsn=u=root,p=123,h=192.168.65.10,P=3306 dsn=u=root,p=123,h=192.168.65.11,P=3306 --execute --print


c.同步单个库

pt-table-sync --charset=utf8 --databases=mydb_a dsn=u=root,p=123,h=192.168.65.10,P=3306 dsn=u=root,p=123,h=192.168.65.11,P=3306 --execute --print


d.同步单个库中某几个表

pt-table-sync --charset=utf8 --databases=mydb_a --tables=stu,class dsn=u=root,p=123,h=192.168.65.10,P=3306 dsn=u=root,p=123,h=192.168.65.11,P=3306 --execute --print


e.同步单个库中某个表的部分数据

pt-table-sync --charset=utf8 --databases=mydb_a --tables=teacher --where="teacher_birth < '2001-01-01'" dsn=u=root,p=123,h=192.168.65.10,P=3306 dsn=u=root,p=123,h=192.168.65.11,P=3306 --execute --print


PS:
1.目标服务器上的库和表必须存在,否则同步不成功。
2.第一个 dsn 为源库信息,第二个 dsn 为目标库信息。
3.如果有中文的话,要用 --charset=utf8,如果用utf8mb4,则每次每个表只能 insert 一条数据。

参数:
--ignore-databases
--databases
--ignore-tables
--tables
--where
--replicate  
--sync-to-master
--bin-log(set sql_log_bin=1)
--no-bin-log(set sql_log_bin=0)




PS:1.主 IP 在前,从 IP 在后,否则以从的数据为基准来处理
  2.一般与 pt-table-checksum 一起使用,先用 pt-table-checksum 检查是否一致,然后在用 pt-table-sync 处理不一致的问题

 
参考:
http://www.hellojava.com/a/75316.html(详细)
http://blog.sina.com.cn/s/blog_a1e9c7910102vnsd.html


6.3 中文使用说明

2.28 pt-table-sync
2.28.1 NAME
pt-table-sync - Synchronize MySQL table data efficiently.

2.28.2 SYNOPSIS
Usage
pt-table-sync [OPTIONS] DSN [DSN]
pt-table-sync synchronizes data efficiently between MySQL tables. This tool changes data, so for maximum safety, you should back up your databefore using it. When synchronizing a server that is a replication slave with the--replicate or--sync-to-mastermethods, it always makes the changes on the replication master,neverthe replication slave directly. This is in general the only safe way to bring a replica back in sync with its master; changes to the replica are usually the source of the problems in the first place. However, the changes it makes on the master should be no-op changes that set the data to their current values, and actually affect only the replica.
>>pt-table-sync用来高效对mysql数据库间存在不一致表的数据进行同步。该工具会修改工具,所以为了安全,在使用该工具前,对数据库进行备份。当使用--replicate或者--sync-to-master参数来对从库数据进行同步时,如果主从数据不一致,不会直接在从库上修改不一致的数据,而是在主库上进行replace 操作(需要表有唯一键),通过binlog把动作传递给从库,从而达到修改从库数据的目的。导致主库数据不一致的根本原因就是直接在从库上进行增删改操作,所以我们在修复从库数据的时候也尽量避免在从库直接操作。(但是如果主从数据不一致的表没有唯一键,那么不能使用--replicate或者--sync-to-master来修复数据)。当然在主库上做的修改应该是no-op的,并不会修改主库表的当前值,只对从库数据产生影响。

Sync db.tbl on host1 to host2: >>同步host1和host2 的db.tbl表(以host1为源)
pt-table-sync --execute h=host1,D=db,t=tbl h=host2

Sync all tables on host1 to host2 and host3: >>同步host1,和host2,host3 上所有的表(以host1为源)
pt-table-sync --execute host1 host2 host3

Make slave1 have the same data as its replication master: >>以主库来同步slave1上的数据(以master为源)
pt-table-sync --execute --sync-to-master slave1
NOTE:使用--sync-to-master要求被sync的表上有唯一键,且使用该参数只能指定一个从库的dsn

Resolve differences that pt-table-checksum found on all slaves of master1: >>根据pt-table-checksum的检查结果(percona.checksum),同步主库master1与其所有主库上不一致的数据
pt-table-sync --execute --replicate test.checksum master1
NOTE:使用--replicate要求被sync的表上有唯一键,且使用该参数只能指定一个dsn,该dsn为主库

Same as above but only resolve differences on slave1: >>跟上面的作用类似,sync主从数据不一致,但是这里只是比对slave1同主库的不一致并修复
pt-table-sync --execute --replicate test.checksum --sync-to-master slave1

Sync master2 in a master-master replication configuration, where master2’s copy of db.tbl is known or suspected to be incorrect:   >>如果想修复主主结构中某个实例上的表(假设master2 上tbl表数据有问题),应该如下操作:
pt-table-sync --execute --sync-to-master h=master2,D=db,t=tbl
Note that in the master-master configuration, the following will NOT do what you want, because it will make changes directly on master2, which will then flow through replication and change master1’s data: >>(假设master2上db.tbl表数据有问题)主主结构下不能像下面这样进行数据同步,因为这样会直接在master2上进行修改,而该修改又会同步到master1中
# Don't do this in a master-master setup!
pt-table-sync --execute h=master1,D=db,t=tbl master2

2.28.3 RISKS
WARNING: pt-table-sync changes data! Beforeusing this tool, please:
• Read the tool’s documentation
• Review the tool’s known “BUGS”
• Test the tool on a non-production server
• Backup your production server and verify the backups
pt-table-sync is mature, proven in the real world, and well tested, but if used improperly it can have adverse consequences. Always test syncing first with--dry-run and--print.

2.28.4 DESCRIPTION
pt-table-sync does one-way and bidirectional synchronization of table data. It does notsynchronize table structures,indexes, or any other schema objects. The following describes one-way synchronization. “BIDIRECTIONAL SYNCING” is described later. This tool is complex and functions in several different ways. To use it safely and effectively, you should understand three things: the purpose of--replicate, finding differences, and specifying hosts. These three conceptsare closely related and determine how the tool will run. The following is the abbreviated logic:
>>pt-table-sync能够实现单向同步,也可以实现双向同步(目前双向同步还处于测试阶段,后面涉及到双向同步的内容先不做介绍)。pt-table-sync不能同步表结构,索引等对象,只能用来同步记录。下面将介绍单向同步。该工具比较复杂,参数也比较多,想要更安全,更高效的使用它 你需要搞清楚三个概念:the purpose of --replicate, finding differences, and specifying hosts。这三个概念与该工具的运行是息息相关的。下面是单向同步的逻辑过简图:
if DSN has a t part, sync onlythat table:
if 1DSN:
if --sync-to-master:
The DSN is a slave. Connect to its master and sync.
if more than 1 DSN:
The first DSN is the source. Sync each DSN in turn.
else if --replicate:
if --sync-to-master:
The DSN is a slave. Connect to its master, find records
of differences, and fix.
else:
The DSN is the master. Find slaves and connect to each,
find records of differences, and fix.
else:
if only 1 DSN and --sync-to-master:
The DSN is a slave. Connect to its master, find tables and
filter with --databases etc, and sync each table to the master.
else:
find tables, filtering with --databases etc, and sync each
DSN to the first.

pt-table-sync can run in one of two ways: with --replicate or without. The defaultis to run without --replicatewhich causespt-table-sync to automatically find differences efficiently with one of several algorithms (see “ALGORITHMS”).Alternatively, the value of--replicate, if specified, causespt-table-sync to use the differences already found by having previously ran pt-table-checksum with its own --replicate option. Strictly speaking, you don’t need to use--replicate because pt-table-synccan find differences, but many people use--replicateif, for example, they checksum regularly using pt-table-checksum then fix differences as needed withpt-table-sync. If you’re unsure, read each tool’s documentation carefully and decide for yourself, or consult with an expert. >>pt-table-sync能够以两种方式运行,带--replicate或者不带。默认是运行在不带--replicate方式下,这时pt-table-sync会自动检查主从数据的不一致,并进行数据同步。如果指定了--replicate,这时pt-table-sync会根据pt-table-checksum检查的结果来进行数据同步。严格意义上来讲,你不需要指定--replicate,因为pt-table-sync能自己发现数据不一致,但是很多人都喜欢指定--replicate,比如他们会定期使用pt-table-checksum,然后使用pt-table-sync来修复数据。
Regardless of whether --replicate is used or not, you need to specify which hosts to sync. There are two ways:with--sync-to-masteror without.Specifying--sync-to-mastermakespt-table-sync expect one and only slave DSN on the command line. The tool will automatically discover the slave’s master and sync it so that its data is the same as its master. This is accomplished by making changes on the master which then flow through replication and update the slave to resolve its differences.Be careful though: although this option specifies and syncs a single slave, if there are other slaves on the same master, they will receive via replication the changes intended for
the slave that you’re trying to sync.   >>不论是否指定--replicate值,你都需要为pt-table-sync指定host,有两种情况:指定了--sync-to-master或者没有指定。如果使用了--sync-to-master,只能为pt-table-sync指定一个host并且作为slave。这时pt-table-sync会自动查找该slave的主库,并检查主从数据,如果存在不一致使用主库数据同步从库。主从数据的同步是通过在主库对不一致表的修改来完成的(表必须存在唯一键),主库的修改会通过binlog同步到从库,从而实现主从数据一致。注意,虽然使用--sync-to-master参数时我们只能指定一个从库,但是如果你的同步数据操作是在主库进行,那么该动作还是会被传递到所有的从库,而不仅仅是你所指定的这个从库。如果表上没有唯一键,那不能使用--sync-to-master对主从进行sync。
Alternatively, if you do not specify --sync-to-master, the first DSN given on the command line is the source host. There is only ever one source host. If you do not also specify--replicate, then you must specify at least one other DSN as the destination host. There can be one or more destination hosts. Source and destination hosts must be independent; they cannot be in the same replication topology.pt-table-sync will die with an error ifit detects that a destination host is a slave because changes are written directly to destination hosts (and it’s not safe to write directly to slaves). Or, if you specify--replicate(but not--sync-to-master) thenpt-table-syncexpects one and only one master DSN on the command line. The tool will automatically discover all the master’s slaves and sync them to the master. This is the only way to sync several (all) slaves at once(because--sync-to-masteronly specifies one slave).   >>如果未使用--sync-to-master参数,所指定的第一个dsn作为源,同时如果你也没有使用--replicate参数,那么除了源dsn以外你必须至少再指定一个目的dsn。源和目的实例必须是相互独立的,不能处在同一个复制拓扑中。如果pt-table-sync检查发现目的实例是从库,pt-table-sync会自动退出,因为该种情形下是在目的实例直接进行数据修复,如果目的实例是从库,这样做是不安全的(可以通过指定--no-check-slave不检查desitination是否为从库)。如果你指定了--replicate(但是没有指定--sync-to-master)那么只能指定一个dsn,并且把它当做主库。pt-table-sync会自动发现该主库的所有从库,检查数据是否一致,并对不一致的数据进行同步。这是同时修复多个从库数据不一致的唯一方式(即指定--replicate,但不指定--sync-to-master)。

Each host on the command line is specified as a DSN. The first DSN (or only DSN for cases like --sync-to-master) provides default values for other DSNs, whether those other DSNs are specified on the command line or auto-discovered by the tool. So in this example, pt-table-sync --execute h=host1,u=msandbox,p=msandbox h=host2
the host2 DSN inherits the u and p DSN parts from the host1 DSN. Use the --explain-hosts option to see how pt-table-syncwill interpret the DSNs given on the command line. >>命令行中第一个dsn中指定的值即其他dsn的默认值,不论其他的dsn是你显示指定的还是自动发现的(--replicate会自动发现从库,--sync-to-master会自动发现主库)。比如上面的那个例子pt-table-sync登录host2上的实例时使用的用户名/密码是msandbox/msandbox(继承自第一个dsn),当然你也可以显示为第二个dsn指定相关参数
##即你指定的用户在所有的实例都要存在

2.28.5 OUTPUT
If you specify the --verbose option, you’ll see information about the differences between the tables. There is one row per table. Each server is printed separately. For example, >>如果你指定了--verbose,pt-table-sync会为每个被同步的表输出一段如下信息:
# Syncing h=host1,D=test,t=test1
# DELETE REPLACE INSERT UPDATE ALGORITHM START END EXIT DATABASE.TABLE
# 0 0 3 0 Chunk 13:00:00 13:00:17 2 test.test1
Table test.test1 on host1 required 3 INSERTstatements to synchronize and it used the Chunk algorithm (see“ALGORITHMS”). The sync operation for this table started at 13:00:00 and ended17 seconds later (times taken from NOW() on the source host). Because differences were found, its “EXIT STATUS” was 2. >>上面的输出信息表示host1 上的test.test1表需要执行三个insert语句来实现数据同步,使用的算法为chunk。该表sync在13:00:00开始,总共花了17秒时间完成。因为发现了不一致,所以工具的退出状态为2
If you specify the --print option, you’ll see the actual SQL statements that the script uses to synchronize the table if--executeis also specified. >>如果你指定了--print参数和--execute,你能看到用来进行数据同步的具体语句。
If you want to see the SQL statements that pt-table-sync is using to select chunks, nibbles, rows, etc., then specify --printonce and--verbose twice. Be careful though:this can print a lot of SQL statements. >>如果你想看pt-table-sync 的chunk语句,那么指定一次--print,同时指定--verbose两次

There are cases where no combination of INSERT, UPDATE or DELETE statements can resolve differences without violating some unique key. For example, suppose there’s a primary key on columna and a unique key on column b. Then there is no way to sync these two tables with straight forward UPDATE statements:
+---+---+ +---+---+
| a | b | | a | b |
+---+---+ +---+---+
| 1 | 2 | |1|1|
| 2 | 1 | |2|2|
+---+---+ +---+---+
The tool rewrites queries to DELETE and REPLACE in this case. This is automatically handled after the first index violation, so you don’t have to worry about it.
Be careful when using pt-table-sync in any master-master setup. Master-master replication is inherently tricky, and it’s easy to make mistakes. You need to be sure you’re using the tool correctly for master-master replication. See the “SYNOPSIS” for the overview of the correct usage. >>注意在使用pt-table-sync处理主主架构是比较棘手的,也比较容易出错。关于主主架构下的pt-table-sync使用请详细阅读“SYNOPSIS”部分。
Also be careful with tables that have foreign key constraints with ON DELETE or ON UPDATE definitions because these might cause unintended changes on the child tables. See--[no]check-child-tables. In general, this tool is best suited when your tables have a primary key orunique index. Although it can synchronize data in tables lacking a primary key or unique index, it might be best to synchronize that data by another means. >>另外需要注意的是如果你sync的表包含外键(外键使用了on delete或者on update),那么可能会对字表的数据也进行了修改(但这可能并不是我们想要的)。参数--[no]check-child-tables参数。

2.28.6 REPLICATION SAFETY
   Synchronizing a replication master and slave safely is a non-trivial problem, in general. There are all sorts of issues to think about, such as other processes changing data, trying to change data on the slave, whether the destination and source are a master-master pair, and much more.                                                                                         >>如何安全的实现主从不一致数据的同步是个重要的问题。我们需要考虑如下问题,"such as other processes changing data, trying to change data on the slave, whether the destination and source are a master-master pair, and much more"
  In general, the safe way to do it is to change the data on the master, and lett he changes flow through replication to the slave like any other changes. However, this works only if it’s possible to REPLACE into the table on the master.REPLACE works only if there’s a unique index on the table (othe rwise it just acts like an ordinary INSERT).     >>通常安全的做法是在主库修改数据,并通过binlog把变动传递到从库,从而到达修改从库上不一致数据的目的。但是这是有一个前提条件的,即该表上必须有唯一键(因为在主库执行的是replace into操作,而该操作必须要有唯一键,否则即转换为普通的insert操作)
   If your table has unique keys, you should use the --sync-to-master and/or--replicateoptions to sync a slave to its master. This will generally do the right thing. When there is nounique key on the table, there is no choice but to change the data on the slave, andpt-table-sync will detect that you’re trying to do so. It will complain and die
unless you specify --no-check-slave (see --[no]check-slave).                                                                                                                                                                                                       >>如果你的表上有唯一键,你可以通过指定--sync-to-master或者--replicate,或者同时指定这两个参数来同步主从不一致的数据。但是如果你的表上没有唯一键,那么只能通过直接修改从库上的数据来实现主从数据的同步(不能使用--replicate或者--sync-to-master)。但是如果pt-table-sycn检查发现你的destination是从库会报错并退出,可以通过指定--no-check-slave来禁用该检查
   If you’re syncing a table without a primary or unique key on a master-master pair, you must change the data on the destination server. Therefore, you need to specify --no-bin-log for safety (see--[no]bin-log). If you don’t, the changes you make on the destination server will replicate back to the source server and change the data there!         >>如果你需要sync的表没有唯一键(主键或者唯一索引),并且是主主架构,那么你只能在destination上执行修改操作,所以你需要指定--no-bin-log,保证该修改不会写入binlog。
   The generally safe thing to do on a master-master pair is to use the --sync-to-master option so you don’t change the data on the destination server. You will also need to specify --no-check-slave to keep pt-table-syncfrom complaining that it is changing data on a slave. >>

2.28.7 ALGORITHMS
pt-table-sync has a generic data-syncing framework which uses different algorithms to find differences. The tool automatically chooses the best algorithm for each table based on indexes,column types, and the algorithm preferences >>pt-table-sync 有多种算法可以用来查找数据一致性。它会根据表的索引情况,列的类型,和索引的优先级情况自动选择最合适的算法
specified by --algorithms. The following algorithms are available, listed in their default order of preference: >>可以通过--algorithms参数来指定各算法的优先级顺序,默认顺序如下:
Chunk
Finds an index whose first column is numeric (including date and time types),and divides the column’s range of values into chunks of approximately--chunk-size rows. Syncs a chunk at a time by checksumming the entire chunk. If the chunk differs on the source and destination, checksums each chunk’s rows individually to find the rows that differ. It is efficient when the column has sufficient cardinality to make the chunk send up about the right size.The initial per-chunk checksum is quite small and results in minimal network traffic and memory consumption. If a chunk’s rows must be examined, only the primary key columns and a checksum are sent over the network, not the entire row. If a row is found to be different, the entire row will be fetched, but not before.Note that this algorithm will not work if chunking a char column where all the values start with the same character. In that case, the tool will exit and suggest picking a different algorithm.   >>发现有数值列开头的索引(这里数值包含时间类型),并且能够根据该索引把表分成合适的大小的chunk(--chunk-size)。每次只对比一个chunk,如果发现源和目的chunk checksum不同,则分别检查每一行的checksum,找出不一致的行。在表上存在合适的索引,能把表分成指定大小的chunk,那么该算法是很高效的。每个chunk的checksum值是很小的,对网络带宽和内存的消耗也就很小。如果发现chunk的checksum不一致,从而需要对chunk中的每一行进行checksum,但是通过网络只是发送该行的主键以及checksum值,只有检查发现该行数据不一致后,才会发送整行数据进行对比。

Nibble
Finds an index and ascends the index in fixed-size nibbles of --chunk-size rows, using an onbacktracking algorithm (see pt-archiver for more on this algorithm). It is very similar to “Chunk”, but instead of pre-calculating the boundaries of each piece of the table based on index cardinality, it uses LIMIT to define each nibble’s upper limit,and the previous nibble’s upper limit to define the lower limit. It works in steps: one query finds the row that will define the next nibble’s upper boundary, and the next query checksums the entire nibble. If the nibble differs between the source and destination, it examines
the nibble row-by-row, just as “Chunk” does. >>Nibble算法同Chunk算法十分相似,不同的是Nibble算法通过limit 把表分为不同的chunk。

GroupBy
Selects the entire table grouped by all columns, with a COUNT(*) column added.Compares all columns, and if they’re the same, compares the COUNT(*) column’s value to determine how many rows to insert or delete into the destination. Works on tables with no primary key or unique index. >>使用表的所有列进行分组(group by),并且统计每个组成员数(count(*))。比对每个分组的所有列,如果相同,继续比对该分组的count(*)值。该算法适用于没有创建主键和唯一索引的表

Stream
Selects the entire table in one big stream and compares all columns. Selects all columns. Much less efficient than the other algorithms, but works when there is no suitable index for them to use. >>把表中所有数据放到一个stream中,比对所有列。该算法跟其他几个算法相比效率较低。在表上没有合适的索引时,可以使用该算法。

Future Plans
Possibilities for future algorithms are TempTable (what I originally calledbottom-up in earlier versions of this tool), DrillDown (what I originally called top-down), and GroupByPrefix(similar to how SqlYOG Job Agent works). Each algorithm has strengths and weaknesses. If you’d like toimplement your favorite technique for finding differences between two sources of data on possiblydifferent servers, I’m willing to help. The algorithms adhere to a simple interface that makes it pretty easy towrite your own.

2.28.8 BIDIRECTIONAL SYNCING >>双向sync目前还处于测试阶段,就先不说了。
Bidirectional syncing is a new, experimental feature. To make itwork reliably there are a number of strict limitations:
* only works when syncing one server to other independent servers
* does not work in any way with replication
* requires that the table(s)are chunkable with the Chunk algorithm
* is not N-way, only bidirectional between two servers at a time
* does not handle DELETE changes
For example, suppose we have three servers: c1, r1, r2. c1 is thecentral server, a pseudo-master to the other servers
(viz. r1 and r2 are not slaves to c1). r1 and r2 are remote servers. Rows intable foo are updated and inserted on all
three servers and we want to synchronize all the changes between all theservers. Table foo has columns:
id int PRIMARY KEY
ts timestamp auto updated
name varchar
Auto-increment offsets are used so that new rows from any serverdo not create conflicting primary key (id) values. In
general, newer rows, as determined by the ts column, take precedence when asame but differing row is found during
the bidirectional sync. “Same but differing” means that two rows have the sameprimary key (id) value but different
values for some other column, like the name column in this example. Same butdiffering conflicts are resolved by
a “conflict”. A conflict compares some column of the competing rows todetermine a “winner”. The winning row
becomes the source and its values are used to update the other row.
There are subtle differences between three columns used to achievebidirectional syncing that you should be familiar with: chunk column (--chunk-column), comparison column(s) (--columns), and conflict column
(--conflict-column). The chunk columnis only used to chunk the table; e.g. “WHERE id >= 5 AND id
< 10”. Chunks are checksummed and when chunk checksums reveal a difference,the tool selects the rows in that
chunk and checksums the --columns for each row. If a column checksum differs, the rows have one ormore conflicting column values. In a traditional unidirectional sync, theconflict is a moot point because it can be resolved
simply by updating the entire destination row with the source row’s values. Ina bidirectional sync, however, the
--conflict-column (in accordance withother --conflict-* options list below) iscompared to determine
which row is “correct” or “authoritative”; this row becomes the “source”.

To sync all three servers completely, two runs of pt-table-sync are required. The firstrun syncs c1 and r1, then syncs
c1 and r2 including any changes from r1. At this point c1 and r2 are completelyin sync, but r1 is missing any changes
from r2 because c1 didn’t have these changes when it and r1 were synced. So asecond run is needed which syncs the
servers in the same order, but this time when c1 and r1 are synced r1 gets r2’schanges.
The tool does not sync N-ways, only bidirectionally between the first DSN givenon the command line and each
subsequent DSN in turn. So the tool in this example would be ran twice like:
pt-table-sync --bidirectional h=c1h=r1h=r2
The --bidirectional option enables this feature and causes various sanity checks to beperformed. You must
specify other options that tell pt-table-sync how to resolve conflicts for same but differing rows. Theseoptions are:
* --conflict-column
* --conflict-comparison
* --conflict-value
* --conflict-threshold
* --conflict-error"> (optional)
Use --print to test this option before--execute. The printed SQL statements will have comments saying on
which host the statement would be executed if you used --execute.
Technical side note: the first DSN is always the “left” server and the otherDSNs are always the “right” server. Since
either server can become the source or destination it’s confusing to think ofthem as “src” and “dst”. Therefore, they’re
generically referred to as left and right. It’s easy to remember this becausethe first DSN is always to the left of the
other server DSNs on the command line.


2.28.9 EXIT STATUS
The following are the exit statuses (also called return values, orreturn codes) when pt-table-sync finishes and exits. >>pt-table-sync退出状态
STATUS MEANING
====== =======================================================
0 Success.
1 Internal error.
2 At least one table differed on thedestination.
3 Combination of 1and 2.


2.28.10 OPTIONS
Specify at least one of --print,--execute, or--dry-run.
--where and --replicate are mutually exclusive.
This tool accepts additional command-line arguments. Refer to the “SYNOPSIS”and usage information for details.
>>必须至少要为pt-table-sync指定--print,--execute,--dry-run三个参数中的一个,--where和--replicate两个参数是互斥的。

--algorithms
type: string; default: Chunk,Nibble,GroupBy,Stream
Algorithm to use when comparing the tables, in order of preference. For each table,pt-table-syncwill check if the table can be synced with the given algorithms in the order that they’re given. The first algorithm that can sync the table is used. See “ALGORITHMS”.
>>指定用来对比表的算法优先级。对每个表,pt-table-checksum都会根据表的索引情况,列类型,指定的算法优先级顺序等因素为其选择一个最合适的算法。

--ask-pass
Prompt for a password when connecting to MySQL.
>>连接mysql的时候提供密码(--password这个参数是直接在命令行中写密码)

--bidirectional
Enable bidirectional sync between first and subsequent hosts. See “BIDIRECTIONAL SYNCING” for more information.
>>开启双向sync功能(该功能还处于测试阶段)

--[no]bin-log
default: yes
Log to the binary log (SET SQL_LOG_BIN=1). Specifying --no-bin-log will SET SQL_LOG_BIN=0.
>>该参数默认值为yes,即把变更操作记录binlog(set sql_log_bin=1)。你可以指定--no-bin-log来 set sql_log_bin=0.

--buffer-in-mysql
Instruct MySQL to buffer queries in its memory. This option adds the SQL_BUFFER_RESULT option to the comparison queries. This causes MySQL to execute the queries and place them in a temporary table internally before sending the results back to pt-table-sync. The advantage of this strategy is that pt-table-sync can fetch rows as desired without using a lot of memory inside the Perl process, while releasing locks on the MySQL table (to reducecontention with other queries). The disadvantage is that it uses more memory on the MySQL server instead. You probably want to leave --[no]buffer-to-client enabled too, because buffering into a temp table and then fetching it all into Perl’s memory is probably a silly thing to do.This option is most useful for the GroupBy and Stream algorithms, which may fetch a lot of data from the server.
>>指定--buffer-in-mysql即执行set sql_buffer_result=1。即在把mysql的查询结果返回客户端(pt-table-sync)之前,先把结果集保存到一个临时表中。这样做的优点是,pt-table-sync可以每次只从结果集中提取部分行(一个chunk),这样perl 程序消耗的内存比较小,同时减少表的锁定时间减少争用。这样做的缺点是mysql会消耗更多的内存。一般--buffer-to-client参数会和--buffer-in-mysql同时使用。因为如果不使用--buffer-to-client参数的话,mysql会一次性把结果集发送给客户端,这就和我们使用--buffer-in-mysql参数的初衷相违背了。在使用GroupBy 和 Stream算法时,因为结果集会非常大,这时使用--[no]buffer-to-client和--buffer-in-mysql参数会非常有用。

--[no]buffer-to-client
default: yes
Fetch rows one-by-one from MySQL while comparing.This option enables mysql_use_result which causes MySQL to hold the selected rows on the server until the tool fetches them. This allows the tool to use less memory but may keep the rows locked on the server longer.If this option is disabled by specifying --no-buffer-to-client then mysql_store_result is used which causes MySQL to send all selected rows to the tool at once. This may result in the results “cursor” being held open for a shorter time on the server, but if the tables are large, it could take a long time anyway, and use all your memory.For most non-trivial data sizes, you want to leave this option enabled.
This option is disabled when --bidirectional is used.
>>指定该参数后(默认为yes),pt-table-sync会一条一条的从mysql取记录。这时mysql会保存结果集,直到pt-table-sync取完结果集。这样的话pt-table-sync只是消耗很少的内存,但是mysql的行锁可能需要保持更长的时间(可以通过指定--buffer-in-mysql来解决锁的问题)。如果我们指定--no-buffer-to-client,mysql会结果集一次性发送给pt-talbe-sync。这时如果遇到个很大的表,pt-table-sync可能消耗掉你所有的内存。所以当有很多大表的时候,最好指定--buffer-in-mysql和--buffer-to-client

--charset
short form: -A; type: string
Default character set. If the value is utf8, sets Perl’s binmode on STDOUT to utf8, passes the mysql_enable_utf8 option to DBD::mysql, and runs SET NAMES UTF8 after connecting to MySQL. Any other value sets binmode on STDOUT without the utf8 layer, and runs SET NAMES after connecting to MySQL.
>>如果指定--character为utf8,则设置Perl的binmode on STDOUT 为utf8,同时在连接mysql实例后设置SET NAMES UTF8

--[no]check-child-tables
default: yes
Check if --execute will adversely affect child tables. When--replace,--replicate, or --sync-to-master is specified, the tool may sync tables using REPLACE statements. If a table being synced has child tables with ON DELETE CASCADE, ON UPDATE CASCADE, or ON UPDATE SET NULL, the tool prints an error and skips the table because REPLACE becomes DELETE then INSERT,so the DELETE will cascade to the child table and delete its rows. In the worst case, this can delete all rows in child
tables! Specify --no-check-child-tables to disable this check. To completely avoid affecting child tables, also specify --no-foreign-key-checks so MySQL will not cascade any operations from the parent to child ables.This check is only preformed if--execute and one of--replace,--replicate, or --sync-to-master is specified.--print does not check child tables. The error message only prints the first child table found with an ON DELETE CASCADE, ON UPDATE CASCADE, or ON UPDATE SET NULL foreign key constraint. There could be other affected child tables.
>>默认值为yes,即检查pt-table-sync的变更操作是否会影响到字表。当--replace,replicate或者--sync-to-master被使用时,如果可能的话(表上存在唯一键)会通过replace语句来变更数据。如需要变更的表上存在子表(如字表应用了父表的列作为外键,且定义外键时指定了on delete cascade,或者on update cascade或者on update set null),则pt-table-sync会报错,并跳过该表,因为replace是先delete然后insert,这样delete会删除字表上的数据(这可能不是我们想要的)。可以通过指定--no-check-child-tables来禁用该检查。禁用检查并不能避免对字表的影响,可以通过指定--no-foreign-key-checks来排除变更父表对字表的影响。只有当你指定--execute的同时指定--replace或者--replicate或者--sync-to-master时才会进行该检查。--print不会进行该检查。如果检查发现更新父表会影响到字表,会报错,但是只是在该工具第一次发现该情况的时候会报错,之后再发现不会报错。

--[no]check-master
default: yes
With --sync-to-master, try to verify that the detected master is the real master.
>>默认值为yes,当你使用了--sync-to-master参数时,会检查pt-table-sync探测到的主库是否是真正的主库,可以指定--no-check-master来禁止该检查。

--[no]check-slave
default: yes
Check whether the destination server is a slave. If the destination server is a slave, it’s generally unsafe to make changes on it. However, sometimes you have
to; --replace won’t work unless there’s a unique index, for example, so you can’t make changes on the master in that scenario. By default pt-table-sync will complain if you try to change data on a slave. Specify --no-check-slave to disable this check.Use it at your own risk.
>>默认值为yes,检查destination是否为从库。如果destination是从库,那么直接在destination上进行数据变更是不安全的。但是有时候你只能在从库上直接变更,如表上没有唯一索引,replace不能使用,这时你就只能在从库上对不一致的数据进行变更。如果pt-table-sync发现只能在destination上进行变更,并且destination是从库,那么会告警并退出,你可以通过--on-check-slave,来禁用该检查。

--[no]check-triggers
default: yes
Check that no triggers are defined on the destination table. Triggers were introduced in MySQL v5.0.2, so for older versions this option hasno effect because triggers will
not be checked.
>>检查目标表上没有定义触发器。

--chunk-column
type: string
Chunk the table on this column.
>>根据指定的列来对表进行chunk

--chunk-index
type: string
Chunk the table using this index.
>>根据指定的索引来对表进行chunk

--chunk-size
type: string; default: 1000
Number of rows or data size per chunk. The size of each chunk of rows for the “Chunk” and “Nibble” algorithms. The size can be either a number of rows, or a data size. Data sizes are specified with a suffix of k=kibibytes,M=mebibytes, G=gibibytes. Data sizes are converted to a number of rows by dividing by the average row length.
>>指定chunk的大小,可以指定行数,也可以指定数据大小(根据指定大数据大小除以行的平均长度来决定chunk的行数)

--columns
short form: -c; type: array
Compare this comma-separated list of columns.
>>对比数据时,只比对指定列的数据(但是更新时并不是只更新指定的列)

--config
type: Array
Read this comma-separated list of config files; if specified, this must be thefirst option on the command line.
>>可以把pt-table-sync相关参数放在某个文件中,通过--config指定使用该控制文件(如果使用控制文件--config必须放在首位)

--conflict-column
type: string
Compare this column when rows conflict during a --bidirectional sync. When a same but differing row is found the value of this column from each rowis compared according to --conflict-comparison,--conflict-valueand--conflict-thresholdtodetermine which row has the correct data and becomes the source. The column can beany type for which there is an appropriate --conflict-comparison(thisis almost all types except, for example, blobs). This option only works with--bidirectional. See “BIDIRECTIONAL SYNCING” for more information.
>>bidirectional相关参数赞不做解释

--conflict-comparison
type: string
Choose the --conflict-column with this propertyas the source. The option affects how the--conflict-columnvalues from the conflicting rows are compared. Possible
comparisons are one of these MAGIC_comparisons:
newest|oldest|greatest|least|equals|matches
COMPARISON CHOOSES ROW WITH
===================================================================
newest Newest temporal --conflict-column value
oldest Oldest temporal --conflict-column value
greatest Greatest numerical "--conflict-columnvalue
least Least numerical --conflict-column value
equals --conflict-column value equal to --conflict-value
matches --conflict-column value matching Perl regex pattern
--conflict-value
This option only works with --bidirectional. See “BIDIRECTIONAL SYNCING” for more information.
>>bidirectional相关参数赞不做解释

--conflict-error
type: string; default: warn
How to report unresolvable conflicts and conflict errors This option changes how the user is notified when a conflict cannot be resolvedor causes some kind of error.
Possible values are:
* warn: Print a warning to STDERR about the unresolvable conflict
* die: Die, stop syncing, and print a warning to STDERR
This option only works with --bidirectional. See “BIDIRECTIONAL SYNCING” for more information.
>>bidirectional相关参数赞不做解释

--conflict-threshold
type: string
Amount by which one --conflict-column must exceed the other. The--conflict-thresholdpreventsa conflict from being resolved if the absolute difference between the two--conflict-columnvalues is less thanthis amount. For example, if two--conflict-column have timestamp values “2009-12-01 12:00:00” and “2009-12-0112:05:00” the difference is 5 minutes. If --conflict-thresholdisset to “5m” the conflict will be resolved, but if--conflict-threshold is set to “6m” the conflict will fail to resolve because thedifference is not greater than or equal to 6 minutes. In this latter case,--conflict-errorwill report the failure. This option only works with--bidirectional. See “BIDIRECTIONAL SYNCING” for more information.
>>bidirectional相关参数赞不做解释

--conflict-value
type: string
Use this value for certain --conflict-comparison. This option gives the value for equals and matches--conflict-comparison. This option only works with--bidirectional. See “BIDIRECTIONAL SYNCING” for more information.
>>bidirectional相关参数赞不做解释


--databases
short form: -d; type: hash
Sync only this comma-separated list of databases. A common request is to sync tables from one database with tables from anotherdatabase on the same or different
server. This is not yet possible. --databases will not do it, and you can’t do it with the D part of the DSN either because in the absence of a table name it assumes the whole servershould be synced and the D part controls only the connection’s default database.
>>只对该参数指定的数据库进行sync

--defaults-file
short form: -F; type: string
Only read mysql options from the given file. You must give an absolute pathname.
>>指定mysql配置文件,必须指定绝对路径

--dry-run
Analyze, decide the sync algorithm to use, print and exit. Implies --verboseso you can see the results. The results are in the same output format that you’ll see from actually running the tool, but there will be zeros for rows affected. This is because the tool actually executes,but stops before it compares any data and just returns zeros. The zeros do not mean there are no changes to be made.
>>指定--dry-run时pt-table-sync只是分析使用哪种算法,并输出然后退出,而不会去比较实例间数据,也不会去sync。输出格式同真正执行时的输出格式是一样的,但是不会对任何数据做变更,但这并意味着不存在数据不一致。

--engines
short form: -e; type: hash
Sync only this comma-separated list of storage engines.
>>只对该参数指定的相关存储引擎的表进行sync

--execute
Execute queries to make the tables have identical data. This option makes pt-table-sync actually sync table data by executing all the queries that it created to resolve
table differences. Therefore, the tables will be changed! And unless you also specify --verbose, the changes will be made silently. If this is not what you want, see--print or--dry-run.
>>指定该参数,pt-table-sync会查看不一致的数据并进行sync。除非你指定--verbose,否则不会输出变更信息。

--explain-hosts
Print connection information and exit. Print out a list of hosts to which pt-table-sync will connect, with all the various connection options, and exit.
>>指定该参数,pt-table-sync会打印出将要登录的mysql实例的连接信息,并退出。

--float-precision
type: int
Precision for FLOAT and DOUBLE number-to-string conversion. Causes FLOAT and DOUBLE values to be rounded to the specified number of digits after the decimal point, with the ROUND() function in MySQL. This can help avoid checksum mismatches due to different floating-point representations of the same values on different MySQL versions and hardware. The default is no rounding; the values are converted to strings by the CONCAT() function, and MySQL chooses the string representation. If you specifya value of 2, for example,then the values 1.008 and 1.009 will be rounded to 1.01, and will checksum as equal.
>>指定float和double类型在numer-to-string转换时的精确度。

--[no]foreign-key-checks
default: yes
Enable foreign key checks (SET FOREIGN_KEY_CHECKS=1).Specifying --no-foreign-key-checks will SET FOREIGN_KEY_CHECKS=0.
>>指定该参数则登录mysql后设置SET FOREIGN_KEY_CHECKS=1,指定--no-foreign-key-checks登录mysql后设置SET FOREIGN_KEY_CHECKS=0.

--function
type: string
Which hash function you’d like to use for checksums.The default is CRC32. Other good choices include MD5 and SHA1. If you have installed the FNV_64userdefined function,pt-table-syncwill detect it and prefer to use it, because it is much fasterthan the built-ins.You can also use MURMUR_HASH if you’ve installed that user-defined function. Both of these are distributed with Maatkit. See pt-table-checksum for more information and benchmarks.
>>指定checksum的function,默认function为CRC32。(关于该参数的详细信息可以参见我的另一篇博客http://blog.csdn.net/shaochenshuo/article/details/53098224)

--help
Show help and exit.
>>显示help信息,并退出

--[no]hex-blob
default: yes
HEX() BLOB, TEXT and BINARYcolumns.When row data from the source is fetched to create queries to sync the data(i.e. the queries seen with --print and executed by--execute), binary columns are wrapped in HEX() so the binary data does not produce an invalid SQL statement. You can disable this option but you probably shouldn’t.
>>默认值为yes,即在生成sync语句时,如果涉及到BLOB,TEXT,BINARY类型的列时,会使用HEX()对其进行转换,以免因为这些值导致无效的sql。你可以通过--no-hex-blob来禁用该功能,但是建议你不要禁用。

--host
short form: -h; type: string
Connect to host.
>>指定数据库实例的ip地址

--ignore-columns
type: Hash
Ignore this comma-separated list of column names in comparisons.This option causes columns not to be compared. However, if a row is determined to differ between tables, all columns in that row will be synced, regardless. (It is not currently possible to exclude columns from the sync process itself, only from the comparison.)
>>比较数据是否一致时,忽略该参数指定的列。但是如果发现了某行数据不一致,那么该行数据的所有列都会被sync,无法只sync其中某些列。

--ignore-databases
type: Hash
Ignore this comma-separated list of databases.(system databases such as information_schemaandperformance_schemaare ignored by default)
>>比较数据是否一致时,忽略该参数指定的数据(系统库information_schema和performance_schema默认被忽略)

--ignore-engines
type: Hash; default: FEDERATED,MRG_MyISAM
Ignore this comma-separated list of storage engines.
>>比较数据是否一致时,忽略相关存储引擎的表

--ignore-tables
type: Hash
Ignore this comma-separated list of tables.Table names may be qualified with the database name.
>>比较数据是否一致时,忽略该参数指定的表,可以这样指定表database.table

--ignore-tables-regex
type: string; group: Filter
Ignore tables whose names match the Perl regex.
>>作用同--ignore-tables,区别是支持使用perl正则表达式来匹配表名

--[no]index-hint
default: yes
Add FORCE/USE INDEX hints to the chunk and row queries.By default pt-table-sync adds a FORCE/USE INDEX hint to each SQL statement to coerce MySQL into using
the index chosen by the sync algorithm or specified by --chunk-index. This is usually a good thing, but in rare cases the index may not be the best for the query so you can suppress the index hint by specifying --no-index-hint and let MySQL choose the index. This does not affect the queries printed by --print; it only affects the chunk and row queries that pt-tablesync uses to select and compare rows.
>>pt-table-sync对chunk语句和row queries语句使用force/use index强制mysql使用sync算法选择的索引或者--chunk-index指定索引。这样在在绝大多数情况下都是没有问题的,但是在极少数情况下可能因为sync算法或者--chunk-index指定并不是最合适的索引,而导致效率问题。这时可以通过指定--no-index-hint让mysql自己为相关语句选择合适的索引。

--lock
type: int
Lock tables: 0=none, 1=per sync cycle, 2=per table, or 3=globally.
This uses LOCK TABLES. This can help prevent tables being changed while you’re examining them. The possible values are as follows: >>通过--lock控制sync的锁定粒度
VALUE MEANING
===== =======================================================
0 Never lock tables.   >>0表示不锁定
1 Lock and unlock one time per sync cycle (as implemented by the syncing algorithm).This is the most granular level of locking available. For example, the Chunk
algorithm will lock each chunk of C<N> rows, and then unlock themifthey are the same on the source and the destination, before moving on to the next chunk. >>1表示只锁定正在进行examining的chunk的相关行,如果检查发现source和destination该chunk相同,则解锁。
2 Lock and unlock before and after each table. >>2表示锁定正在进行examining的表,examining完成后解锁
3 Lock and unlock once for every server(DSN)synced, with C<FLUSH TABLES WITH READ LOCK>. >>3表示通过 flush tables with read lock锁定正在进行examinig的实例
A replication slave is never locked if --replicate or--sync-to-master is specified, since in theory locking the table on the master should prevent any changes from taking place.(You are not changing data on your slave, right?) If --waitis given, the master (source) is locked and then the tool waitsfor the slave to catch up to the master before continuing.If --transaction is specified, LOCK TABLES is not used. Instead,lock and unlock are implemented by beginning and committing transactions. The exception is if--lockis 3. If --no-transaction is specified, then LOCK TABLES is used for any value of--lock. See--[no]transaction.
>>如果指定了--replicate或者--sync-to-master来同步主从数据,那么从库上的表是不会被锁定的,因为锁定主库的表即可以保证从库的表不会发生变化(当然前提是你不会人为的去修改从库的数据)。

--lock-and-rename
Lock the source and destination table, sync, then swap names. Thisis useful as a less-blocking ALTER TABLE,once the tables are reasonably in sync with each other (which you may choose toaccomplish via any number of means, including dump and reload or even something like pt-archiver). Itrequires exactly two DSNs and assumes they are on the same server, so it does no waiting for replication orthe like. Tables are locked with LOCK TABLES.


--password
short form: -p; type: string
Password to use when connecting. If password contains commas they must beescaped with a backslash:“exam,ple”
>>登录数据库的密码(如果密码中包含逗号,需要使用单引号)

--pid
type: string
Create the given PID file. The tool won’t start if the PID file already existsand the PID it contains is different than the current PID. However, if the PID file exists and the PID it containsis no longer running, the tool will overwrite the PID file with the current PID. The PID file is removedautomatically when the tool exits.
>>启动pt-table-sync工具时会生成pid文件(用户记录该进程的pid号)可以通过--pid来指定该文件名。如果指定的文件已存在,并且文件中包含的pid和当前pid不一致,则pt-table-sync不会启动。但是如果pid文件存在,但是文件中记录的进程已不存在,那么pt-table-sync会使用当前的pid号覆盖写入该文件,并启动pt-table-sync

--port
short form: -P; type: int
Port number to use for connection.
>>指定连接数据库实例的端口

--print
Print queries that will resolve differences.If you don’t trust pt-table-sync, or just want to see what it will do, this is a good way to be safe. These queries are valid SQL and you can run them yourself if you want to sync the tables manually.
>>如果你不相信pt-table-sync,或者说你只是想看看pt-table-sync将会执行哪些语句,你可以指定--print参数(指定--print参数后只打印不输出)。你可以手动执行这些sql来实现数据的sync

--recursion-method
type: array; default: processlist,hosts
Preferred recursion method used to find slaves.Possible methods are:自动发现从库的方法如下
METHOD USES
=========== ==================
processlist SHOW PROCESSLIST >>通过show processlist查找从库(主库使用默认端口时,该参数默认值为processlist)
hosts SHOW SLAVE HOSTS             >>通过show slave host查找从库(主库使用非默认端口时,该参数默认值为hosts)
none Do not find slaves
The processlist method is preferred because SHOW SLAVE HOSTS is not reliable. However, the hosts method is required if the server uses a non-standard port (not 3306). Usually pt-table-sync does the right thing and finds the slaves, but you may give a preferred method and it will be used first. Ifit doesn’t find any slaves, the other
methods will be tried.
>>如果主库使用默认端口,则该参数默认值为processlist,如果主库使用非默认端口,该参数默认值为hosts,使用hosts需要从库配置report_host和report_port参数(注意report_host和report_port参数可能会引发的问题)。

--replace
Write all INSERT and UPDATE statementsas REPLACE.This is automatically switched on as needed when there are unique index violations.
>>pt-table-sync使用replace insert和replace update代替普通的insert和update。在表中有唯一键时,该功能自动启用

--replicate
type: string
Sync tables listed as different in this table.Specifies that pt-table-sync should examine the specified table to find data that differs. The table is exactly the same as the argument of the same name to pt-table-checksum. That is, it contains records of which tables (and ranges of values) differ between the master and slave.For each table and range of values that shows differences between the master and slave,pt-table-checksum will sync that table,with the appropriate WHERE clause,to its master.
This automatically sets --wait to 60 and causes changes to be made on the master instead of the slave.If --sync-to-master is specified, the tool will assume the server you specified is the slave, and connect to the master as usual to sync.Otherwise, it will try to use SHOW PROCESSLIST to find slaves of the server you specified. If it is unable to find any slaves via SHOW PROCESSLIST,it will inspect SHOW SLAVE HOSTS instead. You must configure each slave’s report-host, report-port and other options for this to work right. After finding slaves, it will inspect the specified table on each slave to find data that needs to be synced, and sync it.The tool examines the master’s copy of the table first, assuming that the master is potentially a slave as well. Any table that shows differences there will NOT be synced on the slave(s). For example, suppose your replicationis set up as A->B, B->C, B->D. Suppose you use this argument and specify server B. The tool will examine server B’s copy of the table. If it looks like server B’s data in table test.tbl1 is different from serverA’s copy, the tool will not sync that table on servers C and D.
>>如果在执行pt-table-sync之前,已经通过pt-table-checksum命令找出了主从数据不一致的表。则可以通过--replicate参数使pt-table-sync只对这些表进行sync。如果你指定了--replicate(但是没有指定--sync-to-master)那么只能指定一个dsn,并且把它当做主库。pt-table-sync会自动发现该主库的所有从库。

--set-vars
type: Array
Set the MySQL variables in this comma-separated list of variable=value pairs.By default, the tool sets:wait_timeout=10000 Variables specified on the command line override these defaults.For example, specifying --set-vars wait_timeout=500 overrides the defaultvalue of 10000. The tool prints a warning and continues if a variable cannot be set.
>>通过该参数指定pt-table-sync会话级别的参数,默认情况下pt-table-sync会设置wait_timeout=10000,innodb_lock_wait_timeout=1。如果所指定的参数无法设置,pt-table-sync会输出warning,然后继续工作。


--socket
short form: -S; type: string
Socket file to use for connection.
>>指定所要连接数据库实例的socket文件

--sync-to-master
Treat the DSN as a slave and sync it to its master.Treat the server you specified as a slave. Inspect SHOW SLAVE STATUS, connect to the server’s master, and treat the master as the source and the slave as the destination. Causes changes to be made on the master. Sets --wait to 60 by default, sets --lock to 1 by default, and disables--[no]transaction by default. See also--replicate, which changes this option’s behavior.
>>如果使用了--sync-to-master,只能为pt-table-sync指定一个host并且作为slave。这时pt-table-sync会自动查找该slave的主库,并检查主从数据,如果存在不一致使用主库数据同步从库。主从数据的同步是通过在主库对不一致表的修改来完成的(表必须存在唯一键),主库的修改会通过binlog同步到从库,从而实现主从数据一致。注意,虽然使用--sync-to-master参数时我们只能指定一个从库,但是如果你的同步数据操作是在主库进行,那么该动作还是会被传递到所有的从库,而不仅仅是你所指定的这个从库。如果表上没有唯一键,那么只能直接在从库上修改该表数据。指定了--sync-to-master时,会默认指定--wait为60,--lock为1,并禁用--transaction

--tables
short form: -t; type: hash
Sync only this comma-separated list of tables.Table names may be qualified with the database name.
>>只对该参数指定的表进行sync,可以如果指定表名database.table

--timeout-ok
Keep going if --wait fails.If you specify--wait and the slave doesn’t catch up to the master’s position before the wait times out, the default behavior is to abort. This option makes the tool keep going anyway.Warning: if you are trying to get a consistent comparison between the two servers, you probably don’t want to keep going after a timeout.
>>如果你指定了--wait,在指定时间内从库没有追上主库,那么默认情况下pt-table-sync会退出。如果你指定--timeout-ok,那么在超过--wait指定时间后从库没有追上主库,pt-table-sync也不会退出。

--[no]transaction
Use transactions instead of LOCK TABLES.The granularity of beginning and committing transactions is controlled by --lock. This is enabled by default,but since --lock is disabled by default,it has no effect.Most options that enable locking also disable transactions by default, so if you want to use transactional locking(via LOCK IN SHARE MODE and FOR UPDATE, you must specify --transaction explicitly.If you don’t specify --transaction explicitly pt-table-sync will decide on aper-table basis whether to use
transactions or table locks. It currently uses transactions on InnoDB tables,and table locks on all others.If --no-transaction is specified, then pt-table-sync will not use transactions at all (not even for InnoDB tables) and locking is controlled by --lock.When enabled, either explicitly or implicitly, the transaction isolation levelis set REPEATABLE READ and transactions are started WITH CONSISTENT SNAP SHOT.
>>

--trim
TRIM() VARCHAR columns in BIT_XOR and ACCUM modes. Helps when comparing MySQL 4.1 to >= 5.0.This is useful when you don’t care about the trailing space differences between MySQL versions which vary in their handling of trailing spaces. MySQL 5.0 and later all retain trailing spaces in VARCHAR, while previous
versions would remove them.
>>指定该参数,在比较时会去掉varchar列的前后空格(对于比较4.1 和5.0及更高版本的库时很有帮助)。在mysql 5.0及更高版本中 varchar尾部的空格会被保留,但是在之前的版本中会被删除。所以如果我们比较的是4.1版本数据库和高于5.0版本数据库,最好指定--trim参数(如果你认为尾部是否有空格不是差异的情况下)


--[no]unique-checks
default: yes
Enable unique key checks (SET UNIQUE_CHECKS=1).Specifying --no-unique-checks will SET UNIQUE_CHECKS=0.
>>默认值为yes指定该参数,会在pt-table-sync连接MySQL实例后执行set unique_checks=1。你也可以指定--no-unique-checks来禁用该检查(set unique_checks=0)

--user
short form: -u; type: string
User for login if not current user.
>>指定所要连接数据库实例的用户名

--verbose
short form: -v; cumulative: yes
Print results of sync operations.See “OUTPUT” for more details about the output.


--version
Show version and exit.--[no]version-check
default: yes
Check for the latest version of Percona Toolkit, MySQL, and other programs.This is a standard “check for updates automatically” feature, with twoadditional features. First, the tool checks the version of other programs on the local system in addition to its ownversion. For example, it checks the version of every MySQL server it connects to, Perl, and the Perl moduleDBD::mysql. Second, it checks for and warns about versions with known problems. For example, MySQL 5.5.25 had acritical bug and was re-released as 5.5.25a.Any updates or known problems are printed to STDOUT before the tool’s normaloutput. This feature should never interfere with the normal operation of the tool.For more information, visithttps://www.percona.com/version-check.


--wait
short form: -w; type: time
How long to wait for slaves to catch up to their master.Make the master wait for the slave to catch up in replication before comparing the tables. The value is the number of seconds to wait before timing out (see also--timeout-ok). Sets--lock to 1 and --[no]transaction to 0 by default. If you see an error such as the following,
MASTER_POS_WAIT returned -1 It means the timeout was exceeded and you need to increase it.The default value of this option is influenced by other options. To see what value is in effect, run with--help.To disable waiting entirely (except for locks), specify --wait 0. This helps when the slave is lagging on tables that are not being synced.
>>指定超时时间。pt-table-sync在比对表之前会先检查主从延时,如果从库有延迟,pt-table-sync会等待一定时间(--wait指定),以便从库追上主库。如果在--wait指定时间内从库还没有追上主库,那么pt-table-sync会退出工作(如果同时指定timeout-ok则不会退出)。如果指定了--wait参数会默认设置--lock为1,--transaction为0。如果你看到如下错误,即表示因为从库延时触发了pt-table-sync超时:"MASTER_POS_WAIT returned -1"。--wait的默认值受其他参数测影响。如果想知道该参数实际值为多少,可以指定--help。指定--wait为0则禁止等待,遇到从库延迟,直接退出。

--where
type: string
WHERE clause to restrict syncing to part ofthe table.
>>只检查匹配相应where条件的记录。where条件需要放在括号中

--[no]zero-chunk
default: yes
Add a chunk for rows with zero or zero-equivalent values. The only has an effect when--chunk-size is specified. The purpose of the zero chunk is to capture a potentially large number of zero values that would imbalance the size of the first chunk. For example, if a lot of negative numbers were inserted into an unsigned
integer column causing them to be stored as zeros, then these zero values are captured by the zero chunk instead of the first chunk and all its non-zero values.
>>参数默认值为yes。添加一个zero or zero-equivalent 的chunk。只有在指定指定--chunk-size参数时,该参数才生效。如果向一个非负的列中插入了大量的负数,那么这些负数会作为0来保存。这些列值为0的行会被放入第一个chunk(以该列了chunk表的话),从而造成第一个chunk过大,这时我们指定--zero-chunk,则把这些行放入zero-chunk,吧,避免造成第一个chunk过大。

2.28.11 DSN OPTIONS
These DSN options are used to create a DSN. Each option is givenlike option=value. The options arecasesensitive, so P and p are not the same option. There cannot be whitespacebefore or after the = andif the value
contains whitespace it must be quoted. DSN options arecomma-separated. See the percona-toolkit manpage for full
details.
• A
dsn: charset; copy: yes
Default character set.
• D
dsn: database; copy: yes
Database containing the table to be synced.
• F
dsn: mysql_read_default_file; copy: yes
Only read default options from the given file
• h
dsn: host; copy: yes
Connect to host.
• p
dsn: password; copy: yes
Password to use when connecting. If password contains commas they must beescaped with a backslash:
“exam,ple”
• P
dsn: port; copy: yes
Port number to use for connection.
• S
dsn: mysql_socket; copy: yes
Socket file to use for connection.
• t
copy: yes
Table to be synced.
• u
dsn: user; copy: yes
User for login if not current user.



相关链接:
pt-table-checksum 使用方法
http://blog.csdn.net/shaochenshuo/article/details/56009092


pt-table-sync 中文使用说明
http://blog.csdn.net/shaochenshuo/article/details/53285439


pt-table-checksum 中文使用说明
http://blog.csdn.net/shaochenshuo/article/details/53098224




7.pt-heartbeat

参考:https://www.cnblogs.com/gomysql/p/3687329.html

pt-heartbeat -D mydb_a --table=heartbeat --monitor -h 192.168.65.11



1,在主上运行:--daemonize表示守护进程,后台运行。
pt-heartbeat --user=root --ask-pass --host=127.0.0.1 --create-table -D yayun --interval=1 --update --replace --daemonize

2.在主上运行监测复制延迟
pt-heartbeat -D yayun --table=heartbeat --monitor -h 192.168.0.20


参数:
注意:需要指定的参数至少有 --stop,--update,--monitor,--check。其中--update,--monitor和--check是互斥的,--daemonize和--check也是互斥。

--ask-pass
隐式输入MySQL密码

--charset
字符集设置

--check
检查从的延迟,检查一次就退出,除非指定了--recurse会递归的检查所有的从服务器。

--check-read-only
如果从服务器开启了只读模式,该工具会跳过任何插入。

--create-table
在主上创建心跳监控的表,如果该表不存在。可以自己建立,建议存储引擎改成memory。通过更新该表知道主从延迟的差距。
CREATE TABLE heartbeat (
 ts                    varchar(26) NOT NULL,
 server_id             int unsigned NOT NULL PRIMARY KEY,
 file                  varchar(255) DEFAULT NULL,   -- SHOW MASTER STATUS
 position              bigint unsigned DEFAULT NULL, -- SHOW MASTER STATUS
 relay_master_log_file varchar(255) DEFAULT NULL,   -- SHOW SLAVE STATUS
 exec_master_log_pos   bigint unsigned DEFAULT NULL -- SHOW SLAVE STATUS
);
heratbeat表一直在更改ts和position,而ts是我们检查复制延迟的关键。

--daemonize
执行时,放入到后台执行

--user
-u,连接数据库的帐号

--database
-D,连接数据库的名称

--host
-h,连接的数据库地址

--password
-p,连接数据库的密码

--port
-P,连接数据库的端口

--socket
-S,连接数据库的套接字文件

--file 【--file=output.txt】
打印--monitor最新的记录到指定的文件,很好的防止满屏幕都是数据的烦恼。

--frames 【--frames=1m,2m,3m】
在--monitor里输出的[]里的记录段,默认是1m,5m,15m。可以指定1个,如:--frames=1s,多个用逗号隔开。可用单位有秒(s)、分钟(m)、小时(h)、天(d)。

--interval
检查、更新的间隔时间。默认是见是1s。最小的单位是0.01s,最大精度为小数点后两位,因此0.015将调整至0.02。

--log
开启daemonized模式的所有日志将会被打印到制定的文件中。

--monitor
持续监控从的延迟情况。通过--interval指定的间隔时间,打印出从的延迟信息,通过--file则可以把这些信息打印到指定的文件。

--master-server-id
指定主的server_id,若没有指定则该工具会连到主上查找其server_id。

--print-master-server-id
在--monitor和--check 模式下,指定该参数则打印出主的server_id。

--recurse
多级复制的检查深度。模式M-S-S...不是最后的一个从都需要开启log_slave_updates,这样才能检查到。

--recursion-method
指定复制检查的方式,默认为processlist,hosts。

--update
更新主上的心跳表。

--replace
使用--replace代替--update模式更新心跳表里的时间字段,这样的好处是不用管表里是否有行。

--stop
停止运行该工具(--daemonize),在/tmp/目录下创建一个“pt-heartbeat-sentinel” 文件。后面想重新开启则需要把该临时文件删除,才能开启(--daemonize)。

--table
指定心跳表名,默认heartbeat。



「喜欢这篇文章,您的关注和赞赏是给作者最好的鼓励」
关注作者
【版权声明】本文为墨天轮用户原创内容,转载时必须标注文章的来源(墨天轮),文章链接,文章作者等基本信息,否则作者和墨天轮有权追究责任。如果您发现墨天轮中有涉嫌抄袭或者侵权的内容,欢迎发送邮件至:contact@modb.pro进行举报,并提供相关证据,一经查实,墨天轮将立刻删除相关内容。

评论