PostgreSQL流复制参数max_wal_senders的用法说明
环境:
PostgreSQL 9.2.4
主机:192.25.10.76
从机:192.25.10.71
做postgresql的流复制主从时,会遇到调整max_wal_sengers这个参数,官方文档对这个参数做了一个简要的说明(9.2.4比早先版本多了几句话并做了一些微调),但没有实际的例子。
1.参数说明:
Specifies the maximum number of concurrent connections from standby servers or streaming base backup clients (i.e., the maximum number of simultaneously running WAL sender processes). The default is zero, meaning replication is disabled. WAL sender processes count towards the total number of connections, so the parameter cannot be set higher than max_connections. This parameter can only be set at server start. wal_level must be set to archive or hot_standby to allow connections from standby servers.
也就是说,这个参数是在主机上设置的,是从机连接到主机的并发连接数之总和,所以这个参数是个正整型。默认值是0,也即默认没有流复制功能。该并发连接数从进程上看,就是各个wal sender进程数之和,可以通过ps -ef|grep senders来查看,所以该值不能超过系统的最大连接数(max_connections,该BUG在9.1.5被修复),可以允许超过实际的流复制用户数。该参数更改需要重启DB,比如我只配了一个从机:
[postgres@ndb2 database]$ ps -ef|grep sender postgres 21257 21247 0 20:57 ? 00:00:00 postgres: wal sender process repuser 192.25.10.71(46161) streaming 0/4018ED8 postgres 22193 20949 0 23:02 pts/0 00:00:00 grep sender
2.异常情况
很多时候配置主从的时候会遗漏这个参数,或者没有设置正确(比如实际配的从机数超过设置的连接数),这个时候一般会报错 number of requested standby connections exceeds max_wal_senders (currently X): 备机上显示的日志异常:
2013-08-12 20:53:42.132 CST,,,8859,,5208dad6.229b,1,,2013-08-12 20:53:42 CST,,0,FATAL,XX000,"could not connect to the primary server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 0) ",,,,,,,,,"" 2013-08-12 20:53:47.137 CST,,,8861,,5208dadb.229d,1,,2013-08-12 20:53:47 CST,,0,FATAL,XX000,"could not connect to the primary server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 0) ",,,,,,,,,"" 2013-08-12 20:53:52.142 CST,,,8862,,5208dae0.229e,1,,2013-08-12 20:53:52 CST,,0,FATAL,XX000,"could not connect to the primary server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 0) ",,,,,,,,,"" 2013-08-12 20:53:57.148 CST,,,8864,,5208dae5.22a0,1,,2013-08-12 20:53:57 CST,,0,FATAL,XX000,"could not connect to the primary server: FATAL: number of requested standby connections exceeds max_wal_senders (currently 0) ",,,,,,,,,"
主机上显示的日志异常:
receiver" 2013-08-12 20:43:26.937 CST,,,21064,"",5208d86e.5248,1,"",2013-08-12 20:43:26 CST,,0,LOG,00000,"connection received: host=192.25.10.71 port=46085",,,,,,,,,"" 2013-08-12 20:43:26.938 CST,"repuser","",21064,"192.25.10.71:46085",5208d86e.5248,2,"authentication",2013-08-12 20:43:26 CST,2/1195,0,LOG,00000,"replication connection authorized: user=repuser",,,,,,,,,"" 2013-08-12 20:43:26.938 CST,"repuser","",21064,"192.25.10.71:46085",5208d86e.5248,3,"startup",2013-08-12 20:43:26 CST,2/0,0,FATAL,53300,"number of requested standby connections exceeds max_wal_senders (currently 0)",,,,,,,,,"walreceiver" 2013-08-12 20:43:26.939 CST,"repuser","",21064,"192.25.10.71:46085",5208d86e.5248,4,"startup",2013-08-12 20:43:26 CST,,0,LOG,00000,"disconnection: session time: 0:00:00.002 user=repuser database= host=192.25.10.71 port=46085",,,,,,,,,"wa lreceiver" 2013-08-12 20:43:41.513 CST,,,21066,"",5208d87d.524a,1,"",2013-08-12 20:43:41 CST,,0,LOG,00000,"connection received: host=192.25.10.71 port=46086",,,,,,,,,"" 2013-08-12 20:43:41.514 CST,"repuser","",21066,"192.25.10.71:46086",5208d87d.524a,2,"authentication",2013-08-12 20:43:41 CST,2/1198,0,LOG,00000,"replication connection authorized: user=repuser",,,,,,,,,"" 2013-08-12 20:43:41.514 CST,"repuser","",21066,"192.25.10.71:46086",5208d87d.524a,3,"startup",2013-08-12 20:43:41 CST,2/0,0,FATAL,53300,"number of requested standby connections exceeds max_wal_senders (currently 0)",,,,,,,,,"walreceiver" 2013-08-12 20:43:41.515 CST,"repuser","",21066,"192.25.10.71:46086",5208d87d.524a,4,"startup",2013-08-12 20:43:41 CST,,0,LOG,00000,"disconnection: session time: 0:00:00.002 user=repuser database= host=192.25.10.71 port=46086",,,,,,,,,"wa lreceiver" ...
遇到如此问题,则需要检查postgresql.conf文件的max _wal_senders参数了。
3.参数生效
文档上说明需要重启DB使之生效,如果动态使之生效会报错如下:
[postgres@proxy1 ]$ psql psql (9.2.4) Type "help" for help. postgres=# show max_wal_senders ; max_wal_senders ----------------- 0 (1 row) postgres=# set max_wal_senders=1; ERROR: parameter "max_wal_senders" cannot be changed without restarting the server postgres=#
补充: postgresql基于流复制 (streaming replication)的warm-standby
实例一枚:
Primary:
l 归档设置:
Wal_level=archive Archive_mode=on archive_command = 'cp -i %p /data/pgsql/archived_wal/%f'
l 流复制相关设置:
max_wal_senders = '10' #启动复制进程数量限制,必须大于0 max_replication_slots = '10' #为使用replication slot,必须大于0;replication slot作用是保证wal没有同步到standby之前不能从pg_xlog移走; wal_keep_segments = '50' #指定pg_xlog中最少保留的wal数量 select pg_create_physical_replication_slot(‘gp1_a_slot'); #创建replication slot select * from pg_replication_slots; #查询创建的replication slot
l 编辑pg_hba.conf
# Allow replication connections from localhost, by a user with the replication privilege. #host replication postgres 127.0.0.1/32 trust #host replication postgres ::1/128 trust local replication postgres trust host replication postgres 192.168.12.0/24 trust
l 联机备份过程(基础备份)
#touch /var/lib/pgsql/backup_in_progress $psql –c "select pg_start_backup('hot_backup');" $tar -cf /var/lib/pgsql/backup.tar /var/lib/pgsql/data/ $psql -c "select pg_stop_backup();" #rm /var/lib/pgsql/backup_in_progress tar -rf /var/lib/pgsql/backup.tar /var/lib/pgsql/archive/ #打包归档
实例:
psql -c "select pg_start_backup('pgbk10');" tar -zcf pgbk10.tgz data/ psql -c "select pg_stop_backup();"
Standby:
l 编辑recovery.conf
standby_mode = 'on' primary_conninfo = 'host=192.168.12.38 port=5666 user=postgres' primary_slot_name='gp1_a_slot' #restore_command = 'cp /data/pgsql/archived_wal/%f %p' #archive_cleanup_command = 'pg_archivecleanup /data/pgsql/archived_wal %r'
l 将primary上的基础备份传输到standby上
$scp primary: /var/lib/pgsql/backup.tar .
解压备份到standby上的$PGDATA
l 启动standby
$pg_ctl start –D $PGDATA
启动standby后,postgres开始从primary上接收wal日志进行恢复,并且一直保持恢复状态,psql不能登录;
以上为个人经验,希望能给大家一个参考,也希望大家多多支持我们。如有错误或未考虑完全的地方,望不吝赐教。