哨兵集群介绍
Redis的哨兵(sentinel) 常用于管理多个 Redis 服务器,它主要会执行以下三个任务:
监控(Monitoring):哨兵(sentinel) 会不断地检查你的Master和Slave是否运作正常。
提醒(Notification):当被监控的某个 Redis出现问题时, 哨兵(sentinel) 可以通过 API 向管理员或者其他应用程序发送通知。
自动故障迁移(Automatic failover):当一个Master不能正常工作时,哨兵(sentinel) 会开始一次自动故障迁移操作,它会将 Master 故障后的其中一个 Slave 升级为新的 Master,并让 Master 故障后的其他 Slave 改为复制新的Master;当客户端试图连接故障的 Master 时,集群也会向客户端返回新Master的地址,使得集群可以使用新的 Master 代替故障的 Master。
哨兵(sentinel) 是一个分布式系统,你可以在一个架构中运行多个哨兵(sentinel) 进程,这些进程使用流言协议(gossipprotocols)来接收关于 Master 是否下线的信息,并使用投票协议(agreement protocols)来决定是否执行自动故障迁移,以及选择哪个 Slave 作为新的 Master。
每个哨兵(sentinel) 会向其它哨兵(sentinel)、master、slave定时发送消息,以确认对方是否“活”着,如果发现对方在指定时间(可配置)内未回应,则暂时认为对方已挂(所谓的”主观认为宕机” Subjective Down,简称sdown)。
若“哨兵群”中的多数 sentinel ,报告某一 Master 没响应,系统才认为该 Master "彻底死亡"(即:客观上的真正down机,Objective Down,简称odown),通过一定的vote算法,从剩下的slave节点中,选一台提升为master,然后自动修改相关配置。
虽然哨兵(sentinel) 释出为一个单独的可执行文件 redis-sentinel ,但实际上它只是一个运行在特殊模式下的 Redis 服务器,你可以在启动一个普通 Redis 服务器时通过给定 --sentinel 选项来启动哨兵(sentinel)。
服务器规划
Redis安装
下载官方安装包:https://redis.io/download
1、Redis支持包安装
由于 Redis 是由 C 语言编写,所系统需要安装 gcc
[root@rocketmq-nameserver1 ~]# yum install -y gcc automake autoconf libtool make
2、解压下载的 Redis 安装包,并进入目录,进行编译,分别在3台服务器上操作。
[root@rocketmq-nameserver1 redis-5.0.4]# make install
此过程大概需要2分钟左右,耐心等待。。。。。。
说明:如果在编译过程中出现 该错误,则需要在编译的时候使用此命令 make MALLOC=libc install 可解决。
3、复制 Redis 命令文件到 /usr/bin 目录
[root@rocketmq-nameserver1 redis-5.0.4]# cd src/
[root@rocketmq-nameserver1 src]# cp redis-cli redis-sentinel redis-server /usr/bin
redis配置
过滤 Redis 配置文件 redis.conf
[root@rocketmq-nameserver1 redis-5.0.4]# grep -Ev "^#|^$" redis.conf
bind 192.168.2.177
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize yes
supervised no
pidfile /wdata/redis/data/redis.pid
loglevel notice
logfile "/wdata/redis/logs/redis.log"
databases 16
always-show-logo yes
save 900 1
save 300 10
save 60 10000
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
dir /wdata/redis
replicaof 192.168.2.177 6379
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync no
repl-diskless-sync-delay 5
repl-disable-tcp-nodelay no
replica-priority 100
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
appendonly no
appendfilename "appendonly.aof"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
lua-time-limit 5000
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
list-max-ziplist-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-ziplist-entries 128
zset-max-ziplist-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
我们对 Redis 配置文件的修改,大概只需要修改上面标红的部分,其他选项可根据自己的需要修改。其中下面两项为
daemonize yes:设置 Redis 启动时在后台运行
replicaof 192.168.2.177 6379:设置集群 Master 服务器地址和端口,注意,在 192.168.2.177 主Redis 服务器上将该项注释
将修改好的配置文件复制到其他两台服务器
for i in 178 180; do scp redis.conf root@192.168.2.$i:/wdata/redis/config; done
哨兵配置
过滤 sentinel 配置文件 sentinel.conf
[root@rocketmq-nameserver1 redis-5.0.4]# grep -Ev "^#|^$" sentinel.conf
bind 192.168.2.177
port 26379
daemonize yes
pidfile /wdata/redis-sentinel.pid
logfile "/wdata/redis/logs/sentinel.log"
dir /wdata/redis
sentinel monitor mymaster 192.168.2.177 6379 2
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
一般,我们只需要修改上面标红的部分,其他选项根据实际需要进行修改,其中下面意向的含义为
sentinel monitor mymaster 192.168.2.177 6379 2:设置哨兵监控的主服务器为 192.168.2.177,端口为6379,2 表示如果同时有 2 个哨兵都认为该主服务器不可访问时,则进行故障转移。
将修改好的 sentinel 配置文件分发到其他两台服务器。
for i in 178 180; do scp sentinel.conf root@192.168.2.$i:/wdata/redis/config; done
启动Redis
在三台服务器上分别执行
[root@rocketmq-nameserver1 ~]# service redis start
查看 Redis 是否启动成功
[root@rocketmq-nameserver1 ~]# ps -ef | grep redis
或者
[root@rocketmq-nameserver1 ~]# service redis status
启动哨兵
在三台服务器上分别执行
[root@rocketmq-nameserver1 ~]# service sentinel start
查看哨兵启动是否成功
[root@rocketmq-nameserver1 ~]# ps -ef | grep sentinel
或者
[root@rocketmq-nameserver1 ~]# service sentinel status
查看哨兵集群状态
我们的 Redis 服务和哨兵(sentinel)服务均已成功启动,接下来我们需要验证一下集群是否正常。
[root@rocketmq-nameserver1 ~]# redis-cli -h 192.168.2.177 -p 26379
192.168.2.177:26379> SENTINEL sentinels mymaster
1) 1) "name"
2) "648ced6a4a5126ffe053c7190a7787ce8507122d"
3) "ip"
4) "192.168.2.178"
5) "port"
6) "26379"
7) "runid"
8) "648ced6a4a5126ffe053c7190a7787ce8507122d"
9) "flags"
10) "sentinel"
11) "link-pending-commands"
12) "0"
13) "link-refcount"
14) "1"
15) "last-ping-sent"
16) "0"
17) "last-ok-ping-reply"
18) "203"
19) "last-ping-reply"
20) "203"
21) "down-after-milliseconds"
22) "30000"
23) "last-hello-message"
24) "425"
25) "voted-leader"
26) "?"
27) "voted-leader-epoch"
28) "0"
2) 1) "name"
2) "25133c581dc5a5dcc41ed40d720bf417b70d6449"
3) "ip"
4) "192.168.2.180"
5) "port"
6) "26379"
7) "runid"
8) "25133c581dc5a5dcc41ed40d720bf417b70d6449"
9) "flags"
10) "sentinel"
11) "link-pending-commands"
12) "0"
13) "link-refcount"
14) "1"
15) "last-ping-sent"
16) "0"
17) "last-ok-ping-reply"
18) "203"
19) "last-ping-reply"
20) "203"
21) "down-after-milliseconds"
22) "30000"
23) "last-hello-message"
24) "870"
25) "voted-leader"
26) "?"
27) "voted-leader-epoch"
28) "0"
由输出可见,其他两个哨兵运行正常
192.168.2.177:26379> SENTINEL masters
1) 1) "name"
2) "mymaster"
3) "ip"
4) "192.168.2.177"
5) "port"
6) "6379"
7) "runid"
8) "f2c92c055ec129186fec93d0b394a99ad120fd1d"
9) "flags"
10) "master"
11) "link-pending-commands"
12) "0"
13) "link-refcount"
14) "1"
15) "last-ping-sent"
16) "0"
17) "last-ok-ping-reply"
18) "800"
19) "last-ping-reply"
20) "800"
21) "down-after-milliseconds"
22) "30000"
23) "info-refresh"
24) "10034"
25) "role-reported"
26) "master"
27) "role-reported-time"
28) "120535"
29) "config-epoch"
30) "0"
31) "num-slaves"
32) "2"
33) "num-other-sentinels"
34) "2"
35) "quorum"
36) "2"
37) "failover-timeout"
38) "180000"
39) "parallel-syncs"
40) "1"
由上输出可见,主 Redis 正常
192.168.2.177:26379> SENTINEL slaves mymaster
1) 1) "name"
2) "192.168.2.180:6379"
3) "ip"
4) "192.168.2.180"
5) "port"
6) "6379"
7) "runid"
8) "2675617f208ace5c13161e133819be75f7079946"
9) "flags"
10) "slave"
11) "link-pending-commands"
12) "0"
13) "link-refcount"
14) "1"
15) "last-ping-sent"
16) "0"
17) "last-ok-ping-reply"
18) "448"
19) "last-ping-reply"
20) "448"
21) "down-after-milliseconds"
22) "30000"
23) "info-refresh"
24) "4606"
25) "role-reported"
26) "slave"
27) "role-reported-time"
28) "235584"
29) "master-link-down-time"
30) "0"
31) "master-link-status"
32) "ok"
33) "master-host"
34) "192.168.2.177"
35) "master-port"
36) "6379"
37) "slave-priority"
38) "100"
39) "slave-repl-offset"
40) "47120"
2) 1) "name"
2) "192.168.2.178:6379"
3) "ip"
4) "192.168.2.178"
5) "port"
6) "6379"
7) "runid"
8) "803d8178c44a4380243523248dca0838c7964299"
9) "flags"
10) "slave"
11) "link-pending-commands"
12) "0"
13) "link-refcount"
14) "1"
15) "last-ping-sent"
16) "0"
17) "last-ok-ping-reply"
18) "448"
19) "last-ping-reply"
20) "448"
21) "down-after-milliseconds"
22) "30000"
23) "info-refresh"
24) "4606"
25) "role-reported"
26) "slave"
27) "role-reported-time"
28) "235627"
29) "master-link-down-time"
30) "0"
31) "master-link-status"
32) "ok"
33) "master-host"
34) "192.168.2.177"
35) "master-port"
36) "6379"
37) "slave-priority"
38) "100"
39) "slave-repl-offset"
40) "47120"
由上输出可见 Redis 从服务正常。
哨兵集群常用命令
SENTINEL masters #列出所有被监视的master,以及当前master状态
SENTINEL master <master name> #列出指定的master
SENTINEL slaves <master name> #列出给定master的所有slave以及slave状态
SENTINEL sentinels <master name> #列出监控指定的master的所有sentinel
SENTINEL get-master-addr-by-name <master name> #返回给定master名字的服务器的IP地址和端口号
SENTINEL reset <pattern> #重置所有匹配pattern表达式的master状态
SENTINEL failover <master name> #当msater失效时, 在不询问其他 Sentinel 意见的情况下, 强制开始一次自动故障迁移,但是它会给其他sentinel发送一个最新的配置,其他sentinel会根据这个配置进行更新
SENTINEL ckquorum <master name> #检查当前sentinel的配置能否达到故障切换master所需的数量,此命令可用于检测sentinel部署是否正常,正常返回ok
SENTINEL flushconfig #强制sentinel将运行时配置写入磁盘,包括当前sentinel状态
redis启动脚本
#!/bin/sh
#chkconfig: 2345 55 25
#
# Simple Redis init.d script conceived to work on Linux systems
# as it does use of the /proc filesystem.
### BEGIN INIT INFO
# Provides: redis_6379
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Redis data structure server
# Description: Redis data structure server. See https://redis.io
### END INIT INFO
source /etc/init.d/functions
REDISPORT=6379
EXEC=/usr/bin/redis-server
CLIEXEC=/usr/bin/redis-cli
PIDFILE=/wdata/redis/data/redis.pid
CONF="/wdata/redis/config/redis.conf"
AUTH=""
BIND_IP='192.168.2.177'
start(){
if [ -f $PIDFILE ]
then
echo "$PIDFILE exists, process is already running or crashed"
else
echo "Starting Redis server..."
$EXEC $CONF
fi
if [ "$?"="0" ]
then
echo "Redis is running..."
else
echo "Redis not running !"
fi
}
stop(){
if [ ! -f $PIDFILE ]
then
echo "$PIDFILE does not exist, process is not running"
else
PID=$(cat $PIDFILE)
echo "Stopping ..."
#$CLIEXEC -h $BIND_IP -a $AUTH -p $REDISPORT shutdown
$CLIEXEC -h $BIND_IP -p $REDISPORT shutdown
while [ -x /proc/${PID} ]
do
echo "Waiting for Redis to shutdown ..."
sleep 1
done
echo "Redis stopped."
fi
}
status(){
ps -ef | grep redis-server | grep -v grep >/dev/null 2>&1
if [ $? -eq 0 ];then
echo "redis server is running."
else
echo "redis server is stopped."
fi
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
status)
status
;;
*)
echo "Please use start or stop as first argument"
;;
esac
sentinel启动脚本
#!/bin/sh
#chkconfig: 2345 55 25
#
# Simple Sentinel init.d script conceived to work on Linux systems
# as it does use of the /proc filesystem.
### BEGIN INIT INFO
# Provides: redis_6379
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Sentinel data structure server
# Description: Sentinel data structure server. See https://redis.io
### END INIT INFO
source /etc/init.d/functions
REDISPORT=26379
EXEC=/usr/bin/redis-sentinel
CLIEXEC=/usr/bin/redis-cli
PIDFILE=/wdata/redis/data/redis-sentinel.pid
CONF="/wdata/redis/config/sentinel.conf"
AUTH=""
BIND_IP='192.168.2.177'
start(){
if [ -f $PIDFILE ]
then
echo "$PIDFILE exists, process is already running or crashed"
else
echo "Starting Sentinel server..."
$EXEC $CONF
fi
if [ "$?"="0" ]
then
echo "Sentinel is running..."
else
echo "Sentinel not running !"
fi
}
stop(){
if [ ! -f $PIDFILE ]
then
echo "$PIDFILE does not exist, process is not running"
else
PID=$(cat $PIDFILE)
echo "Stopping ..."
#$CLIEXEC -h $BIND_IP -a $AUTH -p $REDISPORT shutdown
$CLIEXEC -h $BIND_IP -p $REDISPORT shutdown
while [ -x /proc/${PID} ]
do
echo "Waiting for Sentinel to shutdown ..."
sleep 1
done
echo "Sentinel stopped."
fi
}
status(){
ps -ef | grep redis-sentinel | grep -v grep >/dev/null 2>&1
if [ $? -eq 0 ];then
echo "Sentinel server is running."
else
echo "Sentinel server is stopped."
fi
}
case "$1" in
start)
start
;;
stop)
stop
;;
restart)
stop
start
;;
status)
status
;;
*)
echo "Please use start or stop as first argument"
;;
esac