健康检查是确保Nginx服务器高可用的关键机制,主要分为两种类型:
Nginx内置的被动健康检查机制可通过以下参数配置:
upstream backend {
server backend1.example.com max_fails=3 fail_timeout=30s;
server backend2.example.com max_fails=3 fail_timeout=30s;
# 被动健康检查参数
proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;
proxy_next_upstream_tries 3;
proxy_next_upstream_timeout 10s;
}
参数说明:
- max_fails
: 在fail_timeout
时间内失败多少次后标记为不可用
- fail_timeout
: 失败计数的时间窗口和服务不可用持续时间
- proxy_next_upstream
: 定义哪些情况下尝试下一个上游服务器
- proxy_next_upstream_tries
: 最多尝试多少次
- proxy_next_upstream_timeout
: 整个重试过程的超时时间
Nginx Plus和开源版(需第三方模块)支持主动健康检查:
upstream backend {
zone backend 64k;
server backend1.example.com;
server backend2.example.com;
# 主动健康检查配置
health_check interval=5s fails=3 passes=2 uri=/health_check;
health_check_timeout 3s;
health_check_status 200;
}
编译安装后配置:
upstream backend {
server backend1.example.com;
server backend2.example.com;
check interval=3000 rise=2 fall=5 timeout=1000 type=http;
check_http_send "HEAD /health HTTP/1.0\r\n\r\n";
check_http_expect_alive http_2xx http_3xx;
}
fail_timeout
后自动重新加入负载均衡池fail_timeout
控制恢复时间vrrp_script chk_nginx {
script "pidof nginx"
interval 2
weight 2
}
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 101
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.1.100
}
track_script {
chk_nginx
}
}
template {
source = "/etc/nginx/conf.d/upstream.conf.ctmpl"
destination = "/etc/nginx/conf.d/upstream.conf"
command = "nginx -s reload"
}
Nginx状态监控:
server {
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
deny all;
}
}
Prometheus监控:
健康检查导致性能问题:
误判服务不可用:
max_fails
和fail_timeout
服务恢复后流量不均衡:
slow_start
参数让恢复的服务逐步接收流量通过合理配置健康检查和自动故障恢复机制,可以显著提高Nginx服务器的可用性和可靠性。