Kubernetes 高可用(HA)部署主要解决控制平面的单点故障问题,确保集群在部分组件或节点故障时仍能正常工作。典型的高可用架构包括:
etcd是Kubernetes的"大脑",存储所有集群数据,必须实现高可用:
# 在三节点上安装etcd
# 节点1配置示例
ETCD_NAME=etcd1
ETCD_INITIAL_CLUSTER="etcd1=https://10.0.0.1:2380,etcd2=https://10.0.0.2:2380,etcd3=https://10.0.0.3:2380"
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://10.0.0.1:2380"
ETCD_ADVERTISE_CLIENT_URLS="https://10.0.0.1:2379"
ETCD_LISTEN_PEER_URLS="https://0.0.0.0:2380"
ETCD_LISTEN_CLIENT_URLS="https://0.0.0.0:2379"
ETCD_DATA_DIR="/var/lib/etcd"
关键配置项:
- initial-cluster
:指定集群所有成员
- initial-cluster-token
:集群唯一标识符
- initial-cluster-state
:新集群设为new
,加入现有集群设为existing
使用Nginx、HAProxy或云服务商的LB实现:
# Nginx配置示例
stream {
upstream kube_apiserver {
server 10.0.0.1:6443;
server 10.0.0.2:6443;
server 10.0.0.3:6443;
}
server {
listen 6443;
proxy_pass kube_apiserver;
}
}
通过--leader-elect
参数启用领导者选举机制:
# /etc/kubernetes/manifests/kube-controller-manager.yaml
spec:
containers:
- command:
- kube-controller-manager
- --leader-elect=true
- --controllers=*,bootstrapsigner,tokencleaner
准备环境
初始化第一个Master节点
kubeadm init --control-plane-endpoint "LOAD_BALANCER_DNS:LOAD_BALANCER_PORT" \
--upload-certs \
--pod-network-cidr=10.244.0.0/16
kubeadm join LOAD_BALANCER_DNS:LOAD_BALANCER_PORT \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash> \
--control-plane \
--certificate-key <key>
kubeadm join LOAD_BALANCER_DNS:LOAD_BALANCER_PORT \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash>
etcd集群部署
etcdctl --endpoints=https://10.0.0.1:2379 --cacert=/etc/etcd/ssl/ca.pem --cert=/etc/etcd/ssl/etcd.pem --key=/etc/etcd/ssl/etcd-key.pem endpoint health
部署API Server
[Unit]
Description=Kubernetes API Server
After=etcd.service
[Service]
ExecStart=/usr/local/bin/kube-apiserver \
--advertise-address=10.0.0.1 \
--allow-privileged=true \
--apiserver-count=3 \
--audit-log-maxage=30 \
--audit-log-maxbackup=3 \
--audit-log-maxsize=100 \
--audit-log-path=/var/log/audit.log \
--authorization-mode=Node,RBAC \
--client-ca-file=/etc/kubernetes/pki/ca.crt \
--enable-admission-plugins=NodeRestriction \
--enable-bootstrap-token-auth=true \
--etcd-cafile=/etc/etcd/ssl/ca.pem \
--etcd-certfile=/etc/etcd/ssl/etcd.pem \
--etcd-keyfile=/etc/etcd/ssl/etcd-key.pem \
--etcd-servers=https://10.0.0.1:2379,https://10.0.0.2:2379,https://10.0.0.3:2379 \
--kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt \
--kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key \
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt \
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key \
--requestheader-allowed-names=front-proxy-client \
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt \
--requestheader-extra-headers-prefix=X-Remote-Extra- \
--requestheader-group-headers=X-Remote-Group \
--requestheader-username-headers=X-Remote-User \
--secure-port=6443 \
--service-account-key-file=/etc/kubernetes/pki/sa.pub \
--service-cluster-ip-range=10.96.0.0/12 \
--tls-cert-file=/etc/kubernetes/pki/apiserver.crt \
--tls-key-file=/etc/kubernetes/pki/apiserver.key
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
--leader-elect=true
高可用集群需要可靠的网络插件,常见选择: - Calico - Flannel - Weave Net - Cilium
以Calico为例:
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
检查节点状态
kubectl get nodes -o wide
检查组件状态
kubectl get cs
模拟故障测试
kubectl drain
模拟节点维护检查etcd集群健康
ETCDCTL_API=3 etcdctl --endpoints=https://10.0.0.1:2379 \
--cacert=/etc/etcd/ssl/ca.pem \
--cert=/etc/etcd/ssl/etcd.pem \
--key=/etc/etcd/ssl/etcd-key.pem \
endpoint health
证书管理
kubeadm alpha certs check-expiration
检查证书过期时间etcd备份与恢复
# 备份
ETCDCTL_API=3 etcdctl --endpoints=https://10.0.0.1:2379 \
--cacert=/etc/etcd/ssl/ca.pem \
--cert=/etc/etcd/ssl/etcd.pem \
--key=/etc/etcd/ssl/etcd-key.pem \
snapshot save /backup/etcd-snapshot.db
# 恢复
etcdctl snapshot restore /backup/etcd-snapshot.db \
--data-dir /var/lib/etcd-from-backup
Kubernetes版本升级
# 使用kubeadm升级
kubeadm upgrade plan
kubeadm upgrade apply v1.xx.yy
节点规划
监控与告警
安全加固
文档与演练
脑裂问题
证书过期
性能瓶颈
通过以上详细配置和最佳实践,您可以构建一个稳定可靠的Kubernetes高可用集群,满足生产环境的需求。