监控工具:
高可用架构:
# 在两台服务器上安装Prometheus
wget https://github.com/prometheus/prometheus/releases/download/v2.45.0/prometheus-2.45.0.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
cd prometheus-*
修改prometheus.yml
配置:
global:
scrape_interval: 15s
evaluation_interval: 15s
# 使用相同的配置启动两个实例
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# 监控其他节点
- job_name: 'node'
static_configs:
- targets: ['node1:9100', 'node2:9100', 'node3:9100']
wget https://dl.grafana.com/oss/release/grafana-10.1.5.linux-amd64.tar.gz
tar -zxvf grafana-10.1.5.linux-amd64.tar.gz
cd grafana-10.1.5
./bin/grafana-server
使用远程存储方案: - Thanos - Cortex - VictoriaMetrics
以Thanos为例:
# 部署Thanos Sidecar与每个Prometheus实例一起运行
docker run -d \
-v /path/to/prometheus-data:/prometheus \
-v /path/to/thanos-config:/etc/thanos \
quay.io/thanos/thanos:v0.32.0 \
sidecar \
--prometheus.url=http://localhost:9090 \
--tsdb.path=/prometheus
使用Alertmanager集群:
# 在两台服务器上启动Alertmanager
docker run -d -p 9093:9093 -v alertmanager.yml:/etc/alertmanager/alertmanager.yml quay.io/prometheus/alertmanager
配置alertmanager.yml
:
route:
group_by: ['alertname']
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: 'admin@example.com'
# 集群配置
cluster:
peer: "alertmanager1:9094"
peer: "alertmanager2:9094"
使用MySQL主从复制或Galera集群:
# 主服务器
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_do_db = zabbix
# 从服务器
[mysqld]
server-id = 2
relay-log = /var/log/mysql/mysql-relay-bin.log
log_bin = /var/log/mysql/mysql-bin.log
binlog_do_db = zabbix
使用负载均衡器或VIP:
# 在两台服务器上安装Zabbix
wget https://repo.zabbix.com/zabbix/6.0/ubuntu/pool/main/z/zabbix-release/zabbix-release_6.0-4+ubuntu20.04_all.deb
dpkg -i zabbix-release_6.0-4+ubuntu20.04_all.deb
apt update
apt install zabbix-server-mysql zabbix-frontend-php zabbix-apache-conf zabbix-sql-scripts zabbix-agent
# 在所有被监控节点上
apt install zabbix-agent
编辑/etc/zabbix/zabbix_agentd.conf
:
Server=zabbix-server1,zabbix-server2
ServerActive=zabbix-server1:10051,zabbix-server2:10051
系统级监控:
服务级监控:
业务级监控:
通过以上方案,您可以建立一个高可用的Linux系统监控环境,确保即使部分组件故障,监控系统仍能持续工作并提供准确的监控数据。