Prometheus + Grafana:
Zabbix:
Nagios/Icinga:
# CPU使用率监控
cat /proc/loadavg
# 内存使用监控
free -m
# 磁盘空间监控
df -h
# 服务进程监控
ps aux | grep [service_name]
systemctl status [service_name]
wget https://github.com/prometheus/prometheus/releases/download/v2.30.3/prometheus-2.30.3.linux-amd64.tar.gz
tar xvfz prometheus-*.tar.gz
cd prometheus-*
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'node'
static_configs:
- targets: ['localhost:9100'] # node_exporter端口
wget https://github.com/prometheus/node_exporter/releases/download/v1.2.2/node_exporter-1.2.2.linux-amd64.tar.gz
tar xvfz node_exporter-*.tar.gz
cd node_exporter-*
./node_exporter &
wget https://github.com/prometheus/alertmanager/releases/download/v0.23.0/alertmanager-0.23.0.linux-amd64.tar.gz
tar xvfz alertmanager-*.tar.gz
cd alertmanager-*
route:
receiver: 'email-notifications'
receivers:
- name: 'email-notifications'
email_configs:
- to: 'admin@example.com'
from: 'alertmanager@example.com'
smarthost: 'smtp.example.com:587'
auth_username: 'alertmanager@example.com'
auth_password: 'password'
{host:system.cpu.load[all,avg1].last()}>5
- 设置报警媒介(邮件、短信、Webhook等) - 定义报警升级规则
#!/bin/bash
# 监控HTTP服务并发送邮件报警
SERVICE="nginx"
RECIPIENT="admin@example.com"
if systemctl is-active --quiet $SERVICE; then
echo "$SERVICE is running"
else
echo "$SERVICE is not running" | mail -s "$SERVICE 服务异常" $RECIPIENT
systemctl restart $SERVICE
fi
设置cron任务每5分钟检查一次:
*/5 * * * * /path/to/monitor_script.sh
分层监控:
报警分级:
避免报警疲劳:
监控看板:
通过以上方案,您可以构建一个全面的Linux服务监控和自动报警系统,确保及时发现并处理系统问题。