Linux系统中关键日志文件通常位于:
- /var/log/messages
- 常规系统消息
- /var/log/syslog
- 系统日志
- /var/log/auth.log
- 认证相关日志
- /var/log/kern.log
- 内核日志
- /var/log/dmesg
- 启动消息
- /var/log/nginx/
- Nginx日志
- /var/log/mysql/
- MySQL日志
# 实时查看日志
tail -f /var/log/syslog
# 查看特定时间段的日志
sed -n '/2023-10-01 10:00/,/2023-10-01 11:00/p' /var/log/syslog
# 统计错误出现次数
grep -i "error" /var/log/syslog | wc -l
# 按关键词过滤并高亮显示
grep --color "critical" /var/log/messages
现代Linux系统的标准日志服务,配置示例:
# /etc/rsyslog.conf 中添加
*.emerg /var/log/emergency.log
auth.* /var/log/auth.log
日志轮转配置示例:
# /etc/logrotate.d/nginx
/var/log/nginx/*.log {
daily
missingok
rotate 14
compress
delaycompress
notifempty
create 0640 www-data adm
sharedscripts
postrotate
/usr/sbin/nginx -s reopen
endscript
}
部署架构: 1. Filebeat 收集日志并发送到 Logstash 2. Logstash 过滤和处理日志 3. Elasticsearch 存储和索引日志 4. Kibana 可视化展示
基本Filebeat配置示例:
# filebeat.yml
filebeat.inputs:
- type: log
enabled: true
paths:
- /var/log/*.log
output.logstash:
hosts: ["logstash-server:5044"]
适用于指标监控,结合Alertmanager实现告警:
# prometheus.yml 告警规则示例
groups:
- name: node-alerts
rules:
- alert: HighCPUUsage
expr: 100 - (avg by(instance) (irate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
for: 10m
labels:
severity: warning
annotations:
summary: "High CPU usage on {{ $labels.instance }}"
轻量级日志聚合系统:
# promtail配置示例
server:
http_listen_port: 9080
grpc_listen_port: 0
positions:
filename: /tmp/positions.yaml
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: system
static_configs:
- targets:
- localhost
labels:
job: varlogs
__path__: /var/log/*log
# 安装
sudo apt install swatch
# 配置示例 (~/.swatchrc)
watchfor /error|fail|critical/
echo red
bell 3
mail address=admin@example.com,subject="系统错误警报"
# 同时监控多个日志文件
multitail -e "error" /var/log/syslog -e "fail" /var/log/auth.log
# 分屏显示
multitail -s 2 /var/log/nginx/access.log /var/log/nginx/error.log
# 实时HTML报告
goaccess /var/log/nginx/access.log -o /var/www/html/report.html --real-time-html
使用mailx发送邮件告警:
# 安装mailx
sudo apt install mailutils
# 测试发送
echo "测试邮件内容" | mail -s "测试主题" admin@example.com
# 结合日志监控的脚本示例
#!/bin/bash
LOG=/var/log/syslog
ERRORS=$(grep -i "error" $LOG | tail -n 5)
if [ -n "$ERRORS" ]; then
echo "$ERRORS" | mail -s "系统错误警报 $(date)" admin@example.com
fi
Telegram告警脚本示例:
#!/bin/bash
TOKEN="your_bot_token"
CHAT_ID="your_chat_id"
MESSAGE="系统检测到关键错误!"
curl -s -X POST https://api.telegram.org/bot$TOKEN/sendMessage \
-d chat_id=$CHAT_ID -d text="$MESSAGE"
#!/bin/bash
# 磁盘空间检查
DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | tr -d '%')
if [ $DISK_USAGE -gt 90 ]; then
echo "警告: 根分区使用率 ${DISK_USAGE}%" | \
mail -s "磁盘空间警报" admin@example.com
fi
# 内存检查
MEM_FREE=$(free -m | awk 'NR==2 {print $4}')
if [ $MEM_FREE -lt 100 ]; then
echo "警告: 可用内存仅剩 ${MEM_FREE}MB" | \
mail -s "内存警报" admin@example.com
fi
# 统计HTTP状态码
awk '{print $9}' access.log | sort | uniq -c | sort -rn
# 提取特定时间段日志
awk '/01\/Oct\/2023:10:00/,/01\/Oct\/2023:11:00/' access.log
# 统计IP访问量
awk '{print $1}' access.log | sort | uniq -c | sort -nr | head -20
#!/usr/bin/perl
use strict;
use warnings;
open(my $log, '<', '/var/log/nginx/access.log') or die $!;
my %ip_count;
while (<$log>) {
if (/^(\d+\.\d+\.\d+\.\d+)/) {
$ip_count{$1}++;
}
}
foreach my $ip (sort { $ip_count{$b} <=> $ip_count{$a} } keys %ip_count) {
printf "%15s: %d\n", $ip, $ip_count{$ip};
}
from sklearn.ensemble import IsolationForest
import numpy as np
# 假设我们已经从日志中提取了特征
X = np.array([[1.1], [0.3], [0.5], [100.0], [0.4], [0.6], [120.0]])
# 训练异常检测模型
clf = IsolationForest(random_state=42)
clf.fit(X)
# 预测异常
print(clf.predict(X)) # 输出1表示正常,-1表示异常
通过以上实践,您可以构建一个高效、可靠的Linux日志监控和告警系统,及时发现并响应系统问题,保障业务连续性。