段错误(Segmentation Fault)
内存不足(OOM)
死锁
信号处理不当
资源耗尽
# 查看系统日志
journalctl -xe
# 或
dmesg | tail -n 50
# 特定服务的日志
journalctl -u service_name
# 启用核心转储
ulimit -c unlimited
echo "/tmp/core.%e.%p" > /proc/sys/kernel/core_pattern
# 使用gdb分析核心转储文件
gdb /path/to/executable /path/to/corefile
# 使用strace跟踪系统调用
strace -f -o trace.log /path/to/program
# 使用ltrace跟踪库调用
ltrace -f -o trace.log /path/to/program
# /etc/systemd/system/myapp.service
[Unit]
Description=My Application
After=network.target
[Service]
Type=simple
ExecStart=/path/to/myapp
Restart=always
RestartSec=5
User=myuser
Group=mygroup
Environment="NODE_ENV=production"
[Install]
WantedBy=multi-user.target
常用命令:
systemctl daemon-reload
systemctl start myapp
systemctl enable myapp
systemctl status myapp
# /etc/supervisor/conf.d/myapp.conf
[program:myapp]
command=/path/to/myapp
directory=/path/to/app/dir
user=myuser
autostart=true
autorestart=true
startretries=3
stderr_logfile=/var/log/myapp.err.log
stdout_logfile=/var/log/myapp.out.log
environment=NODE_ENV="production"
# /etc/monit/conf.d/myapp
check process myapp matching "/path/to/myapp"
start program = "/bin/systemctl start myapp"
stop program = "/bin/systemctl stop myapp"
if failed port 8080 protocol http
request /
with timeout 5 seconds
then restart
if cpu > 80% for 2 cycles then alert
if cpu > 95% for 5 cycles then restart
if 3 restarts within 5 cycles then timeout
使用Docker时,可以设置重启策略:
docker run --restart unless-stopped -d myapp
在Kubernetes中使用liveness probe:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
template:
spec:
containers:
- name: myapp
image: myapp:latest
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
#!/bin/bash
while true; do
if ! pgrep -f "/path/to/myapp" > /dev/null; then
echo "$(date): Process not running, restarting..." >> /var/log/myapp_monitor.log
/path/to/myapp &
fi
sleep 10
done
代码质量
资源限制
# 限制内存使用
ulimit -v 1000000
# 限制文件描述符
ulimit -n 4096
优雅退出处理
监控告警
通过以上方法和工具,可以有效地诊断和解决Linux进程崩溃问题,并确保关键服务的持续可用性。