Docker提供了内置的自动重启机制,可以通过--restart
标志配置:
docker run --restart=always my-container
可用的重启策略选项:
检查已运行容器的重启策略:
docker inspect -f "{{ .HostConfig.RestartPolicy.Name }}" container_name
在Dockerfile或运行命令中添加健康检查:
HEALTHCHECK --interval=30s --timeout=3s \
CMD curl -f http://localhost/ || exit 1
或运行时:
docker run --health-cmd="curl -f http://localhost || exit 1" \
--health-interval=5m \
--health-timeout=3s \
my-image
docker logs
查看容器日志bash
docker run --log-driver=json-file --log-opt max-size=10m --log-opt max-file=3 my-container
防止资源耗尽导致故障:
docker run -m 512m --memory-swap=1g --cpus=1.5 my-container
在容器内使用轻量级进程管理器如supervisor:
RUN apt-get install -y supervisor
COPY supervisord.conf /etc/supervisor/conf.d/supervisord.conf
CMD ["/usr/bin/supervisord"]
在生产环境中使用编排工具提供更强大的恢复能力: - Docker Swarm - Kubernetes - Nomad
version: '3'
services:
web:
image: nginx
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost"]
interval: 30s
timeout: 3s
retries: 3
#!/bin/bash
while true; do
docker start -a my-container || {
echo "Container failed, taking corrective action..."
# 自定义恢复逻辑
docker rm -f my-container
docker run -d --name my-container --restart=unless-stopped my-image
}
sleep 10
done
docker events --filter 'event=die' --format '{{.ID}}' | while read container_id; do
echo "Container $container_id died, restarting..."
docker restart $container_id
done
检查容器退出代码:
docker inspect -f '{{.State.ExitCode}}' container_name
查看最后运行的命令:
docker inspect -f '{{.Config.Cmd}}' container_name
进入停止的容器检查状态:
docker commit container_name temp-image
docker run -it --entrypoint=sh temp-image
通过合理配置重启策略、健康检查和监控机制,可以显著提高Docker容器的可靠性和自愈能力。